0% found this document useful (0 votes)
11 views

Data Science

The document outlines a comprehensive curriculum covering advanced Excel, Power BI, SQL, Python, NumPy, Pandas, Matplotlib, Seaborn, statistics, machine learning, natural language processing, PySpark, web scraping, and time series analysis. Each module includes key concepts, techniques, and practical projects to enhance learning. The final capstone project involves IMDB movie analysis using Power BI, SQL, and Python.

Uploaded by

Kameshwaran King
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Science

The document outlines a comprehensive curriculum covering advanced Excel, Power BI, SQL, Python, NumPy, Pandas, Matplotlib, Seaborn, statistics, machine learning, natural language processing, PySpark, web scraping, and time series analysis. Each module includes key concepts, techniques, and practical projects to enhance learning. The final capstone project involves IMDB movie analysis using Power BI, SQL, and Python.

Uploaded by

Kameshwaran King
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MODULE – 1 ADVANCED EXCEL

Data Validation in Excel

Conditional formatting

Pivot Tables

Power Pivot

Power Query

Important Functions & Formulae in Excel

Data Visualization in Excel

PROJECT – ANALYSIS OF BIKE SALES

MODULE – 2 POWER BI

Power BI Introduction and Installation

The Power BI user interface, including types of data sources and visualizations

Power BI Visualization

Power Query Editor

Modeling with Power BI

Power Query Editor Filter Data

Customize the data in Power BI

DAX Expressions

Power BI Service

Sample dashboard with Animation Visual

PROJECT – GATHERING INSIGHTS OF IPL – 2022


MODULE – 3 SQL

Introduction to SQL

Database Normalization and Entity Relationship Model

SQL Operators

Working with SQL: Join, Tables, and Variables

SQL Functions

Working with Sub queries

SQL Views, Functions, and Stored Procedures

SQL Optimization and Performance

PROJECT – WAL-MART SALES ANALYSIS

MODULE – 4 PYTHON

Introduction to Python

Installation of Python

If – Else Loop

For loop

While loop

Break and Continue

Data Structures – List, Tuple, Set, Dictionary

Functions

List Comprehension

Lambda Functions

File Handling

Exception Handling
MODULE – 5 NUMPY

Introduction to NumPy

NumPy Arrays

Universal Functions (Ufunc)

Array Operations

Boolean Indexing

Linear Algebra with NumPy

Random In NumPy

File Input/output With NumPy

Understanding and Using NumPy’s Datetime64 and Timedelta64.

Introduction to Structured Arrays

MODULE – 6 PANDAS

Introduction to Pandas

Data Structures in Pandas

Data Cleaning and Preparation

Data Selection and Indexing

Data Cleaning Techniques

Grouping and Aggregation

Merging and Concatenating Data Frames

Working with Text Data in Pandas

Introduction to Pandas Profiling

PROJECT – ANALYSIS OF NETFLIX SERIES USING PANDAS


MODULE – 7 MATPLOTLIB

Introduction to Data Visualization

Importance of data visualization in data analysis

Overview of Matplotlib and its capabilities

Installation and basic setup

Basic Plotting with Matplotlib

Line plots, scatter plots, and bar plots

Customizing plot appearance: colors, markers, and styles

Multiple Plots and Subplots

Annotations and Text

Customizing Axes and Tickers

Histograms and Density Plots

3D Plots with Matplotlib

MODULE – 8 SEABORN

Introduction to Seaborn

Installation and basic setup

Basic Plots with Seaborn

Creating common plots like scatter plots, line plots, and bar plots

Customizing aesthetics using Seaborn's style options

Categorical Plots

Distribution Plots

Pair Plots and Heat maps

Regression Plots

Matrix Plots
MODULE – 9 STATISTICS & PROBABILITY

Descriptive Statistics

The Measure of Central Tendency

Five Points Summary

Inferential Statistics

Correlation & Covariance

Confidence Intervals

Hypothesis Testing

Probability Distributions

Bayes’ Theorem

Central Limit Theorem

MODULE – 10 MACHINE LEARNING

Introduction to Machine Learning

Supervised Learning Algorithms

Regression & Classification Problems

Linear Regression

Logistic Regression

Decision Trees and Random Forests

Support Vector Machines (SVM)

Unsupervised Learning Algorithms

Clustering Algorithms

PROJECT – 1: CREDIT CARD FRAUD DETECTION

PROJECT – 2: GOLD PRICE PREDICTION


MODULE – 11 NATURAL LANGUAGE PROCESSING

Introduction to NLP

Text Processing with Python

Text Tokenization

Text Preprocessing

Bag-of-Words (BOW) Model

TF-IDF (Term Frequency-Inverse Document Frequency)

Text Classification Basics

Advanced Text Classification

Named Entity Recognition (NER)

PROJECT – SPAM MAIL FILTERATION

MODULE – 12 PYSPARK

Introduction to PySpark

Introduction to Big Data

PySpark Data Frames and RDDs

Data Processing with PySpark

PySpark SQL

User-Defined Functions (UDFs)

Working with Different Data Sources

Integration with Pandas

Introduction to PySpark MLlib

Feature Engineering and Pipelines

Optimization and Performance Tuning


MODULE – 13 WEB SCRAPING

Data Extraction with Beautiful Soup

Web Scraping using Selenium

PROJECT – DATA EXTRACTION FROM AMAZON WEBSITE

MODULE – 14 TIME SERIES ANALYSIS

Components of Time Series Analysis

Stationarity

Autocorrelation and Cross-Correlation

Time Series Decomposition

Moving Averages and Smoothing

Anomaly Detection in Time Series

Time Series Visualization

PROJECT – STOCK MARKET ANALYSIS

MODULE – 15 CAPSTONE PROJECT

IMDB MOVIE ANALYSIS USING POWER BI, SQL AND PYTHON

You might also like