0% found this document useful (0 votes)
66 views

Data Analyst Nanodegree Program - Syllabus

This document outlines a data analyst syllabus that prepares students for a career as a data analyst through three months of coursework in Python data analysis libraries like NumPy and pandas. The first course introduces data analysis concepts and the pandas and NumPy libraries. The second course covers advanced data wrangling skills for assessing, cleaning, and structuring messy real-world datasets. Students complete two projects analyzing and wrangling real-world datasets in Jupyter Notebooks to build their portfolio.

Uploaded by

Shaikh Saad Alam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Data Analyst Nanodegree Program - Syllabus

This document outlines a data analyst syllabus that prepares students for a career as a data analyst through three months of coursework in Python data analysis libraries like NumPy and pandas. The first course introduces data analysis concepts and the pandas and NumPy libraries. The second course covers advanced data wrangling skills for assessing, cleaning, and structuring messy real-world datasets. Students complete two projects analyzing and wrangling real-world datasets in Jupyter Notebooks to build their portfolio.

Uploaded by

Shaikh Saad Alam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Analyst

Syllabus

u dacity.com
u dacity.com
Data Analyst

BEFORE YOU START

Overview:
Learn how to analyze data using in-demand Python libraries
like NumPy and pandas. Students will start by going over the
basics of the data analysis process, then dive into advanced
data wrangling skills to work with messy, complex real-world
datasets. Finally, you will create highly customized
visualizations using the Matplotlib Python library.

Educational Objectives Prerequisites


This program prepares you for a career as a data analyst A well-prepared learner

by helping you learn to organize data, uncover patterns has experience with:
and insights, draw meaningful conclusions, and clearly
communicate critical ndings. ou ll develop pro ciency
fi Y ’ fi Basic Python 

in Python and its data analysis libraries NumPy, pandas, (


Descriptive Statistics

Matplotlib as you build a portfolio of pro ects to


) j

showcase in your ob search.


j

Length of Program*: Skill level: School:


3 months
Intermediate
School of Data Science

Soft are ar
w /H dw are an ver ion re d s qu irement :s

For this anode ree ro ram ou will need access to the Internet
N g p g , y .

Additional software such as P thon and its common data anal sis libraries e. . andas and at lotlib will be
y y ( g, p M p )

required but the ro ram includes Udacit or s aces with all of the relevant ac a es installed so students
, p g y W k p p k g ,

will not need to download an additional software.


y

*The length of this program is an estimation of total hours the average student may take to complete all required coursework,
including lecture and project time. If you spend about 5-10 hours per week working through the program, you should finish within
the time provided. Actual hours may vary.

u dacity.com
Data A n a lyst

Course #1:

Introduction to Data Analysis

with Pandas and NumPy
PROJ ECT # 1

Investigate a Dataset

In this project, you will analyze a dataset and then communicate


your findings about it. This includes asking questions, exploring
the dataset, performing basic data wrangling, drawing
conclusions, and presenting your findings with numbers and
visualizations. Your analysis will be performed in a Jupyter
Notebook using the NumPy and pandas Python libraries.

Exploring and Inspecting Data


Supporting Lesson Content Form and ask questions about data

Define data wrangling and EDA

Gather data

The Data Analysis Process


Read CSV files with pandas

Describe the types of problems that Data Analysts can solve

Use pandas to inspect and assess data


Describe the five steps in the data analysis process:
Question, Wrangle, Explore, Draw Conclusions, and
Communicate
Manipulating Data Using Pandas and NumPy
Describe three important Python packages for data analysis: Use pandas to perform simple data cleaning tasks

NumPy, pandas, and Matplotlib Use the pandas query function to filter data

Fix column data types using pandas

Use pandas concatenate and merge to combine data

Jupyter Notebooks
Use pandas explode to expand data
Explain that Jupyter Notebooks can combine explanatory
text, math equations, code, and visualizations

Communicating Results
Create a new Jupyter Notebook

Use pandas to summarize a dataset

Use code and Markdown cells in a Jupyter Notebook

Use pandas plotting to create simple visualizations

Use keyboard shortcuts in a Jupyter Notebook

Draw conclusions from data using descriptive statistics and


Use magic keywords in a Jupyter Notebook
visualizations

Convert notebooks to other formats Use visuals to communicate results

udacity.com
Data A n a lyst

Course #2:

Advanced Data Wrangling
PROJ ECT # 2

Wrangle and Analyze Data


Real-world data rarely comes clean. Using Python and its
libraries, you will gather data from a variety of sources and in a
variety of formats, assess its quality and structure, then clean it.
This is called data wrangling. You will document your wrangling
efforts in a Jupyter Notebook, plus showcase them through
analyses and visualizations using Python (and its libraries).

Assessing Data
Supporting Lesson Content
Describe the assessing phase

Distinguish between dirty data (content or “quality” issues)



Introduction to Data Wrangling and messy data (structural or “tidiness” issues)

Identify each step of the data wrangling process (gathering, Identify data quality issues and categorize them

assessing, and cleaning)

Assess data quality visually

Explain why data wrangling is important


Assess data quality programmatically using pandas

Strategize about data structuring needed for analytical datasets

Gathering Data Assess data structure visually

Describe the gathering phase

Assess data structure using pandas


Unzip file archives using Python

Extract gathered tabular data from flat files using pandas


C leaning Data
Gather data by programmatically downloading files
Describe the cleaning phase

Extract data from text files using Python


Identify each step of the data cleaning process (defining,
Gather data by accessing APIs
coding, and testing)

Extract gathered data from JSON files


Define data cleaning tasks based on assessment findings

Gather and extract data from HTML files using BeautifulSoup


Clean data using Python

Extract data from a SQL database


Test cleaning code visually

Identify additional file formats that data analysts might Test cleaning programmatically using Python

encounter Store cleaned data using flat files

udacity.com
Data A n a lyst

Course #3: Data Visualization with Matplotlib and Seaborn


PROJ ECT # 3

Communicate Data Findings


In Part I, Exploratory data visualization, you will use Python visualization libraries to systematically explore your selected
dataset, starting with plots of single variables and building up to plots of multiple variables.

In Part II, Explanatory data visualization, you will produce a short presentation that illustrates interesting properties, trends,
and relationships that you discovered in your selected dataset. The primary method of conveying your findings will be
through transforming your exploratory visualizations from the first part into polished, explanatory visualizations.

Univariate Exploration of Data


Supporting Lesson Content Use bar charts to depict distributions of categorical
variables.

Use histograms to depict distributions of numeric


Data Visualization in Data Analysis variables.

Understand why visualization is important in the practice Use axis limits and different scales to change how your
of data analysis.
data is interpreted.
Know what distinguishes exploratory analysis from
Explanatory analysis, and the role of data visualization in Multivariate Exploration of Data
each. Use encodings like size, shape, and color to encode values of
the third variable in a visualization.

Desi g n of Visualizations Explore multiple relationships between multiple variables at


Interpret features in terms of the level of measurement.
the same time.

Know different encodings that can be used to depict data in Use feature engineering to capture relationships between
visualizations.
variables.
Understand various pitfalls that can affect the effectiveness
and truthfulness of visualizations. Explanatory Visualizations
Understand what it means to tell a compelling story with
Bivariate Exploration of Data
data.

Use scatterplots to depict relationships between numeric Choose the best plot type, encodings, and annotations to
variables.
polish your plots.

Use violin and box charts to depict relationships between Create high-quality image files using a Jupyter Notebook to
categorical and numeric variables.
convey your findings.
Use clustered bar charts to depict relationships between
categorical variables
Visualization C ase S tu d y
Use faceting to create plots across different subsets of
 Apply your knowledge of data visualization to a dataset
the data involving the characteristics of diamonds and their prices.

udacity.com
Data A n a lyst

Course #1 Instructor

Matt Maybeno
Principal Software Engineer

Matt is a Principal Software Engineer at SOCi. With a masters in Bioinformatics


from SDSU, he utilizes his cross domain expertise to build solutions in NLP and
predictive analytics.

Course #2 Instructor

Ria Cheruvu
Intel NEX AI Ethics Lead Architect

Ria is Intel NEX AI Ethics Lead Architect, leading trustworthy AI. She is an emerging
industry speaker and has a master’s in data science from Harvard University. Ria
previously served as a Teaching Fellow for Harvard's 2021 Data Science graduate
curriculum and Lead Instructor for Eduonix's ML Deployment course.

Course #3 Instructor

Josh Magee
Senior Data Scientist

Josh is a Senior Data Scientist at Local Logic, where he models commercial real
estate trends, acquisitions, and sustainable cities. He was formerly Assistant
Professor of Data Analytics at Stonehill College, and was a postdoctoral researcher
in nuclear physics at Lawrence Livermore National Laboratory.

udacity.com
Learn More at
w w w.u dacity.com

u dacity.com

You might also like