Coursera - Data Analytics - Course 4
Coursera - Data Analytics - Course 4
Course 4 Notes
Week 1 - The Importance of Integrity
Data replication
The process of storing data in multiple locations
Data Transfer
The process of copying data from a storage device to memory, or from one computer to
another.
Data Manipulation
The process involves changing the data to make it more organized and easier to read.
It's important to check that the data you use aligns with the business objective.
Use the following decision tree as a reminder of how to deal with data errors or not
enough data:
The importance of sample size
Random Sampling
A way of selecting a sample from a population so that every possible type of sample has an
equal chance of being chosen.
Hypothesis Testing
A way to see if a survey or experiment has meaningful results.
If a test is statistically significant, it means the results of the test are real and not an error
caused by random chance.
Usually, you need a statistical power of at least 0.8 or 80% to consider your results statistically
significant.
Clean data
Data that are complete, correct, or relevant to the problem you're trying to solve.
Text String
A group of characters within a cell, most often composed of letters.
Split
A tool that divides a text string around the specified character and puts each fragment into a
new and separate cell. Split is helpful when you have more than one piece of data in a cell
and you want to separate them out.
CONCATENATE
A function that joins multiple text strings into a single string.
Spreadsheets vs SQL
Features of Spreadsheets Features of SQL Databases
Create graphs and visualizations in the same Prepare data for further analysis in another
program software
Built-in spell check and other useful functions Fast and powerful functionality
Best when working solo on a project Great for collaborative work and tracking
queries run by all users
Advanced SQL