Data Analystic
Data Analystic
Chương II.
The life cycle ò data is plan, capture, manage, analyze, archive and destroy.
- Plan: decides what kind of data it needs, how it will be managed throughout its
life, who will be responsible for it, anf the optimal outcomes.
- Manage: how and where it’s stored, the tools used to keep it safe and secure,
and the action taken to make sure that it’s maintained properly.
- Analyze: The data is used to solve problems, make great decisions, and support
business goals.
- Archiving means storing data in a place where it’s still available, but may not
be used again.
- Destroy: remove data from storage and delete any shared copies of the data.
B2:
Ask: we define the problem to be solved and we make sure that we fully
understand stakeholder expectations.
Stakeholders: People who have invested time and resources into a project
and are interested in outcome
Look at the current state and identify how its different from the ideel
state.
Determine who the stakeholder are.
Prepare: This is where data analysts collect and store data they’ll use for the
upcoming analysis process.
Process: here data analysts find and eliminate any errors and inaccuracies that
can get in the way results.
Cleaning data
Transforming data into a more useful format
Combining two or more datasets to make information more complete
Removing outliters
Analyze : Analyzing the data you’ve collected involves using tools to transform
and organize that information so that you can draw useful conclusions, make
predictions, and drive informed decision-making.
Share:
Query language : a computer programing language that allows you retrieve and
manipulate data from a database
Make it easier for you to learn and understand the requests made to
databases
Allow analysts to select, creat add, or download data from a database for
analysis.
Visualization tools:
Refine the data and present the results of your data analysis.
Business task: The question or problem data analysis answers for a business
Analyze weather data from the last decade to identify predictable patterns.
Fairness means ensuring that your analysis doesn’t create or reinforce bias(cungr
coos thanhf kieens).
Consider fairness
The data analyst role is one of many job titles that contain the word “ analyst”
Data analytics conslutant- analyzes the systems and models for using data.
Data scientist- uses expert skills in technology and social science to find trends
through data analysis.
A
Business task: The question or problem data analysis resolves for a business
Data analyst: Someone who collects, transforms, and organizes data in order to
draw
Data science: A field of study that uses raw data to create new ways of modeling
and
Data strategy: The management of the people, processes, and tools used in data
analysis
Fairness: A quality of data analysis that does not create or reinforce bias
data in a spreadsheet
Gap analysis: A method for examining and evaluating the current state of a
process in order to
population. This can help you better represent them and address imbalanced
datasets
themselves
Stakeholders: People who invest time and resources into a project and are
interested in its
outcome
a database
Technical mindset: The ability to break things down into smaller steps or pieces
and work with
Data engineer- prepares and intergrates data from different sources for analytical
use
Data specialist-organizes or converts data for use in databases or software
systems
Part II:
Module 1:
In the ask step, we define the problem we're solving and make sure that we fully
understand stakeholder expectations.
Course1: Foundations
Will learn:
Skill build:
Course 2: ASK
Will learn:
How data analysts solve problem with data
The use analytics for making data-driven decisions
Spreadsheet formulas and functions
Dashboard basics, including an introduction to Tableau
Data reporting basics
Will build:
Summarizing data
Course 3: Prepare
will learn:
will build:
Course 4: Process
will learn:
Course 5: Analyze
will learn:
Course 6: Share
will learn:
Design thinking
How data analysts use visualizations to communicate about data
The benefits of Tableau for presenting data analysis findings
Data-driven storytelling
Dashboards and dashboard filters
Strategies for creating an effective data presentation
will build:
Creating visualizations and dashboards in Tableau
Addressing accessibility issues when communicating about data
Understanding the purpose of different business communication tools
Telling a data-driven story
Presenting to others about data
Answering questions about data
Course 7:
will learn:
Coding in R
Writing functions in R
Accessing data in R
Cleaning data in R
Generating data visualizations in R
Reporting on data analysis to stakeholders
Course 8: Capstone
will learn:
Step 1: Ask
It’s impossible to solve a problem if you don’t know what it is. There are some
things to consider:
Define the problem you’re trying to solve
Make sure you fully understand the stakeholder’s expectations
Focus on the actual problem and avoid any distractions(phien nhieu)
Collaborate(hop tac) with stakeholders and keep an open line of communication
Take a step back and see the whole situation in context
Questions to ask yourself in this step:
Now that I’ve identified the issues, how can I help the stakeholders resolve their
questions?
Step 2: Prepare
You will decide what data you need to collect in order to answer your questions and how to
organize it so that it is useful. You might use your business task to decide:
Step 3: Process
Clean data is the best data and you will need to clean up your data to get rid of any possible
errors, inaccuracies(thieu chinh xac), or inconsistencies(mau thuan). This might mean:
Step 4: Analyze
You will want to think analytically about your data. At this stage, you might sort and format
your data to make it easier to:
Perform calculations
Combine(ket hop) data from multiple sources
Create tables with your results
Questions to ask yourself in this step:
1. What story is my data telling me?
2. How will my data help me solve this problem?
3. Who needs my company’s product or service? What type of person is most likely to
use it?
Step 5: Share
Everyone shares their results differently so be sure to summarize your results with clear and
enticing visuals of your analysis using data via tools like graphs or dashboards. This is your
chance to show the stakeholders you have solved their problem and how you got there.
Sharing will certainly help your team:
Step 6: Act
Now it’s time to act on your data. You will take everything you have learned from your data
analysis and put it to use. This could mean providing your stakeholders with
recommendations based on your findings so they can make data-driven decisions.
Finding pattern(mau) : Data analysts use data to find patterns by using historical
data to understand what happened in the past and is therefore likely to happen
again.
Specific: Is the question specific? Does it address the problem? Does it have
context? Will it uncover a lot of the information you need?
Measurable: Will the question give you answers that you can measure?
Action-oriented: Will the answers provide information that helps you devise
some type of plan?
Relevant: Is the question about the particular problem you are trying to solve?
Time-bound : Are the answers relevant to the specific time being studied?
Cloud: A place to keep data online, rather than a computer hard drive
Data analysis process: The six phases of ask, prepare, process, analyze, share, and
act whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include plan,
capture, manage, analyze, archive, and destroy
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying themes, making
predictions, and spotting something unusual
Specific question: A question that is simple, significant, and focused on a single topic
or a few closely related ideas
Unit 13: Data trials and triumphs( những thử thách và thành công của dữ liệu)
Data- driven decisions( quyet dinh du tren du lieu): means using facts to
guide(huong dan) business strategy(chien luoc kinh doanh). This
approach(phuong phap) is limited(hanj che) by the quantity(so luong) and
quality(chat luong) of readily-available data.
This is include
Qualitative data tools: Focus groups; social media text analysis(phan tich van
ban xa hoi), in-person interview(phong van truc tiep).
Esay to design
Static.
Low maintenance
Can be confusing
Pivot table : A data summarization tool that is used in data processing. Pivot
tables are used to summarize, sort, reorganize, group, count, total or average
data stored in a database.
Metric: single, quantifiable type of data that can be used for measurement.
This metric goal is a measurable goal set by a company and evaluated using
metrics.
Dashboards are powerful visual tools that help you tell your data story. A dashboard is a tool that
monitors live, incoming data.
Created a dashboard:
Strategic: focuses on long term goals and strategies at the highest level of metrics
Because these dashboards contain information on a time scale of days, weeks, or months, they
can provide performance insight almost in real-time.
Analytical: consists of the datasets and the mathematics used in these sets
Small data:
Specific
Short time-period
Mathematical thinking is a powerful skill you can use to help you solve
problems and see new solutions.
Big Data:
Big decisions
Những thách thức và lợi ích
Dưới đây là một số thách thức bạn có thể gặp phải khi làm việc với dữ liệu lớn:
Rất nhiều tổ chức phải đối mặt với tình trạng quá tải dữ liệu và có quá nhiều thông tin
không quan trọng hoặc không liên quan.
Dữ liệu quan trọng có thể bị ẩn sâu bên dưới cùng với tất cả các dữ liệu không quan
trọng, khiến việc tìm kiếm và sử dụng trở nên khó khăn hơn. Điều này có thể dẫn đến
khung thời gian ra quyết định chậm hơn và kém hiệu quả hơn.
Dữ liệu bạn cần không phải lúc nào cũng dễ dàng truy cập được.
Các công cụ và giải pháp công nghệ hiện tại vẫn đang gặp khó khăn trong việc cung
cấp dữ liệu có thể đo lường và báo cáo được. Điều này có thể dẫn đến sai lệch thuật
toán không công bằng.
Có những lỗ hổng trong nhiều giải pháp kinh doanh dữ liệu lớn.
Bây giờ là phần tin tức tốt! Dưới đây là một số lợi ích đi kèm với dữ liệu lớn:
Khi một lượng lớn dữ liệu có thể được lưu trữ và phân tích, nó có thể giúp các công ty
xác định các cách kinh doanh hiệu quả hơn và tiết kiệm rất nhiều thời gian và tiền bạc.
Dữ liệu lớn giúp các tổ chức phát hiện xu hướng mua hàng của khách hàng và mức độ
hài lòng, từ đó có thể giúp họ tạo ra các sản phẩm và giải pháp mới khiến khách hàng
hài lòng.
Bằng cách phân tích dữ liệu lớn, các doanh nghiệp hiểu rõ hơn về điều kiện thị trường
hiện tại, điều này có thể giúp họ dẫn đầu trong cạnh tranh.
Như trong ví dụ về truyền thông xã hội trước đây của chúng tôi, dữ liệu lớn giúp các
công ty theo dõi sự hiện diện trực tuyến của họ—đặc biệt là phản hồi, cả tốt lẫn xấu,
từ khách hàng. Điều này cung cấp cho họ thông tin họ cần để cải thiện và bảo vệ
thương hiệu của mình.
Next: Get to work with spreadsheets
Spreadsheet tasks:
Plan for the users who will work within a spreadsheet by developing
organizational(to chuc) standards(tieeu chuan).
Archive any spreadsheet that you don’t use often, but might need to reference
later with built-in tools.
Destroy your spreadsheet when you are certain that you will never need it again.
- To start, go to google.com
Context in data analystics is the condition and circumstances that surround and
guve meaning to the data.
Borders: Lines that can be added around two or more cells on a spreadsheet
Filtering: The process of showing only the data that meets a specified criteria
while hiding the rest
Header: The first row in a spreadsheet that labels the type of data in each
column
MAX: A spreadsheet function that returns the largest numeric value from a
range of cells
MIN: A spreadsheet function that returns the smallest numeric value from a
range of cells
Problem domain: The area of analysis that encompasses every activity affecting
or affected by a problem
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
SUM: A spreadsheet function that adds the values of a selected range of cells
Big data: Large, complex datasets typically involving long periods of time,
which enable data analysts to address far-reaching business problems
Borders: Lines that can be added around two or more cells on a spreadsheet
Cloud: A place to keep data online, rather than a computer hard drive
COUNT: A spreadsheet function that counts the number of cells in a range that
meet a specific criteria
D
Dashboard: A tool that monitors live, incoming data
Data analysis process: The six phases of ask, prepare, process, analyze, share,
and act whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include
plan, capture, manage, analyze, archive, and destroy
Filtering: The process of showing only the data that meets a specified criteria
while hiding the rest
Header: The first row in a spreadsheet that labels the type of data in each
column
MAX: A spreadsheet function that returns the largest numeric value from a
range of cells
Metric goal: A measurable goal set by a company and evaluated using metrics
MIN: A spreadsheet function that returns the smallest numeric value from a
range of cells
Pivot table: A data summarization tool used to sort, reorganize, group, count,
total, or average data
Problem domain: The area of analysis that encompasses every activity affecting
or affected by a problem
Problem types: The various problems that data analysts encounter, including
categorizing things, discovering connections, finding patterns, identifying
themes, making predictions, and spotting something unusual
Return on investment (ROI): A formula that uses the metrics of investment and
profit to evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Small data: Small, specific data points typically involving a short period of time,
which are useful for making day-to-day decisions
SUM: A spreadsheet function that adds the values of a selected range of cells
Question:
Database
Interviews
Observations
Forms
Questionnaires
Survey
Cookies
First -party data: Data collected by an individual or group using their own
resources
Second-party data:
Data collected by a group direcly from its audience and then sold
Third -party data: data collected from outside sources who did not collect
it directly.
Discrete data
Continous data
Data that is measured and can have alomost any numeric value
Nominal data
A type of qualitative data that is categorized without a set order
Ordinal data
Internal data
External data
Structured data
Unstructred data