0% found this document useful (0 votes)
100 views

Chapter 1 Data Science Fundamentals

This document provides an introduction to the principles of data science course. The objectives are to understand fundamental concepts of data science, AI, machine learning, data acquisition and analysis, visualization, statistics, and predictive analysis. The course will provide hands-on experience using popular Python and Jupyter Notebook packages and libraries for data analysis on small and large datasets. Upon completing the course, career opportunities in fields like data engineering, data analysis, business intelligence, data science, and machine learning are explored.

Uploaded by

cumar aadan apdi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Chapter 1 Data Science Fundamentals

This document provides an introduction to the principles of data science course. The objectives are to understand fundamental concepts of data science, AI, machine learning, data acquisition and analysis, visualization, statistics, and predictive analysis. The course will provide hands-on experience using popular Python and Jupyter Notebook packages and libraries for data analysis on small and large datasets. Upon completing the course, career opportunities in fields like data engineering, data analysis, business intelligence, data science, and machine learning are explored.

Uploaded by

cumar aadan apdi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Welcome to Principle of Data Science

Objectives
Chapter 1 o Understanding fundamental concepts of
Data Science, AI, and Machine learning.
Data Acquisition and Analysis,
Introduction to Data Science Visualization, Statistics, and Predictive
Fundamentals Analysis.

Lecturer. Engr. Hanad Mohamud o Working Tools required for Data Analysis
Mohamed on small and large data sets

o Preparing with you, practical hands-on


experience using top-related data science
packages, and Libraries in Python and
Jupyter Notebook.
Exploring Career Opportunities Upon Course Completion

o Data Architect and Administrators


o Data Engineer
o Data Analyst
o Business Intelligence BI – Developer
o Data Quality Analyst
o Data Scientist
o Machine Learning Engineer
o Business IT Analyst
o Marketing Analyst
Data – the new oil
Data – the new oil

Data is the new oil. It’s valuable, but if unrefined, it


cannot really be used. It has to be changed into gas,
plastic, chemicals, etc.,
To create a valuable entity that drives profitable
activity, data must be broken down, and analyzed for
it to have value.”
These are the words of famous British mathematician
and data science entrepreneur Clive Humby, who
coined the phrase “data is the new oil” in 2006. In
2011, the Senior Vice President of Gartner, Peter
Sondergaard, took this concept further. “Information
is the oil of the 21st century and analytics is the
combustion engine,” he said. Today, “analytics is the
new oil.”
A Brief History of Data Science?

The term “Data Science” was created in the early 1960s to describe a new profession
that would support the understanding and interpretation of the large amounts of
data which was being amassed at the time. (At the time, there was no way of
predicting the truly massive amounts of data over the next fifty years.) Data Science
continues to evolve as a discipline using computer science and statistical
methodology to make useful predictions and gain insights in a wide range of fields.
While Data Science is used in areas such as astronomy and medicine, it is also used in
business to help make smarter decisions.
Statistics, and the use of statistical models, are deeply rooted within the field of Data
Science. Data Science started with statistics and has evolved to include
concepts/practices such as artificial intelligence, machine learning, and the Internet
of Things, to name a few.
Cont.?

As more and more data has become available, first by way of recorded shopping
behaviors and trends, businesses have been collecting and storing it in ever greater
amounts. With the growth of the Internet, the Internet of Things, and the
exponential growth of data volumes available to enterprises, there has been a flood
of new information or big data. Once the doors were opened by businesses seeking
to increase profits and drive better decision-making, the use of big data started being
applied to other fields, such as medicine, engineering, and social sciences.
Data All-Around

o Lots of data is being collected


and warehoused
o Web data, e-commerce
o Financial transactions, bank/credit transactions
o Online trading and purchasing
o Social Network
Types of Data

o Relational Data
(Tables/Transaction/Legacy Data)
o Unstructured Text Data (Web)
o Semi-structured Dat (XML)
o Streaming Data (images and
videos)
What do you know about Data Science?
What is Data Science?
What is Data Science?

Data science is the field of study that combines domain expertise, programming skills,
and knowledge of mathematics and statistics to extract meaningful insights from data.
Data science practitioners apply machine learning algorithms to numbers, text, images,
video, audio, and more to produce artificial intelligence (AI) systems to perform tasks
that ordinarily require human intelligence.

Data scientists require a background in these


fundamental related disciplines
What is Data Science?

• An area that manages, manipulates, extracts, and


interprets knowledge from a tremendous amount of
data
• Data Science (DS) is a multidisciplinary field of study
with goal of addressing the challenges in big data
• Data science principles apply to all data – big and
small
• Data Scientist
• The best Job of the 21st Century
• They find stories and extract knowledge. They are not
reporters
Why study data science?

o Data is one of the important assets of every organization or business to make


informed decisions - ‘the new oil’.
o Big data - people and devices generate data at a growing and unprecedented
speed, e.g. social media.
o Advances in computational capacity allow massive data storage and
processing, and little of that data is analyzed.
o Data scientists and analysts are influential in the decision-making of any
company or organization.
o Data analysts and scientists are in high demand for every domain and
discipline.
o New data science programs, companies, and institutes are being created every day.
And many more.
Data Science Workflow
How Data Science Powers Business Value?

The company details eight ways that data scientists can add value to business

1.Empowering management to make better decisions


2.Directing actions and defining goals based on trends
3.Challenging staff to adopt best practices and focus on issues that matter
4.Identifying business opportunities
5.Decision-making with quantifiable, data-driven evidence
6.Testing these decisions
7.Identifying and refining of target audiences
8.Recruiting the right talent
Artificial Intelligence

What is AI?

– Actions that are indistinguishable from a


human’s.

Alan Turing

– The science and engineering of making


intelligent machines, especially intelligent
computer programs.

»
Artificial Intelligence

Artificial intelligence (AI)


The study of computer systems that attempt to model
and apply the intelligence of the human mind
For example, writing a program to pick out objects in a
picture
o Requires
o Natural language
o Knowledge representation
o Automated reasoning – Predicted that by 2000, a machine might have a 30% chance
o Machine learning of detecting a layperson for 5 minutes.
– Suggested major components of AI: knowledge rep.,
o (vision, robotics) for full test reasoning, natural language processing, Machine learning
Machine Learning

o The study of computer algorithms that improve automatically through experience.


o It is the science of creating algorithms and programs able to learn on their own on
the basis of heterogeneous data sources such as systems, things, and humans.

The Discipline of Machine Learning, Tom M. Mitchell, 2006


Machine Learning

Machine Learning is continuously growing in


the IT world and gaining strength in different
business sectors. Although Machine Learning
is in the developing phase, it is popular
among all technologies. It is a field of study
that makes computers capable of
automatically learning and improving from
experience. Hence, Machine Learning
focuses on the strength of computer
programs with the help of collecting data
from various observations
Big Data

Big Data is any data that is expensive to manage and hard to


extract value from
• Volume
• The size of the data
• Velocity
• The latency of data processing relative to the growing
demand for interactivity
• Variety and Complexity
• the diversity of sources, formats, quality, structures.
Big Data

Definitions of 'Big Data' based on an online survey of 154 global executives

o Massive growth of transaction data, including data from


customers and the supply chain

o Explosion of new data sources (social media, mobile devices,


and machine-generated devices)

o Requirement to store and archieve data for regulatory and


compliance

o New technologies designed to address the volume, variaty,


and velocity challenges of Big Data

o Some other definition

Although there is no commonly agreed definition for big data, it can be said to mean large and
complex data, which cannot be handled with conventional data storage and processing tools
Big Data

A lot of things happen in an internet minute – millions of messages, e-mails and texts are
sent, scrolled and uploaded, and hundreds of thousands of hours of content are consumed
Open Data

https://medium.com/@mselvaraaju/open-data-value-chain-6bf628ac13ae.
Data Analytics

o Data analytics is the process of transforming raw data into meaningful insights for
better decision making, mostly using statistical processing and machine learning.

How many
customers did What are the Who are the Which customers
we loose last reasons for their likely customers should we target
churn ? to churn next? to retain ?
year ?

Descriptive Diagnostic Predictive Prescriptive


analytics+ analytics* analytics+ analytics*
What happened ? Why it happened ? What will happen? What should we do?

Past Future
* Some form of Intelligence involved
+ data or data with machine learning
Data science concepts in one picture

Read the article at: https://www.kdnuggets.com/2016/03/data-science-puzzle-explained.html/2


Activity 1 – identify potential DS use case

In your group or your own (10


minutes):

1. Think of one current national


(Somalia) problem where data
science can help us understand it
and/or find a solution to it?

2. What would be appropriate data for


this problem and how can we get it?

3. Make your notes and share with us


your thoughts about the above.
Data Science Applications

o Data exploration and visualization


o Financial analysis and fraud detection
o Customer behavior analysis, customer churn.
o News aggregation and summarization (Google)
o Image and Face recognition (Facebook, Google,
…)
o Customer behavior analysis
o Question answering (Siri, IBM Watson, …)
o Healthcare Analysis
o Driverless cars (Google)
o Election polling and predictions
Top 10 data science tools – last 10 years

Source : https://youtu.be/pKPaHH7hnv8
Why Python?

Python is a popular language for data science


because it is easy to learn, has a large and
active community, offers powerful libraries for
data analysis and visualization, and has
excellent machine-learning libraries
In terms of application areas, Data
scientists prefer Python for the following
modules:
o Data Analysis
o Data Visualization
o Machine Learning
o Deep Learning
o Image Processing
o Computer Vision
o Natural Language Processing (NLP)
Activity – 2 Install Python
Anaconda Distribution

 Anaconda distribution combines thousands of open source


data science libraries and packages in a single framework, e.g.,
– data analysis (e.g, Pandas)
– data visualization (e.g., Matplotlib)
– statistical analysis (e.g. Statsmodels)
– machine learning (e.g. Scikit-learn)
 It runs on all major OS platforms, e.g Windows, Mac OS, and
Linux, and includes the following:
– Standard Python
– Jupyter notebook (interactive coding via a browser)
– Spyder, a code editor

 Download from: https://www.anaconda.com/download/


Jupyter Notebook

o Easy to use environment


o Web-based
o Combines both text and code into
one
o Come with a great number of useful
packages

You might also like