Intro to Data Science - LVC1 With Markings
Intro to Data Science - LVC1 With Markings
Data Science
• Data science is the domain of study that deals with vast volumes of data using modern tools
and techniques to find unseen patterns, derive meaningful information, and make business
decisions.
• Data science uses complex machine learning algorithms to build predictive models.
• The data used for analysis can come from many different sources and presented in various
formats.
• Machine Learning
Machine learning is the backbone of data science.
Data Scientists need to have a solid grasp of ML in addition to basic knowledge of statistics.
• Modeling
Mathematical models enable you to make quick calculations and predictions based on what you
already know about the data.
Modeling is also a part of Machine Learning and involves identifying which algorithm is the most
suitable to solve a given problem and how to train these models.
• Statistics
Statistics are at the core of data science.
A sturdy handle on statistics can help you extract more intelligence and obtain more meaningful
results. DO NOT WRITE ANYTHING
HERE. LEAVE THIS SPACE FOR
WEBCAM
Prerequisites for Data Science
• Programming
Some level of programming is required to execute a successful data science project.
The most common programming languages are Python, and R.
Python is especially popular because it’s easy to learn, and it supports multiple
libraries for data science and ML.
• Databases
A capable data scientist needs to understand how databases work, how to manage them, and
how to extract data from them.
• Capture:
Data Acquisition, Data Entry, Signal Reception, Data Extraction.
This stage involves gathering raw structured and unstructured data.
• Maintain:
Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture
This stage covers taking the raw data and putting it in a form that can be used.
• Process:
Data Mining, Clustering/Classification, Data Modeling, Data Summarization.
Data scientists take the prepared data and examine its patterns, ranges, and biases to
determine how useful it will be in predictive analysis.
• Analyze:
Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis.
Here is the real meat of the lifecycle.
This stage involves performing the various analyses on the data.
• Communicate:
Data Reporting, Data Visualization, Business Intelligence, Decision Making.
In this final step, analysts prepare the analyses in easily readable forms such as charts, graphs,
and reports.
It deals with both structured as well It mainly deals only with structured
3. Data
as unstructured data. data.
7. Expertise It’s expertise is data scientist. It’s expertise is the business user.
It deals with the questions of what It deals with the question of what
8. Questions
will happen and what if. happened.
• Definition: Big Data refers to extremely large and complex datasets that cannot be easily
processed, managed, or analyzed using traditional data processing tools and methods.
• Volume, Velocity, Variety: Big Data is characterized by the three Vs—volume (large amount
of data), velocity (speed at which data is generated), and variety (diversity of data types and
sources).
• Processing Technologies: Technologies such as Hadoop, Spark, and NoSQL databases are
commonly used to store, process, and analyze big data.
• Applications: Big Data analytics is applied in areas like business intelligence, healthcare,
finance, and scientific research to extract valuable insights, patterns, and trends.
• Challenges: Challenges in dealing with Big Data include data security concerns, scalability
issues, data quality assurance, and the need for advanced analytics tools and skilled
professionals.
Introduction to Python
• Python is a popular programming language. It was created by Guido van Rossum, and
released in 1991.
• It is used for:
✔ web development (server-side),
✔ software development,
✔ mathematics,
✔ system scripting.
Why Python?
✔ Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
✔ Python has a simple syntax similar to the English language.
✔ Python has syntax that allows developers to write programs with fewer lines than some other
programming languages.
✔ Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
✔ Python can be treated in a procedural way, an object-oriented way or a functional way.
•An IDE enables programmers to combine the different aspects of writing a computer program.
•IDEs increase programmer productivity by introducing features like editing source code, building
executables, and debugging.
Variables
Variables are containers for storing data values.
Creating Variables
A variable is created the moment you first assign a value to it.
Variables do not need to be declared with any particular type, and can even change type after
they have been set.
Example
x=5
y = "John"
print(x)
print(y)
DO NOT WRITE ANYTHING
HERE. LEAVE THIS SPACE FOR
WEBCAM
Python Operators
• Example
• print(10 + 5)
✔ Arithmetic operators
✔ Assignment operators
✔ Comparison operators
✔ Logical operators
✔ Identity operators
✔ Membership operators
✔ Bitwise operators