0% found this document useful (0 votes)
5 views

CHAPTER 1

Uploaded by

rameshtharu076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

CHAPTER 1

Uploaded by

rameshtharu076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 85

CHAPTER 1

Introduction to Data
science
1.1 What is data science?
Data Science is the in-depth study and analysis of massive
datasets, aimed at extracting meaningful insights from both
structured and unstructured data. By employing the scientific
method, advanced technologies, and powerful algorithms, this
multidisciplinary field uncovers new and valuable information
from raw data. It leverages tools and techniques to process,
analyze, and derive innovative solutions.

Prerequisites for Data Science


Before embarking on your data science journey, a grasp of these
core concepts is essential:
1. Machine Learning (ML)
Machine Learning (ML) As the backbone of data science, ML
enables data scientists to develop intelligent systems.
Understanding ML algorithms and a strong foundation in
statistics is crucial.
2. Modeling
Mathematical modeling allows for fast calculations and
predictions based on what you already know about the data. This
includes identifying suitable algorithms and training models to
solve specific problems—key aspects of ML.
3. Statistics
Statistics are at the heart of data science. A sturdy handle on
statistics can help you extract more intelligence and obtain more
meaningful results.
4. Programming
Some level of programming is required to execute a successful
data science project. Languages like Python and R are widely
used, with Python’s simplicity and extensive libraries making it
particularly popular.
5. Databases
Databases Knowledge of database management and data
extraction processes is vital for navigating and handling vast
datasets.

Use of Data Science


Data Science serves as a powerful tool for identifying patterns
within seemingly unstructured or unrelated datasets, enabling
actionable conclusions and accurate predictions. Its
transformative potential is shaping industries in innovative ways:
1. Pattern Recognition: Data Science uncovers hidden
trends and relationships within large datasets, empowering
businesses and researchers to make data-driven decisions.
2. Tech Industry: Organizations utilize user data to devise
strategies that convert raw information into valuable,
profitable insights, driving innovation and operational
excellence.
3. Transportation: Advances in driverless car technology
exemplify the application of Data Science. By analyzing
training datasets—considering factors such as highway
speed limits and busy streets—algorithms are optimized to
enhance road safety and minimize accidents.
4. Healthcare and Genomics: Data Science has
revolutionized therapeutic approaches, enabling
personalized solutions through the analysis of genetic and
genomic data. This facilitates customized treatment options
and improves patient care.
Data Science continues to drive progress, finding meaningful
applications across various domains and creating impactful
solutions.

Applications of Data Science


Data Science permeates almost every industry, driving
innovation and operational excellence. Some key applications
include:

1. Healthcare
Revolutionizing medical equipment and aiding in disease
detection and treatment.
2. Gaming
Video and computer games are now being created with the help
of data science and that has taken the gaming experience to the
next level.
3. Image Recognition
Identifying patterns and objects within images, from social media
tagging to diagnostics.
4. Recommendation Systems
Netflix and Amazon give movie and product recommendations
based on what you like to watch, purchase, or browse on their
platforms.
5. Logistics
Data Science is used by logistics companies to optimize routes to
ensure faster delivery of products and increase operational
efficiency.
6. Fraud Detection
Banking and financial institutions use data science and related
algorithms to detect fraudulent transactions.
7. Internet Search
When we think of search, we immediately think of Google. Right?
However, there are other search engines, such as Yahoo,
Duckduckgo, Bing, AOL, Ask, and others, that employ data
science algorithms to offer the best results for our searched
query in a matter of seconds. Given that Google handles more
than 20 petabytes of data per day. Google would not be the
'Google' we know today if data science did not exist.
8. Speech recognition
Speech recognition is dominated by data science techniques.
Virtual assistants like Siri, Alexa, and Google Assistant rely on
speech-to-text data science technologies.
9. Targeted Advertising
Digital marketing utilizes user behavior to tailor advertisements,
resulting in higher engagement compared to traditional
marketing.
10. Airline Route Planning
Predicting delays and optimizing routes for efficiency and
profitability.

1.2 Differences between Data Science and Big Data


Aspect Data Science Big Data
Definition Data Science is an area. Big Data is a technique to
collect, maintain, and
process vast amounts of
information.
Focus Involves the collection, Focuses on extracting vital
processing, analysis, and and valuable information
utilization of data. It is more from large datasets.
conceptual.
Nature A field of study, like Computer A technique for tracking and
Science, Applied Statistics, or discovering trends in
Applied Mathematics. complex data sets.
Goal Building data-centric products Making data useful by
for businesses or ventures. extracting only relevant
information from vast
datasets.
Tools Primarily includes SAS, R, Commonly uses Hadoop,
Used Python, and other similar tools. Spark, Flink, and similar
technologies.
Relations A superset of Big Data that A subset of Data Science,
hip includes techniques like data focusing on mining activities
scrapping, cleaning, and as part of the data pipeline.
visualization.
Primary Mainly for scientific and Primarily for business
Use research purposes. optimization and customer
satisfaction.
Emphasis Broadly focuses on the science Primarily involves processes
and study of data. to handle and manage
voluminous datasets.

1.3 Business Intelligence vs. Data Science


The following table states the key differences between business
intelligence and data science:

Factor Business Intelligence (BI) Data Science


Concept A collection of processes, tools,
Involves mathematical and
and technologies aiding statistical models to
businesses in analyzing data. process data, discover
patterns, and predict
future actions.
Data Works mainly with structured Handles both structured
data. and unstructured data.
Flexibilit Data sources are planned before Data sources can be added
y visualization. or updated as needed.
Approac Combines statistical and visual Includes advanced
h methods for data analysis. techniques like graph
analysis, NLP, machine
learning, and neural
networks.
Expertis Designed for business users to Requires in-depth
e analyze raw business information knowledge of data analysis
Required with minimal technical expertise. and programming.
Complex Simpler to use and suitable for More complex, requiring
ity individual users to visualize data. advanced skills and
methodologies.
Tools Includes tools like MS Excel, Utilizes tools such as
Power BI, SAS BI, IBM Cognos, Python, Hadoop, Spark, R,
and MicroStrategy. TensorFlow, MATLAB, and
BigML.

1.4 Setting the Research Goal: From Data to Actionable


Insights
A successful data science project follows a systematic approach
to ensure meaningful outcomes. Below is a detailed breakdown
of the key steps:

1. Setting the Research Goal: The foundation of any data


science project lies in understanding the business or activity
it supports. Begin by defining the "what," "why," and "how"
of your project in a detailed project charter. Clearly outline
objectives, set a timeline, and establish measurable key
performance indicators (KPIs). This essential first step
ensures your data initiative has direction and purpose.

2. Retrieving Data: Identifying and accessing the necessary


data is the next critical step. Combine data from as many
sources as possible, whether internal (within the company)
or external (third-party). Methods to gather usable data
include:
o Connecting to databases
o Using APIs
o Exploring open data repositories

3. Data Preparation: Data preparation is often the most


time-consuming phase, taking up to 80% of the project
timeline. This step includes:
a. Identifying and correcting data errors
b. Enhancing the dataset with information from other
sources
c. Transforming raw data into formats suitable for
modeling

4. Data Exploration: Once the data is clean, it’s time to dive


deeper using descriptive statistics and visual techniques.
Manipulating the data reveals hidden patterns and valuable
insights. Examples of exploratory tasks include:
a. Creating time-based features: Extracting date
components (e.g., month, hour, weekday) or
calculating differences between date columns.
b. Data enrichment: Merging datasets or referencing
additional data to supplement existing information.

5. Presentation and Automation: Presenting the results to


stakeholders is a crucial step. Leverage visualizations to
effectively communicate findings, especially when handling
large volumes of data. This phase may also involve
industrializing or automating analysis processes for
repeated use and integration with other tools. A structured,
methodical approach enhances the project's chances of
success while optimizing costs.
CHAPTER 2
Data science Process
2.1 Six steps of the data science process

Fig. 2.1. The six steps of the data science process


1. The first step of this process is setting a research goal. The
main purpose here is making sure all the stakeholders understand
the what, how, and why of the project. In every serious project
this will result in a project charter.
2. The second phase is data retrieval. You want to have data
available for analysis, so this step includes finding suitable data
and getting access to the data from the data owner. The result is
data in its raw form, which probably needs polishing and
transformation before it becomes usable.
3. Now that you have the raw data, it’s time to prepare it. This
includes transforming the data from a raw form into data that’s
directly usable in your models. To achieve this, you’ll detect and
correct different kinds of errors in the data, combine data from
different data sources, and transform it. If you have successfully
completed this step, you can progress to data visualization and
modeling.
4. The fourth step is data exploration. The goal of this step is to
gain a deep understanding of the data. You’ll look for patterns,
correlations, and deviations based on visual and descriptive
techniques. The insights you gain from this phase will enable you
to start modeling.
5. Finally, we get to: model building It is now that you attempt to
gain the insights or make the predictions stated in your project
charter. If you’ve done this phase right, you’re almost done.
6. The last step of the data science model is presenting your
results and automating the analysis if needed. One project goal is
to change a process and/or make better decisions. This is where
you can shine in your influencer role. The importance of this step
is more apparent in projects on a strategic and tactical level.
Specific projects require you to perform the business process over
and over again, so automating the project will save time.
Following these six steps pays off in terms of a higher project
success ratio and increased impact of research results. This
process ensures you have a well-defined research plan, a good
understanding of the business question, and clear deliverables
before you even start looking at data. Another benefit of following
a structured approach is that you work more in prototype mode
while you search for the best model. When building a prototype,
you’ll probably try multiple models and won’t focus heavily on
issues such as program speed or writing code against standards.
2.1. Step 1: Defining research goals and creating a project
charter:
The first step of this process is setting a research goal. In every
serious project this will result in a project charter.
A project starts by understanding the what, the why, and the how
of your project (figure 1.2). three questions (what, why, how) is
the goal of the first phase, so that everybody knows what to do
and can agree on the best course of action.

Fig. 2.1 Step 1: Setting the research goal


The outcome should be a clear research goal, a good
understanding of the context, well-defined deliverables, and a
plan of action with a timetable. This information is then best
placed in then a project charter.
Create a project charter
All this information is best collected in a project charter. For any
significant project this would be mandatory.
A project charter requires teamwork, and your input covers at
least the following:
 A clear research goal
 The project mission and context
 How you’re going to perform your analysis
 What resources you expect to use
 Proof that it’s an achievable project, or proof of concepts
 Deliverables and a measure of success
 A timeline

This information to make an estimation of the project costs and


the data and people required for your project to become a
success.
2.2. Step 2: Retrieving data
The next step in data science is to retrieve the required data
(figure 2.3). Sometimes you need to go into the field and design a
data collection process yourself, but most of the time you won’t
be involved in this step. Many companies will have already
collected and stored the data for you, and what they don’t have
can often be bought from third parties.

Fig. 2.3 Step 2: Retrieving data

Data can be stored in many forms, ranging from simple text files
to tables in a database. The objective now is acquiring all the data
you need. This may be difficult, and even if you succeed, data is
often like a diamond in the rough: it needs polishing to be of any
use to you.
2.3. Step 3: Data Preparation
Cleansing, integrating, and transforming data
The data received from the data retrieval phase. Your task now is
to sanitize and prepare it for use in the modeling and reporting
phase. Data model needs the data in a specific format, so data
transformation will always come into play. It’s a good habit to
correct data errors as early on in the process as possible. Figure

2.4. shows the most common actions to take during the data
cleansing, integration, and transformation phase.

Fig. 2.4 Step 3: Data preparation


Cleansing data
Data cleansing is a sub process of the data science process that
focuses on removing errors in your data so your data becomes a
true and consistent representation of the processes it originates
from.

Correct errors as early as possible


A good practice is to mediate data errors as early as possible in
the data collection chain and to fix as little as possible inside your
program while fixing the origin of the problem. Retrieving data is
a difficult task. The data collection process is error-prone.
Fixing the data as soon as it’s captured is nice in a perfect world.
If you can’t correct the data at the source, you’ll need to handle it
inside your code.
Integrating (Combining data from different data sources)
Your data comes from several different places, and in this sub
step we focus on integrating these different sources. Data varies
in size, type, and structure, ranging from databases and Excel
files to text documents. We focus on data in table structures in
this chapter for the sake of brevity.
 The different ways of combining data
You can perform two operations to combine information from
different data sets. The first operation is joining: enriching an
observation from one table with information from another table.
The second operation is appending or stacking: adding the
observations of one table to those of another table.
 you combine data, you have the option to create a new physical
table or a virtual table by creating a view. The advantage of a
view is that is doesn’t consume more disk space. “

 Joining tables
Joining tables allows you to combine the information of one
observation found in one table with the information that you find
in another table. The focus is on enriching a single observation.
Let’s say that the first table contains information about the
purchases of a customer and the other table contains information
about the address where your customer lives. Joining the tables
allows you to combine the information, as shown in figure 2.7.

Client Item Month


Ram Copy January
Sita Book January
Client Address
Ram Kohalpur
Sita Nepalgu
nj

Client Item Month Address


Ram Copy January Kohalpur
Sita Book January Nepalgunj
Fig.2.7 Joining two tables on the Item and Region keys

To join tables, you use variables that represent the same object in
both tables, such as a date, a country name, or a Social Security
number. These common fields are known as keys.
 Appending tables
Appending or stacking tables is effectively adding observations
from one table to another table. Figure 2.8 shows an example of
appending tables. One table contains the observations from the
month January and the second table contains observations from
the month March. The result of appending these tables is a larger
one with the observations from January as well as March. The
equivalent operation in set theory would be the union, and this is
also the command in SQL, the common language of relational
databases. Other set operators are also used in data science,
such as set difference and intersection.
Figure 2.8 Appending data from tables is a common operation but
requires an equal structure in
Client Item Month
the tables being appended.
Ram Pen March
Sita Pencil March
Client Item Month
Ram Copy January
Sita Book January

Client Item Month


Ram Copy January
Sita Book January
Ram Pen March
Site Pencil march

Fig. 2.8 Appending data


Transforming Data
Certain models require their data to be in a certain shape. Now
that you’ve cleansed and integrated the data, this is the next task
you’ll perform: transforming your data so it takes a suitable form
for data modeling.
2.4. Step 4: Exploratory data analysis
During exploratory data analysis you take a deep dive into the
data (figure 2.14). Information becomes much easier to grasp
when shown in a picture, therefore you mainly use graphical
(visualization) techniques to gain an understanding of your data
and the interactions between variables. This phase is about
exploring data.

Fig. 2.14. Step 4: Data exploration


The visualization techniques you use in this phase range from
simple line graphs or histograms, as shown in figure 2.9, to more
complex diagrams such as Sankey and network graphs.
Sometimes it’s useful to compose a composite graph from simple
graphs to get even more insight into the data. Figure 2.15. From
top to bottom, a bar chart, a line plot, and a distribution are some
of the graphs used in exploratory analysis. Such as
Fig. 2.9. a bar chart, a line plot, and a distribution
2.5. Step 5: Build the models
With clean data in place and a good understanding of the content,
you’re ready to build models with the goal of making better
predictions, classifying objects, or gaining an understanding of
the system that you’re modeling. This phase is much more
focused than the exploratory analysis step, because you know
what you’re looking for and what you want the outcome to be.
Figure 2.10 shows the components of model building.

Fig. 2.10. Step 5: Data modeling

The techniques you’ll use now are borrowed from the field of
machine learning, data mining, and/or statistics.
Building a model is an iterative process. The way you build your
model depends on whether you go with classic statistics or the
somewhat more recent machine learning, and the type of
technique you want to use. Either way, most models consist of the
following main steps:
1. Selection of a modeling technique and variables to enter in the
model
2. Execution of the model
3. Diagnosis and model comparison
Model and variable selection
You’ll need to select the variables you want to include in your
model and a modeling technique. Your findings from the
exploratory analysis should already given a fair idea of what
variables will help you construct a good model. Many modeling
techniques are available, and choosing the right model for a
problem requires judgment on your part.
Model execution
Once you’ve chosen a model you’ll need to implement it in code.
Most programming languages, such as Python, already have
libraries such as Stats Models or Scikit-learn. These packages use
several of the most popular techniques. Coding a model is a
nontrivial task in most cases, so having these libraries available
can speed up the process.
Model diagnostics and model comparison
You’ll be building multiple models from which you then choose
the best one based on multiple criteria. Working with a holdout
sample helps you pick the best-performing model. A holdout
sample is a part of the data you leave out of the model building
so it can be used to evaluate the model afterward. The principle
here is simple: the model should work on unseen data. You use
only a fraction of your data to estimate the model and the other
part, the holdout sample, is kept out of the equation. The model is
then unleashed on the unseen data and error measures are
calculated to evaluate it.
Figure 2.11. A holdout sample helps you compare models and
ensures that you can generalize results to data that the model
has not yet seen.
Fig. 2.11. A holdout sample
Many models make strong assumptions, such as independence of
the inputs, and you have to verify that these assumptions are
indeed met. This is called model diagnostics.
This section gave a short introduction to the steps required to
build a valid model. Once you have a working model, you’re ready
to go to the last step.
2.6. Step 6: Presenting findings and building applications
on top of them
After you’ve successfully analyzed the data and built a well-
performing model, you’re ready to present your findings to the
world figure 2.12. This is an exciting part; all your hours of hard
work have paid off and you can explain what you found to the
stakeholders.

Fig. 2.12. Presentation and automation


Sometimes people get so excited about your work that you’ll
need to repeat it over and over again because they value the
predictions of your models or the insights that you produced. For
this reason, you need to automate your models. This doesn’t
always mean that you have to redo all of your analysis all the
time. Sometimes it’s sufficient that you implement only the model
scoring; other times you might build an application that
automatically updates reports, Excel spreadsheets, or PowerPoint
presentations.
CHAPTER 3
Environmental Setup for
Using Data Science
Libraries in Python

Introduction
In this chapter, we will explore the data science library in
Python. However, before learning about various data
science libraries, it is quite important to create an
environmental setup for installing and using these data
science libraries in Python. Setting up the environment
for utilizing data science libraries like NumPy, SciPy,
Matplotlib, Pandas, and others in Python ensures
effective data analysis and modeling workflows. This
entails managing dependencies, version control, and
package management to guarantee project compatibility
and reproducibility. By creating isolated environments,
potential conflicts between different library versions are
mitigated, facilitating seamless collaboration and
reproducibility of results. So, let us get into the intricate
details of Python installation and Integrated
Development Environments (IDEs) like VSCode and
Jupyter Notebook for writing and executing the Python
code.

Structure
In this chapter, we will discuss the following topics:
 Introduction to Python
 Setup installation in Windows for Jupyter Notebook
 Insights of Jupyter Notebook
 Demo program using Jupyter Notebook
 Introduction to Data Science Libraries in Python

Objectives
This chapter aims to provide a comprehensive guide on
creating an efficient Python programming environment,
highlighting the importance of an Integrated Development
Environment (IDE). It begins by detailing the step-by-step
installation of Jupyter Notebook on a Windows system, offering
guidance on setup and functionality. Next, it covers the
installation and utilization of Visual Studio Code (VSCode) for
Python programming. Finally, the chapter introduces key data
science libraries in Python, equipping learners with essential
tools for their programming endeavors.

Introduction to python
Before diving into Python, it is crucial to set up a proper
development environment. This chapter will walk you through
the process of installing Python on your system, ensuring you
are ready to write and execute Python code.
Python is an open-source, high-level programming language
known for its simplicity, readability, and versatility. It supports
multiple programming paradigms, making it an excellent
choice for beginners and experienced developers alike.
Whether you are exploring web development, data analysis,
machine learning, or scientific computing, Python provides
powerful tools to support your journey.
The first step in using Python is installing the Python
interpreter, which executes Python code and grants access to
a vast ecosystem of libraries and tools. This chapter will guide
you through the installation process on Windows, macOS, and
Linux, ensuring a smooth setup for your development
environment.

The steps for installing Python on Windows are mentioned


below:
1. Visit the Python website:
https://www.python.org/downloads/ to
download the latest version of Python. Here, we
have downloaded the latest Python version 3.13.2.
2. We use 64-bit Windows OS and will click or execute
the Python installer file.
3. The following window will pop out, and the user can
choose Install Now or Customize installation. Here,
we will be selecting the Install Now option. Also,
remember initially, there are two unchecked
checkboxes which are:

a. Use admin privileges when installing py.exe.

b. Add python.exe to PATH.

4. Use admin privileges when installing py.exe. This


option grants the Python installer administrative
privileges during installation. It is essential for
installing Python in system- wide directories or when
installing packages that demand elevated
permissions. Enabling this ensures a smooth
installation process without encountering permission-
related issues. This option can also be used to change
the Python installation folder.
5. Add python.exe to PATH. By selecting this option, the
Python installer adds the directory containing
python.exe to the PATH environment variable. This
inclusion allows easy access to Python commands
from any command prompt or terminal window
without specifying the full path to the Python
executable. It streamlines the usage of Python across
the system, enhancing convenience for running
Python scripts and commands.
The steps for installing Python on macOS are mentioned
below:
1. Visit the python website go to:
https://www.python.org/downloads/ to
download the latest version of Python for macOS.
2. Run the installer: Open the downloaded .pkg file
and follow the instructions in the installation wizard.
3. Verify the installation: Open a terminal and type
python3 -

-version to confirm that Python has been installed


correctly.
The steps for installing Python on Linux are mentioned
below:
1. Use the package manager: Most Linux
distributions come with Python pre-installed.
However, if you need to install or upgrade Python,
you can use the package manager. For example, on
Ubuntu, you can use the following commands:
sudo apt update
sudo apt install python3
2. Verify the installation: Open a terminal and type
python3 --version to confirm that Python has been
installed correctly.

We have used Python version 3.11.4 in this book, which


means the micro release of Python 3.11 may contain
bug fixes and minor enhancements compared to earlier
micro releases within the Python 3.11 series. But to show
you all the installation steps, we will be showing you
Python version 3.12.2, which is the latest version as of 6
February 2024. By the end of this chapter, readers will
have a fully functional Python environment set up on their
system, ready to embark on their programming journey.
Whether you are a student, professional, or hobbyist,
Python offers a welcoming and intuitive platform for
turning your ideas into reality.
This revision includes installation instructions for
Windows, macOS, and Linux users, ensuring that the
content is comprehensive and helpful for all readers.
In most cases, it is recommended that you leave these
checkboxes checked to ensure a smooth installation
experience and convenient Python usage on your
system. Refer to the following figure for a better
understanding:
Figure 1.1: Python installation on running the Python executable
Refer to the following steps for a better understanding:
1. We will be selecting the Install Now option as a
recommended option where the default installation
path will be C:\Users\[user]\AppData\Local\Programs\Python\
Python[version] for the existing user, which will include
the IDLE, pip, and documentation and thus create
shortcuts and file associations as mentioned in Figure
1.1.
2. Once the installation has been done, the following
image will pop out, as displayed in Figure 1.2, where
the user can close the image.
Figure 1.2: Image depicting the successful installation of Python
version
Once Python is successfully installed, the user can view the
Python version using IDLE (Python 3.12 64-bit), which is
installed and can be searched in Windows apps. Pythonʼs
built-in IDE will be opened when IDLE is run. Another way
is that the user can navigate to the directory where
Python is installed on the system and double-click
python.exe.

Setup installation in Windows for Jupyter


Notebook
Now, we shall view the steps of installing Anaconda on
Windows OS:
1. First, we will download the Anaconda installer by
visiting the website
https://www.anaconda.com/download. Here,
we have downloaded Anaconda3 2024.02-1.
2. Once downloaded, we will run this installer file, and
we are just getting started with the pop-up of the
following image file, as shown in Figure 1.3:
Figure 1.3: Image depicting Step-1 of getting started of
Anaconda3

3. On clicking the Next button of Figure 1.3, read the


License Agreement as shown in Figure 1.4:
Figure 1.4: Image depicting Step-2 of license agreement of
Anaconda3

4. On clicking Next (refer to Figure 1.4), we will be


prompted with an installation type of choosing
either Just Me (Recommended) or All users
(which requires admin privileges), as shown in
Figure 1.5. Just me option provides the current user
account and All Users option provides installation
for all the user accounts which require privileges
(refer to Figure 1.4), we will be prompted to choose
an installation type: either just me (recommended)
or all Windows administrator.
Figure 1.5: Image depicting Step 3 of Installation Type of
Anaconda3

5. Then, clicking the Next button (shown in Figure 1.5)


will prompt us to select a destination folder to install
Anaconda. Install Anaconda to a directory path that
avoids spaces or unicode characters, as shown in
Figure 1.6:
Figure 1.6: Image depicting Step-4 of Destination folder selection of
Anaconda3

6. Then, when clicking the Next button in Figure 1.6, we


will see the advanced installation option, as shown
in Figure 1.7:
Figure 1.7: Image depicting Step-5 of Advanced Installation options of
Anaconda3
This Figure 1.7 provides us with three checkbox options
to select: Create start menu
shortcuts (supported
packages only): This option creates
shortcuts in the Start menu for supported
Anaconda packages, making it easier to
access Anaconda Navigator, Anaconda
Prompt, and other tools.
Use these tools frequently for your data science
work.
Register Anaconda3 as the system
Python 3.11: Enabling this sets Anaconda
Python as the default interpreter for the
system, meaning references to Python will
point to the Anaconda installation. It is helpful
if you prefer using Anaconda Python for all
your Python-related tasks. However, if you
have other Python installations or prefer
managing Python environments manually,
you may choose to keep this unchecked.
Clear the package cache upon
completion: Enabling this option clears the
package cache after installation, freeing up
disk space. This cache stores downloaded
packages and files used during installation.
While clearing it saves space, leaving it
unchecked retains cached files, which can
reduce download times for future updates or
reinstalls of packages.
We are checking all the three following options:
7. Clicking the Install button in Figure 1.7 will initiate
the installation process, as shown in Figure 1.8. The
user may click the Show Details button to observe
the packages installed on Anaconda3.

Figure 1.8: Image depicting Step-6 of packages installation of


Anaconda3

8. When the installation is complete, we will be


prompted with the following image where they can
click the Next button,
as shown in Figure 1.9:

Figure 1.9: Image depicting complete installation of packages of


Anaconda3

9. Then, click the Next button of Figure 1.10, which


displays the cloud notebook service of Anaconda:
Figure 1.10: Image depicting cloud notebook service of
Anaconda3

10. The Anaconda Distribution Tutorial offers a


comprehensive guide to installing and utilizing
Anaconda, a widely used platform for data science
and machine learning. This tutorial covers essential
aspects, including installation on various operating
systems, setup of Python environments, package
management with conda and pip, integration with
popular IDEs like Jupyter Notebook and Spyder,
practical usage examples for data analysis and
machine learning, and access to community
resources for support and learning. Getting Started
with Anaconda provides a beginner-friendly
introduction, focusing on installation, setup, and basic
usage to empower users to begin their data science
journey effectively with Anacondaʼs robust tools and
libraries. Finally, check the following two checkboxes,
as shown in Figure 1.11:
Figure 1.11: Anaconda distribution tutorial and getting started guide
You may check/uncheck it as you wish. We are getting
started with Anaconda and clicking the finish button at
the end, thus ensuring the complete installation of
Anaconda3 2024.02-1 (64- bit).
In Figure 1.12, some pre-installed tools available on Anaconda
Navigator can be viewed, and the user can Launch a
Jupyter Notebook on their default browser by clicking it, as
shown below:

Figure 1.12: Image depicting the Launch of Jupyter Notebook on


Anaconda Navigator

Note: If any reader is interested in installing


Anaconda on macOS / Linux, then you may
refer to the following two links as references:
https://docs.anaconda.com/anaconda/install/mac-
os/

https://docs.anaconda.com/anaconda/install/linux/

Insights of Jupyter Notebook


In the previous topic, we demonstrated how to launch a
Jupyter Notebook using the Anaconda Navigator. Now, we
will use the Anaconda prompt to launch by typing Jupyter
notebook, as shown in Figure 1.13:

Figure 1.13: Image depicting the Launch of Jupyter Notebook using


anaconda prompt
Using a web application, Jupyter Notebook, we can run
the code, embed explanatory text, and visualize all
under a single umbrella. On launching a web application,
Jupyter, the user will be prompted, as depicted in Figure
1.14:

Figure 1.14: Image depicting Jupyter Notebook web app

A new folder called Demo1 is created on the Desktop, as


shown in
Figure 1.15:
Figure 1.15: Image depicting Desktop/Demo1 folder in Jupyter
notebook
In Figure 1.15, we can see that a Jupyter notebook file
already has the extension Untitled1.ipynb. In the web app
shown above, the user can create a new Notebook, File,
Folder, and so on.
Suppose a new Notebook file has to be created, like
Untitled.ipynb.
Then, the user will click Notebook and be prompted to
Select the kernel, as shown in Figure 1.16:

Figure 1.16: Image depicting creation of new Notebook file


Desktop/Demo1 folder
In Jupyter Notebook, when we click New to create a new
notebook, we are prompted to select a kernel. A kernel,
in this context, refers to the computational engine that
executes the code within the notebook. The prompt
Python3 (ipykernel) indicates that you are selecting the
Python 3 kernel managed by the ipykernel package.
The meaning of each part is given below:
Python3: This specifies the
kernelʼs programming
language. In this case, it is Python 3,
indicating that the
notebook will execute Python code.
(ipykernel): This indicates the specific
implementation of the Python kernel.
ipykernel is the package responsible for
providing Jupyter with IPython-compatible
kernels. It enables the notebook to execute
Python code and manage interactions with
the Python interpreter.
By selecting Python3 (ipykernel) as the kernel when
creating a new notebook, we specify that the notebook
will use the Python 3 kernel provided by the ipykernel
package to execute Python code within the notebook. We
are checking the option Always start
the preferred kernel. The expanded image is shown in
Figure 1.17:

Figure 1.17: Image depicting Kernel selection for executing Python


code within notebook

Demo program using Jupyter Notebook


Observe Figure 1.18, with annotation in our current topic
demo program at Jupyter notebook. A Jupyter notebook
comprises several integral components:
Cell: The fundamental units of a Jupyter
notebook are cells, which come in two
primary types:
Code cell: These contain executable code
written in Python, R, or Julia. Code cells
execute independently, with their outputs
(such as text, plots, or errors) displayed
below each cell. In Figure 1.18, we have
discussed the code cell.
Markdown cells: These contain formatted
text written in Markdown syntax, enabling
the creation of structured documentation
with features like headings, lists, links,
and images. So, if we write 4 + 4 and then
press Shift + Enter, the literal visual output
will be 4 + 4, as shown in Figure 1.19.
Kernel: As the computational engine, the
kernel executes code within the notebookʼs
cells. Each notebook is linked to a specific
kernel, which dictates the programming
language
and execution environment. Additionally, the
kernel manages the notebookʼs state,
including variable values and imported
modules, as shown in Figure 1.18.
Tool bar: It offers swift access to various
notebook interactions, including saving,
adding cells, running cells, and modifying cell
types, as shown in Figure 1.18. The user may
click the icons in the toolbar according to
their needs.
Menu bar: It hosts dropdown menus and
provides supplemental functionalities for
managing the notebook, such as cell
insertion, type adjustments, and kernel
configuration, as shown in Figure 1.18. The
user may explore multiple options present in
the Menu bar, such as File, Edit, View, Insert,
and so on.
Output area: Situated beneath code cells,
the output area showcases execution results,
encompassing printed output, error
messages, and graphical plots generated by
the code, as shown in Figure 1.18.
These components synergize to establish an
interactive and adaptable environment for data
analysis, visualization, and documentation within
Jupyter notebooks.
Apart from these, there are two states or modes of a
notebook, viz Edit mode and Command mode:
Edit mode: When we are in Edit mode,
signified by a green cell border, we can
directly modify the content of a cell. This
mode allows us to type code or text within
the cell and perform editing actions like
copying, cutting, and pasting. To enter Edit
mode, simply click inside a cell or press Enter
when a cell is selected.
Command mode: Indicated by a blue cell
border, it enables users to execute
operations on cells without directly editing
their content. In this mode, the user can
perform tasks like moving, deleting, and
creating cells by using keyboard shortcuts. To
enter Command mode, press Esc or click
outside a cell after editing its content.
Refer to the following figure for a better understanding:

Figure 1.18: Image depicting Demo program on Jupyter notebook


Here, we have displayed a simple demo program using
Jupyter notebook. In the code cell, we are simply
initializing the values of two variables, multiplying these
two variables, and storing the result in mynum3.
mynum = 2
1
mynum = 3
2
mynum = mynum1 *
3 mynum2
In another code cell, we are only writing mynum3 and
getting the output as 6 in the output Area. Also, notice
there are 2 prompts namely In[] and Out[] which indicate
input and output, respectively, for code cells.
In[]:
This prompt denotes an input cell where
the user can write and execute code. The
number inside the square brackets indicates
the order in which the code cell was executed
within the notebook. For example, In[2]
indicates that this is the second code cell
executed in the notebook.
Out[]:This prompt denotes an output cell that
displays the result of the code execution
from the corresponding input
cell (In[]). The number inside the square
brackets corresponds to the input cellʼs
number (In[]) from which the output was
generated. For example, Figure 1.18 Out[2]
indicates that this is the output whose value
is 6 corresponding to the input cell In[2].
These prompts help to keep track of the
execution order and the associated input and
output for code cells in a Jupyter Notebook.
They provide a clear indication of the
codeʼs execution flow and the results
generated at each
step.
In Jupyter Notebook, Shift + Enter and Ctrl + Enter are both
keyboard shortcuts used to execute code cells, but they
operate differently:
Shift + Enter: This shortcut executes the
current cell and
moves the focus to the next cell. If there is no
subsequent cell, a new one is created below.
Itʼs commonly used when you want to
execute a cell and proceed to the next task
or
cell in the notebook.
Ctrl + Enter: Pressing Ctrl + Enter
executes the current
cell but keeps the focus within the same cell
after execution. Itʼs useful when you want to
execute a cell without advancing to the next
one, allowing you to stay
focused on the current cell for further editing
or analysis.
In essence, Shift + Enter executes and
advances to the next cell, while Ctrl + Enter
executes without moving to the next cell,
enabling you to control the flow of execution
based on your workflow in Jupyter Notebook.
We also saw usage of pwd in In[3] which indicates the
current working directory in Jupyter notebook. So, the file
Untitled.ipynb is saved in the following folder:
'C:\\Users\\SAURABH\\Desktop\\Demo_Jupyter'.
Now, we shall explain an example of Markdown, Raw
NBConvert, and Heading when the selection mode is
changed for a cell in
Jupyter notebook:

Figure 1.19: Image depicting Markdown Cell usage on Jupyter


notebook

Raw NBConvert Cell: In Jupyter Notebook, a


Raw NBConvert cell allows you to insert
unprocessed content, such as HTML, LaTeX,
or Markdown, which remains untouched
during the conversion process. For instance,
including a Raw NBConvert cell with custom
CSS styles ensures specific formatting
remains intact when exporting the notebook
to HTML or other formats, enhancing
document presentation without altering its
content within the notebook, as shown below:
Figure 1.20: Image depicting Raw NBConvert Cell usage on Jupyter
notebook

Heading cell: Heading cells in Jupyter


Notebook enable the creation of structured
section titles to organize content effectively.
Using Markdown syntax, preceded by #
symbols, users designate various heading
levels, aiding in content hierarchy and
navigation. For instance, employing #
Introduction for top-level sections and ##
Background for subtopics enhances
readability and comprehension, particularly
when navigating extensive notebooks or
generating navigational aids like table of
contents, as shown below:

Figure 1.21: Image depicting Raw usage on Jupyter notebook


The steps on how to setup installation in Windows for
VSCode is mentioned below:
1. First visit the
website
https://code.visualstudio.com/download and
download the installer file for windows using any
browser of your choice. There are other choices to
download for ubuntu or macOS. Here, we are
explaining the installation setup procedure in
Windows.
2. Run the installer file and will be prompted to accept
the agreement option, that is, VSCode terms and
conditions. We need to accept the radio-button
option I accept the agreement as shown in
Figure 1.22:
Figure 1.22: Image depicting Raw Terms and Conditions page
during Microsoft VS Code installation
3. Select the directory where we want to run Visual
Studio Code. Weʼll be prompted to browse for the
location. Afterward, click on the Next button to
proceed. Here, the default selected
path is C:\Users\6146c\
AppData\Local\Programs\Microsoft VS Code as shown in Figure
1.23 as follows:
Figure 1.23: Image depicting Destination folder selection during
Microsoft VS Code installation
4. Then, we will be prompted to select additional tasks
as shown in Figure 1.24 as follows. Just check the
options as displayed in the image and click the
Next button.
Figure 1.24: Image depicting Additional Tasks selection during
Microsoft VS Code installation
5. We will be prompted to start the installation setup as
shown in Figure 1.25 as follows. Click on the Install
button option and the installation procedure will
begin:
Figure 1.25: Image depicting Installation prompt during
Microsoft VS Code installation
6. The installation has been started and will take some
time to install. During the installation, we might
encounter the image shown in Figure 1.26 as follows:
Figure 1.26: Image depicting Installation action during Microsoft
VS Code installation
7. When the installation of VS Code is completed,
check the Launch Visual Studio Code and then
click Finish as shown in Figure 1.27:
Figure 1.27: Image depicting completion of Microsoft VS Code
installation
8. In the Visual Studio Code window, we have the
option to create a new file and select the preferred
programming language to kickstart our coding
journey! Then type Ctrl + Shift + X and then type
Python under EXTENSIONS as shown in Figure
1.28 as follows. Click Install. We are installing the
Python extension for Visual Studio Code:
Figure 1.28: Image depicting Python extension installation for VS
Code

Note: If any reader is interested in installing


VSCode in macOS/Linux, then you may refer to
the following two links as references:

https://code.visualstudio.com/docs/setup/mac

https://code.visualstudio.com/docs/setup/linux

Demo program using VSCode


Now, we shall see a simple demo program to print HelloWorld
which we have learned first before start learning any
programming language. Click the EXPLORER Icon which
is below the VS Code Icon at the top left. To explain to you
all the basic demo program I have installed VS Code on
my PC desktop and created a Demo folder under Downloads
by clicking New Folder icon option such that my present
working directory is C:\Users\6146c\Downloads\Demo1. Now,
press Ctrl + `, we shall be viewing the following tabs:
Problems: This refers to the panel where we
can view and manage diagnostic messages,
warnings, and errors reported by our code or
extensions. It helps us to identify and resolve
issues in our codebase.
Output: The Output panel displays the output
of tasks, extensions, and debug sessions. It
provides valuable information and feedback
from various processes running within Visual
Studio Code.
Debug Console: The Debug Console is where
we can interactively debug your code during
a debugging session. It allows us to view and
evaluate expressions, inspect variables, and
execute commands within the context of our
debug
session.
Terminal: Visual Studio Code includes an
integrated terminal that allows us to run
command-line tasks and interact with our
operating system directly within the editor.
It supports various shells and can be
customized to suit our preferences.
Ports: Ports refer to network ports used for
communication between Visual Studio Code
and external processes, such as debuggers
or language servers. Configuring and
managing ports may be necessary when
working with certain features or extensions
that require network connectivity.
Now, a new Python file helloworld.py is created by clicking
New File icon under EXPLORER section. We are saving
Python files with .py extension. It is a text file that
contains Python code and under this particular file we are
only writing the print statement and that too as shown in
Figure 1.29 as follows:
print(“HelloWorld”)

Figure 1.29: Image depicting helloworld.py file in VSCode


Now, at present by just writing pwd command we can know
the directory as shown in Figure 1.20 as follows:
Figure 1.30: Image depicting current working directory in VSCode
By typing, ls command the contents of the current
directory, including files and directories will be displayed
as shown in Figure
1.31 as follows:

Figure 1.31: Image depicting ls command in VSCode


So, now just write the command python .\helloworld.py under
the Terminal tab, we may get the error as shown in Figure
1.32 as VS Code was unable to locate the Python
interpreter.

Figure 1.32: Image depicting error on running the Python command


along with file name
So, just close VSCode, again restart it and run the
command python .\helloworld.py under the Terminal tab.
This time we may not get an error and the output
HelloWorld will be displayed to the user, as shown below:
Figure 1.33: Image depicting HelloWorld output to the user

Note: Executing this command in a terminal or


command prompt triggers the Python interpreter
to run the code within the helloworld.py script,
resulting in the generation of any output or
behavior defined within the script.

Introduction to data science libraries in


Python
The introductory overview of data science libraries in
Python encompasses essential tools for data analysis,
manipulation, and visualization. Foundational libraries like
NumPy, Pandas, Matplotlib, and Scipy cater to distinct
aspects of the data science workflow. NumPy excels in
array operations and mathematical functions, while
Pandas offers versatile data structures for flexible
manipulation and analysis. Matplotlib aids in crafting
high-quality visualizations, and Scipy extends functionality
with scientific computing tools and algorithms. Proficiency
in these libraries is paramount for aspiring data scientists
as they underpin Pythonʼs capabilities for data
exploration and analysis. Moreover, Polars and Seaborn
emerge as valuable additions to the data science toolkit.
Polars, a fast and efficient DataFrame library akin to
Pandas, excels in handling large-scale data processing
tasks with improved performance. Conversely, Seaborn,
built on Matplotlib, delivers a high-level interface for
creating captivating statistical graphics, enabling users to
generate various plots swiftly, including scatter plots,
histograms, and heatmaps, to glean insights and
communicate findings effectively. Integrating these
libraries elevates Pythonʼs prowess in data manipulation,
analysis, and visualization, fostering more efficient and
insightful data-driven decision- making processes. So, in
this book, we will be covering these data science
libraries with examples and various concepts
chapter-wise so that all the data science learners can
grasp these concepts in a well-structured manner.
Note: If you type press h letter by focusing on
Jupyter notebook, then the following image
will be popped up as shown in Figure 1.34
which will access the keyboard shortcuts help
menu. When you press h while in command
mode (blue cell border), a list of available
keyboard shortcuts in a popup dialog will be
displayed in the Jupyter notebook.

Figure 1.34: Image depicting shortcuts in Jupyter notebook


CHAPTER 4
Exploring NumPy Library
for Data Science in Python
NumPy stands for Numerical Python. It is an open-source library
created in 2005 by Travis Oliphant, designed for numerical
operations, providing a robust framework to handle and process
large datasets. It is fundamental to mastering data manipulation
and computational efficiency. Here's a breakdown tailored for
data scientists:

Why NumPy Matters for Data Science


1. Efficient Data Handling:

 NumPy arrays (ndarrays) are faster and more memory-


efficient than Python lists, making them ideal for handling
large-scale numerical datasets.

 Supports multi-dimensional arrays with a wide range of


operations, such as reshaping, slicing, and aggregations.
2. Mathematical and Statistical Computations:

 Offers built-in functions for linear algebra, matrix operations,


statistical measures, and random number generation.

 These tools are critical for tasks such as feature engineering,


data transformations, and preliminary data analysis.
3. Seamless Integration:

 Integrates effortlessly with other Python libraries like Pandas


(for data manipulation), Matplotlib (for visualization), and
SciPy (for advanced computations).

 Acts as the backbone for many machine learning and deep


learning libraries, including TensorFlow and scikit-learn.

4. Vectorization and Broadcasting:

 Operations on NumPy arrays are vectorized, meaning you


can perform element-wise operations without explicit loops,
boosting computational speed.
 Supports broadcasting, allowing operations between arrays
of different shapes with minimal code adjustments.

5. Data Cleaning and Preparation:

 NumPy arrays are ideal for working with missing values,


filtering data, and converting data types all essential steps
in data preprocessing pipelines.

NumPy Use Cases for Data Scientists

 Creating datasets and working with time series or tabular


data in a structured manner.

 Performing simulations, such as Monte Carlo experiments,


using its random sampling capabilities.

 Preprocessing images for computer vision applications by


treating image data as multi-dimensional arrays.
Installation of NumPy
 If you have Python and PIP already installed on a system,
then installation of NumPy
 Install it using this command:
C:\Users\Your Name>pip install numpy
 If this command fails, then use a python distribution that
already has NumPy installed like Jupyter Notebook,
Anaconda, Spyder etc.
Import NumPy
 Once NumPy is installed, import it in your applications by
adding the import keyword:
import numpy
NumPy as np

 NumPy is usually imported under the np alias.

 We often shorten numpy to np when importing it, like


this: import numpy as np .

 This convention is widely used and makes code easier to


read and understand.

 It’s recommended to always use np for consistency, so


others working with your code can easily follow along.
import numpy as np
Python list
A Python list is a built-in data structure used to store an ordered
collection of elements. It is incredibly versatile, allowing you to
group multiple items such as numbers, strings, or even other data
structures into a single variable.
Some key characteristics include:
Dynamic Size:
 Lists can grow or shrink dynamically, meaning you can add
or remove elements as needed.
Heterogeneous Elements:
 A single list can contain items of different types, such as
integers, strings, and floats.
Indexed Access:
 Each item in a list is indexed, starting from 0. For example:

In [ ] :list = [10, 20, 30]


print(list[1])
Out [ ] 20
:
Mutable:

 Lists are mutable, meaning you can modify their contents


(add, remove, or update elements) without creating a new
list.
Basic Operations:
 Creating a List:
In [ ] : list = [1,2,3,4]
print(list)
Out [ ] [1,2,3,4]
:

 Adding Elements:
In [ ] : list.append(5). # Adds 5 to the end
print(list)
Out [ ] [1,2,3,4,5]
:

 Removing Elements:
In [ ] : list.remove(3) # Removing the value 3
print(list)
Out [ ] [1,2,4,5]
:

Array
In NumPy, arrays are the backbone of data storage and
manipulation. They act as structured grids or tables where every
value has a consistent type, known as the "array dtype." This
consistency simplifies accessing, processing, and interpreting
individual elements.

You can access elements in an array using various methods, like


indexing with numbers, booleans, or even other arrays. Arrays
can have multiple dimensions, which is called the rank (like rows
and columns in a table), and the shape describes the size along
each dimension.

To create a NumPy array, you can start with a Python list, using
nested lists for data with multiple dimensions, such as a 2D or 3D
array.

When working with NumPy in Python, it's common practice to


import it using the shorthand np like this: import numpy as np.

Using np consistently allows others to understand your code


easily, promoting collaboration and standardization in data
science workflows.
dtype (Data Type in NumPy)
This is specific the type of data stored in a NumPy array, such as
integers, floats, or strings. If not explicitly provided, NumPy
automatically determines the smallest data type that can hold all
the elements in the array. For example, arr.dtype might
return int32 if the array contains 32-bit integers. dtype tells you
what kind of elements are inside the array (like int32 or float64).
type (Python Object Type)
A Python function used to identify the type of any object, such as
lists, integers, floats, or NumPy arrays. For example,
calling type(arr) on a NumPy array would output <class
'numpy.ndarray'>.
type tells you what kind of object it is (like a NumPy array).
Comparison between NumPy Arrays and Python Lists
comparison NumPy Arrays Python Lists
Memory Usage Store all elements in a Store each element
single contiguous block of separately, requiring
memory, resulting in lower more memory and
memory usage and faster making them slower
data access. This efficiency for large-scale
is particularly beneficial operations.
when working with large
datasets.
Speed Optimized for numerical Slower for numerical
and mathematical computations
operations. Since NumPy is because they are not
implemented in C, it optimized for
outperforms Python lists mathematical
significantly for tasks like operations on large
addition, multiplication, or data.
matrix operations.
Data Types Require all elements to be Can hold elements of
of the same data type different data types
(e.g., integers, floats, or (e.g., mixing numbers
strings), enabling faster and strings), offering
computations and flexibility but at the
consistency. cost of performance.

Mathematical Allow for "element-wise" Require loops or list


Operations: operations directly. For comprehensions for
example, multiplying an mathematical
entire array by 2 applies operations on
the operation to all individual elements,
elements at once, making which leads to slower
code concise and performance and
execution faster. more complex code.
Multidimension It is designed to handle It can handle multi-
al Data data in multiple dimensional data, but
Handling dimensions, such as 2D it gets tricky and isn’t
matrices or even 3D as smooth as NumPy
tensors, which are for things like tables
essential in data science or matrices.
and machine learning.
Lets take any random list i.e l = [4, 3, 'hello] in this list acoording
to storage element 4 can be stroe in 1 bit data element 3 can
store 2 bit data and 'hello' is storing lets say 3 bit data here
but in case of array ; every element into the list array stores equal
storage i.e from l list every element stores equal lets say 3 3 3 bit
of storage each
numpy shows the string data in unicode formate

Arrays with Zeros


The function np.zeros(shape, dtype) is used to create an array
that is filled entirely with zeros.
By default, the data type (dtype) of the elements is float64, but
you can specify a different type if needed, such as int.

Arrays with Zeros


The function np.ones(shape, dtype) is used to create an array
where every element is set to 1.
The default data type is also float64, but it can be changed to
other types such as int or bool if needed.
Random Matrix
Random Matrix Using NumPy's Random Function
The np.random.random () function generates a matrix filled with
random values.
These random values always lie within the range 0 to 1. Each
time the code is executed, it will give diff values everytime.
This function is highly useful for generating test datasets or
initializing model weights in machine learning tasks.
1D random array
2D random array

3D random array

Randint
Random Matrix Using NumPy's Randint
The np.random.randint () function generates random integer
values within a specified range. By default, the starting value (n)
is 0.
Numbers within the range can repeat, as the function allows
duplicates.
This is useful for creating random numerical datasets for
simulations or testing.
Rand
Random Matrix Using NumPy's Rand
The np.random.rand () function in NumPy generates random
numbers following a uniform distribution, where every number in
the range 0 to 1 has an equal chance of being selected. The
output values are floating-point numbers.
Randn
Random Matrix Using NumPy's Randn
The np.random.randn() function is a convenient method to create
matrices filled with random values drawn from a standard normal
distribution (mean of 0 and standard deviation of 1). This is often
useful in simulations, algorithm testing, or initializing parameters
in machine learning models. it gives any random no ,positive or
negative also .
Uniform
Random Matrix Using NumPy's Uniform
The numpy.random.uniform() function provides a method for
generating random floating-point numbers distributed uniformly
within a specified range. Here's how it works, explained:
Specified Range: If both low and high values are provided, the
function generates random floats exclusively within the specified
range, ensuring no repetition among the values.
Single Value: If only a single argument is passed (the high value),
the function assumes the range to be [0, high] by default.
Default Behavior: If no arguments are given, the function
generates random float values within the range [0, 1]
Choice
Random Matrix Using NumPy's Choice
The np.random.choice () function generates a random value from
a specified sequence (e.g., a list or array).
If no range is provided but only a single value is given, it defaults
to a range from 0 to the specified number and generates a
random value within this range.
The selected elements can be repeated by default. To prevent
repetition, set the parameter replace=False.
Arange
The np.arange() function generates a sequence of numerical
values, such as integers or floats, based on the provided range
and step size.
It accepts a flexible number of positional arguments:
1. np.arange(start, stop): Generates numbers starting from
start (inclusive) to stop (exclusive).
2. np.arange(start, stop, step): Additionally specifies a step size
to control the spacing between consecutive values.

The arange function is specifically designed for numerical values


and does not work with non-numeric types like strings or objects.

Identity Matrix
An identity matrix is a square matrix where all diagonal elements
are 1, and all off-diagonal elements are 0. It is widely used in data
science, especially in linear algebra and machine learning, for
matrix operations and solving systems of equations.

Reshape
The reshape function is used to change or modify the shape of a
NumPy array without altering its data.
The product of the dimensions specified in the reshape arguments
must equal the total number of elements in the original array.
now lets try to reshape the existing array
Scaler operations on array
A scalar operation involves a single number (scalar) operating on
every element of an array. These operations are fundamental
in data science for data preprocessing, feature scaling,
normalization, and other numerical transformations.
Arithmetic operation
Arithmetic operations refer to performing arithmetic calculations
like addition, subtraction, multiplication, division or modulo
directly on each element within a NumPy array. These operations
are applied uniformly across all elements, enabling efficient and
concise computations that are crucial in data science workflows.
Relational operation
Relational operators, also referred to as comparison operators,
are used to evaluate the relationship between values within a
dataset. These operators return a Boolean array where each
element is either True or False, based on whether the specified
condition is satisfied for the operands being compared.
Vector Operation

You might also like