data science

The document provides a comprehensive guide on using Python for data science, including installation commands for essential packages like ipykernel, numpy, pandas, and matplotlib. It covers topics such as data acquisition, exploration, and ethical considerations in data science, alongside practical activities like personality prediction and dataset creation. Additionally, it explains key concepts in data science, data formats, and the functionalities of various Python libraries for data manipulation and visualization.

Uploaded by

Kiran Jasdev

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

data science

Uploaded by

Kiran Jasdev

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Advance Python

Commands
Command to install ipykernel with jupyter notebook
>>> conda install ipykernel nb_conda jupyter

To launch jupyter notebook

>>> jupyter notebook

Creating virtual environment

>>> conda create –n env python=3.7

Activate the virtual enviroment

>>> conda activate env
Installing packages
>>> conda install numpy
>>> conda install pandas
>>> conda install matplotlib
Data Sciences
Topics to be covered
Introduction to Data Science
Applications of Data Science
Python for Data Sciences
Hands-on: Statistical Learning & Data Visualisation
Activity: Personality Prediction
Understanding K-nearest neighbour model
Ethical issues around Data Science
What is Data?
 Data can be defined as a representation of facts or instructions about some
entity (students, school, sports, business, animals etc.) that can be processed
or communicated by human or machines.
 Data is a collection of facts, such as numbers, words, pictures, audio clips,
videos, maps, measurements, observations or even just descriptions of things.
Data maybe represented with the help of characters such as alphabets (A-Z,
a-z), digits (0-9) or special characters (+, -, /, *, <,>, = etc.)
Humans are social animals. We tend to organize and/or participate in various
kinds of social gatherings all the time. We love eating out with friends and
family because of which we can find restaurants almost everywhere and out of
these, many of the restaurants arrange for buffets to offer a variety of food
items to their customers. Be it small shops or big outlets, every restaurant
prepares food in bulk as they expect a good crowd to come and enjoy their
food. But in most cases, after the day ends, a lot of food is left which becomes
unusable for the restaurant as they do not wish to serve stale food to their
customers the next day. So, every day, they prepare food in large quantities
keeping in mind the probable number of customers walking into their outlet.
But if the expectations are not met, a good amount of food gets wasted which
eventually becomes a loss for the restaurant as they either have to dump it or
give it to hungry people for free. And if this daily loss is taken into account for a
year, it becomes quite a big amount.
Problem Scoping
Problem Scoping
Problem Scoping
Problem Statement Template
Data Acquisition
Data Acquisition
Data Exploration
Data Science
 It is a concept to unify statistics, data analysis, machine learning and their
related methods in order to understand and analyse actual phenomena with
data.
 It work around analysing the data and when it comes to AI, the analysis
helps in making the machine intelligent enough to perform tasks by itself.
Explain how
each of this field
uses Data
Science
Data Sciences
Let us create a students’ dataset for your class (the one given
below is a sample, you can create one of your own)
Activity
Does this dataset tell you a
story?
Do you think it mirrors an
association between
marks obtained and
attendance?
Can you extract 5
observations from this
dataset?
Data Collection
While accessing data from any of the data sources, following points should be
kept in mind:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected from
random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper
training of the AI model.
Sources of Data
Types of Data Formats

1. CSV: CSV stands

for comma
separated values.
2. Spreadsheet
3. SQL: SQL is a
programming
language also known
as Structured Query
Language.
Python Packages
Introduction to Lists
Practical and demonstration
NumPy
NumPy stands for ‘Numerical Python’. It is a package for data analysis and
scientific computing with Python.
It is a commonly used package to working around numbers.
NumPy gives a wide range of arithmetic operations around numbers giving us an
easier approach in working with them.
NumPy also works with arrays (homogenous collection of Data)
In NumPy, the arrays used are known as ND-arrays (N-Dimensional Arrays) as
NumPy comes with a feature of creating n-dimensional arrays in Python.
Creation of NumPy Arrays from List
import numpy as np
#The NumPy’s array() function converts a given list into an array.
#For example, Create an array called array1 from the given list.
array1 = np.array([10,20,30])
#Display the contents of the array
print(array1)
//output
array([10, 20, 30])
Pandas (panel data)
Pandas is a software library written for data manipulation and analysis
it offers data structures and operations for manipulating numerical tables and time series
Pandas is well suited for many different kinds of data:
• Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time series data.
• Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column
labels
• Any other form of observational / statistical data sets. The data actually need not be
labelled at all to be placed into a Pandas data structure
Here are just a few of the things that pandas does well:
• Easy handling of missing data (represented as NaN)
• Size mutability: columns can be inserted and deleted from DataFrame and
higher dimensional objects
• Automatic and explicit data alignment: objects can be explicitly aligned to a
set of labels, or the user can simply ignore the labels and let Series,
DataFrame, etc. automatically align the data for you in computations
• Intelligent label-based slicing, fancy indexing, and subsetting of large data
sets
• Intuitive merging and joining data sets
• Flexible reshaping and pivoting of data sets
Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays.
Matplotlib is a multi-platform data visualization library built on NumPy arrays.
Matplotlib comes with a wide variety of plots.
Plots helps to understand trends, patterns, and to make correlations.
Package Installation
conda install numpy /pip install numpy
conda install pandas /pip install pandas

conda install matplotlib / pip install matplotlib

Functions performed on numpy array
ARR = numpy.array([1,2,3,4,5])
Dataframe : Reading CSV file
import pandas as pd
df=pd.read_csv(r”path/filename.csv”)
print(df)
print(df.head()) //display top five rows
print(df.head(10)) //display top ten rows
print(df.tail(10)) //display bottom 10 rows
print(df.dtypes)
dataframe
//to remove all rows with null or empty values
newdf1 = df.dropna()
//to do changes in original dataframe
df.dropna(inplace = True)
//to fill null values with some values
newdf2=df.fillna(newValue)
//replace only for a specified column
newdf3=df["ColumnName"].fillna(newValue)
dataframe
//calculating mean
mn = df["ColumnName"].mean()
//to fill the null values with mean value
newdf4=df["ColumnName"].fillna(mn)
//calculating median
med = df["ColumnName"].median()
//calculating mode
md = df["ColumnName"].mode()
dataframe
//calculating sum
s = df.sum(axis = 0, skipna = True) # column wise
df.sum(axis = 1, skipna = True) # row wise
//to find minimum value
df.min(axis = 0)
df.min(axis = 1)
//calculating max
df.max(axis = 0)
df.max(axis = 1)

Okta Developer Exam - Study Guide - 0
No ratings yet
Okta Developer Exam - Study Guide - 0
17 pages
CS604 Operating Systems Solved MCQSfor Final Term Lec 23 To 32 With Reference by Me
No ratings yet
CS604 Operating Systems Solved MCQSfor Final Term Lec 23 To 32 With Reference by Me
9 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
tool and lib in Data Science
No ratings yet
tool and lib in Data Science
32 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
UNIT 4 Data Science Notes
No ratings yet
UNIT 4 Data Science Notes
4 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
ds with py
No ratings yet
ds with py
39 pages
CH 4
No ratings yet
CH 4
17 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
51 pages
DS1
No ratings yet
DS1
20 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Report
No ratings yet
Report
18 pages
Data Science
No ratings yet
Data Science
8 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
DS FINAL
No ratings yet
DS FINAL
46 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Unit 1
100% (1)
Unit 1
69 pages
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
No ratings yet
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
179 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
final dev record
No ratings yet
final dev record
49 pages
unit 5
No ratings yet
unit 5
28 pages
Data Science
No ratings yet
Data Science
109 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Data Analytics and Visualization With Python 1728356869
No ratings yet
Data Analytics and Visualization With Python 1728356869
121 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
Microsoft Ai Automate
No ratings yet
Microsoft Ai Automate
259 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
DS1
No ratings yet
DS1
10 pages
1st Class-Introduction and Python Package (1)
No ratings yet
1st Class-Introduction and Python Package (1)
93 pages
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
No ratings yet
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
16 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Unit 5
No ratings yet
Unit 5
27 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Module 1
No ratings yet
Module 1
91 pages
Machine Learning Lecture2
No ratings yet
Machine Learning Lecture2
38 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Python Ca22
No ratings yet
Python Ca22
14 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
AI-Data Science
No ratings yet
AI-Data Science
21 pages
PYTHON
No ratings yet
PYTHON
11 pages
DATASCIENCE_INTERNSHIP[1]
No ratings yet
DATASCIENCE_INTERNSHIP[1]
43 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (7)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Cisco ME 3400 Boot Loader Commands
No ratings yet
Cisco ME 3400 Boot Loader Commands
26 pages
Grade 7 Rationalized Pre Technical Teaching Notes Complete
75% (4)
Grade 7 Rationalized Pre Technical Teaching Notes Complete
46 pages
Microsoft - Ensurepass.pl 400.vce.2023 Jul 30.by - Rex.101q.vce
No ratings yet
Microsoft - Ensurepass.pl 400.vce.2023 Jul 30.by - Rex.101q.vce
14 pages
Internship Report Shivam
No ratings yet
Internship Report Shivam
27 pages
Codetru - Big Data
100% (1)
Codetru - Big Data
17 pages
Luis Martin Pachas
No ratings yet
Luis Martin Pachas
5 pages
IIoT Report 2024
No ratings yet
IIoT Report 2024
9 pages
Differentiate Between Magnetic and Optical Storage Devices
No ratings yet
Differentiate Between Magnetic and Optical Storage Devices
2 pages
Etumos Health Audit - SAMPLE
No ratings yet
Etumos Health Audit - SAMPLE
17 pages
(Ebook) Redis for Dummies by Steve Suehring ISBN 9781119824275, 1119824273 All Chapters Instant Download
100% (2)
(Ebook) Redis for Dummies by Steve Suehring ISBN 9781119824275, 1119824273 All Chapters Instant Download
81 pages
Backbox VM Intallation Guide
No ratings yet
Backbox VM Intallation Guide
13 pages
November 2022 DSpace Community Workshops Q&A
No ratings yet
November 2022 DSpace Community Workshops Q&A
11 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
26 pages
Enable Database Table Logging in SAP
No ratings yet
Enable Database Table Logging in SAP
3 pages
SAP S4 HANA Notes
No ratings yet
SAP S4 HANA Notes
125 pages
ET Week1
No ratings yet
ET Week1
46 pages
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
No ratings yet
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
8 pages
EPM 1173 - Day - 3-Unit - 3 - Excel-2
No ratings yet
EPM 1173 - Day - 3-Unit - 3 - Excel-2
23 pages
2VAA008605 en B Symphony Plus Historian Brochure
No ratings yet
2VAA008605 en B Symphony Plus Historian Brochure
6 pages
Ba BSC Part 2 Operating System 799 Dec 2019
No ratings yet
Ba BSC Part 2 Operating System 799 Dec 2019
5 pages
Python Question Bank - 2nd IA
No ratings yet
Python Question Bank - 2nd IA
2 pages
John Miller CV
No ratings yet
John Miller CV
1 page
Creating & Managing Supplier Relationships: Principles of Supply Chain Management: A Balanced Approach, 2e
No ratings yet
Creating & Managing Supplier Relationships: Principles of Supply Chain Management: A Balanced Approach, 2e
17 pages
C# Cheatsheet ZTM
No ratings yet
C# Cheatsheet ZTM
15 pages
SQA Assignment
No ratings yet
SQA Assignment
6 pages
Ignite Sample
No ratings yet
Ignite Sample
89 pages
Unit 1 - Project Management - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Project Management - WWW - Rgpvnotes.in
13 pages
Hameed
No ratings yet
Hameed
2 pages

data science

Uploaded by

data science

Uploaded by

Advance Python

To launch jupyter notebook

Creating virtual environment

Activate the virtual enviroment

1. CSV: CSV stands

conda install matplotlib / pip install matplotlib

You might also like