Python For Data Science

Uploaded by

shobit98200

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Python For Data Science

Uploaded by

shobit98200

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

PYTHON FOR DATA SCIENCE

You learnt about Python's two most essential and popular libraries, NumPy and Pandas.
You studied NumPy arrays in different dimensions and performed various mathematical operations on NumPy. NumPy offers an enormous library of
high-level mathematical functions that efficiently operate on arrays and matrices
Then, you learnt about Pandas which is built on top of NumPy. Pandas allow you to slice, index, and execute other DataFrame operations that are useful
for cleaning and analysing data

Common Interview Questions: PYTHON FOR DATA SCIENCE

1. What is NumPy?
2. How is vstack() different from hstack() in NumPy?
3. List the advantages NumPy Arrays have over (nested) Python lists.
NumPy Pandas
4. How do you convert a Pandas DataFrame to a NumPy array? Create 1D, 2D, 3D arrays Rows and columns in a DataFrame
5. What are the different types of data structures in Pandas? Operations on 1-D arrays Indexing and slicing
Mathematical operations on Operations on DataFrames
6. What are the most important features of The Pandas library? NumPy arrays Groupby functions
7. How do you get the frequency count of the unique items in a series? NumPy vs lists in Python Merging two DataFrames
Pivot table
8. What are the different ways of creating DataFrame in Pandas? Explain
with examples.
9. How are loc and iloc different in Pandas?
10. How does the groupby() method works in Pandas?
NUMPY(import numpy as np)

NumPy array Indexing

a = np.array([20,24,28,32,36,40])
1D array 2D array 3D array
axis 1 axis 2 Indexing
1 2 3 0 1 2 3 4 5
axis 1 Positive indexing
1.5 2 3
axis o
4 5 6 axis o Negative indexing -6 -5 -4 -3 -2 -1

Syntax: array[index]

a = np.array([20,24,28,32, 36, 40]) #1D array

Mathematical operations:
b = np.array([(1.5,2,3),(4,5,6)],dtype = float)#2D array
c = np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]], dtype = float) a.sum() #180; Sum of all elements
#3D array a.min() #20; To find minimum
a.max() #40; To find maximum
a.mean() #30; Average of all numbers
Slicning a/4 #[5., 6., 7., 8., 9., 10.) #Rowwise operation

1D array
a[:] #[20, 24, 28, 32, 36, 40] #selects everything
Array Manipulation
a[2:5] #[28,32,36] #Selects the 2nd through the 4th rows (does not include
the 5th row) a1 = np.array([20, 21, 22, 23, 24, 25])
a2 = ntp.arange(6) #[0, 1, 2, 3, 4, 5]
2D array a1.reshape (2,3) #Reshaping arrays without changing data
b[:,:] #[[1.5, 2., 3.),[4., 5., 6. ]] #Selects all rows and all columns #[[20, 21, 22],
b[:,0] [1.5, 4. ) #Selects all rows, and the zeroth column #[23, 24, 25]]
b[0,:] #[1.5, 2., 3. ) #Selects the zeroth row, and all columns in that row np.concatenate((a1, a2)) #Concatenate arrays
b[0:2,:] #[[1.5,2.,3.1,[4.,5.,6.]]#Selects the zeroth and first row, np.hstack((a1, a2)) #Stack arrays horizontally
but NOT the second #[20, 21, 22, 23, 24, 25, 0, 1, 2, 3, 4, 5]
b[0:2,0:2] #[1.5,2.],[4.,5.]]#Selects the zeroth and first row, and the np.vstack((a1, a2)) #Stack arrays vertically
zeroth and first column #[[20, 21, 22, 23, 24, 25),
#[ 0, 1, 2, 3, 4, 5]]
Note: For three-or more-dimensional arrays, the slicing method remains similar
PANDAS(import pandas as pd)

NumPy array Basic information about DataFrame

Pandas Series df.info() #Information about DataFrames
Pandas DataFrame df.describe() #To get statistical information like mean,
median, mode, percentile
Pandas Series: df.head() #To identify the first five rows in a DataFrame
s = pd.Series([3,-5,7,4], index=[1,2,3,4]) df.sort.index() #Hierarchical indexing
df[start_index:end_index] #Subset the rows according to the
start and end indices
Output

1 3
2 -5
3 7 #Conditional operator
4 4
dtype: int64 df[df['Population_in_millions']>100]

Country_name Capital Population_in_millions

Create a DataFrame from a dictionary:
0 India New Delhi 1393.4
Syntax: pd.DataFrame(dictionary_name)
1 Brazil Brasília 201.0
Read an external CSV file:
Syntax: pd.read_csv(filepath, sep = ', ', header = ' infer') loc vs iloc in Pandas DataFrame
separator (by default ‘,’) #loc selects rows and columns with specific labels
header (takes the top row by default, if not specified)
df.loc[[0,1], ['Country_name']]
names (list of column name)

Country_name
df = pd.read_csv('country.csv',index_col=0)
0 India

Country_name Capital Population_in_millions 1 Brazil

India New Delhi 1393.40 #iloc selects rows and columns at speciﬁc integer positions
Brazil Brasília 201.00
df.loc[[1,2] #Element in first row and secound column
Canada ottawa 38.23
214.0
PANDAS(import pandas as pd)
Statistical summary in Pandas Merging two DataFrames in Pandas
df.sum() #Sum values of each object df_1.merge(df_2, on = ['column_1', 'column_2'], how = '____')
The attribute 'how' speciﬁes the type of merge that is to be performed.
df.cumsum() #Cummulative sum values of each object
Merges are of several types as shown below:
df.min()/df.max() #Min/max value of each object
df.idxmin()/df.idxmax() #Min/Max index value of each object
df.mean() #Mean of each object LEFT JOIN FULL OUTER JOIN LEFT JOIN (if NULL)
df.median() #Median of each object
df.median() #Standard of each object

INNER JOIN RIGHT JOIN RIGHT JOIN (if NULL)

GroupBy function:
DataFrame.groupby(by['col_name'])
df.groupby(by="col") #Return a
GroupBy object, grouped by values in
column named "col". left: Selecting the entries only in the ﬁrst DataFrame.
df.groupby(level="ind") Return a right: Considering the entries only in the second DataFrame
GroupBy object, grouped by values in outer: Union of all the entries in the DataFrames
index level named "ind". inner: Intersection of the keys from both DataFrames

Pivot Table: #Summarise a DataFrame. Pivot table works like groupby function but it
represents a data in a structured and simpliﬁed manner

df.pivot(columns='grouping_variable_col',
values='value_to_aggregate', index='grouping_variable_row')

df.pivot_table(values, index, aggfunc=

{'value_1': np.mean,'value_2': [min, max, np.mean]})

ABCadabra - User - Guide 3
No ratings yet
ABCadabra - User - Guide 3
8 pages
DataWarehouse Concept
100% (1)
DataWarehouse Concept
18 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Machine Learning- Section #3 (Numpy)
No ratings yet
Machine Learning- Section #3 (Numpy)
21 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Pandas Numpy
No ratings yet
Pandas Numpy
4 pages
DSE UNIT 3
No ratings yet
DSE UNIT 3
12 pages
Analitical Reseach
No ratings yet
Analitical Reseach
15 pages
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
No ratings yet
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
6 pages
Ip pb1 QP Ms Agra Set A
No ratings yet
Ip pb1 QP Ms Agra Set A
17 pages
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Pygame Tutorials - Surfarray Introduction — Pygame v2.0.0.Dev5 Documentation
No ratings yet
Pygame Tutorials - Surfarray Introduction — Pygame v2.0.0.Dev5 Documentation
8 pages
Aman Ai Primers Numpy
100% (1)
Aman Ai Primers Numpy
85 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Numpy @CodeProgrammer
No ratings yet
Numpy @CodeProgrammer
64 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
UNIT IV FDS
No ratings yet
UNIT IV FDS
142 pages
numpy primer
No ratings yet
numpy primer
19 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
2d arrys
No ratings yet
2d arrys
34 pages
Pyq Solution
No ratings yet
Pyq Solution
12 pages
Pandas
No ratings yet
Pandas
13 pages
Data_Science_Question_Bank_Unit-I
No ratings yet
Data_Science_Question_Bank_Unit-I
53 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Chapter 2 Data Structures in R
No ratings yet
Chapter 2 Data Structures in R
14 pages
GR Xii Ip Pandas Worksheet
No ratings yet
GR Xii Ip Pandas Worksheet
6 pages
Pandas Question PDF
0% (1)
Pandas Question PDF
2 pages
numpyintro-pdf
No ratings yet
numpyintro-pdf
17 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
NumPy: from basic to advance
No ratings yet
NumPy: from basic to advance
119 pages
Pandas
No ratings yet
Pandas
82 pages
UNIT 5 python aktu
No ratings yet
UNIT 5 python aktu
49 pages
Numpy Matplot
No ratings yet
Numpy Matplot
14 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Numpy 1721963082
No ratings yet
Numpy 1721963082
68 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
FDS Unit 4
No ratings yet
FDS Unit 4
66 pages
Chapter 2 Data handling using Pandas - I
No ratings yet
Chapter 2 Data handling using Pandas - I
10 pages
Satish Dangi
No ratings yet
Satish Dangi
13 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
12 pages
Numpy Pandas
No ratings yet
Numpy Pandas
54 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Fundamentals of Computer Programming: Jehangir Arshad Meo (Lecturer)
No ratings yet
Fundamentals of Computer Programming: Jehangir Arshad Meo (Lecturer)
28 pages
3 Introduction To Numpy
No ratings yet
3 Introduction To Numpy
9 pages
ML-CONTENTHALF
No ratings yet
ML-CONTENTHALF
35 pages
b
No ratings yet
b
3 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
NumPy Arrays and Pandas Series Object
No ratings yet
NumPy Arrays and Pandas Series Object
18 pages
cs229_python_friday
No ratings yet
cs229_python_friday
40 pages
تلخيص numPy
No ratings yet
تلخيص numPy
15 pages
Ip Half Yearly Exam 1 - 2022
No ratings yet
Ip Half Yearly Exam 1 - 2022
7 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
StatementOfAccount 59133131796 06122023 130857
No ratings yet
StatementOfAccount 59133131796 06122023 130857
22 pages
1 - Pengenalan Enterprise Architecture
No ratings yet
1 - Pengenalan Enterprise Architecture
37 pages
Module 1 Assignment
No ratings yet
Module 1 Assignment
17 pages
Halogen Dimmer PDF
No ratings yet
Halogen Dimmer PDF
1 page
BIM For Heritage Technical Guidance - Asset Information Requirements Template
No ratings yet
BIM For Heritage Technical Guidance - Asset Information Requirements Template
12 pages
ITC Air Canada
No ratings yet
ITC Air Canada
2 pages
Unify OpenScape Business V3 - Sales Information Sales Information External
No ratings yet
Unify OpenScape Business V3 - Sales Information Sales Information External
310 pages
Aifaz Project Final 2
No ratings yet
Aifaz Project Final 2
51 pages
Point of Sale Thesis Documentation PDF
100% (3)
Point of Sale Thesis Documentation PDF
4 pages
Códigos de Errores Prime
No ratings yet
Códigos de Errores Prime
19 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
MT8127 Scatter
No ratings yet
MT8127 Scatter
7 pages
USCIS ELIS Customer User Manual
No ratings yet
USCIS ELIS Customer User Manual
171 pages
Cross Validation Presentation - Jupyter Notebook
No ratings yet
Cross Validation Presentation - Jupyter Notebook
4 pages
Assignment/ Tugasan - Compensation Management
No ratings yet
Assignment/ Tugasan - Compensation Management
8 pages
Oop lab work
No ratings yet
Oop lab work
11 pages
PHAR193 Quiz 4
No ratings yet
PHAR193 Quiz 4
4 pages
2015 Pharma GB
No ratings yet
2015 Pharma GB
20 pages
Getting Started With Oracle (TRCS) Tax Reporting Cloud Service Part I
No ratings yet
Getting Started With Oracle (TRCS) Tax Reporting Cloud Service Part I
8 pages
Varsha BE marks card
No ratings yet
Varsha BE marks card
11 pages
Epson Surecolor Series
No ratings yet
Epson Surecolor Series
12 pages
Project Synopsis
No ratings yet
Project Synopsis
29 pages
Syarikat Air Selangor Proposal
No ratings yet
Syarikat Air Selangor Proposal
14 pages
Complete Bug Bounty Cheat Sheet
No ratings yet
Complete Bug Bounty Cheat Sheet
5 pages
ACCT 410x Foundations of Accounting Fall 2024 Woo
No ratings yet
ACCT 410x Foundations of Accounting Fall 2024 Woo
20 pages
4 Business-Impact-of-IoT-in-Manufacturing-Industries
No ratings yet
4 Business-Impact-of-IoT-in-Manufacturing-Industries
62 pages
Limits, Continuity & Differentiability _ DPP 04 __ Lakshya JEE AIR O1 (2026)
No ratings yet
Limits, Continuity & Differentiability _ DPP 04 __ Lakshya JEE AIR O1 (2026)
3 pages
Freebitco
No ratings yet
Freebitco
3 pages