0% found this document useful (0 votes)
8 views

Python For Data Science

Uploaded by

shobit98200
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Python For Data Science

Uploaded by

shobit98200
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PYTHON FOR DATA SCIENCE

You learnt about Python's two most essential and popular libraries, NumPy and Pandas.
You studied NumPy arrays in different dimensions and performed various mathematical operations on NumPy. NumPy offers an enormous library of
high-level mathematical functions that efficiently operate on arrays and matrices
Then, you learnt about Pandas which is built on top of NumPy. Pandas allow you to slice, index, and execute other DataFrame operations that are useful
for cleaning and analysing data

Common Interview Questions: PYTHON FOR DATA SCIENCE

1. What is NumPy?
2. How is vstack() different from hstack() in NumPy?
3. List the advantages NumPy Arrays have over (nested) Python lists.
NumPy Pandas
4. How do you convert a Pandas DataFrame to a NumPy array? Create 1D, 2D, 3D arrays Rows and columns in a DataFrame
5. What are the different types of data structures in Pandas? Operations on 1-D arrays Indexing and slicing
Mathematical operations on Operations on DataFrames
6. What are the most important features of The Pandas library? NumPy arrays Groupby functions
7. How do you get the frequency count of the unique items in a series? NumPy vs lists in Python Merging two DataFrames
Pivot table
8. What are the different ways of creating DataFrame in Pandas? Explain
with examples.
9. How are loc and iloc different in Pandas?
10. How does the groupby() method works in Pandas?
NUMPY(import numpy as np)

NumPy array Indexing


a = np.array([20,24,28,32,36,40])
1D array 2D array 3D array
axis 1 axis 2 Indexing
1 2 3 0 1 2 3 4 5
axis 1 Positive indexing
1.5 2 3
axis o
4 5 6 axis o Negative indexing -6 -5 -4 -3 -2 -1

Syntax: array[index]

a = np.array([20,24,28,32, 36, 40]) #1D array


Mathematical operations:
b = np.array([(1.5,2,3),(4,5,6)],dtype = float)#2D array
c = np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]], dtype = float) a.sum() #180; Sum of all elements
#3D array a.min() #20; To find minimum
a.max() #40; To find maximum
a.mean() #30; Average of all numbers
Slicning a/4 #[5., 6., 7., 8., 9., 10.) #Rowwise operation

1D array
a[:] #[20, 24, 28, 32, 36, 40] #selects everything
Array Manipulation
a[2:5] #[28,32,36] #Selects the 2nd through the 4th rows (does not include
the 5th row) a1 = np.array([20, 21, 22, 23, 24, 25])
a2 = ntp.arange(6) #[0, 1, 2, 3, 4, 5]
2D array a1.reshape (2,3) #Reshaping arrays without changing data
b[:,:] #[[1.5, 2., 3.),[4., 5., 6. ]] #Selects all rows and all columns #[[20, 21, 22],
b[:,0] [1.5, 4. ) #Selects all rows, and the zeroth column #[23, 24, 25]]
b[0,:] #[1.5, 2., 3. ) #Selects the zeroth row, and all columns in that row np.concatenate((a1, a2)) #Concatenate arrays
b[0:2,:] #[[1.5,2.,3.1,[4.,5.,6.]]#Selects the zeroth and first row, np.hstack((a1, a2)) #Stack arrays horizontally
but NOT the second #[20, 21, 22, 23, 24, 25, 0, 1, 2, 3, 4, 5]
b[0:2,0:2] #[1.5,2.],[4.,5.]]#Selects the zeroth and first row, and the np.vstack((a1, a2)) #Stack arrays vertically
zeroth and first column #[[20, 21, 22, 23, 24, 25),
#[ 0, 1, 2, 3, 4, 5]]
Note: For three-or more-dimensional arrays, the slicing method remains similar
PANDAS(import pandas as pd)

NumPy array Basic information about DataFrame


Pandas Series df.info() #Information about DataFrames
Pandas DataFrame df.describe() #To get statistical information like mean,
median, mode, percentile
Pandas Series: df.head() #To identify the first five rows in a DataFrame
s = pd.Series([3,-5,7,4], index=[1,2,3,4]) df.sort.index() #Hierarchical indexing
df[start_index:end_index] #Subset the rows according to the
start and end indices
Output

1 3
2 -5
3 7 #Conditional operator
4 4
dtype: int64 df[df['Population_in_millions']>100]

Country_name Capital Population_in_millions


Create a DataFrame from a dictionary:
0 India New Delhi 1393.4
Syntax: pd.DataFrame(dictionary_name)
1 Brazil Brasília 201.0
Read an external CSV file:
Syntax: pd.read_csv(filepath, sep = ', ', header = ' infer') loc vs iloc in Pandas DataFrame
separator (by default ‘,’) #loc selects rows and columns with specific labels
header (takes the top row by default, if not specified)
df.loc[[0,1], ['Country_name']]
names (list of column name)

Country_name
df = pd.read_csv('country.csv',index_col=0)
0 India

Country_name Capital Population_in_millions 1 Brazil

India New Delhi 1393.40 #iloc selects rows and columns at specific integer positions
Brazil Brasília 201.00
df.loc[[1,2] #Element in first row and secound column
Canada ottawa 38.23
214.0
PANDAS(import pandas as pd)
Statistical summary in Pandas Merging two DataFrames in Pandas
df.sum() #Sum values of each object df_1.merge(df_2, on = ['column_1', 'column_2'], how = '____')
The attribute 'how' specifies the type of merge that is to be performed.
df.cumsum() #Cummulative sum values of each object
Merges are of several types as shown below:
df.min()/df.max() #Min/max value of each object
df.idxmin()/df.idxmax() #Min/Max index value of each object
df.mean() #Mean of each object LEFT JOIN FULL OUTER JOIN LEFT JOIN (if NULL)
df.median() #Median of each object
df.median() #Standard of each object

INNER JOIN RIGHT JOIN RIGHT JOIN (if NULL)


GroupBy function:
DataFrame.groupby(by['col_name'])
df.groupby(by="col") #Return a
GroupBy object, grouped by values in
column named "col". left: Selecting the entries only in the first DataFrame.
df.groupby(level="ind") Return a right: Considering the entries only in the second DataFrame
GroupBy object, grouped by values in outer: Union of all the entries in the DataFrames
index level named "ind". inner: Intersection of the keys from both DataFrames

Pivot Table: #Summarise a DataFrame. Pivot table works like groupby function but it
represents a data in a structured and simplified manner

df.pivot(columns='grouping_variable_col',
values='value_to_aggregate', index='grouping_variable_row')

df.pivot_table(values, index, aggfunc=


{'value_1': np.mean,'value_2': [min, max, np.mean]})

You might also like