Python For Data Science
Python For Data Science
You learnt about Python's two most essential and popular libraries, NumPy and Pandas.
You studied NumPy arrays in different dimensions and performed various mathematical operations on NumPy. NumPy offers an enormous library of
high-level mathematical functions that efficiently operate on arrays and matrices
Then, you learnt about Pandas which is built on top of NumPy. Pandas allow you to slice, index, and execute other DataFrame operations that are useful
for cleaning and analysing data
1. What is NumPy?
2. How is vstack() different from hstack() in NumPy?
3. List the advantages NumPy Arrays have over (nested) Python lists.
NumPy Pandas
4. How do you convert a Pandas DataFrame to a NumPy array? Create 1D, 2D, 3D arrays Rows and columns in a DataFrame
5. What are the different types of data structures in Pandas? Operations on 1-D arrays Indexing and slicing
Mathematical operations on Operations on DataFrames
6. What are the most important features of The Pandas library? NumPy arrays Groupby functions
7. How do you get the frequency count of the unique items in a series? NumPy vs lists in Python Merging two DataFrames
Pivot table
8. What are the different ways of creating DataFrame in Pandas? Explain
with examples.
9. How are loc and iloc different in Pandas?
10. How does the groupby() method works in Pandas?
NUMPY(import numpy as np)
Syntax: array[index]
1D array
a[:] #[20, 24, 28, 32, 36, 40] #selects everything
Array Manipulation
a[2:5] #[28,32,36] #Selects the 2nd through the 4th rows (does not include
the 5th row) a1 = np.array([20, 21, 22, 23, 24, 25])
a2 = ntp.arange(6) #[0, 1, 2, 3, 4, 5]
2D array a1.reshape (2,3) #Reshaping arrays without changing data
b[:,:] #[[1.5, 2., 3.),[4., 5., 6. ]] #Selects all rows and all columns #[[20, 21, 22],
b[:,0] [1.5, 4. ) #Selects all rows, and the zeroth column #[23, 24, 25]]
b[0,:] #[1.5, 2., 3. ) #Selects the zeroth row, and all columns in that row np.concatenate((a1, a2)) #Concatenate arrays
b[0:2,:] #[[1.5,2.,3.1,[4.,5.,6.]]#Selects the zeroth and first row, np.hstack((a1, a2)) #Stack arrays horizontally
but NOT the second #[20, 21, 22, 23, 24, 25, 0, 1, 2, 3, 4, 5]
b[0:2,0:2] #[1.5,2.],[4.,5.]]#Selects the zeroth and first row, and the np.vstack((a1, a2)) #Stack arrays vertically
zeroth and first column #[[20, 21, 22, 23, 24, 25),
#[ 0, 1, 2, 3, 4, 5]]
Note: For three-or more-dimensional arrays, the slicing method remains similar
PANDAS(import pandas as pd)
1 3
2 -5
3 7 #Conditional operator
4 4
dtype: int64 df[df['Population_in_millions']>100]
Country_name
df = pd.read_csv('country.csv',index_col=0)
0 India
India New Delhi 1393.40 #iloc selects rows and columns at specific integer positions
Brazil Brasília 201.00
df.loc[[1,2] #Element in first row and secound column
Canada ottawa 38.23
214.0
PANDAS(import pandas as pd)
Statistical summary in Pandas Merging two DataFrames in Pandas
df.sum() #Sum values of each object df_1.merge(df_2, on = ['column_1', 'column_2'], how = '____')
The attribute 'how' specifies the type of merge that is to be performed.
df.cumsum() #Cummulative sum values of each object
Merges are of several types as shown below:
df.min()/df.max() #Min/max value of each object
df.idxmin()/df.idxmax() #Min/Max index value of each object
df.mean() #Mean of each object LEFT JOIN FULL OUTER JOIN LEFT JOIN (if NULL)
df.median() #Median of each object
df.median() #Standard of each object
Pivot Table: #Summarise a DataFrame. Pivot table works like groupby function but it
represents a data in a structured and simplified manner
df.pivot(columns='grouping_variable_col',
values='value_to_aggregate', index='grouping_variable_row')