0% found this document useful (0 votes)
9 views

EDA_CODE_SNIPPETS

The document provides a comprehensive collection of code snippets for Exploratory Data Analysis (EDA) using Pandas, NumPy, Matplotlib, and Seaborn. It includes over 30 operations for each library, covering data manipulation, statistical analysis, and various visualizations. Each snippet is accompanied by a brief description of its function, making it a useful reference for data analysis tasks.

Uploaded by

Nakshi Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

EDA_CODE_SNIPPETS

The document provides a comprehensive collection of code snippets for Exploratory Data Analysis (EDA) using Pandas, NumPy, Matplotlib, and Seaborn. It includes over 30 operations for each library, covering data manipulation, statistical analysis, and various visualizations. Each snippet is accompanied by a brief description of its function, making it a useful reference for data analysis tasks.

Uploaded by

Nakshi Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Comprehensive EDA Code Snippets with

Descriptions

Pandas Snippets (30+ Operations)


• Display first 5 rows of DataFrame:
df . head ()

• Display last 5 rows of DataFrame:


df . tail ()

• Get summary statistics:


df . describe ()

• Find the mean of a column:


df [ ’ column_name ’]. mean ()

• Find the median of a column:


df [ ’ column_name ’]. median ()

• Find the mode of a column:


df [ ’ column_name ’]. mode () [0]

• Calculate the variance of a column:


df [ ’ column_name ’]. var ()

• Find the standard deviation of a column:


df [ ’ column_name ’]. std ()

1
• Find the covariance matrix:
df . cov ()

• Calculate the correlation matrix:


df . corr ()

• Find unique values in a column:


df [ ’ column_name ’]. unique ()

• Find value counts in a column:


df [ ’ column_name ’]. value_counts ()

• Rename a column:
df . rename ( columns ={ ’ old_name ’: ’ new_name ’} ,
inplace = True )

• Filter rows based on condition:


df [ df [ ’ column_name ’] > 10]

• Group by a column and compute mean:


df . groupby ( ’ column_name ’) . mean ()

• Create a new column based on operation:


df [ ’ new_col ’] = df [ ’ col1 ’] + df [ ’ col2 ’]

• Drop rows with missing values:


df . dropna ( inplace = True )

• Fill missing values with mean:


df [ ’ column_name ’]. fillna ( df [ ’ column_name ’].
mean () , inplace = True )

• Filter rows by multiple conditions:


df [( df [ ’ col1 ’] > 10) & ( df [ ’ col2 ’] == ’ value ’)
]

2
• Reset index of DataFrame:
df . reset_index ( drop = True , inplace = True )

• Sort DataFrame by column:


df . sort_values ( by = ’ column_name ’ , ascending =
False )

• Check for missing values:


df . isnull () . sum ()

• Convert column to datetime:


df [ ’ column_name ’] = pd . to_datetime ( df [ ’
column_name ’])

• Create pivot table:


df . pivot_table ( values = ’ col1 ’ , index = ’ col2 ’ ,
columns = ’ col3 ’)

• Find duplicates in a column:


df [ df . duplicated ([ ’ column_name ’]) ]

• Drop duplicates:
df . drop_duplicates ( inplace = True )

• Apply function to a column:


df [ ’ new_column ’] = df [ ’ column_name ’]. apply (
lambda x : x *2)

• Create dummy variables for categorical columns:


pd . get_dummies ( df [ ’ category_column ’] ,
drop_first = True )

3
NumPy Snippets (30+ Operations)
• Create an array:
np . array ([1 , 2 , 3])

• Create a zeros array:


np . zeros ((3 , 3) )

• Create an identity matrix:


np . eye (3)

• Generate random numbers:


np . random . rand (3 , 3)

• Generate random integers:


np . random . randint (0 , 100 , size =(5 , 5) )

• Find the mean of an array:


np . mean ( arr )

• Find the median of an array:


np . median ( arr )

• Find the variance of an array:


np . var ( arr )

• Find the standard deviation:


np . std ( arr )

• Reshape an array:
np . reshape ( arr , ( rows , cols ) )

• Find the dot product of two arrays:


np . dot ( arr1 , arr2 )

4
• Transpose an array:
arr . T

• Find the inverse of a matrix:


np . linalg . inv ( arr )

• Find eigenvalues and eigenvectors:


np . linalg . eig ( arr )

• Find the determinant of a matrix:


np . linalg . det ( arr )

• Sort an array:
np . sort ( arr )

• Concatenate arrays:
np . concatenate (( arr1 , arr2 ) , axis =0)

• Find the cumulative sum:


np . cumsum ( arr )

• Find the cumulative product:


np . cumprod ( arr )

• Get array of unique values:


np . unique ( arr )

• Find indices of non-zero elements:


np . nonzero ( arr )

• Check if any values in array are true:


np . any ( arr )

• Check if all values in array are true:

5
np . all ( arr )

• Find max element in array:


np . max ( arr )

• Find min element in array:


np . min ( arr )

• Get array of random permutations:


np . random . permutation ( arr )

• Generate random samples from normal distribution:


np . random . normal ( loc =0.0 , scale =1.0 , size =(3 ,
3) )

• Find the percentile of array:


np . percentile ( arr , 90)

article listings color


Comprehensive EDA Code Snippets with Descriptions
article listings color
Comprehensive EDA Code Snippets with Descriptions

Pandas Snippets (30+ Operations)


• Display first 5 rows of DataFrame:
df . head ()

• Display last 5 rows of DataFrame:


df . tail ()

• Get summary statistics:


df . describe ()

• Find the mean of a column:

6
df [ ’ column_name ’]. mean ()

• Find the median of a column:


df [ ’ column_name ’]. median ()

• Find the mode of a column:


df [ ’ column_name ’]. mode () [0]

• Find the variance of a column:


df [ ’ column_name ’]. var ()

• Find the standard deviation of a column:


df [ ’ column_name ’]. std ()

• Find the covariance matrix:


df . cov ()

• Calculate the correlation matrix:


df . corr ()

• Find unique values in a column:


df [ ’ column_name ’]. unique ()

• Find value counts in a column:


df [ ’ column_name ’]. value_counts ()

• Rename a column:
df . rename ( columns ={ ’ old_name ’: ’ new_name ’} ,
inplace = True )

• Filter rows based on condition:


df [ df [ ’ column_name ’] > 10]

• Group by a column and compute mean:

7
df . groupby ( ’ column_name ’) . mean ()

• Create a new column based on operation:


df [ ’ new_col ’] = df [ ’ col1 ’] + df [ ’ col2 ’]

• Drop rows with missing values:


df . dropna ( inplace = True )

• Fill missing values with mean:


df [ ’ column_name ’]. fillna ( df [ ’ column_name ’].
mean () , inplace = True )

• Filter rows by multiple conditions:


df [( df [ ’ col1 ’] > 10) & ( df [ ’ col2 ’] == ’ value ’)
]

• Reset index of DataFrame:


df . reset_index ( drop = True , inplace = True )

• Sort DataFrame by column:


df . sort_values ( by = ’ column_name ’ , ascending =
False )

• Check for missing values:


df . isnull () . sum ()

• Convert column to datetime:


df [ ’ column_name ’] = pd . to_datetime ( df [ ’
column_name ’])

• Create pivot table:


df . pivot_table ( values = ’ col1 ’ , index = ’ col2 ’ ,
columns = ’ col3 ’)

• Find duplicates in a column:

8
df [ df . duplicated ([ ’ column_name ’]) ]

• Drop duplicates:
df . drop_duplicates ( inplace = True )

• Apply function to a column:


df [ ’ new_column ’] = df [ ’ column_name ’]. apply (
lambda x : x *2)

• Create dummy variables for categorical columns:


pd . get_dummies ( df [ ’ category_column ’] ,
drop_first = True )

• Select specific columns:


df [[ ’ column1 ’ , ’ column2 ’]]

• Calculate cumulative sum:


df [ ’ column_name ’]. cumsum ()

• Create a rolling average:


df [ ’ column_name ’]. rolling ( window =5) . mean ()

• Join two DataFrames:


pd . merge ( df1 , df2 , on = ’ common_column ’)

• Concatenate two DataFrames:


pd . concat ([ df1 , df2 ] , axis =1)

NumPy Snippets (30+ Operations)


• Create an array:
np . array ([1 , 2 , 3])

• Create a zeros array:

9
np . zeros ((3 , 3) )

• Create an identity matrix:


np . eye (3)

• Generate random numbers:


np . random . rand (3 , 3)

• Generate random integers:


np . random . randint (0 , 100 , size =(5 , 5) )

• Find the mean of an array:


np . mean ( arr )

• Find the median of an array:


np . median ( arr )

• Find the variance of an array:


np . var ( arr )

• Find the standard deviation:


np . std ( arr )

• Reshape an array:
np . reshape ( arr , ( rows , cols ) )

• Find the dot product of two arrays:


np . dot ( arr1 , arr2 )

• Transpose an array:
arr . T

• Find the inverse of a matrix:


np . linalg . inv ( arr )

10
• Find eigenvalues and eigenvectors:
np . linalg . eig ( arr )

• Find the determinant of a matrix:


np . linalg . det ( arr )

• Sort an array:
np . sort ( arr )

• Find the cumulative sum:


np . cumsum ( arr )

• Find the cumulative product:


np . cumprod ( arr )

• Concatenate two arrays:


np . concatenate (( arr1 , arr2 ) , axis =0)

• Find the maximum value in an array:


np . max ( arr )

• Find the minimum value in an array:


np . min ( arr )

• Find the index of the maximum value:


np . argmax ( arr )

• Find the index of the minimum value:


np . argmin ( arr )

• Create an array of ones:


np . ones ((3 , 3) )

• Flatten an array:

11
arr . flatten ()

• Find the shape of an array:


arr . shape

• Find the rank of a matrix:


np . linalg . matrix_rank ( arr )

• Find the trace of a matrix:


np . trace ( arr )

• Repeat elements of an array:


np . tile ( arr , (2 , 2) )

• Slice an array:
arr [1:3]

Matplotlib Snippets (30+ Visualizations)


• Create a simple line plot:
plt . plot (x , y )
plt . show ()

• Set plot title and labels:


plt . title ( ’ Title ’)
plt . xlabel ( ’X - axis ’)
plt . ylabel ( ’Y - axis ’)

• Create a bar chart:


plt . bar (x , y )

• Create a scatter plot:


plt . scatter (x , y )

12
• Create a histogram:
plt . hist ( data , bins =10)

• Create a box plot:


plt . boxplot ( data )

• Set axis limits:


plt . xlim (0 , 10)
plt . ylim (0 , 100)

• Display grid on the plot:


plt . grid ( True )

• Create a subplot:
plt . subplot (2 , 1 , 1)
plt . plot (x , y )

• Save a plot as image:


plt . savefig ( ’ plot . png ’)

• Change line style and color:


plt . plot (x , y , linestyle = ’ - - ’ , color = ’r ’)

• Create a pie chart:


plt . pie ( sizes , labels = labels )

• Change figure size:


plt . figure ( figsize =(8 , 6) )

• Create a filled plot:


plt . fill_between (x , y1 , y2 )

• Create a heatmap:
plt . imshow ( data , cmap = ’ hot ’)

13
• Add legend to the plot:
plt . legend ([ ’ Label1 ’ , ’ Label2 ’])

• Annotate a point on plot:


plt . annotate ( ’ Point ’ , xy =( x , y ) , xytext =( x +1 ,
y +10) ,
arrowprops = dict ( facecolor = ’ black
’) )

• Create a violin plot:


plt . violinplot ( data )

• Create a stacked bar chart:


plt . bar (x , y1 , label = ’ Y1 ’)
plt . bar (x , y2 , bottom = y1 , label = ’ Y2 ’)
plt . legend ()

• Set logarithmic scale:


plt . xscale ( ’ log ’)

• Change marker style:


plt . plot (x , y , marker = ’o ’)

• Plot a function:
x = np . linspace (0 , 10 , 100)
plt . plot (x , np . sin ( x ) )

• Set axis aspect ratio:


plt . gca () . set_aspect ( ’ equal ’ , adjustable = ’ box
’)

• Fill under a line plot:


plt . fill (x , y )

• Create a polar plot:


plt . subplot ( projection = ’ polar ’)
plt . plot ( theta , r )

14
• Create a quiver plot:
plt . quiver (x , y , u , v )

• Create a contour plot:


plt . contour (X , Y , Z )

• Add text to plot:


plt . text (1 , 1 , ’ Text ’ , fontsize =12)

• Draw a horizontal line:


plt . axhline ( y =0.5 , color = ’r ’)

• Draw a vertical line:


plt . axvline ( x =0.5 , color = ’g ’)

• Create a 3D plot:
ax = plt . axes ( projection = ’3d ’)
ax . plot3D (x , y , z )

Seaborn Snippets (30+ Visualizations)


• Create a seaborn scatter plot:
sns . scatterplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )

• Create a seaborn line plot:


sns . lineplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )

• Create a seaborn bar plot:


sns . barplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )

• Create a seaborn box plot:


sns . boxplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )

• Create a seaborn histogram:

15
sns . histplot ( df [ ’ column ’] , kde = True )
\ item \ textbf { Add a legend :}
\ begin { lstlisting }
plt . legend ([ ’ Label1 ’ , ’ Label2 ’])

• Create a stacked bar chart:


plt . bar (x , y1 , label = ’ Y1 ’)
plt . bar (x , y2 , bottom = y1 , label = ’ Y2 ’)
plt . legend ()

• Save a figure:
plt . savefig ( ’ figure . png ’)

• Create a pie chart:


plt . pie ( sizes , labels = labels , autopct = ’%1.1 f
%% ’)

• Create a 3D plot:
from mpl_toolkits . mplot3d import Axes3D
fig = plt . figure ()
ax = fig . add_subplot (111 , projection = ’3d ’)
ax . scatter (x , y , z )

• Create a contour plot:


plt . contour (X , Y , Z )

• Create a heatmap:
plt . imshow ( data , cmap = ’ hot ’ , interpolation = ’
nearest ’)
plt . colorbar ()

• Change figure size:


plt . figure ( figsize =(10 , 5) )

• Add annotations:
plt . annotate ( ’ Point ’ , xy =( x , y ) , xytext =( x +1 ,
y +1) , arrowprops = dict ( facecolor = ’ black ’ ,
arrowstyle = ’ - > ’) )

16
• Create a violin plot:
plt . violinplot ( data )

• Create a pair plot using Seaborn:


import seaborn as sns
sns . pairplot ( df )

• Customize tick marks:


plt . xticks ( rotation =45)

• Create a polar plot:


plt . polar ( theta , r )

• Create a histogram with density:


plt . hist ( data , density = True , bins =10)

• Create a filled area plot:


plt . fill_between (x , y1 , y2 )

• Overlay multiple plots:


plt . plot (x , y1 , label = ’ Y1 ’)
plt . plot (x , y2 , label = ’ Y2 ’)
plt . legend ()

• Show the plot:


plt . show ()

17

You might also like