pandas-cheet-sheet
pandas-cheet-sheet
Python For Data Science >>> help(pd.Series.loc) Drop values from rows (axis=0)
>>> s.drop(['a', 'c'])
Cheat Sheet Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Pandas Basics >>> s['b'] Get one element Sort & Rank
Pandas -5
>>> df.sort_index() Sort by labels along an axis
>>> df[1:] Get subset of a DataFrame >>> df.sort_values(by='Country')Sort by the values along an axis
The Pandas library is built on NumPy and provides easy-to-use >>> df.rank() Assign ranks to entries
data structures and data analysis tools for the Python Country Capital Population 1
Basic Information
Use the following import convention: By Position (rows,columns)
Select single value by row & >>> df.shape
>>> import pandas as pd >>> df.iloc[[0],[0]] >>> df.index Describe index
column >>> df.columns Describe DataFrame columns
Pandas Data Structures
'Belgium'
>>> df.info() Info on DataFrame
>>> df.iat([0],[0]) >>> df.count() Number of non-NA values
Series 'Belgium'
Summary
A one-dimensional labeled a 3 By Label
>>> df.loc[[0], ['Country']] Select single value by row & >>> df.sum() Sum of values
array capable of holding any b -5
column labels >>> df.cumsum() Cummulative sum of values
'Belgium' Minimum/maximum values
data type c 7 >>> df.min()/df.max()
Index >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax()Minimum/Maximum index value
d 4 'Belgium' >>> df.describe() Summary statistics
>>> df.mean() Mean of values
By Label/Position >>> df.median() Median of values
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
>>> df.ix[2] Select single row of
DataFrame Country Brazil subset of rows Applying Functions
Capital Brasília
Columns Population 207847528 >>> f = lambda x: x*2
Select a single column of Apply function
Country Capital Population A two-dimensional labeled >>> df.ix[:,'Capital'] subset of columns
>>> df.apply(f)
Apply function element-wise
>>> df.applymap(f)
0 Belgium Brussels 11190846 data structure with columns 0 Brussels 1
New Delhi 2
of potentially different types Brasília Data Alignment
1 India New Delhi1303171035
Index Select rows and columns
2 Brazil Brasília 207847528
>>> df.ix[1,'Capital'] Internal Data Alignment
'New Delhi'
NA values are introduced in the indices that don’t overlap:
Boolean Indexing
>>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s[~(s > 1)] Series s where value is not >1 >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3
>>> df[df['Population']>1200000000] Use filter to adjust DataFrame
'Population': [11190846, 1303171035, 207847528]} a 10.0
b NaN
>>> df = pd.DataFrame(data, c 5.0
columns=['Country', 'Capital', 'Population'])
>>> s['a'] = 6 Set index a of Series s to 6 d 7.0