0% found this document useful (0 votes)
48 views1 page

pandas-cheet-sheet

Cheat sheet for python libraries named pandas

Uploaded by

Aman Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views1 page

pandas-cheet-sheet

Cheat sheet for python libraries named pandas

Uploaded by

Aman Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Asking For Help Dropping

Python For Data Science >>> help(pd.Series.loc) Drop values from rows (axis=0)
>>> s.drop(['a', 'c'])
Cheat Sheet Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1)

Pandas Basics >>> s['b'] Get one element Sort & Rank
Pandas -5
>>> df.sort_index() Sort by labels along an axis
>>> df[1:] Get subset of a DataFrame >>> df.sort_values(by='Country')Sort by the values along an axis
The Pandas library is built on NumPy and provides easy-to-use >>> df.rank() Assign ranks to entries
data structures and data analysis tools for the Python Country Capital Population 1

Retrieving Series/DataFrame Information


India New Delhi 1303171035 2
programming language. Brazil Brasília 207847528

Basic Information
Use the following import convention: By Position (rows,columns)
Select single value by row & >>> df.shape
>>> import pandas as pd >>> df.iloc[[0],[0]] >>> df.index Describe index
column >>> df.columns Describe DataFrame columns
Pandas Data Structures
'Belgium'
>>> df.info() Info on DataFrame
>>> df.iat([0],[0]) >>> df.count() Number of non-NA values
Series 'Belgium'
Summary
A one-dimensional labeled a 3 By Label
>>> df.loc[[0], ['Country']] Select single value by row & >>> df.sum() Sum of values
array capable of holding any b -5
column labels >>> df.cumsum() Cummulative sum of values
'Belgium' Minimum/maximum values
data type c 7 >>> df.min()/df.max()
Index >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax()Minimum/Maximum index value
d 4 'Belgium' >>> df.describe() Summary statistics
>>> df.mean() Mean of values
By Label/Position >>> df.median() Median of values
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
>>> df.ix[2] Select single row of
DataFrame Country Brazil subset of rows Applying Functions
Capital Brasília
Columns Population 207847528 >>> f = lambda x: x*2
Select a single column of Apply function
Country Capital Population A two-dimensional labeled >>> df.ix[:,'Capital'] subset of columns
>>> df.apply(f)
Apply function element-wise
>>> df.applymap(f)
0 Belgium Brussels 11190846 data structure with columns 0 Brussels 1
New Delhi 2
of potentially different types Brasília Data Alignment
1 India New Delhi1303171035
Index Select rows and columns
2 Brazil Brasília 207847528
>>> df.ix[1,'Capital'] Internal Data Alignment
'New Delhi'
NA values are introduced in the indices that don’t overlap:
Boolean Indexing
>>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s[~(s > 1)] Series s where value is not >1 >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3
>>> df[df['Population']>1200000000] Use filter to adjust DataFrame
'Population': [11190846, 1303171035, 207847528]} a 10.0
b NaN
>>> df = pd.DataFrame(data, c 5.0
columns=['Country', 'Capital', 'Population'])
>>> s['a'] = 6 Set index a of Series s to 6 d 7.0

I/O Arithmetic Operations with Fill Methods


You can also do the internal data alignment yourself with
Read and Write to CSV Read and Write to SQL Query or Database Table
the help of the fill methods:
>>>pd.read_csv( , header=None, nrows=5) >>> from sqlalchemy import create_engine
>>> df.to_csv('myDataFrame.csv') >>> engine = create_engine('sqlite:///:memory:') a 10.0
b -5.0
Read and Write to Excel >>> pd.read_sql("SELECT * FROM my_table;", engine)
c 5.0
>>> pd.read_sql_table('my_table', engine) d 7.0
>>> pd.read_excel( ) >>> pd.read_sql_query("SELECT * FROM my_table;", engine)
>>> df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Read multiple sheets from the same file read_sql()is a convenience wrapper around read_sql_table() and
>>> xlsx = pd.ExcelFile( read_sql_query()
)
>>> df = pd.read_excel(xlsx, 'Sheet1') >>> df.to_sql('myDf', engine)

You might also like