0% found this document useful (0 votes)

22 views

Pandas Cheat Sheet........

This document is a cheat sheet for the pandas DataFrame object, providing essential commands and methods for data manipulation in Python. It covers importing necessary libraries, loading data from various sources, creating DataFrames, and performing operations such as filtering, selecting, and saving data. The document also includes examples for working with Series and DataFrames, along with common methods for data analysis.

Uploaded by

abdelrahmanmostafamohamedfouad

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Pandas Cheat Sheet........

Uploaded by

abdelrahmanmostafamohamedfouad

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

DATA AND AI

Pandas
Cheat Sheet
Shwetank Singh
GritSetGrow - GSGLearn.com

gsglearn.com
Cheat Sheet: The pandas DataFrame Object

Preliminaries Get your data into a DataFrame

Start by importing these Python modules Load a DataFrame from a CSV file
import numpy as np df = pd.read_csv('file.csv')# often works
df = pd.read_csv(‘file.csv’, header=0,
import matplotlib.pyplot as plt
import pandas as pd index_col=0, quotechar=’”’,sep=’:’,
from pandas import DataFrame, Series na_values = [‘na’, ‘-‘, ‘.’, ‘’])
Note: these are the recommended import aliases Note: refer to pandas docs for all arguments

From inline CSV text to a DataFrame

from StringIO import StringIO # python2.7
The conceptual model #from io import StringIO # python 3
data = """, Animal, Cuteness, Desirable
DataFrame object: The pandas DataFrame is a two- row-1, dog, 8.7, True
row-2, bat, 2.6, False"""
dimensional table of data with column and row indexes.
df = pd.read_csv(StringIO(data),
The columns are made up of pandas Series objects.
header=0, index_col=0,
skipinitialspace=True)
Column index (df.columns) Note: skipinitialspace=True allows a pretty layout

Load DataFrames from a Microsoft Excel file

data

data
data

data

# Each Excel sheet in a Python dictionary

Row index

workbook = pd.ExcelFile('file.xlsx')
(df.index)

dictionary = {}
of

of
of

for sheet_name in workbook.sheet_names:

df = workbook.parse(sheet_name)
Series

Series

Series
Series

Series

dictionary[sheet_name] = df
Note: the parse() method takes many arguments like
read_csv() above. Refer to the pandas documentation.

Series object : an ordered, one-dimensional array of Load a DataFrame from a MySQL database
data with an index. All the data in a Series is of the import pymysql
same data type. Series arithmetic is vectorised after first from sqlalchemy import create_engine
aligning the Series index for each of the operands. engine = create_engine('mysql+pymysql://'
s1 = Series(range(0,4)) # -> 0, 1, 2, 3 s2 = +'USER:PASSWORD@localhost/DATABASE')
Series(range(1,5)) # -> 1, 2, 3, 4 s3 = s1 + s2 # -> df = pd.read_sql_table('table', engine)
1, 3, 5, 7 s4 = Series(['a','b'])*3 # -> 'aaa','bbb'
Data in Series then combine into a DataFrame
The index object: The pandas Index provides the axis labels # Example 1 ...
for the Series and DataFrame objects. It can only contain s1 = Series(range(6))
hashable objects. A pandas Series has one Index; and a s2 = s1 * s1
DataFrame has two Indexes. s2.index = s2.index + 2# misalign indexes
# --- get Index from Series and DataFrame df = pd.concat([s1, s2], axis=1)
idx = s.index idx = df.columns idx = df.index
# Example 2 ... s3 = Series({'Tom':1, 'Dick':4,
# the column index 'Har':9}) s4 = Series({'Tom':3, 'Dick':2, 'Mar':5})
# the row index df = pd.concat({'A':s3, 'B':s4 }, axis=1)

# --- some Index attributes b = Note: 1st method has in integer column labels
idx.is_monotonic_decreasing b = Note: 2nd method does not guarantee col order
idx.is_monotonic_increasing b = Note: index alignment on DataFrame creation
idx.has_duplicates
Get a DataFrame from data in a Python dictionary
i = idx.nlevels # multi-level indexes
# default --- assume data is in columns
# --- some Index methods df = DataFrame({
a = idx.values() l = # get as numpy array 'col0' : [1.0, 2.0, 3.0, 4.0],
idx.tolist() # get as a python list 'col1' : [100, 200, 300, 400]
idx = idx.astype(dtype)# change data type })
b = idx.equals(o) # check for equality
idx = idx.union(o) # union of two indexes
i = idx.nunique() # number unique labels
label = idx.min() # minimum label #
label = idx.max() maximum label

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
1
Get a DataFrame from data in a Python dictionary
# --- use helper method for data in rows
Working with the whole DataFrame
df = DataFrame.from_dict({ # data by row
'row0' : {'col0':0, 'col1':'A'},
'row1' : {'col0':1, 'col1':'B'} Peek at the DataFrame contents
df.info() n = 4 dfh # index & data types
}, orient='index')
= df.head(n) dft =
df.tail(n) # get first n rows
df = DataFrame.from_dict({ # data by row
# get last n rows
'row0' : [1, 1+1j, 'A'],
'row1' : [2, 2+2j, 'B'] dfs = df.describe() # summary stats cols
top_left_corner_df = df.iloc[:5, :5]
}, orient='index')
DataFrame non-indexing attributes
Create play/fake data (useful for testing)
# --- simple dfT = df.T # transpose rows and cols
df = DataFrame(np.random.rand(50,5)) l = df.axes # list row and col indexes
(r, c) = df.axes # from above
s = df.dtypes # Series column data types
# --- with a time-stamp row index:
b = df.empty # True for empty DataFrame
df = DataFrame(np.random.rand(500,5))
i = df.ndim # number of axes (2)
df.index = pd.date_range('1/1/2006',
t = df.shape # (row-count, column-count)
periods=len(df), freq='M')
(r, c) = df.shape # from above
i =row-count
# df.size * column-count a = df.values #
# --- with alphabetic row and col indexes
get a numpy array for df
import string import random r = 52 # note:
min r is 1; max r is 52 c = 5 df =
DataFrame(np.random.randn(r, c), DataFrame utility methods
dfc = df.copy() # copy a DataFrame dfr =
df.rank() # rank each col (default) dfs =
df.sort() # sort each col (default) dfc =
columns = ['col'+str(i) for i in df.astype(dtype) # type conversion
range(c)],
index = list((string.uppercase + DataFrame iteration methods
string.lowercase)[0:r])) df.iteritems()# (col-index, Series) pairs
df['group'] = list( df.iterrows() # (row-index, Series) pairs
''.join(random.choice('abcd')
for _ in range(r))
) # example ... iterating over columns
for (name, series) in df.iteritems():
print('Col name: ' + str(name))
print('First value: ' +
Saving a DataFrame
str(series.iat[0]) + '\n')

Saving a DataFrame to a CSV file Maths on the whole DataFrame (not a complete list)
df.to_csv('name.csv', encoding='utf-8') df = df.abs() # absolute values
df = df.add(o) # add df, Series or value
Saving DataFrames to an Excel Workbook s = df.count() # non NA/null values
from pandas import ExcelWriter writer = df(cols
# = df.cummax()
default axis)
ExcelWriter('filename.xlsx') df(cols
# = df.cummin()
default axis)
df1.to_excel(writer,'Sheet1') df(cols
# = df.cumsum()
default axis)
df2.to_excel(writer,'Sheet2') writer.save() df = df.cumprod() # (cols default axis)
df = df.diff() # 1st diff (col def axis)
df = df.div(o) # div by df, Series, value
df = df.dot(o) # matrix dot product
Saving a DataFrame to MySQL s = df.max() # max of axis (col def)
import pymysql s = df.mean() # mean (col default axis)
from sqlalchemy import create_engine s = df.median()# median (col default)
e = create_engine('mysql+pymysql://' + s = df.min() # min of axis (col def)
'USER:PASSWORD@localhost/DATABASE') df = df.mul(o) # mul by df Series val
df.to_sql('TABLE',e, if_exists='replace') s = df.sum() # sum axis (cols default)
Note: if_exists ! 'fail', 'replace', 'append' Note: The methods that return a series default to
working on columns.
Saving a DataFrame to a Python dictionary
dictionary = df.to_dict() DataFrame filter/select rows or cols on label info
df = df.filter(items=['a', 'b']) # by col df =
Saving a DataFrame to a Python string df.filter(items=[5], axis=0) #by row df =
string = df.to_string() df.filter(like='x') # keep x in col df =
df.filter(regex='x') # regex in col df =
Note: sometimes may be useful for debugging df.select(crit=(lambda x:not x%5))#r
Not : select takes a Boolean function, for cols: axis=1
e : filter defaults to cols; select defaults to rows
Not
Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
e 2
Columns value set based on criteria
Working with Columns df['b']=df['a'].where(df['a']>0,other=0)
df['d']=df['a'].where(df.b!=0,other=df.c)

A DataFrame column is a pandas Series object Note: where other can be a Series or a scalar

Data type conversions

Get column index and labels
s = df['col'].astype(str) # Series dtype
idx = df.columns # get col index label = na = df['col'].values # numpy array
df.columns[0] # 1st col label lst = pl = df['col'].tolist() # python list
df.columns.tolist() # get as a list
Note: useful dtypes for Series conversion: int, float, str
Change column labels Trap: index lost in conversion from Series to array or list
df.rename(columns={'old':'new'},
Common column-wide methods/attributes
inplace=True)
df = df.rename(columns={'a':1,'b':'x'}) value = df['col'].dtype # type of data
value = df['col'].size # col dimensions
value = df['col'].count()# non-NA count
Selecting columns
value = df['col'].sum() value = df['col'].prod()
s = df['colName'] # select col to Series
value = df['col'].min() value = df['col'].max()
df = df[['colName']] # select col to df
value = df['col'].mean() value =
df = df[['a','b']] # select 2 or more df['col'].median() value =
df = df[['c','a','b']]# change order df['col'].cov(df['col2'])
s = df[df.columns[0]] # select by number
df = df[df.columns[[0, 3, 4]] # by number
s = df.pop('c') # get col & drop from df s = s df['col'].describe()
= df['col'].value_counts()
Selecting columns with Python attributes
s = df.a # same as s = df['a'] Find index label for min/max values in column
# cannot create new columns by attribute label = df['col1'].idxmin()
df.existing_col = df.a / df.b label = df['col1'].idxmax()
df['new_col'] = df.a / df.b
Trap: column names must be valid identifiers. Common column element-wise methods
s = df['col'].isnull()
Adding new columns to a DataFrame s = df['col'].notnull() # not isnull()
df['new_col'] = range(len(df)) s = df['col'].astype(float)
df['new_col'] = np.repeat(np.nan,len(df)) s = df['col'].round(decimals=0)
df['random'] = np.random.rand(len(df)) s = df['col'].diff(periods=1)
df['index_as_col'] = df.index s = df['col'].shift(periods=1)
df1[['b','c']] = df2[['e','f']] s = df['col'].to_datetime()
df3 = df1.append(other=df2) s = df['col'].fillna(0) # replace NaN w 0
s = df['col'].cumsum()
Trap: When adding an indexed pandas object as a new s = df['col'].cumprod()
column, only items from the new series that have a s = df['col'].pct_change(periods=4)
corresponding index in the DataFrame will be added. s = df['col'].rolling_sum(periods=4,
The receiving DataFrame is not extended to
accommodate the new series. To merge, see below. window=4)
Trap: when adding a python list or numpy array, the Note: also rolling_min(), rolling_max(), and many more.
column will be added by integer position.
Swap column contents – change column order Append a column of row sums to a DataFrame
df[['B', 'A']] = df[['A', 'B']] df['Total'] = df.sum(axis=1)
Note: also means, mins, maxs, etc.
Dropping columns (mostly by label)
df = df.drop('col1', axis=1) Multiply every column in DataFrame by Series
df.drop('col1', axis=1, inplace=True) df = df.mul(s, axis=0) # on matched rows
df = df.drop(['col1','col2'], axis=1)
s = df.pop('col') # drops from frame Note: also add, sub, div, etc.
del df['col'] # even classic python works
df.drop(df.columns[0], inplace=True) Selecting columns with .loc, .iloc and .ix
Vectorised arithmetic on columns df
df =
= df.loc[:,
df.iloc[:, 'col1':'col2']
0:2] # inclusive
# exclusive
df['proportion']=df['count']/df['total']
df['percent'] = df['proportion'] * 100.0 Get the integer position of a column index label
j = df.columns.get_loc('col_name')
Apply numpy mathematical functions to columns
df['log_data'] = np.log(df['col1'])
Test if column index values are unique/monotonic
df['rounded'] = np.round(df['col2'], 2)
if df.columns.is_unique: pass # ... b =
df.columns.is_monotonic_increasing b =
df.columns.is_monotonic_decreasing
Note: Many more mathematical functions

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
3
Select a slice of rows by label/index
Working with rows [inclusive-from : inclusive–to [ : step]]
df = df['a':'c'] # rows 'a' through 'c'
Get the row index and labels Trap: doesn't work on integer labelled rows
idx = df.index # get row index label =
df.index[0] # 1st row label lst = df.index.tolist() Append a row of column totals to a DataFrame
# get as a list # Option 1: use dictionary comprehension
sums = {col: df[col].sum() for col in df}
sums_df = DataFrame(sums,index=['Total']) df
Change the (row) index = df.append(sums_df)
df.index
# new ad= hoc
idx index
df.index = range(len(df)) # set with list
df = df.reset_index() # replace old w new # Option 2: All done with pandas
# note: old index stored as a col in df df = df.append(DataFrame(df.sum(),
df = df.reindex(index=range(len(df))) columns=['Total']).T)
df = df.set_index(keys=['r1','r2','etc'])
df.rename(index={'old':'new'}, Iterating over DataFrame rows
inplace=True) for (index, row) in df.iterrows(): # pass
Trap: row data type may be coerced.
Adding rows
df = original_df.append(more_rows_in_df) Sorting DataFrame rows values
Hint: convert to a DataFrame and then append. Both df = df.sort(df.columns[0],
DataFrames should have same column labels. ascending=False)
Dropping rows (by name) df.sort(['col1', 'col2'], inplace=True)
df = df.drop('row_label')
df = df.drop(['row1','row2']) # multi-row Random selection of rows
import random as r
Boolean row selection by values in a column k = 20 # pick a number
df = df[df['col2'] >= 0.0] selection = r.sample(range(len(df)), k)
df = df[(df['col3']>=1.0) | df_sample = df.iloc[selection, :]
Note: this sample is not sorted
(df['col1']<0.0)]
df = df[df['col'].isin([1,2,5,7,11])] Sort DataFrame by its row index
df = df[~df['col'].isin([1,2,5,7,11])] df.sort_index(inplace=True) # sort by row
df = df[df['col'].str.contains('hello')] df = df.sort_index(ascending=False)
Trap: bitwise "or", "and" “not” (ie. | & ~) co-opted to be
Boolean operators on a Series of Boolean Drop duplicates in the row index
Trap: need parentheses around comparisons. df['index'] = df.index # 1 create new col
df = df.drop_duplicates(cols='index',
Selecting rows using isin over multiple columns take_last=True)# 2 use new col
del df['index'] # 3 del the col
# fake up some data
df.sort_index(inplace=True)# 4 tidy up
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = pd.DataFrame(data)
Test if two DataFrames have same row index
# multi-column isin len(a)==len(b) and all(a.index==b.index)
lf = {1:[1, 3], 3:[8, 27]} # look for
f = df[df[list(lf)].isin(lf).all(axis=1)] Get the integer position of a row or col index label
i = df.index.get_loc('row_label')
Selecting rows using an index
idx = df[df['col'] >= 2].index Trap: index.get_loc() returns an integer for a unique
match. If not a unique match, may return a slice or
print(df.ix[idx])
mask.
Select a slice of rows by integer position
Get integer position of rows that meet condition
[inclusive-from : exclusive-to [: step]] a = np.where(df['col'] >= 2) #numpy array
default start is 0; default end is len(df)
df = df[:] df = # copy DataFrame # rows Test if the row index values are unique/monotonic
df[0:2] df = 0 and 1 # the last row #
df[-1:] df = row 2 (the third row) # all if df.index.is_unique: pass # ... b =
df[2:3] df = but the last row # every df.index.is_monotonic_increasing b =
df[:-1] df = 2nd row (0 2 ..) df.index.is_monotonic_decreasing
df[::2]
Trap: a single integer without a colon is a column label
for integer numbered columns.

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4
Working with cells In summary: indexes and addresses

Selecting a cell by row and column labels In the main, these notes focus on the simple, single
value = df.at['row', 'col'] level Indexes. Pandas also has a hierarchical or
value = df.loc['row', 'col'] multi-level Indexes (aka the MultiIndex).
value = df['col'].at['row'] # tricky
Note: .at[] fastest label based scalar lookup
•A DataFrame hascolumn
Typically, the two Indexes
index (df.columns) is a list of
Setting a cell by row and column labels strings (observed variable names) or (less
df.at['row, 'col'] = value df.loc['row, commonly) integers (the default is numbered from 0
'col'] = value df['col'].at['row'] = to length-1)
value # tricky • Typically, the row index (df.index) might be:
o Integers - for case or row numbers (default is
Selecting and slicing on labels o numbered from 0 to length-1);
df = df.loc['row1':'row3', 'col1':'col3'] o Strings – for case names; or
Note: the "to" on this slice is inclusive. DatetimeIndex or PeriodIndex – for time series
data (more below)
Setting a cross-section by labels
df.loc['A':'C', 'col1':'col3'] = np.nan
df.loc[1:2,'col1':'col2']=np.zeros((2,2)) Indexing
df.loc[1:2,'A':'C']=othr.loc[1:2,'A':'C'] # --- selecting columns
s = df['col_label'] # scalar
Remember: inclusive "to" in the slice
df = df[['col_label']] # one item list
df = df[['L1', 'L2']] # many item list
Selecting a cell by integer position df = df[index] # pandas Index
value = df.iat[9, 3] value = df.iloc[0, 0] value = df = df[s] # pandas Series
# [row, col] #
df.iloc[len(df)-1,
[row, col]
# --- selecting rows
df = df['from':'inc_to']# label slice
len(df.columns)-1]
df = df[3:7] # integer slice
df = df[df['col'] > 0.5]# Boolean Series
Selecting a range of cells by int position
df = df.iloc[2:4, 2:4] # subset of the df # =single
df df.loc['label']
label # lab list/Series df =
df = df.iloc[:5, :5] # top left corner df = df.loc[container]
df.loc['from':'to']# inclusive slice
s = df.iloc[5, :] # returns row as Series df =
df.iloc[5:6, :] # returns row as row df = df.loc[bs] df # Boolean Series
Note: exclusive "to" – same as python list slicing. = df.iloc[0] # single integer
Setting cell by integer position df = df.iloc[container] # int list/Series
df.iloc[0, 0] = value df = df.iloc[0:5] # exclusive slice
df.iat[7, 8] = value # [row, col] df = df.ix[x] # loc then iloc

Setting cell range by integer position # --- select DataFrame cross-section

df.iloc[0:3, 0:5] = value r and c can be scalar, list, slice df.loc[r, c] #
#
df.iloc[1:3, 1:4] = np.ones((2, 3)) label accessor (row, col) df.iloc[r, c]# integer
df.iloc[1:3, 1:4] = np.zeros((2, 3)) accessor
df.iloc[1:3, 1:4] = np.array([[1, 1, 1], df.ix[r, c] # label access int fallback
df[c].iloc[r]# chained – also for .loc
[2, 2, 2]]) # --- select cell
Remember: exclusive-to in the slice # r and c must be label or integer
df.at[r, c] # fast scalar label accessor
.ix for mixed label and integer position indexing df.iat[r, c] # fast scalar int accessor
value = df.ix[5, 'col1'] df[c].iat[r] # chained – also for .at
df = df.ix[1:5, 'col1':'col3']
# --- indexing methods v
Views and copies = df.get_value(r,
# get by row, col c)
From the manual: Setting a copy can cause subtle df = df.set_value(r,c,v)# set by row, col
errors. The rules about when a view on the data is df = df.xs(key, axis) # get cross-section
returned are dependent on NumPy. Whenever an array df = df.filter(items, like, regex, axis)
of labels or a Boolean vector are involved in the indexing df = df.select(crit, axis)
operation, the result will be a copy. Note: the indexing attributes (.loc, .iloc, .ix, .at .iat) can
be used to get and set values in the DataFrame.
Note: the .loc, iloc and .ix indexing attributes can accept
python slice objects. But .at and .iat do not.
Note: .loc can also accept Boolean Series arguments
Avoid: chaining in the form df[col_indexer][row_indexer]
Trap: label slices are inclusive, integer slices exclusive.

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
5
Joining/Combining DataFrames Groupby: Split-Apply-Combine

Three ways to join two DataFrames: The pandas "groupby" mechanism allows us to split the
• merge (a database/SQL-like join operation) data into groups, apply a function to each group
• concat (stack side by side or one on top of the other) independently and then combine the results.
• combine_first (splice the two together, choosing
Grouping
values from one over the other)
gb = df.groupby('cat') # by one columns
gb = df.groupby(['c1','c2']) # by 2 cols
Merge on indexes gb = df.groupby(level=0) # multi-index gb
df_new = pd.merge(left=df1, right=df2, gb = df.groupby(level=['a','b']) # mi gb
how='outer', left_index=True, print(gb.groups)
right_index=True)
How:How
'left', 'right', 'outer', 'inner' Note: groupby() returns a pandas groupby object
: outer=union/all; inner=intersection Note: the groupby object attribute .groups contains a
Merge on columns dictionary mapping of the groups.
df_new = pd.merge(left=df1, right=df2, Trap: NaN values in the group key are automatically
dropped – there will never be a NA group.
how='left', left_on='col1', Iterating groups – usually not needed
right_on='col2') for name, group in gb:
Trap: When joining on columns, the indexes on the
passed DataFrames are ignored. print (name)
Trap: many-to-many merges on a column can result in print (group)
an explosion of associated data.
Join on indexes (another way of merging) Selecting a group
df_new = df1.join(other=df2, on='col1', dfa = df.groupby('cat').get_group('a') dfb =
df.groupby('cat').get_group('b')
how='outer') Applying an aggregating function
df_new = df1.join(other=df2,on=['a','b'], # apply to a column ... s = df.groupby('cat')
how='outer') ['col1'].sum() s = df.groupby('cat')
Note: DataFrame.join() joins on indexes by default. ['col1'].agg(np.sum) # apply to the every
DataFrame.merge() joins on common columns by default. column in DataFrame s =
Simple concatenation is often the best df.groupby('cat').agg(np.sum) df_summary =
df=pd.concat([df1,df2],axis=0)#top/bottom df df.groupby('cat').describe() df_row_1s =
= df1.append([df2, df3]) #top/bottom df.groupby('cat').head(1)
df=pd.concat([df1,df2],axis=1)#left/right
Note: aggregating functions reduce the dimension by
one – they include: mean, sum, size, count, std, var,
Trap:Note
can end up with duplicate rows or cols sem, describe, first, last, min, max
: concat has an ignore_index parameter Applying multiple aggregating functions
Combine_first gb = df.groupby('cat')
df = df1.combine_first(other=df2)

# apply multiple functions to one column

# multi-combine with python reduce() dfx = gb['col2'].agg([np.sum, np.mean])
df = reduce(lambda x, y: # apply to multiple fns to multiple cols
dfy = gb.agg({
x.combine_first(y),
[df1, df2, df3, df4, df5]) 'cat': np.count_nonzero,
'col1': [np.sum, np.mean, np.std],
Uses the non-null values from df1. The index of the 'col2': [np.min, np.max]
combined DataFrame will be the union of the indexes
})
from df1 and df2.
Note: gb['col2'] above is shorthand for
df.groupby('cat')['col2'], without the need for regrouping.
Transforming functions
# transform to group z-scores, which have
# a group mean of 0, and a std dev of 1.
zscore = lambda x: (x-x.mean())/x.std()
dfz = df.groupby('cat').transform(zscore)

# replace missing data with group mean

mean_r = lambda x: x.fillna(x.mean())
dfm = df.groupby('cat').transform(mean_r)
Note: can apply multiple transforming functions in a
manner similar to multiple aggregating functions above,

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
6
Applying filtering functions
Working with dates, times and their indexes
Filtering functions allow you to make selections based on
whether each group meets specified criteria
# select groups with more than 10 members
eleven = lambda x: (len(x['col1']) >= 11) df11 = Dates and time – points and spans
df.groupby('cat').filter(eleven) With its focus on time-series data, pandas has a suite of
tools for managing dates and time: either as a point in
Group by a row index (non-hierarchical index) time (a Timestamp) or as a span of time (a Period).
df = df.set_index(keys='cat') s = t = pd.Timestamp('2013-01-01')
df.groupby(level=0)['col1'].sum() dfg = t = pd.Timestamp('2013-01-01 21:15:06')
df.groupby(level=0).sum() t = pd.Timestamp('2013-01-01 21:15:06.7')
p = pd.Period('2013-01-01', freq='M')
Note: Timestamps should be in range 1678 and 2261
years. (Check Timestamp.max and Timestamp.min).
Pivot Tables
A Series of Timestamps or Periods
ts = ['2015-04-01 13:17:27',
Pivot '2014-04-02 13:17:29']
Pivot tables move from long format to wide format data
df = DataFrame(np.random.rand(100,1)) # Series of Timestamps (good)
df.columns = ['data'] # rename col s = pd.to_datetime(pd.Series(ts))
df.index = pd.period_range('3/3/2014',
periods=len(df), freq='M') # Series of Periods (often not so good)
df['year'] = df.index.year s = pd.Series( [pd.Period(x, freq='M')
df['month'] = df.index.month for x in ts] )
s = pd.Series(
# pivot to wide format pd.PeriodIndex(ts,freq='S'))
df = df.pivot(index='year', Note: While Periods make a very useful index; they may
columns='month', values='data') be less useful in a Series.

# melt to long format From non-standard strings to Timestamps

dfm = df t = ['09:08:55.7654-JAN092002',
dfm['year'] = dfm.index '15:42:02.6589-FEB082016']
dfm = pd.melt(df, id_vars=['year'], s = pd.Series(pd.to_datetime(t,
var_name='month', value_name='data') format="%H:%M:%S.%f-%b%d%Y"))
Also: %B = full month name; %m = numeric month;
# unstack to long format # reset index to %y = year without century; and more …
remove multi-level index Dates and time – stamps and spans as indexes
dfu=df.unstack().reset_index(name='data') An index of Timestamps is a DatetimeIndex.
An index of Periods is a PeriodIndex.
Value counts date_strs = ['2014-01-01', '2014-04-01',
s = df['col1'].value_counts() '2014-07-01', '2014-10-01']

dti = pd.DatetimeIndex(date_strs)

pid = pd.PeriodIndex(date_strs, freq='D') pim

= pd.PeriodIndex(date_strs, freq='M') piq =
pd.PeriodIndex(date_strs, freq='Q')

print (pid[1] - pid[0]) print # 90 days

(pim[1] - pim[0]) print # 3 months
(piq[1] - piq[0]) # 1 quarter

time_strs = ['2015-01-01 02:10:40.12345',

'2015-01-01 02:10:50.67890']
pis = pd.PeriodIndex(time_strs, freq='U')

df.index = pd.period_range('2015-01',
periods=len(df), freq='M')

dti = pd.to_datetime(['04-01-2012'],
dayfirst=True) # Australian date format
pi = pd.period_range('1960-01-01',
'2015-12-31', freq='M')
Hint: unless you are working in less than seconds,
prefer PeriodIndex over DateTimeImdex.

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
7
Period frequency constants (not a complete list) Upsampling and downsampling
Name Description # upsample from quarterly to monthly
U Microsecond Millisecond Second pi = pd.period_range('1960Q1',
Minute Hour Calendar day periods=220, freq='Q')
L
S Business day Week ending on … df = DataFrame(np.random.rand(len(pi),5),
T Calendar start of month Calendar index=pi)
H end of month Quarter start with dfm = df.resample('M', convention='end') #
use ffill or bfill to fill with values
D year starting
B (QS – December)
# downsample from monthly to quarterly
W-{MON, TUE, …} Quarter end with year ending (Q dfq = dfm.resample('Q', how='sum')
MS – December)
M Year start (AS - December)
Time zones
QS-{JAN, FEB, …} Year end (A - December) t = ['2015-06-30 00:00:00',
Q-{JAN, FEB, …} '2015-12-31 00:00:00']
dti = pd.to_datetime(t
AS-{JAN, FEB, …} ).tz_localize('Australia/Canberra')
dti = dti.tz_convert('UTC')
A-{JAN, FEB, …}
ts = pd.Timestamp('now',
From DatetimeIndex to Python datetime objects tz='Europe/London')
dti = pd.DatetimeIndex(pd.date_range(
# get a list of all time zones
import pyzt
start='1/1/2011', periods=4, freq='M')) for tz in pytz.all_timezones:
s = Series([1,2,3,4], index=dti) print tz
na = dti.to_pydatetime()
#numpy array na = s.index.to_pydatetime() Note: by default, Timestamps are created without time
#numpy array zone information.
Row selection with a time-series index
Frome Timestamps to Python dates or times # start with the play data above idx =
df['date'] = [x.date() for x in df['TS']] df['time'] pd.period_range('2015-01',
= [x.time() for x in df['TS']]
Note: converts to datatime.date or datetime.time. But periods=len(df), freq='M')
does not convert to datetime.datetime. df.index = idx
From DatetimeIndex to PeriodIndex and back february_selector = (df.index.month == 2)
df = DataFrame(np.random.randn(20,3))
february_data = df[february_selector]
df.index = pd.date_range('2015-01-01',

periods=len(df), freq='M') q1_data = df[(df.index.month >= 1) &

dfp = df.to_period(freq='M') (df.index.month <= 3)]
dft = dfp.to_timestamp()
Note: from period to timestamp defaults to the point in mayornov_data = df[(df.index.month == 5)
time at the start of the period. | (df.index.month == 11)]
Working with a PeriodIndex totals = df.groupby(df.index.year).sum()
pi = pd.period_range('1960-01','2015-12', A lso: year, month, day [of month], hour, minute, second,
freq='M') dayofweek [Mon=0 .. Sun=6], weekofmonth, weekofyear
[numbered from 1], week starts on Monday], dayofyear
na = pi.values # numpy array of integers
[from 1], …
lp = pi.tolist() # python list of Periods
sp = Series(pi)# pandas Series of Periods The Series.dt accessor attribute
ss = Series(pi).astype(str) # S of strs DataFrame columns that contain datetime-like objects
ls = Series(pi).astype(str).tolist() can be manipulated with the .dt accessor attribute
Get a range of Timestamps t = ['2012-04-14 04:06:56.307000',
dr = pd.date_range('2013-01-01', '2011-05-14 06:14:24.457000',
'2010-06-14 08:23:07.520000']
'2013-12-31', freq='D')
# a Series of time stamps
Error handling with dates s = pd.Series(pd.to_datetime(t))
# 1st example returns string not Timestamp
t = pd.to_datetime('2014-02-30') print(s.dtype) # datetime64[ns]
# 2nd example returns NaT (not a time) print(s.dt.second) # 56, 24, 7
t = pd.to_datetime('2014-02-30', print(s.dt.month) # 4, 5, 6

coerce=True) # a Series of time periods

# NaT like NaN tests True for isnull() s = pd.Series(pd.PeriodIndex(t,freq='Q'))
b = pd.isnull(t) # --> True print(s.dtype) # datetime64[ns]
print(s.dt.quarter) # 2, 2, 2
The tail of a time-series DataFrame print(s.dt.year) # 2012, 2011, 2010
df = df.last("5M") # the last five months
Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
8
Working with missing and non-finite data Working with Categorical Data

Working with missing data Categorical data

Pandas uses the not-a-number construct (np.nan and The pandas Series has an R factors-like data type for
float('nan')) to indicate missing data. The Python None encoding categorical data.
can arise in data as well. It is also treated as missing s = Series(['a','b','a','c','b','d','a'],
data; as is the pandas not-a-time construct dtype='category')
(pandas.NaT). df['B'] = df['A'].astype('category')
Missing data in a Series Note: the key here is to specify the "category" data type.
s = Series( [8,None,float('nan'),np.nan]) Note: categories will be ordered on creation if they are
sortable. This can be turned off. See ordering below.
#[8, NaN, NaN, NaN] Convert back to the original data type
s.isnull() #[False, True, True, True] s = Series(['a','b','a','c','b','d','a'],
s.notnull()#[True, False, False, False]
s.fillna(0)#[8, 0, 0, 0] dtype='category')
s = s.astype('string')
Missing data in a DataFrame
df = df.dropna() # drop all rows with NaN df = Ordering, reordering and sorting
df.dropna(axis=1) # same for cols s = Series(list('abc'), dtype='category')
df=df.dropna(how='all') #drop all NaN row print (s.cat.ordered)
df=df.dropna(thresh=2) # drop 2+ NaN in r # s=s.cat.reorder_categories(['b','c','a'])
only drop row if NaN in a specified col df = s = s.sort()
df.dropna(df['col'].notnull()) s.cat.ordered = False
Trap: category must be ordered for it to be sorted
Recoding missing data
df.fillna(0, inplace=True) # np.nan !! 0 Renaming categories
s = df['col'].fillna(0) # np.nan 0 s = Series(list('abc'), dtype='category')
df = df.replace(r'\s+', np.nan, s.cat.categories = [1, 2, 3] # in place
regex=True) # white space! np.nan s = s.cat.rename_categories([4,5,6])
# using a comprehension ...
Non-finite numbers s.cat.categories = ['Group ' + str(i)
With floating point numbers, pandas provides for for i in s.cat.categories]
positive and negative infinity. Trap: categories must be uniquely named
s = Series([float('inf'), float('-inf'),
np.inf, -np.inf]) Adding new categories
Pandas treats integer comparisons with plus or minus s = s.cat.add_categories([4])
infinity as expected.
Testing for finite numbers Removing categories
(using the data from the previous example) s = s.cat.remove_categories([4])
b = np.isfinite(s) s.cat.remove_unused_categories() #inplace

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
9
Working with strings Basic Statistics

Working with strings Summary statistics

# assume that df['col'] is series of s = df['col1'].describe()
strings s = df['col'].str.lower() s = df1 = df.describe()
df['col'].str.upper() s = df['col'].str.len()
DataFrame – key stats methods
df.corr() # pairwise correlation cols #
df.cov() pairwise covariance cols #
# the next set work like Python df.kurt() kurtosis over cols (def) #
df['col'] += 'suffix' # append df.mad() mean absolute deviation #
df['col'] *= 2 # duplicate df.sem() standard error of mean #
s = df['col1'] + df['col2'] # concatenate df.var() variance over cols (def)
Most python string functions are replicated in the pandas
DataFrame and Series objects. Value counts
s = df['col1'].value_counts()
Regular expressions
s = df['col'].str.contains('regex') s =
df['col'].str.startswith('regex') s = Cross-tabulation (frequency count)
df['col'].str.endswith('regex') s = ct = pd.crosstab(index=df['a'],
df['col'].str.replace('old', 'new') df['b'] = cols=df['b'])
df.a.str.extract('(pattern)')
Note: pandas has many more regex methods. Quantiles and ranking
quants = [0.05, 0.25, 0.5, 0.75, 0.95]
q = df.quantile(quants)
r = df.rank()

Histogram binning
count, bins = np.histogram(df['col1'])
count, bins = np.histogram(df['col'],
bins=5)
count, bins = np.histogram(df['col1'],
bins=[-3,-2,-1,0,1,2,3,4])

Regression
import statsmodels.formula.api as sm
result = sm.ols(formula="col1 ~ col2 +
col3", data=df).fit()
print (result.params)
print (result.summary())

Smoothing example using rolling_apply

k3x5 = np.array([1,2,3,3,3,2,1]) / 15.0
s = pd.rolling_apply(df['col1'],
window=7,
func=lambda x: (x * k3x5).sum(),
min_periods=7, center=True)

Cautionary note

This cheat sheet was cobbled together by bots roaming

the dark recesses of the Internet seeking ursine and
pythonic myths. There is no guarantee the narratives
were captured and transcribed accurately. You use
these notes at your own risk. You have been warned.

Version 2 May 2015 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
10

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Python For Chemists (Christian Hill) (Z-Library)
100% (1)
Python For Chemists (Christian Hill) (Z-Library)
559 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Coursework Assignment Summer
No ratings yet
Coursework Assignment Summer
7 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Pandas Worksheets ALL
100% (1)
Pandas Worksheets ALL
8 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas 1
No ratings yet
Pandas 1
89 pages
Pandas
No ratings yet
Pandas
5 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Pandas
No ratings yet
Pandas
41 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas Summarized Visually in 8
100% (2)
Pandas Summarized Visually in 8
8 pages
Pandas
No ratings yet
Pandas
42 pages
Pandas Cheatsheets 1.0.6 Web Binder PDF
No ratings yet
Pandas Cheatsheets 1.0.6 Web Binder PDF
8 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
No ratings yet
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
26 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
Unit 2
No ratings yet
Unit 2
81 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
Pandas
No ratings yet
Pandas
25 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
1501992967_1496666168_Pandas
No ratings yet
1501992967_1496666168_Pandas
63 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Unit 4
No ratings yet
Unit 4
36 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python pandas
No ratings yet
Python pandas
34 pages
Pandas
No ratings yet
Pandas
16 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Data_Science_Cohort_1_Assignment_1.ipynb
No ratings yet
Data_Science_Cohort_1_Assignment_1.ipynb
53 pages
Python Lab Internals
No ratings yet
Python Lab Internals
5 pages
Comprehending The Statistics of Zomato
No ratings yet
Comprehending The Statistics of Zomato
33 pages
CS3361 Lab Exp
No ratings yet
CS3361 Lab Exp
9 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Synopsis
No ratings yet
Synopsis
9 pages
Python GTU Study Material Presentations Unit-2 24072020062038AM
No ratings yet
Python GTU Study Material Presentations Unit-2 24072020062038AM
18 pages
Python Question
100% (1)
Python Question
37 pages
A Visual Intro To NumPy and Data Representation - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
A Visual Intro To NumPy and Data Representation - Jay Alammar - Visualizing Machine Learning One Concept at A Time
16 pages
AD3411 (2)
No ratings yet
AD3411 (2)
28 pages
Cs3353 Foundations of Data Science Unit V
No ratings yet
Cs3353 Foundations of Data Science Unit V
13 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
Introduction To Python
No ratings yet
Introduction To Python
14 pages
Fast API
No ratings yet
Fast API
14 pages
Fin 2 Sem P
No ratings yet
Fin 2 Sem P
16 pages
Python for Finance Analyze Big Financial Data 1st Edition Yves Hilpisch - Quickly download the ebook to start your content journey
100% (2)
Python for Finance Analyze Big Financial Data 1st Edition Yves Hilpisch - Quickly download the ebook to start your content journey
47 pages
Chapter 14 DataScience
No ratings yet
Chapter 14 DataScience
53 pages
Market Basket Analysis & Recommendation System Using Association Rules
No ratings yet
Market Basket Analysis & Recommendation System Using Association Rules
74 pages
Python Interview Questions and Answers 2021 - Python Training - Edureka
No ratings yet
Python Interview Questions and Answers 2021 - Python Training - Edureka
34 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
ATA Andling - 25 MARKS: D H Pandas
No ratings yet
ATA Andling - 25 MARKS: D H Pandas
102 pages
Python Syllbus by Lokesh
No ratings yet
Python Syllbus by Lokesh
5 pages
Import Numpy as Np
No ratings yet
Import Numpy as Np
5 pages
A Taste of Python Discrete and Fast Fourier Transforms
No ratings yet
A Taste of Python Discrete and Fast Fourier Transforms
11 pages
Optional Lab - Multiple Linear Regression - Coursera
No ratings yet
Optional Lab - Multiple Linear Regression - Coursera
2 pages
(Numpy) - Extended Cheatsheet
No ratings yet
(Numpy) - Extended Cheatsheet
8 pages
Python Review
No ratings yet
Python Review
50 pages
October 22, 2022: Numpy Scipy - Optimize - Future - Scipy - Optimize Scipy - Optimize.optimize Numpy
No ratings yet
October 22, 2022: Numpy Scipy - Optimize - Future - Scipy - Optimize Scipy - Optimize.optimize Numpy
9 pages

Pandas Cheat Sheet........

Uploaded by

Pandas Cheat Sheet........

Uploaded by

DATA AND AI

Preliminaries Get your data into a DataFrame

From inline CSV text to a DataFrame

Load DataFrames from a Microsoft Excel file

# Each Excel sheet in a Python dictionary

for sheet_name in workbook.sheet_names:

Data type conversions

Setting cell range by integer position # --- select DataFrame cross-section

# apply multiple functions to one column

# replace missing data with group mean

# melt to long format From non-standard strings to Timestamps

pid = pd.PeriodIndex(date_strs, freq='D') pim

print (pid[1] - pid[0]) print # 90 days

time_strs = ['2015-01-01 02:10:40.12345',

periods=len(df), freq='M') q1_data = df[(df.index.month >= 1) &

coerce=True) # a Series of time periods

Working with missing data Categorical data

Working with strings Summary statistics

Smoothing example using rolling_apply

This cheat sheet was cobbled together by bots roaming

You might also like