0% found this document useful (0 votes)
12 views

Pandas

Uploaded by

sketki0612
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Pandas

Uploaded by

sketki0612
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Pandas (import pandas as pd )

Data structures

s = pd.Series(np.random.rand(1000), index=range(1000)) # series


index = pd.date_range("13/06/2016", periods=1000) # time index
df = pd.DataFrame(np.zeros((1000, 3)), index=index,
columns=["A", "B", "C"]) # DataFrame

DataFrame

df = pd.read_csv("filename.csv") # read and load CSV file in a DataFrame


raw = df.values # get raw data out of DataFrame object
cols = df.columns # get list of columns headers
df.dtypes # get data types of all columns
df.head(5) # get first 5 rows
df.describe() # get basic statisitics for all columns
df.index # get index column range

#column slicin
# (.loc[] and .ix[] are inclusive of the range of values selected)
df.col_name # select column values as a series by column
name (not optimized)
df[['col_name']] # select column values as a dataframe by
column name (not optimized)
df.loc[:, 'col_name'] # select column values as a series by column
name
df.loc[:, ['col_name']] # select column values as a dataframe by
column name
df.iloc[:, 0] # select by column index
df.iloc[:, [0]] # select by column index, but as a dataframe
df.ix[:, 'col_name'] # hybrid approach with column name
df.ix[:, 0] # hybrid approach with column index

# row slicin
print(df[:2]) # print first 2 rows of the dataframe
df.iloc[0:2, :] # select first 2 rows of the dataframe
df.loc[0:2,'col_name'] # select first 3 rows of the dataframe
df.loc[0:2, ['col_name1', 'col_name3', 'col_name6']] # select first 3 rows of
the 3 different columns
df.iloc[0:2,0:2] # select fisrt 3 rows and first 3 columns
# Again, .loc[] and .ix[] are inclusive

# Dicin
df[ df.col_name < 7 ] # select all rows where col_name
< 7
df[ (df.col_name1 < 7) & (df.col_name2 == 0) ] # combine multiple boolean
indexing conditionals using bit-wise logical operators.
# Regular Python boolean
operators (and, or) cannot be used here.
# Be sure to encapsulate each
conditional in parenthesis to make this work.
df[df.recency < 7] = -100 # writing to slice

You might also like