Pandas - Colab
Pandas - Colab
keyboard_arrow_down Series
import numpy as np
import pandas as pd
labels=['a','b','c']
my_data=[10,20,30]
arr=np.array(my_data)
d={'a':10,'b':20,'c':30}
pd.Series(data=my_data)
0 10
1 20
2 30
dtype: int64
pd.Series(data=my_data,index=labels)
a 10
b 20
c 30
dtype: int64
pd.Series(my_data,labels)
a 10
b 20
c 30
dtype: int64
pd.Series(arr,labels)
a 10
b 20
c 30
dtype: int64
pd.Series(d)
a 10
b 20
c 30
dtype: int64
labels
pd.Series(data=[sum,print,len])
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 1/9
04/08/2024, 18:17 Pandas - Colab
ser1=pd.Series([1,2,3,4],['USA','Germany','Italy','Japan'])
ser1
USA 1
Germany 2
Italy 3
Japan 4
dtype: int64
ser2=pd.Series([1,2,5,4],['USA','Germany','Italy','Japan'])
ser2
USA 1
Germany 2
Italy 5
Japan 4
dtype: int64
ser1['USA']
ser3=pd.Series(data=labels)
ser3[0]
'a'
ser1+ser2
USA 2
Germany 4
Italy 8
Japan 8
dtype: int64
ser2
USA 1
Germany 2
Italy 5
Japan 4
dtype: int64
np.random.seed(101)
df =pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
df
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 2/9
04/08/2024, 18:17 Pandas - Colab
W X Y Z
Distributions
2-d distributions
Values
df['W']
A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
type(df['W'])
pandas.core.series.Series
def __init__(data=None, index=None, dtype: Dtype | None=None, name=None, copy: bool |
None=None, fastpath: bool=False) -> None
Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
i i d t ( tl t d N N)
type(df)
pandas.core.frame.DataFrame
def __init__(data=None, index: Axes | None=None, columns: Axes | None=None, dtype: Dtype |
None=None, copy: bool | None=None) -> None
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 3/9
04/08/2024, 18:17 Pandas - Colab
df[['W','Z']]
W Z
A 2.706850 0.503826
B 0.651118 0.605965
C -2.018168 -0.589001
D 0.188695 0.955057
E 0.190794 0.683509
Distributions
2-d distributions
Values
df['new']=df['W']+df['Y']
df
W X Y Z new
# df.drop('new',axis=1)
df.drop('new',axis=1,inplace=True)
df
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 4/9
04/08/2024, 18:17 Pandas - Colab
W X Y Z
# df.drop('E',axis=0)
df.drop('E')
W X Y Z
df.shape
(5, 4)
df
W X Y Z
df['W']
A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
df[['X','Z']]
X Z
A 0.628133 0.503826
B -0.319318 0.605965
C 0.740122 -0.589001
D -0.758872 0.955057
E 1.978757 0.683509
df
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 5/9
04/08/2024, 18:17 Pandas - Colab
W X Y Z
df.loc['A']
W 2.706850
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64
df.iloc[2]
W -2.018168
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64
df.loc['C']
W -2.018168
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64
df.loc['B','Y']
-0.8480769834036315
df
W X Y Z
df.loc[['A','B'],['W','Y']]
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 6/9
04/08/2024, 18:17 Pandas - Colab
W Y
A 2.706850 0.907969
B 0.651118 -0.848077
Distributions
2-d distributions
Values
outside
outside
inside
[1, 2, 3, 1, 2, 3]
list(zip(outside,inside))
[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]
hier_index
MultiIndex([('G1', 1),
('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)
df=pd.DataFrame(randn(6,2),hier_index,['A','B'])
df
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 7/9
04/08/2024, 18:17 Pandas - Colab
A B
G1 1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528
G2 1 0.166905 0.184502
2 0.807706 0.072960
3 0.638787 0.329646
Distributions
2-d distributions
Values
df.loc['G1'].loc[1]
A 0.302665
B 1.693723
Name: 1, dtype: float64
df.index.names
FrozenList([None, None])
df.index.names=['Groups','Num']
df
A B
Groups Num
G1 1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528
G2 1 0.166905 0.184502
2 0.807706 0.072960
3 0.638787 0.329646
df.loc['G1'].loc[2]['B']
-1.1591194155484297
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 8/9
04/08/2024, 18:17 Pandas - Colab
df.loc['G2'].loc[3]['B']
0.32964629880452445
df.xs
pandas.core.generic.NDFrame.xs
def xs(key: IndexLabel, axis: Axis=0, level: IndexLabel=None, drop_level: bool_t=True) ->
NDFrameT
Parameters
df
A B
Groups Num
G1 1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528
G2 1 0.166905 0.184502
2 0.807706 0.072960
3 0.638787 0.329646
df.loc['G1']
A B
Num
1 0.302665 1.693723
2 1 706086 1 159119
https://colab.research.google.com/drive/1P8SeKK8QO7iDI1uHnkT5N935ZNnfjmfD#printMode=true 9/9