Day 1 Pandas Library in Python 1729578062
Day 1 Pandas Library in Python 1729578062
Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
Pandas Series
A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.
Importing Pandas
In [1]:
1 import numpy as np
2 import pandas as pd
In [2]:
1 country = ['india','pakistan','usa','nepal','srilanka']
2
3 print(pd.Series(country))
4 '''here pandas is a library,
5 inside the pandas series is a class,
6 you made a object of the class series,
7 and passed a country value in the constructor'''
0 india
1 pakistan
2 usa
3 nepal
4 srilanka
dtype: object
Out[2]:
in series object you will have mainly two parts. one is value and another is index assigned to each value
here data type is object
object is basically string in the pandas
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 1/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [3]:
1 # integers series
2 runs = [13,24,56,78,110]
3 runs_score = pd.Series(runs)
4 runs_score
Out[3]:
0 13
1 24
2 56
3 78
4 110
dtype: int64
In [4]:
1 # custom index
2 marks = [67,57,80,100]
3 subject = ['maths','english','science','hindi']
4
5 '''we want a series in such a way that,
6 marks will be the value where as subject will be the index,
7 for that you have to use the parameter index inside the class Series'''
8
9 pd.Series(marks,index=subject)
10 # here data type of values is integer
Out[4]:
maths 67
english 57
science 80
hindi 100
dtype: int64
In [5]:
Out[5]:
maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 2/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [6]:
Out[6]:
maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64
In [7]:
1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series
Out[7]:
maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64
Series Attributes
In [8]:
1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series
Out[8]:
maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64
size
In [9]:
1 mark_series.size
Out[9]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 3/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
dtype
In [10]:
1 mark_series.dtype
Out[10]:
dtype('int64')
name
In [11]:
1 mark_series.name
Out[11]:
'Himanshu marks'
is_unique
it tells us that whether all the items inside the series are unique or not
In [12]:
1 mark_series.is_unique
Out[12]:
True
In [13]:
1 pd.Series([1,1,2,3,5,6,6]).is_unique
Out[13]:
False
index
In [14]:
1 mark_series.index
Out[14]:
values
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 4/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [15]:
1 mark_series.values
Out[15]:
In [16]:
1 data=pd.read_csv('subs.csv')
2 print(data)
3 type(data)
4 # for now this data type is dataframe
Subscribers gained
0 48
1 57
2 40
3 43
4 44
.. ...
360 231
361 226
362 155
363 144
364 172
Out[16]:
pandas.core.frame.DataFrame
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 5/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [17]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
<class 'pandas.core.series.Series'>
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\2828103470.py:3: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.
print(pd.read_csv('subs.csv', squeeze=True))
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\2828103470.py:4: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.
print(type(pd.read_csv('subs.csv', squeeze=True)))
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 6/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [18]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\1160397978.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.
Out[18]:
In [19]:
1 kohli = pd.read_csv('kohli_ipl.csv')
2 print(kohli)
3 print(type(kohli))
match_no runs
0 1 1
1 2 23
2 3 13
3 4 12
4 5 1
.. ... ...
210 211 0
211 212 20
212 213 73
213 214 25
214 215 7
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 7/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [20]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\3868690441.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.
Out[20]:
pandas.core.series.Series
In [21]:
1 print(pd.read_csv('bollywood.csv'))
2 type(pd.read_csv('bollywood.csv'))
movie lead
0 Uri: The Surgical Strike Vicky Kaushal
1 Battalion 609 Vicky Ahuja
2 The Accidental Prime Minister (film) Anupam Kher
3 Why Cheat India Emraan Hashmi
4 Evening Shadows Mona Ambegaonkar
... ... ...
1495 Hum Tumhare Hain Sanam Shah Rukh Khan
1496 Aankhen (2002 film) Amitabh Bachchan
1497 Saathiya (film) Vivek Oberoi
1498 Company (film) Ajay Devgn
1499 Awara Paagal Deewana Akshay Kumar
Out[21]:
pandas.core.frame.DataFrame
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 8/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [22]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\3858941184.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.
Out[22]:
pandas.core.series.Series
Series methods
head
In [23]:
Out[23]:
match_no
1 1
2 23
3 13
4 12
5 1
Name: runs, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 9/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [24]:
1 subs.head()
Out[24]:
0 48
1 57
2 40
3 43
4 44
Name: Subscribers gained, dtype: int64
In [25]:
Out[25]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
Soni (film) Geetika Vidya Ohlyan
Fraud Saiyaan Arshad Warsi
Bombairiya Radhika Apte
Manikarnika: The Queen of Jhansi Kangana Ranaut
Thackeray (film) Nawazuddin Siddiqui
Name: lead, dtype: object
tail
In [26]:
Out[26]:
match_no
211 0
212 20
213 73
214 25
215 7
Name: runs, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 10/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [27]:
Out[27]:
movie
Raaz (2002 film) Dino Morea
Zameen (2003 film) Ajay Devgn
Waisa Bhi Hota Hai Part II Arshad Warsi
Devdas (2002 Hindi film) Shah Rukh Khan
Kaante Amitabh Bachchan
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, dtype: object
sample
it randomly picks out any one row from the whole data
sample is generated randomly so it is helpful when data is bais, so we can reduce the baisness by
picking the random data with help of sample
In [28]:
1 subs.sample()
Out[28]:
180 93
Name: Subscribers gained, dtype: int64
In [29]:
Out[29]:
movie
31st October (film) Soha Ali Khan
Brothers (2015 film) Akshay Kumar
Fredrick (film) Avinash Dhyani
Banjo (2016 film) Riteish Deshmukh
Jhootha Kahin Ka Rishi Kapoor
Name: lead, dtype: object
value_counts()
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 11/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [30]:
1 # suppose we want that each actor has done how many movies
2 # basically we want the frequency count of the each value
3 movies.value_counts()
4 # it will be in descending order
Out[30]:
Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 566, dtype: int64
In [31]:
Out[31]:
Sharib Hashmi 1
Ravi Kishan 1
Sagar Bhangade 1
Harish Chabbra 1
Bidita Bag 1
..
Sanjay Dutt 26
Salman Khan 31
Ajay Devgn 38
Amitabh Bachchan 45
Akshay Kumar 48
Name: lead, Length: 566, dtype: int64
sort_values()
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 12/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [32]:
1 vk.sort_values()
2 # our whole series will be sorted in the ascending order
Out[32]:
match_no
87 0
211 0
207 0
206 0
91 0
...
164 100
120 100
123 108
126 109
128 113
Name: runs, Length: 215, dtype: int64
In [33]:
1 vk.sort_values(ascending=False)
2 # for descending order use parameter ascending
Out[33]:
match_no
128 113
126 109
123 108
164 100
120 100
...
93 0
211 0
130 0
8 0
135 0
Name: runs, Length: 215, dtype: int64
In [34]:
1 vk.sort_values(ascending=False).head(1).values
2 # it will give us the numpy arrays with value 113
3 # sort_values will not do permanent changes in our data
Out[34]:
array([113], dtype=int64)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 13/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [35]:
1 vk.sort_values(ascending=False).head(1).values[0]
2 '''this type of code is called method chaining where
3 we will use one function after another,
4 so first part will become the input for next part and so on'''
Out[35]:
'this type of code is called method chaining where\nwe will use one functi
on after another,\nso first part will become the input for next part and s
o on'
In [36]:
Out[36]:
'if we provide True for the parameter inplace then \nit makes the permanet
s changes in the series'
In [37]:
1 # print(vk)
2 # changes have been made permanently in the original series
sort_index()
In [38]:
1 movies.sort_index()
2 # here also we can use the inplace parameter to make the changes permanently
Out[38]:
movie
1920 (film) Rajniesh Duggall
1920: London Sharman Joshi
1920: The Evil Returns Vicky Ahuja
1971 (2007 film) Manoj Bajpayee
2 States (2014 film) Arjun Kapoor
...
Zindagi 50-50 Veena Malik
Zindagi Na Milegi Dobara Hrithik Roshan
Zindagi Tere Naam Mithun Chakraborty
Zokkomon Darsheel Safary
Zor Lagaa Ke...Haiya! Meghan Jadhav
Name: lead, Length: 1500, dtype: object
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 14/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
count
In [39]:
1 vk.count()
Out[39]:
215
sum
In [40]:
1 subs.sum()
Out[40]:
49510
In [41]:
1 subs.product()
2 # to get the multiplication
Out[41]:
mean
In [42]:
1 subs.mean()
Out[42]:
135.64383561643837
In [43]:
1 vk.median()
Out[43]:
24.0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 15/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [44]:
1 print(movies.mode())
0 Akshay Kumar
Name: lead, dtype: object
In [45]:
1 subs.std()
Out[45]:
62.67502303725269
In [46]:
1 vk.var()
Out[46]:
688.0024777222344
min/max
In [47]:
1 subs.min()
Out[47]:
33
In [48]:
1 subs.max()
Out[48]:
396
describe
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 16/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [49]:
1 vk.describe()
Out[49]:
count 215.000000
mean 30.855814
std 26.229801
min 0.000000
25% 9.000000
50% 24.000000
75% 48.000000
max 113.000000
Name: runs, dtype: float64
In [50]:
1 subs.describe()
Out[50]:
count 365.000000
mean 135.643836
std 62.675023
min 33.000000
25% 88.000000
50% 123.000000
75% 177.000000
max 396.000000
Name: Subscribers gained, dtype: float64
Series Indexing
In [51]:
1 # integer indexing
2 x = pd.Series([12,13,14,35,46,57,58,79,9])
3 x
Out[51]:
0 12
1 13
2 14
3 35
4 46
5 57
6 58
7 79
8 9
dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 17/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [52]:
1 x[0]
Out[52]:
12
Series will not work on the negative indexing if the custom index is a integer
if the custom index is string then negative indexing will work
In [53]:
1 x[-1]
2 # it will throw an error
-----------------------------------------------------------------------
----
ValueError Traceback (most recent call l
ast)
~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(s
elf, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:
The above exception was the direct cause of the following exception:
In [55]:
1 vk
Out[55]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 18/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [56]:
1 vk[-1]
2 # this will throw an error
3 # here custom indexing is integer
-----------------------------------------------------------------------
----
KeyError Traceback (most recent call l
ast)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(se
lf, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.in
dex.IndexEngine.get_loc()
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.in
dex.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64
HashTable.get_item()
1 movies
Out[57]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
In [58]:
1 movies[-1]
2 # here custom indexing is a string so negative indexing will work
Out[58]:
'Akshay Kumar'
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 19/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [59]:
1 print(movies[0])
2 print(movies['Uri: The Surgical Strike'])
3 # we can fetch the values in above two ways
Vicky Kaushal
Vicky Kaushal
slicing
In [60]:
1 vk
Out[60]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
In [61]:
1 vk[5:16]
Out[61]:
match_no
6 9
7 34
8 0
9 21
10 3
11 10
12 38
13 3
14 11
15 50
16 2
Name: runs, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 20/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [62]:
1 vk[-5:]
Out[62]:
match_no
211 0
212 20
213 73
214 25
215 7
Name: runs, dtype: int64
In [63]:
1 movies[-5:]
Out[63]:
movie
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, dtype: object
In [64]:
1 movies[::2]
2 # 2 is a step or jump value
Out[64]:
movie
Uri: The Surgical Strike Vicky Kaushal
The Accidental Prime Minister (film) Anupam Kher
Evening Shadows Mona Ambegaonkar
Fraud Saiyaan Arshad Warsi
Manikarnika: The Queen of Jhansi Kangana Ranaut
...
Raaz (2002 film) Dino Morea
Waisa Bhi Hota Hai Part II Arshad Warsi
Kaante Amitabh Bachchan
Aankhen (2002 film) Amitabh Bachchan
Company (film) Ajay Devgn
Name: lead, Length: 750, dtype: object
fancy indexing
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 21/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [65]:
1 vk[[1,3,5,6]]
Out[65]:
match_no
1 1
3 13
5 1
6 9
Name: runs, dtype: int64
In [66]:
1 movies
Out[66]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
In [67]:
Out[67]:
Editing Series
using indexing
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 22/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [68]:
1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series
Out[68]:
maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64
In [69]:
1 mark_series[1]=100
2 mark_series
Out[69]:
maths 67
english 100
science 80
hindi 100
Name: Himanshu marks, dtype: int64
In [70]:
1 mark_series['social study'] = 80
2 '''this will not throw an error, instead it will add
3 new value in the existing Series'''
4 mark_series
Out[70]:
maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64
slicing
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 23/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [71]:
1 runs = [13,24,56,78,110]
2 runs_score = pd.Series(runs)
3 runs_score
Out[71]:
0 13
1 24
2 56
3 78
4 110
dtype: int64
In [72]:
1 runs_score
Out[72]:
0 13
1 24
2 56
3 78
4 110
dtype: int64
In [73]:
1 runs_score[2:4]=[90,95]
2 runs_score
Out[73]:
0 13
1 24
2 90
3 95
4 110
dtype: int64
fancy indexing
In [74]:
1 runs_score[[0,3,4]] = [0,0,0]
2 runs_score
Out[74]:
0 0
1 24
2 90
3 0
4 0
dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 24/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [75]:
1 movies
Out[75]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
In [76]:
Out[76]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 25/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
1 subs
Out[77]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
In [78]:
1 print(len(subs))
365
In [79]:
1 print(type(subs))
<class 'pandas.core.series.Series'>
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 26/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [80]:
1 print(dir(subs))
1 print(sorted(subs))
2 # sorted function will save the output in the List data type
3 # it will be in ascending order
[33, 33, 35, 37, 39, 40, 40, 40, 40, 42, 42, 43, 44, 44, 44, 45, 46, 46, 4
8, 49, 49, 49, 49, 50, 50, 50, 51, 54, 56, 56, 56, 56, 57, 61, 62, 64, 65,
65, 66, 66, 66, 66, 67, 68, 70, 70, 70, 71, 71, 72, 72, 72, 72, 72, 73, 7
4, 74, 75, 76, 76, 76, 76, 77, 77, 78, 78, 78, 79, 79, 80, 80, 80, 81, 81,
82, 82, 83, 83, 83, 84, 84, 84, 85, 86, 86, 86, 87, 87, 87, 87, 88, 88, 8
8, 88, 88, 89, 89, 89, 90, 90, 90, 90, 91, 92, 92, 92, 93, 93, 93, 93, 95,
95, 96, 96, 96, 96, 97, 97, 98, 98, 99, 99, 100, 100, 100, 101, 101, 101,
102, 102, 103, 103, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 105,
105, 108, 108, 108, 108, 108, 108, 109, 109, 110, 110, 110, 111, 111, 112,
113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 117, 117, 117, 118,
118, 119, 119, 119, 119, 120, 122, 123, 123, 123, 123, 123, 124, 125, 126,
127, 128, 128, 129, 130, 131, 131, 132, 132, 134, 134, 134, 135, 135, 136,
136, 136, 137, 138, 138, 138, 139, 140, 144, 145, 146, 146, 146, 146, 147,
149, 150, 150, 150, 150, 151, 152, 152, 152, 153, 153, 153, 154, 154, 154,
155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 159, 159, 160,
160, 160, 160, 162, 164, 166, 167, 167, 168, 170, 170, 170, 170, 171, 172,
172, 173, 173, 173, 174, 174, 175, 175, 176, 176, 177, 178, 179, 179, 180,
180, 180, 182, 183, 183, 183, 184, 184, 184, 185, 185, 185, 185, 186, 186,
186, 188, 189, 190, 190, 192, 192, 192, 196, 196, 196, 197, 197, 202, 202,
202, 203, 204, 206, 207, 209, 210, 210, 211, 212, 213, 214, 216, 219, 220,
221, 221, 222, 222, 224, 225, 225, 226, 227, 228, 229, 230, 231, 233, 236,
236, 237, 241, 243, 244, 245, 247, 249, 254, 254, 258, 259, 259, 261, 261,
265, 267, 268, 269, 276, 276, 290, 295, 301, 306, 312, 396]
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 27/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [82]:
1 print(sorted(subs, reverse=True))
2 # with reverse parameter it will become true
[396, 312, 306, 301, 295, 290, 276, 276, 269, 268, 267, 265, 261, 261, 25
9, 259, 258, 254, 254, 249, 247, 245, 244, 243, 241, 237, 236, 236, 233, 2
31, 230, 229, 228, 227, 226, 225, 225, 224, 222, 222, 221, 221, 220, 219,
216, 214, 213, 212, 211, 210, 210, 209, 207, 206, 204, 203, 202, 202, 202,
197, 197, 196, 196, 196, 192, 192, 192, 190, 190, 189, 188, 186, 186, 186,
185, 185, 185, 185, 184, 184, 184, 183, 183, 183, 182, 180, 180, 180, 179,
179, 178, 177, 176, 176, 175, 175, 174, 174, 173, 173, 173, 172, 172, 171,
170, 170, 170, 170, 168, 167, 167, 166, 164, 162, 160, 160, 160, 160, 159,
159, 158, 158, 157, 157, 157, 157, 156, 156, 156, 156, 155, 155, 154, 154,
154, 153, 153, 153, 152, 152, 152, 151, 150, 150, 150, 150, 149, 147, 146,
146, 146, 146, 145, 144, 140, 139, 138, 138, 138, 137, 136, 136, 136, 135,
135, 134, 134, 134, 132, 132, 131, 131, 130, 129, 128, 128, 127, 126, 125,
124, 123, 123, 123, 123, 123, 122, 120, 119, 119, 119, 119, 118, 118, 117,
117, 117, 115, 115, 115, 115, 114, 114, 114, 114, 113, 113, 113, 112, 111,
111, 110, 110, 110, 109, 109, 108, 108, 108, 108, 108, 108, 105, 105, 105,
105, 105, 105, 105, 105, 105, 104, 104, 104, 103, 103, 102, 102, 101, 101,
101, 100, 100, 100, 99, 99, 98, 98, 97, 97, 96, 96, 96, 96, 95, 95, 93, 9
3, 93, 93, 92, 92, 92, 91, 90, 90, 90, 90, 89, 89, 89, 88, 88, 88, 88, 88,
87, 87, 87, 87, 86, 86, 86, 85, 84, 84, 84, 83, 83, 83, 82, 82, 81, 81, 8
0, 80, 80, 79, 79, 78, 78, 78, 77, 77, 76, 76, 76, 76, 75, 74, 74, 73, 72,
72, 72, 72, 72, 71, 71, 70, 70, 70, 68, 67, 66, 66, 66, 66, 65, 65, 64, 6
2, 61, 57, 56, 56, 56, 56, 54, 51, 50, 50, 50, 49, 49, 49, 49, 48, 46, 46,
45, 44, 44, 44, 43, 42, 42, 40, 40, 40, 40, 39, 37, 35, 33, 33]
In [83]:
1 print(min(subs))
33
In [84]:
1 print(max(subs))
396
type conversion
In [85]:
1 mark_series
Out[85]:
maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 28/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [86]:
1 list(mark_series)
2 # converting in the list form
Out[86]:
In [87]:
1 dict(mark_series)
2 # converting in the dictionary form
Out[87]:
{'maths': 67, 'english': 100, 'science': 80, 'hindi': 100, 'social study':
80}
membership operator
In [88]:
1 movies
Out[88]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
In [89]:
True
Out[89]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 29/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [90]:
False
Out[90]:
'this will return false because in operator\nwill work on the indexes onl
y, not on values'
In [91]:
Out[91]:
True
looping
In [92]:
1 for i in movies:
2 print(i)
Vicky Kaushal
Vicky Ahuja
Anupam Kher
himanshu gadhavi
Mona Ambegaonkar
Geetika Vidya Ohlyan
Arshad Warsi
Radhika Apte
Kangana Ranaut
Nawazuddin Siddiqui
Ali Asgar
Ranveer Singh
Prit Kamani
Ajay Devgn
Sushant Singh Rajput
Amitabh Bachchan
Abhimanyu Dasani
Talha Arshad Reshi
Nawazuddin Siddiqui
G i A l
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 30/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [93]:
Arithmetic Operators
In [94]:
1 mark_series
Out[94]:
maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 31/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [95]:
1 print(100 - mark_series)
2 '''this is good example of broadcastig
3 here we are using only one scalar 100 but
4 it is doing operation on every element
5 with help of broadcasting'''
maths 33
english 0
science 20
hindi 0
social study 20
Name: Himanshu marks, dtype: int64
Out[95]:
'this is good example of broadcastig \nhere we are using only one scalar 1
00 but\nit is doing operation on every element \nwith help of broadcastin
g'
In [96]:
1 100 + mark_series
Out[96]:
maths 167
english 200
science 180
hindi 200
social study 180
Name: Himanshu marks, dtype: int64
In [97]:
1 2 * mark_series
Out[97]:
maths 134
english 200
science 160
hindi 200
social study 160
Name: Himanshu marks, dtype: int64
In [98]:
1 2 / mark_series
Out[98]:
maths 0.029851
english 0.020000
science 0.025000
hindi 0.020000
social study 0.025000
Name: Himanshu marks, dtype: float64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 32/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [99]:
1 mark_series**2
Out[99]:
maths 4489
english 10000
science 6400
hindi 10000
social study 6400
Name: Himanshu marks, dtype: int64
Relational Operators
In [100]:
1 vk
Out[100]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
In [101]:
1 vk >= 50
2 # we will get an boolean series
Out[101]:
match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 33/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [102]:
1 vk >= 50
Out[102]:
match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool
In [103]:
Out[103]:
'\nwe are doing indexing on the boolean series\nso we will get only those
matches in which\nvirat kohli made 50 or more runs\n'
In [104]:
Out[104]:
50
In [105]:
Out[105]:
50
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 34/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [106]:
1 vk[ vk == 0]
Out[106]:
match_no
8 0
87 0
91 0
93 0
130 0
135 0
206 0
207 0
211 0
Name: runs, dtype: int64
In [107]:
1 vk[ vk == 0].size
Out[107]:
Count number of day when i had more than 200 subs a day
In [108]:
1 subs
Out[108]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 35/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [109]:
Out[109]:
165 225
166 249
167 265
168 306
169 261
170 222
225 224
226 254
227 214
228 236
229 261
230 247
231 207
232 254
233 301
234 233
240 202
246 259
In [110]:
Out[110]:
59
In [111]:
1 movies
Out[111]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 36/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [112]:
1 movies.value_counts()
Out[112]:
Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 567, dtype: int64
In [113]:
1 num_movies = movies.value_counts()
2 # storing in a variable
In [114]:
1 num_movies > 20
Out[114]:
In [115]:
Out[115]:
Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
Shah Rukh Khan 22
Name: lead, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 37/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
1 subs.plot()
Out[116]:
<AxesSubplot:>
In [117]:
1 movies.value_counts()
2 # to get the no. of movies by each actor
Out[117]:
Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 567, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 38/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [118]:
1 movies.value_counts().head(20)
2 # to get the top 20 actors
Out[118]:
Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
Shah Rukh Khan 22
Emraan Hashmi 20
Saif Ali Khan 18
John Abraham 18
Shahid Kapoor 17
Sunny Deol 17
Jimmy Sheirgill 16
Tusshar Kapoor 16
Arjun Rampal 14
Manoj Bajpayee 14
Irrfan Khan 14
Anupam Kher 13
Hrithik Roshan 12
Kangana Ranaut 12
Ayushmann Khurrana 12
Name: lead, dtype: int64
In [119]:
1 movies.value_counts().head(20).plot(kind = 'bar')
Out[119]:
<AxesSubplot:>
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 39/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [120]:
1 movies.value_counts().head(20).plot(kind = 'barh')
Out[120]:
<AxesSubplot:>
In [121]:
1 movies.value_counts().head(20).plot(kind = 'pie')
Out[121]:
<AxesSubplot:ylabel='lead'>
astype
it is useful to reduce the size of the data or to reduce the foorprint of the data
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 40/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [122]:
1 vk
Out[122]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
In [123]:
1 import sys
2 sys.getsizeof(vk)
3 # to get the size occupies by data
Out[123]:
11752
In [124]:
1 vk.astype('int16')
Out[124]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int16
In [125]:
1 sys.getsizeof(vk.astype('int16'))
2 # size will be reduced
Out[125]:
10462
between
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 41/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [126]:
1 vk.between(51,99)
2 # it will return a boolean series
Out[126]:
match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool
In [127]:
1 vk[vk.between(51,99)]
Out[127]:
match_no
34 58
41 71
44 56
45 67
52 70
57 57
68 73
71 51
73 58
74 65
80 57
81 93
82 99
85 56
97 67
99 73
103 51
In [128]:
1 vk[vk.between(51,99)].count()
Out[128]:
43
clip
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 42/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
1 subs
Out[129]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
In [130]:
1 print(subs.clip(100,200))
2 '''values which are less than 100 will become 100,
3 values which are more than 200 will become 200,
4 values between 100 and 200 will be as it is'''
0 100
1 100
2 100
3 100
4 100
...
360 200
361 200
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
Out[130]:
'values which are less than 100 will become 100,\nvalues which are more th
an 200 will become 200,\nvalues between 100 and 200 will be as it is'
drop_duplicates
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 43/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [131]:
1 temp = pd.Series([1,1,2,2,3,3,4,4])
2 temp
Out[131]:
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
dtype: int64
In [132]:
1 print(temp.drop_duplicates())
2 '''first occurence will be there and second will be droped'''
0 1
2 2
4 3
6 4
dtype: int64
Out[132]:
In [133]:
1 print(temp.drop_duplicates(keep='last'))
2 '''here first will be deleted and second we will keep'''
1 1
3 2
5 3
7 4
dtype: int64
Out[133]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 44/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [134]:
1 movies.drop_duplicates()
Out[134]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Sssshhh... Tanishaa Mukerji
Rules: Pyaar Ka Superhit Formula Tanuja
Right Here Right Now (film) Ankit
Talaash: The Hunt Begins... Rakhee Gulzar
The Pink Mirror Edwin Fernandes
Name: lead, Length: 567, dtype: object
to know that how many duplicate values exist in the data use the .duplicated() function
In [135]:
1 temp.duplicated()
Out[135]:
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 True
dtype: bool
In [136]:
1 temp.duplicated().sum()
2 # we will get total number of duplicate values
Out[136]:
In [137]:
1 movies.duplicated().sum()
Out[137]:
933
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 45/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
isnull
In [138]:
1 temp = pd.Series([1,2,3,np.nan,5,6,np.nan,8,np.nan,10])
2 temp
Out[138]:
0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
6 NaN
7 8.0
8 NaN
9 10.0
dtype: float64
In [139]:
1 temp.size
2 # it will show total number of values
Out[139]:
10
In [140]:
1 temp.count()
2 # it shows only non null value
Out[140]:
In [141]:
1 temp.isnull()
Out[141]:
0 False
1 False
2 False
3 True
4 False
5 False
6 True
7 False
8 True
9 False
dtype: bool
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 46/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [142]:
1 temp.isnull().sum()
2 # there are 3 missing values
Out[142]:
dropna
In [143]:
1 temp.dropna()
Out[143]:
0 1.0
1 2.0
2 3.0
4 5.0
5 6.0
7 8.0
9 10.0
dtype: float64
fillna
In [144]:
1 temp.fillna(0)
2 # we are filling the missing value with zero
Out[144]:
0 1.0
1 2.0
2 3.0
3 0.0
4 5.0
5 6.0
6 0.0
7 8.0
8 0.0
9 10.0
dtype: float64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 47/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [145]:
1 temp.fillna(temp.mean())
2 # here we are filling the missing value with mean of the data
Out[145]:
0 1.0
1 2.0
2 3.0
3 5.0
4 5.0
5 6.0
6 5.0
7 8.0
8 5.0
9 10.0
dtype: float64
isin
In [146]:
1 vk
Out[146]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 48/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [147]:
Out[147]:
match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 False
214 False
215 False
Name: runs, Length: 215, dtype: bool
In [148]:
Out[148]:
match_no
82 99
86 49
Name: runs, dtype: int64
suppose there is a scenario where we want to know that when virat kohli got out on 49, 99 and 79
isin help us to get this cases in one line rather then writing the logic for each value
In [149]:
1 vk.isin([49,99,79])
Out[149]:
match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 False
214 False
215 False
Name: runs, Length: 215, dtype: bool
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 49/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [150]:
1 vk[vk.isin([49,99,79])]
Out[150]:
match_no
82 99
86 49
117 79
Name: runs, dtype: int64
apply
In [151]:
1 movies
Out[151]:
movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object
suppose in the movies data we only want the first name of the actor also that name should be in capital
letter
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 50/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [152]:
1 movies.apply(lambda x:x.split())
2 # we get the list of actor names
Out[152]:
movie
Uri: The Surgical Strike [Vicky, Kaushal]
Battalion 609 [Vicky, Ahuja]
The Accidental Prime Minister (film) [Anupam, Kher]
Why Cheat India [himanshu, gadhavi]
Evening Shadows [Mona, Ambegaonkar]
...
Hum Tumhare Hain Sanam [Shah, Rukh, Khan]
Aankhen (2002 film) [Amitabh, Bachchan]
Saathiya (film) [Vivek, Oberoi]
Company (film) [Ajay, Devgn]
Awara Paagal Deewana [Akshay, Kumar]
Name: lead, Length: 1500, dtype: object
In [153]:
1 movies.apply(lambda x:x.split()[0])
2 # we will get the first item of the list of actor name
Out[153]:
movie
Uri: The Surgical Strike Vicky
Battalion 609 Vicky
The Accidental Prime Minister (film) Anupam
Why Cheat India himanshu
Evening Shadows Mona
...
Hum Tumhare Hain Sanam Shah
Aankhen (2002 film) Amitabh
Saathiya (film) Vivek
Company (film) Ajay
Awara Paagal Deewana Akshay
Name: lead, Length: 1500, dtype: object
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 51/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [154]:
1 movies.apply(lambda x:x.split()[0].upper())
Out[154]:
movie
Uri: The Surgical Strike VICKY
Battalion 609 VICKY
The Accidental Prime Minister (film) ANUPAM
Why Cheat India HIMANSHU
Evening Shadows MONA
...
Hum Tumhare Hain Sanam SHAH
Aankhen (2002 film) AMITABH
Saathiya (film) VIVEK
Company (film) AJAY
Awara Paagal Deewana AKSHAY
Name: lead, Length: 1500, dtype: object
In [155]:
1 subs
Out[155]:
0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
suppose in the above subscribers data we want condition that if on a day if we are getting subscribers
more than the average value then it is good day otherwise it's a bad day
In [156]:
1 subs.mean()
Out[156]:
135.64383561643837
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 52/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [157]:
Out[157]:
0 bad day
1 bad day
2 bad day
3 bad day
4 bad day
...
360 good day
361 good day
362 good day
363 good day
364 good day
Name: Subscribers gained, Length: 365, dtype: object
copy
.head() and .tail() will give the view of the data however they do not copy the data so change made by
saving head or tail will make changes in the original data too
In [158]:
1 vk
Out[158]:
match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
In [159]:
1 vk.head(5)
Out[159]:
match_no
1 1
2 23
3 13
4 12
5 1
Name: runs, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 53/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [160]:
1 new = vk.head(5)
In [161]:
1 new[1] = 10
In [162]:
1 vk
2 # our data will be changed now
Out[162]:
match_no
1 10
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
In [163]:
1 new1 = vk.head(5).copy()
2 new1
Out[163]:
match_no
1 10
2 23
3 13
4 12
5 1
Name: runs, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 54/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook
In [164]:
1 new1[1] = 100
2 new1
3 # changes will be made in new1 variable
4 # which is copied from the original data
Out[164]:
match_no
1 100
2 23
3 13
4 12
5 1
Name: runs, dtype: int64
In [165]:
1 vk
2 # still original data will remain the same
Out[165]:
match_no
1 10
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 55/55