0% found this document useful (0 votes)
4 views

pandas-3

The document provides a tutorial on using the pandas library in Python, including data import, manipulation, and conversion techniques. It demonstrates how to handle null values, apply functions, and sort datasets, specifically using an automobile dataset. Additionally, it covers data type conversions and filtering methods to extract specific information from the dataset.

Uploaded by

praveen838307
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

pandas-3

The document provides a tutorial on using the pandas library in Python, including data import, manipulation, and conversion techniques. It demonstrates how to handle null values, apply functions, and sort datasets, specifically using an automobile dataset. Additionally, it covers data type conversions and filtering methods to extract specific information from the dataset.

Uploaded by

praveen838307
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

pandas-3

April 21, 2025

[1]: import pandas as pd


df1 = pd.read_csv('Auto.csv')
df1.head()

[1]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[ ]: # soft conversion --> wherever the conversion is possible


# bool --> int --> float --> complex --> strings ( strings is at highest level)

[3]: int(True)

[3]: 1

[4]: float(1)

[4]: 1.0

[5]: complex(1.0)

[5]: (1+0j)

[6]: str(1+0j)

[6]: '(1+0j)'

[ ]: # bool --> int --> float --> complex --> strings ( strings is at highest level)

1
[8]: pd.Series([1,2,3,4,5,1.4]) # as Series can take one single data type , it will␣
↪assign data type based on hierarchy

[8]: 0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 1.4
dtype: float64

[ ]: # Null values : 1. machine error machine was not able to capture this␣
↪information

# 2. human error : people didnt entered the data

[19]: int(2)*2

[19]: 4

[20]: int('2') *2

[20]: 4

[21]: '2'*2 # it repeats the string twice

[21]: '22'

[9]: pd.Series([1,2,3,4,5,'abc']) # strings are the highest# it will convert␣


↪everything to string

[9]: 0 1
1 2
2 3
3 4
4 5
5 abc
dtype: object

[11]: 1*2

[11]: 2

[16]: int('1') * 2

[16]: 2

[22]: 'a' *2 # string repeats those many times

2
[22]: 'aa'

[23]: '2' *2

[23]: '22'

[ ]: import os
os.getcwd() # get current working directory

os.chdir('') # mention the path to new folder

[ ]: a function which was created using def block can be used infinite times across␣
↪the python code

if you know the function that you need to use is not used more than once -->␣
↪lambda ( if function logic is simple )

[24]: df1['col1'] = df1['acceleration'].apply(lambda x: 'Even' if x%2 == 0 else 'Odd')

[25]: df1.head()

[25]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name col1


0 1 chevrolet chevelle malibu Even
1 1 buick skylark 320 Odd
2 1 plymouth satellite Odd
3 1 amc rebel sst Even
4 1 ford torino Odd

[ ]: inplace = True --> changes will be committed to df1 not to your excel file

[27]: pwd

[27]: 'C:\\Users\\admin\\2802'

[26]: df1.to_excel('auto_updated.xlsx', index = False)

[ ]: # string in characters to integer conversion is not possible

[17]: int('abc') # errors = 'coerce' # if conversioin is not possible return null␣


↪value there

3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 int('abc')

ValueError: invalid literal for int() with base 10: 'abc'

[18]: help(pd.to_numeric)

Help on function to_numeric in module pandas.core.tools.numeric:

to_numeric(arg, errors: 'DateTimeErrorChoices' = 'raise', downcast:


"Literal['integer', 'signed', 'unsigned', 'float'] | None" = None,
dtype_backend: 'DtypeBackend | lib.NoDefault' = <no_default>)
Convert argument to a numeric type.

The default return dtype is `float64` or `int64`


depending on the data supplied. Use the `downcast` parameter
to obtain other dtypes.

Please note that precision loss may occur if really large numbers
are passed in. Due to the internal limitations of `ndarray`, if
numbers smaller than `-9223372036854775808` (np.iinfo(np.int64).min)
or larger than `18446744073709551615` (np.iinfo(np.uint64).max) are
passed in, it is very likely they will be converted to float so that
they can be stored in an `ndarray`. These warnings apply similarly to
`Series` since it internally leverages `ndarray`.

Parameters
----------
arg : scalar, list, tuple, 1-d array, or Series
Argument to be converted.
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
- If 'raise', then invalid parsing will raise an exception.
- If 'coerce', then invalid parsing will be set as NaN.
- If 'ignore', then invalid parsing will return the input.

.. versionchanged:: 2.2

"ignore" is deprecated. Catch exceptions explicitly instead.

downcast : str, default None


Can be 'integer', 'signed', 'unsigned', or 'float'.
If not None, and if the data has been successfully cast to a
numerical dtype (or if the data was numeric to begin with),
downcast that resulting data to the smallest numerical dtype

4
possible according to the following rules:

- 'integer' or 'signed': smallest signed int dtype (min.: np.int8)


- 'unsigned': smallest unsigned int dtype (min.: np.uint8)
- 'float': smallest float dtype (min.: np.float32)

As this behaviour is separate from the core conversion to


numeric values, any errors raised during the downcasting
will be surfaced regardless of the value of the 'errors' input.

In addition, downcasting will only occur if the size


of the resulting data's dtype is strictly larger than
the dtype it is to be cast to, so if none of the dtypes
checked satisfy that specification, no downcasting will be
performed on the data.
dtype_backend : {'numpy_nullable', 'pyarrow'}, default 'numpy_nullable'
Back-end data type applied to the resultant :class:`DataFrame`
(still experimental). Behaviour is as follows:

* ``"numpy_nullable"``: returns nullable-dtype-backed :class:`DataFrame`


(default).
* ``"pyarrow"``: returns pyarrow-backed nullable :class:`ArrowDtype`
DataFrame.

.. versionadded:: 2.0

Returns
-------
ret
Numeric if parsing succeeded.
Return type depends on input. Series if Series, otherwise ndarray.

See Also
--------
DataFrame.astype : Cast argument to a specified dtype.
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
numpy.ndarray.astype : Cast a numpy array to a specified type.
DataFrame.convert_dtypes : Convert dtypes.

Examples
--------
Take separate series and convert to numeric, coercing when told to

>>> s = pd.Series(['1.0', '2', -3])


>>> pd.to_numeric(s)
0 1.0
1 2.0

5
2 -3.0
dtype: float64
>>> pd.to_numeric(s, downcast='float')
0 1.0
1 2.0
2 -3.0
dtype: float32
>>> pd.to_numeric(s, downcast='signed')
0 1
1 2
2 -3
dtype: int8
>>> s = pd.Series(['apple', '1.0', '2', -3])
>>> pd.to_numeric(s, errors='coerce')
0 NaN
1 1.0
2 2.0
3 -3.0
dtype: float64

Downcasting of nullable integer and floating dtypes is supported:

>>> s = pd.Series([1, 2, 3], dtype="Int64")


>>> pd.to_numeric(s, downcast="integer")
0 1
1 2
2 3
dtype: Int8
>>> s = pd.Series([1.0, 2.1, 3.0], dtype="Float64")
>>> pd.to_numeric(s, downcast="float")
0 1.0
1 2.1
2 3.0
dtype: Float32

[ ]: # when we execute any command in pandas it returns the data


# if you save this output --> then its not displayed

[31]: df1['acceleration'].apply(lambda x: 'Even' if x%2 == 0 else 'Odd')

[31]: 0 Even
1 Odd
2 Odd
3 Even
4 Odd

6
392 Odd
393 Odd
394 Odd
395 Odd
396 Odd
Name: acceleration, Length: 397, dtype: object

[32]: df1['col1'] = df1['acceleration'].apply(lambda x: 'Even' if x%2 == 0 else␣


↪'Odd')

[33]: df1.head(2)

[33]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name col1


0 1 chevrolet chevelle malibu Even
1 1 buick skylark 320 Odd

[ ]: ['col1','col2']

[29]: # Sorting the data set

[30]: df1.sort_values(by = 'acceleration', ascending=False)


# ascending = False --> descending order
# ascending = True --> ascending order ( by default )

[30]: mpg cylinders displacement Horse Power weight acceleration year \


299 27.2 4.0 141.0 71 3190 24.8 79
393 44.0 4.0 97.0 52 2130 24.6 82
326 43.4 4.0 90.0 48 2335 23.7 80
59 23.0 4.0 97.0 54 2254 23.5 72
195 29.0 4.0 85.0 52 2035 22.2 76
.. … … … … … … …
12 15.0 NaN 400.0 150 3761 9.5 70
6 NaN 8.0 NaN 220 4354 9.0 70
7 14.0 8.0 NaN 215 4312 8.5 70
9 15.0 8.0 390.0 190 3850 8.5 70
11 14.0 NaN 340.0 160 3609 8.0 70

origin name col1


299 2 peugeot 504 Odd
393 2 vw pickup Odd
326 2 vw dasher (diesel) Odd
59 2 volkswagen type 3 Odd
195 1 chevrolet chevette Odd

7
.. … … …
12 1 chevrolet monte carlo Odd
6 1 chevrolet impala Odd
7 1 plymouth fury iii Odd
9 1 amc ambassador dpl Odd
11 1 plymouth 'cuda 340 Even

[397 rows x 10 columns]

[ ]: # df1.sort_values(by = ['acceleration','weight'], ascending=[False,True])

How to filter the dataset


[34]: df1.shape

[34]: (397, 10)

[ ]: # extract the rows where mpg value is greater than 20

[38]: df1['mpg']

[38]: 0 18.0
1 15.0
2 NaN
3 NaN
4 NaN

392 27.0
393 44.0
394 32.0
395 28.0
396 31.0
Name: mpg, Length: 397, dtype: float64

[40]: cond1 = df1['mpg'] > 20


df1[['mpg','weight','acceleration']][cond1] # it will return the rows where␣
↪cond1 is set to True

[40]: mpg weight acceleration


14 24.0 2372 15.0
15 22.0 2833 15.5
17 21.0 2587 16.0
18 27.0 2130 14.5
19 26.0 1835 20.5
.. … … …
392 27.0 2790 15.6
393 44.0 2130 24.6

8
394 32.0 2295 11.6
395 28.0 2625 18.6
396 31.0 2720 19.4

[237 rows x 3 columns]

[ ]: df1['col1'] = df1['acceleration'].apply(lambda x: 'Even' if x%2 == 0 else 'Odd')

[ ]: & --> and


| --> Or
~ --> negation

[41]: cond1 = (df1['mpg'] > 20) & (df1['col1'] == 'Even')


df1[cond1] # it will return the rows where cond1 is set to True

[41]: mpg cylinders displacement Horse Power weight acceleration year \


17 21.0 6.0 200.0 85 2587 16.0 70
31 25.0 4.0 113.0 95 2228 14.0 71
49 23.0 4.0 122.0 86 2220 14.0 71
50 28.0 4.0 116.0 90 2123 14.0 71
54 35.0 4.0 72.0 69 1613 18.0 71
77 22.0 4.0 121.0 76 2511 18.0 72
79 26.0 4.0 96.0 69 2189 18.0 72
80 22.0 4.0 122.0 86 2395 16.0 72
101 23.0 6.0 198.0 95 2904 16.0 73
113 21.0 6.0 155.0 107 2472 14.0 73
122 24.0 4.0 121.0 110 2660 14.0 73
148 26.0 4.0 116.0 75 2246 14.0 74
151 31.0 4.0 79.0 67 2000 16.0 74
167 29.0 4.0 97.0 75 2171 16.0 75
175 29.0 4.0 90.0 70 1937 14.0 75
234 24.5 4.0 151.0 88 2740 16.0 77
293 31.9 4.0 89.0 71 1925 14.0 79
305 28.4 4.0 151.0 90 2670 16.0 79
331 33.8 4.0 97.0 67 2145 18.0 80
349 34.1 4.0 91.0 68 1985 16.0 81
369 34.0 4.0 112.0 88 2395 18.0 82
371 29.0 4.0 135.0 84 2525 16.0 82
372 27.0 4.0 151.0 90 2735 18.0 82

origin name col1


17 1 ford maverick Even
31 3 toyota corona Even
49 1 mercury capri 2000 Even
50 2 opel 1900 Even
54 3 datsun 1200 Even
77 2 volkswagen 411 (sw) Even

9
79 2 renault 12 (sw) Even
80 1 ford pinto (sw) Even
101 1 plymouth duster Even
113 1 mercury capri v6 Even
122 2 saab 99le Even
148 2 fiat 124 tc Even
151 2 fiat x1.9 Even
167 3 toyota corolla Even
175 2 volkswagen rabbit Even
234 1 pontiac sunbird coupe Even
293 2 vw rabbit custom Even
305 1 buick skylark limited Even
331 3 subaru dl Even
349 3 mazda glc 4 Even
369 1 chevrolet cavalier 2-door Even
371 1 dodge aries se Even
372 1 pontiac phoenix Even

[42]: df1[(df1['mpg'] > 20) & (df1['col1'] == 'Even')] # it will return the rows␣
↪where cond1 is set to True

[42]: mpg cylinders displacement Horse Power weight acceleration year \


17 21.0 6.0 200.0 85 2587 16.0 70
31 25.0 4.0 113.0 95 2228 14.0 71
49 23.0 4.0 122.0 86 2220 14.0 71
50 28.0 4.0 116.0 90 2123 14.0 71
54 35.0 4.0 72.0 69 1613 18.0 71
77 22.0 4.0 121.0 76 2511 18.0 72
79 26.0 4.0 96.0 69 2189 18.0 72
80 22.0 4.0 122.0 86 2395 16.0 72
101 23.0 6.0 198.0 95 2904 16.0 73
113 21.0 6.0 155.0 107 2472 14.0 73
122 24.0 4.0 121.0 110 2660 14.0 73
148 26.0 4.0 116.0 75 2246 14.0 74
151 31.0 4.0 79.0 67 2000 16.0 74
167 29.0 4.0 97.0 75 2171 16.0 75
175 29.0 4.0 90.0 70 1937 14.0 75
234 24.5 4.0 151.0 88 2740 16.0 77
293 31.9 4.0 89.0 71 1925 14.0 79
305 28.4 4.0 151.0 90 2670 16.0 79
331 33.8 4.0 97.0 67 2145 18.0 80
349 34.1 4.0 91.0 68 1985 16.0 81
369 34.0 4.0 112.0 88 2395 18.0 82
371 29.0 4.0 135.0 84 2525 16.0 82
372 27.0 4.0 151.0 90 2735 18.0 82

origin name col1

10
17 1 ford maverick Even
31 3 toyota corona Even
49 1 mercury capri 2000 Even
50 2 opel 1900 Even
54 3 datsun 1200 Even
77 2 volkswagen 411 (sw) Even
79 2 renault 12 (sw) Even
80 1 ford pinto (sw) Even
101 1 plymouth duster Even
113 1 mercury capri v6 Even
122 2 saab 99le Even
148 2 fiat 124 tc Even
151 2 fiat x1.9 Even
167 3 toyota corolla Even
175 2 volkswagen rabbit Even
234 1 pontiac sunbird coupe Even
293 2 vw rabbit custom Even
305 1 buick skylark limited Even
331 3 subaru dl Even
349 3 mazda glc 4 Even
369 1 chevrolet cavalier 2-door Even
371 1 dodge aries se Even
372 1 pontiac phoenix Even

[43]: df1[pd.isna(df1['mpg'])]

[43]: mpg cylinders displacement Horse Power weight acceleration year \


2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70
5 NaN 8.0 NaN 198 4341 10.0 70
6 NaN 8.0 NaN 220 4354 9.0 70

origin name col1


2 1 plymouth satellite Odd
3 1 amc rebel sst Even
4 1 ford torino Odd
5 1 ford galaxie 500 Even
6 1 chevrolet impala Odd

[44]: df1[~(pd.isna(df1['mpg']))] # negation return opposite answer --> it return␣


↪data which is not null

[44]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
7 14.0 8.0 NaN 215 4312 8.5 70

11
8 14.0 8.0 455.0 225 4425 10.0 70
9 15.0 8.0 390.0 190 3850 8.5 70
.. … … … … … … …
392 27.0 4.0 140.0 86 2790 15.6 82
393 44.0 4.0 97.0 52 2130 24.6 82
394 32.0 4.0 135.0 84 2295 11.6 82
395 28.0 4.0 120.0 79 2625 18.6 82
396 31.0 4.0 119.0 82 2720 19.4 82

origin name col1


0 1 chevrolet chevelle malibu Even
1 1 buick skylark 320 Odd
7 1 plymouth fury iii Odd
8 1 pontiac catalina Even
9 1 amc ambassador dpl Odd
.. … … …
392 1 ford mustang gl Odd
393 2 vw pickup Odd
394 1 dodge rampage Odd
395 1 ford ranger Odd
396 1 chevy s-10 Odd

[392 rows x 10 columns]

12

You might also like