0% found this document useful (0 votes)
12 views

DMV - 3 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

DMV - 3 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

10/6/24, 7:24 PM DMV_3 - Jupyter Notebook

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Housing.csv')

In [3]: df.columns = df.columns.str.strip()


df.columns = df.columns.str.replace(' ', '_')
df.columns = df.columns.str.replace('[^A-Za-z0-9_]', '', regex=True)

In [4]: df.head()

Out[4]:
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea furnishingstatus

0 13300000 7420 4 2 3 yes no no no yes 2 yes furnished

1 12250000 8960 4 4 4 yes no no no yes 3 no furnished

2 12250000 9960 3 2 2 yes no yes no no 2 yes semi-furnished

3 12215000 7500 4 2 2 yes no yes no yes 3 yes furnished

4 11410000 7420 4 1 2 yes yes yes no yes 2 no furnished

In [5]: df.tail()

Out[5]:
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea furnishingstatus

540 1820000 3000 2 1 1 yes no yes no no 2 no unfurnished

541 1767150 2400 3 1 1 no no no no no 0 no semi-furnished

542 1750000 3620 2 1 1 yes no no no no 0 no unfurnished

543 1750000 2910 3 1 1 no no no no no 0 no furnished

544 1750000 3850 3 1 2 yes no no no no 0 no unfurnished

In [6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB

In [7]: df.describe()

Out[7]:
price area bedrooms bathrooms stories parking

count 5.450000e+02 545.000000 545.000000 545.000000 545.000000 545.000000

mean 4.766729e+06 5150.541284 2.965138 1.286239 1.805505 0.693578

std 1.870440e+06 2170.141023 0.738064 0.502470 0.867492 0.861586

min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000 0.000000

25% 3.430000e+06 3600.000000 2.000000 1.000000 1.000000 0.000000

50% 4.340000e+06 4600.000000 3.000000 1.000000 2.000000 0.000000

75% 5.740000e+06 6360.000000 3.000000 2.000000 2.000000 1.000000

max 1.330000e+07 16200.000000 6.000000 4.000000 4.000000 3.000000

In [8]: df.shape

Out[8]: (545, 13)

localhost:8888/notebooks/BE_PRACTICALS/DMV_3.ipynb 1/2
10/6/24, 7:24 PM DMV_3 - Jupyter Notebook

In [9]: df.columns

Out[9]: Index(['price', 'area', 'bedrooms', 'bathrooms', 'stories', 'mainroad',


'guestroom', 'basement', 'hotwaterheating', 'airconditioning',
'parking', 'prefarea', 'furnishingstatus'],
dtype='object')

In [10]: df.isnull().sum()

Out[10]: price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

In [16]: Categorical_Column = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'aircondtioning', 'prefarea', 'furnishing_statu

In [19]: filtered_data = df[df['price'] > 100000]


print("Filtered data: ", filtered_data.head())

Filtered data: price area bedrooms bathrooms stories mainroad guestroom basement \
0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes

hotwaterheating airconditioning parking prefarea furnishingstatus


0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished

In [21]: categorical_cols = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea', 'furnishingstatus


df = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

In [23]: Q1 = df['price'].quantile(0.25)
Q3 = df['price'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

data_no_outliers = df[(df['price'] >= lower_bound) & (df['price'] <= upper_bound)]
print("Data after removing outliers:\n", data_no_outliers.describe())

Data after removing outliers:


price area bedrooms bathrooms stories \
count 5.300000e+02 530.000000 530.000000 530.000000 530.000000
mean 4.600663e+06 5061.518868 2.943396 1.260377 1.788679
std 1.596119e+06 2075.449479 0.730515 0.464359 0.861190
min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000
25% 3.430000e+06 3547.500000 2.000000 1.000000 1.000000
50% 4.270000e+06 4500.000000 3.000000 1.000000 2.000000
75% 5.600000e+06 6315.750000 3.000000 1.000000 2.000000
max 9.100000e+06 15600.000000 6.000000 3.000000 4.000000

parking
count 530.000000
mean 0.664151
std 0.843320
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 3.000000

In [ ]: ​

localhost:8888/notebooks/BE_PRACTICALS/DMV_3.ipynb 2/2

You might also like