m2
m2
Open in Colab
(https://colab.research.google.com/github/JAYASURYAb/ML-
project2/blob/master/Campus_recruitment.ipynb)
Importing Libraries
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
In [3]:
dataset
Out[3]:
sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p degree_t work
... ... ... ... ... ... ... ... ... ...
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 1/18
6/2/2021 temp-162262094050084495
In [4]:
dataset.describe()
Out[4]:
In [5]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sl_no 215 non-null int64
1 gender 215 non-null object
2 ssc_p 215 non-null float64
3 ssc_b 215 non-null object
4 hsc_p 215 non-null float64
5 hsc_b 215 non-null object
6 hsc_s 215 non-null object
7 degree_p 215 non-null float64
8 degree_t 215 non-null object
9 workex 215 non-null object
10 etest_p 215 non-null float64
11 specialisation 215 non-null object
12 mba_p 215 non-null float64
13 status 215 non-null object
14 salary 148 non-null float64
dtypes: float64(6), int64(1), object(8)
memory usage: 25.3+ KB
We can observe we have null values only in salary column,So we have, 215 - 148 = 67.
In [6]:
dataset['salary'].fillna(value=0, inplace=True)
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 2/18
6/2/2021 temp-162262094050084495
Filling missing value by 0,So that now ,we don't get any null values
In [7]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sl_no 215 non-null int64
1 gender 215 non-null object
2 ssc_p 215 non-null float64
3 ssc_b 215 non-null object
4 hsc_p 215 non-null float64
5 hsc_b 215 non-null object
6 hsc_s 215 non-null object
7 degree_p 215 non-null float64
8 degree_t 215 non-null object
9 workex 215 non-null object
10 etest_p 215 non-null float64
11 specialisation 215 non-null object
12 mba_p 215 non-null float64
13 status 215 non-null object
14 salary 215 non-null float64
dtypes: float64(6), int64(1), object(8)
memory usage: 25.3+ KB
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 3/18
6/2/2021 temp-162262094050084495
In [8]:
column=dataset.select_dtypes(include=['object'])
for col in column:
display(dataset[col].value_counts())
M 139
F 76
Name: gender, dtype: int64
Central 116
Others 99
Name: ssc_b, dtype: int64
Others 131
Central 84
Name: hsc_b, dtype: int64
Commerce 113
Science 91
Arts 11
Name: hsc_s, dtype: int64
Comm&Mgmt 145
Sci&Tech 59
Others 11
Name: degree_t, dtype: int64
No 141
Yes 74
Name: workex, dtype: int64
Mkt&Fin 120
Mkt&HR 95
Name: specialisation, dtype: int64
Placed 148
Not Placed 67
Name: status, dtype: int64
Except for hsc_s and degree_t with 3 classes, all other have 2 classes each and we can notice 148
students are placed and 67 students are not placed. Now the challenge is:
Gender
In [9]:
/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: F
utureWarning: pandas.util.testing is deprecated. Use the functions in the
public API at pandas.testing instead.
import pandas.util.testing as tm
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 4/18
6/2/2021 temp-162262094050084495
In [10]:
plt.style.use('seaborn-white')
f,ax=plt.subplots(1,2,figsize=(18,8))
dataset['gender'].value_counts().plot.pie(explode=[0,0.05],autopct='%1.1f%%',ax=ax[0],s
hadow=True)
ax[0].set_title('gender')
sns.countplot(x = 'gender',hue = "status",data = dataset)
ax[1].set_title('Influence of gender on placement')
plt.show()
ax = sns.barplot(x="gender", y="salary", data=dataset)
plt.show()
So,we observe
The number of placed male students are almost double than placed female students
Male students are offered slightly greater salary than female on an average.
Board of Education(ssc_b,hsc_b,hsc_s)
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 5/18
6/2/2021 temp-162262094050084495
In [11]:
plt.figure(figsize=(10,8))
sns.countplot(x='ssc_b',hue='status',data=dataset)
sns.catplot(x='hsc_b',hue='hsc_s',col='status',data=dataset,kind='count')
plt.show()
So,we observe
In ssc_b,the central board students are placed more than other boards.
But we see in hsc_b,the other boards students are placed more than the central board.
Therfore,Board doesn't matter in placements.
Degree,Specialisation
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 6/18
6/2/2021 temp-162262094050084495
In [12]:
plt.figure(figsize=(7,3))
sns.countplot(x="degree_t", hue='status',data=dataset)
plt.show()
ax = sns.barplot(x="degree_t", y="salary", data=dataset)
plt.show()
sns.countplot(x="specialisation", hue='status',data=dataset)
plt.show()
ax = sns.barplot(x="specialisation", y="salary", data=dataset)
plt.show()
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 7/18
6/2/2021 temp-162262094050084495
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 8/18
6/2/2021 temp-162262094050084495
So,here we observe
Commerce and Science degree students are placed more and other students are less placed.
By salary wise,Sci&tech students gets paid more and second comes Commerce&mgmt and others are
paid less salary.
Specialisation matters lot in placements.Mkt&fin students have more placements compared to
Mkt&HR.By salary wise also MKT&Fin students are highly paid compared to Mkt&HR.
Percentage
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 9/18
6/2/2021 temp-162262094050084495
In [13]:
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 10/18
6/2/2021 temp-162262094050084495
Female students got a higher percentage in all fields as compared to male students.
Students with higher percentages in their 10th,12th and degree have a better chance of placements.
There's no guarantee of placements in MBA for good percentage.
Therefore,percentage doesn't influence over the salary.
Work experience
In [14]:
plt.style.use('seaborn-white')
f,ax=plt.subplots(1,2,figsize=(18,8))
dataset['workex'].value_counts().plot.pie(explode=[0,0.05],autopct='%1.1f%%',ax=ax[0],s
hadow=True)
ax[0].set_title('Work experience')
sns.countplot(x = 'workex',hue = "status",data = dataset)
ax[1].set_title('Influence of experience on placement')
plt.show()
So,we observe
Data preprocessing
In [15]:
x = dataset.iloc[:,[4,7,9,10,11,12]].values
y = dataset.iloc[:,-2].values
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 11/18
6/2/2021 temp-162262094050084495
In [16]:
In [17]:
Out[17]:
In [18]:
In [19]:
x_train
Out[19]:
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 12/18
6/2/2021 temp-162262094050084495
In [20]:
x_test
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 13/18
6/2/2021 temp-162262094050084495
Out[20]:
In [21]:
y_train
Out[21]:
array([1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0,
1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
In [22]:
y_test
Out[22]:
array([1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0])
Standardisation
In [23]:
Logistic Regression
KNN
Decision Tree
Random Forest
linear SVC
Kernel SVC
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 15/18
6/2/2021 temp-162262094050084495
In [24]:
In [25]:
l_cla = LogisticRegression()
k_cla = KNeighborsClassifier(n_neighbors = 10)
d_cla = DecisionTreeClassifier()
r_cla = RandomForestClassifier(n_estimators = 200)
s_cla = SVC(kernel='linear')
ks_cla = SVC(kernel= 'rbf')
In [26]:
l_cla.fit(x_train, y_train)
k_cla.fit(x_train, y_train)
d_cla.fit(x_train, y_train)
r_cla.fit(x_train, y_train)
s_cla.fit(x_train, y_train)
ks_cla.fit(x_train, y_train)
Out[26]:
In [27]:
l_pred = l_cla.predict(x_test)
k_pred = k_cla.predict(x_test)
d_pred = d_cla.predict(x_test)
r_pred = r_cla.predict(x_test)
s_pred = s_cla.predict(x_test)
ks_pred = ks_cla.predict(x_test)
In [28]:
In [29]:
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 16/18
6/2/2021 temp-162262094050084495
In [30]:
print(l_c)
print(k_c)
print(d_c)
print(r_c)
print(s_c)
print(ks_c)
[[15 5]
[ 2 43]]
[[13 7]
[ 3 42]]
[[14 6]
[11 34]]
[[15 5]
[ 1 44]]
[[13 7]
[ 1 44]]
[[10 10]
[ 0 45]]
In [31]:
In [32]:
print('Logistic Regression: ' + str(l_a) + '\nKNN: ' + str(k_a) + '\nDecision Tree: ' +
str(d_a) + '\nRandom Forest: ' + str(r_a) + '\nLinear SVC: ' + str(s_a) + '\nKernel SV
C: ' + str(ks_a))
So we can conclude Random Forest classification model is best fit for our dataset
Conclusion
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 17/18
6/2/2021 temp-162262094050084495
More male students got placed as compared to female students. (Since more male students sat for
placements)
Male students got higher salaries as compared to female students.
Board of Education doesn't matter in placements
Students with higher percentages in 10th,12th and degree have a better chance of placements.But MBA
percentages don't influence over placements.
Students with no work experience got placed more than the students who had work experience.
Specialisation matters lot in placements.Mkt&fin students have more placements compared to
Mkt&HR.By salary wise also MKT&Fin students are highly paid compared to Mkt&HR.
Sci&Tech students gets a higher salary compared to Comm&Mgmt and other degrees.
Thank you
Jayasurya B
https://htmtopdf.herokuapp.com/ipynbviewer/temp/28bc1f836ea2273deddb4b773b060bee/Campus_recruitment.html?t=1622620941713 18/18