Ids Unit 4 Case Study 1 Checking Patterns in Data
Ids Unit 4 Case Study 1 Checking Patterns in Data
Well, the answer to all these questions is one of the simplest things that all of us have
probably been doing since childhood. When we were in school, we were often given the
task of identifying the missing alphabets to predict which number would come in a
sequence next or to join the dots for completing the figure. The prediction of the
missing number or alphabet involved analyzing the trend followed by the given
numbers or alphabets. This is what pattern recognition in Machine Learning means.
# Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno
import warnings
warnings.filterwarnings("ignore")
(1059, 8)
In [5]:
linkcode
df.head()
Checking for NaN values
pd.DataFrame(df.isnull().sum(), columns=["Null Values"]).rename_axis("Column Name")
df.info()
IDS-Unit-4-BCA SEM-IV
Statistical information
df.describe(include = "all")
EDA
for i in df.columns:
print(i)
print(df[i].unique())
print('\n')
for i in df.columns:
print(i)
print(df[i].value_counts())
print('\n')
for i in df.columns:
plt.figure(figsize=(15,6))
sns.histplot(df[i], kde = True, bins = 20, palette = 'hls')
plt.xticks(rotation = 90)
plt.show()
Correlation-
Correlation is a statistical measure that expresses the extent to which two variables are linearly
related (meaning they change together at a constant rate). It's a common tool for describing simple
relationships without making a statement about cause and effect.
df_corr = df.corr()
df_corr
plt.figure(figsize=(10, 8))
matrix = np.triu(df_corr)
sns.heatmap(df_corr, annot=True, linewidth=.8, mask=matrix, cmap="rocket");
plt.show()
sns.pairplot(df,hue="Grade",height=3)
plt.show()
IDS-Unit-4-BCA SEM-IV
# Train-Test Data
X= df.drop("Temprature", axis = 1)
y= df["Temprature"]
Model Building
#LinearRegression-
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)