0% found this document useful (0 votes)
4 views

Lab_questionbank

Uploaded by

devaadi0713
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lab_questionbank

Uploaded by

devaadi0713
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lab Exam: Mid-Semester 1

Subject: AIML (Artificial Intelligence and Machine Learning)


Institute: IEM (Institute of Engineering and Management)

Answer one question from each group: Group A and Group B


*************************************************************************************
Group A
1. Load a dataset into Colab from your local system.
Analysis Part: After loading, explore the dataset using df.info() and df.describe() to understand its
structure.
2. Identify outliers in the 'DIS' column of the Boston dataset using a box plot.
Analysis Part: After plotting, identify any data points that lie outside the whiskers and discuss possible
causes.
3. Visualize the relationship between 'INDUS' and 'TAX' columns in the Boston dataset using a scatter
plot.
Analysis Part: Discuss whether there is a noticeable correlation between industrial areas and tax rates.
4. Load the 'titanic.csv' dataset from Kaggle into Colab.
Analysis Part: Use sns.boxplot to identify outliers in the 'Fare' column.
5. Handle missing values in the 'Age' column of the Titanic dataset.
Analysis Part: Use df['Age'].fillna(df['Age'].median()) to fill missing values and visualize the distribution
of the 'Age' column using a box plot.
6. Encode categorical variables in the 'titanic.csv' dataset using one-hot encoding. Analysis
Part: Apply one-hot encoding to the 'Embarked' column and display the first five rows. 7.
Perform label encoding on the 'Species' column of the 'iris.csv' dataset.
Analysis Part: Apply label encoding using LabelEncoder from sklearn and visualize the distribution of
encoded species.
8. Normalize the 'sepal_length' and 'sepal_width' columns of the 'iris.csv' dataset. Analysis Part: Use
StandardScaler to normalize these columns and visualize them with a scatter plot. 9. Identify missing
values in the 'housing.csv' dataset from UCI Machine Learning Repository. Analysis Part: Use
df.isnull().sum() to check for missing values and visualize using a bar plot. 10. Drop rows with
missing values in the 'housing.csv' dataset.
Analysis Part: Use df.dropna() and compare the shape of the dataset before and after dropping rows. 11.
Replace missing values in the 'TotalBsmtSF' column of the 'housing.csv' dataset with the mean value.
Analysis Part: Use df['TotalBsmtSF'].fillna(df['TotalBsmtSF'].mean()) and visualize using a histogram.

*************************************************************************************
*******************
Group B

12. Visualize outliers in the 'GrLivArea' column of the 'housing.csv' dataset using a box plot.
Analysis Part: After plotting, discuss any visible outliers and potential impact on model performance.
13. Split the 'diabetes.csv' dataset from Kaggle into train and test sets.
Analysis Part: Use train_test_split to create training and testing sets, and display their shapes.
14. Normalize the 'BMI' and 'Glucose' columns of the 'diabetes.csv' dataset.
Analysis Part: Apply MinMaxScaler and visualize the normalized data with a scatter plot. 15. Handle
missing values in the 'Pregnancies' column of the 'diabetes.csv' dataset. Analysis Part: Use
df['Pregnancies'].fillna(0) to fill missing values and visualize the distribution with a bar plot.
16. Detect missing values in the 'abalone.csv' dataset from UCI Machine Learning Repository.
Analysis Part: Use df.isnull().sum() to check for missing values and create a heatmap visualization.
17. Drop columns with missing values in the 'abalone.csv' dataset.
Analysis Part: Use df.dropna(axis='columns') and discuss any columns that were dropped.
18. Use one-hot encoding on the 'Sex' column of the 'abalone.csv' dataset.
Analysis Part: Apply one-hot encoding and visualize the distribution of each sex category. 19.
Perform label encoding on the 'diagnosis' column of the 'cancer.csv' dataset from UCI Machine
Learning Repository.
Analysis Part: Use LabelEncoder to encode the 'diagnosis' column and visualize the class distribution.
20. Identify outliers in the 'area_mean' column of the 'cancer.csv' dataset using a box plot. Analysis
Part: Plot and identify any outliers in the 'area_mean' column.
21. Normalize the 'perimeter_mean' and 'concavity_mean' columns of the 'cancer.csv' dataset.
Analysis Part: Use StandardScaler and visualize the normalized data with a scatter plot. 22.
Visualize the relationship between 'age' and 'cholesterol' in the 'heart.csv' dataset from Kaggle.
Analysis Part: Use a scatter plot and discuss any visible patterns or correlations. 23. Handle
missing values in the 'thalach' column of the 'heart.csv' dataset.
Analysis Part: Use df['thalach'].fillna(df['thalach'].median()) to fill missing values and visualize using a
histogram.
24. Encode the 'gender' column in the 'adult.csv' dataset from UCI Machine Learning Repository.
Analysis Part: Use label encoding and visualize the distribution of genders.
25. Normalize the 'hours-per-week' column of the 'adult.csv' dataset.
Analysis Part: Apply MinMaxScaler and create a box plot for the normalized data. 26. Drop rows with
missing values in the 'LoanAmount' column of the 'loan.csv' dataset from Kaggle. Analysis Part: Use
df.dropna(subset=['LoanAmount']) and compare the dataset size before and after. 27. Perform a scatter
plot analysis between 'ApplicantIncome' and 'LoanAmount' in the 'loan.csv' dataset. Analysis Part: Plot
and discuss any visible correlations or patterns.
28. Identify outliers in the 'Age' column of the 'credit.csv' dataset from UCI Machine Learning
Repository.
Analysis Part: Use a box plot to identify any outliers and discuss their implications.
29. Use one-hot encoding for the 'Education' column in the 'credit.csv' dataset.
Analysis Part: Apply one-hot encoding and visualize the new column distribution. 30. Handle missing
values in the 'CreditAmount' column of the 'credit.csv' dataset. Analysis Part: Use
df['CreditAmount'].fillna(df['CreditAmount'].mean()) and visualize using a histogram. 31. Visualize the
correlation between 'Age' and 'CreditAmount' in the 'credit.csv' dataset using a scatter plot.
Analysis Part: Plot and analyze any potential relationships.
32. Encode the 'Smoker' column in the 'insurance.csv' dataset from Kaggle using label encoding.
Analysis Part: Use label encoding and visualize the distribution of smokers.
33. Normalize the 'BMI' and 'Charges' columns of the 'insurance.csv' dataset.
Analysis Part: Use StandardScaler and create a scatter plot to visualize normalized data.
34. Drop columns with missing values in the 'cars.csv' dataset from UCI Machine Learning Repository.
Analysis Part: Use df.dropna(axis='columns') and discuss which columns were dropped. 35. Identify
and handle outliers in the 'Horsepower' column of the 'cars.csv' dataset. Analysis Part: Use a box plot
to detect outliers and discuss strategies for handling them (e.g., capping, removing).

You might also like