0% found this document useful (0 votes)
7 views30 pages

NASHEEEEYYYYYY

The document consists of a series of questions and answers related to data types, feature selection, machine learning concepts, and Python libraries such as NumPy and Pandas. It covers topics like structured vs unstructured data, types of numerical data, feature engineering, and data visualization techniques. The content is organized into two units, with Unit 1 focusing on data concepts and feature selection, and Unit 2 on practical applications using Python.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views30 pages

NASHEEEEYYYYYY

The document consists of a series of questions and answers related to data types, feature selection, machine learning concepts, and Python libraries such as NumPy and Pandas. It covers topics like structured vs unstructured data, types of numerical data, feature engineering, and data visualization techniques. The content is organized into two units, with Unit 1 focusing on data concepts and feature selection, and Unit 2 on practical applications using Python.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

ONE-FINAL-DANCE

AI VS AI 
UNIT 01

1. What is the primary difference between data and information?

A) Data is processed information


B) Information is raw data
C) Data is unstructured, whereas information is structured and meaningful
D) There is no difference between data and information
Answer: C

2. Which of the following is NOT a type of numerical data?

A) Discrete data
B) Continuous data
C) Ordinal data
D) Both A and B
Answer: C

3. Discrete data is characterized by:

A) Taking any value within a range


B) Being countable and finite
C) Having no real value
D) Being unstructured
Answer: B

4. Which of the following is an example of continuous data?

A) Number of students in a class


B) Temperature in Celsius
C) Number of books on a shelf
D) Number of cars in a parking lot
Answer: B

5. Categorical data is classified into which two types?

A) Ordinal and Nominal


B) Discrete and Continuous
C) Structured and Unstructured
D) Labeled and Unlabeled
Answer: A

6. Which of the following is an example of ordinal data?


A) Blood type (A, B, AB, O)
B) Movie ratings (1 star, 2 stars, 3 stars, etc.)
C) Names of students
D) ZIP codes
Answer: B

7. What is nominal data?

A) Data with meaningful order


B) Data that represents categories with no intrinsic ranking
C) Data that can be counted
D) Data with numerical values
Answer: B

8. Time series data is characterized by:

A) Randomized values
B) Data collected over different categories
C) Data points indexed in time order
D) Lack of time dependency
Answer: C

9. Unstructured data can include:

A) Audio files
B) Images
C) Videos
D) All of the above
Answer: D

10. Which of the following is NOT an example of structured data?

A) Customer names and addresses in a database


B) Financial transaction records
C) Social media posts
D) Inventory details
Answer: C

11. Data labeling is required for:

A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) None of the above
Answer: A

12. The process of identifying and assigning meaningful labels to raw data is called:

A) Data annotation
B) Data cleansing
C) Data extraction
D) Data transformation
Answer: A

13. What is a feature in machine learning?

A) A function that transforms data


B) An attribute or property used for prediction
C) A type of data preprocessing technique
D) A neural network model
Answer: B

14. Why is feature selection important?

A) To improve model accuracy


B) To reduce computational cost
C) To prevent overfitting
D) All of the above
Answer: D

15. What is feature selection?

A) Extracting new features from existing data


B) Selecting a subset of relevant features for model building
C) Transforming categorical data into numerical data
D) Applying data augmentation
Answer: B

16. Which of the following is a feature selection algorithm?

A) Decision Trees
B) Principal Component Analysis (PCA)
C) Sequential Forward Selection (SFS)
D) K-Means Clustering
Answer: C

17. Sequential forward selection works by:

A) Removing one feature at a time


B) Adding one feature at a time based on model performance
C) Selecting all features initially and then pruning
D) Randomly selecting features
Answer: B

18. Sequential backward selection works by:

A) Adding one feature at a time


B) Removing one feature at a time based on performance
C) Selecting random features
D) Using only categorical features
Answer: B
19. Bidirectional feature selection is a combination of:

A) Forward and backward selection


B) PCA and Decision Trees
C) Feature extraction and transformation
D) Data labeling and augmentation
Answer: A

20. Feature extraction is different from feature selection because it:

A) Selects the most important features


B) Creates new features from existing ones
C) Removes unnecessary features
D) Works only with categorical data
Answer: B

21. What is an advantage of feature extraction?

A) Reduces dimensionality
B) Improves model accuracy
C) Handles correlated features effectively
D) All of the above
Answer: D

22. Principal Component Analysis (PCA) is used for:

A) Feature selection
B) Feature extraction
C) Data labeling
D) Clustering
Answer: B

23. Which feature selection method ranks features based on importance scores?

A) Wrapper method
B) Filter method
C) Embedded method
D) None of the above
Answer: B

24. The wrapper method of feature selection:

A) Evaluates different feature subsets using a predictive model


B) Uses statistical tests to rank features
C) Relies on expert knowledge
D) Creates new features from existing ones
Answer: A

25. Recursive Feature Elimination (RFE) is an example of:


A) Filter method
B) Wrapper method
C) Embedded method
D) Clustering technique
Answer: B

26. Feature selection helps in reducing _____.

A) Model interpretability
B) Overfitting
C) Data redundancy
D) Model complexity
Answer: B

27. A key drawback of wrapper methods is _____.

A) They are fast and efficient


B) They are computationally expensive
C) They use statistical tests for selection
D) They ignore feature interactions
Answer: B

28. Information Gain is used in _____.

A) Feature transformation
B) Decision tree-based feature selection
C) Feature encoding
D) Model evaluation
Answer: B

29. Mutual Information measures _____.

A) Model accuracy
B) Class imbalance
C) The dependency between two variables
D) Feature extraction efficiency
Answer: C

30. Overfitting occurs when _____.

A) The model has too few features


B) The model learns noise instead of patterns
C) Training and testing errors are both high
D) The model generalizes well to new data
Answer: B

31. Variance Inflation Factor (VIF) is used for _____.


A) Feature encoding
B) Model evaluation
C) Detecting multicollinearity among features
D) Imputing missing values
Answer: C

32. High correlation among features can cause _____.

A) Data sparsity
B) Multicollinearity
C) Model underfitting
D) Data augmentation
Answer: B

33. Dimensionality reduction is crucial for _____.

A) Increasing computational cost


B) Making models more complex
C) Removing class imbalance
D) Improving model efficiency
Answer: D

34. The Curse of Dimensionality affects _____.

A) High-dimensional datasets
B) Low-dimensional datasets
C) Time-series analysis only
D) Feature selection algorithms
Answer: A

35. The main purpose of feature engineering is to _____.

A) Improve model performance


B) Increase dataset size
C) Reduce computation time
D) Convert all data into categorical form
Answer: A

36. A good feature should _____.

A) Be highly correlated with the target variable


B) Be redundant with other features
C) Be computationally expensive
D) Have high variance with low bias
Answer: A

37. Outliers can affect _____.

A) Mean-based models like linear regression


B) Decision trees only
C) Only categorical variables
D) None of the above
Answer: A

38. A balanced dataset improves _____.

A) Model interpretability
B) Training time
C) Model generalization
D) Overfitting
Answer: C

39. Normalization is used to _____.

A) Convert categorical data into numerical data


B) Scale features to a common range
C) Increase data complexity
D) Improve class imbalance
Answer: B

40. Encoding categorical variables helps in _____.

A) Reducing noise in data


B) Converting text-based categories into numerical form
C) Handling missing values
D) Data augmentation
Answer: B

41. One-hot encoding is best suited for _____.

A) Ordinal data
B) Highly correlated features
C) Nominal categorical data
D) Continuous data
Answer: C

42. Feature scaling is important for _____.

A) Decision trees
B) Rule-based systems
C) Distance-based models like KNN
D) Label encoding
Answer: C

43. Standardization transforms data into _____.

A) Values between 0 and 1


B) A normal distribution with mean 0 and standard deviation 1
C) Binary features
D) Integer values
Answer: B
44. Feature transformation can include _____.

A) Logarithmic scaling
B) Polynomial features
C) Normalization and standardization
D) All of the above
Answer: D

45. What is a common technique for handling missing values?

A) Dropping missing rows


B) Imputation using mean/median/mode
C) Predictive modeling for missing values
D) All of the above
Answer: D

46. L1 regularization can be used for _____.

A) Feature selection
B) Model evaluation
C) Encoding categorical data
D) Data augmentation
Answer: A

47. Feature selection can be done using _____.

A) Lasso regression
B) Recursive Feature Elimination (RFE)
C) Information Gain
D) All of the above
Answer: D

48. Data augmentation is primarily used in _____.

A) Structured datasets
B) Image and text data
C) Time-series forecasting
D) Feature selection
Answer: B

49. Variance Threshold method is a _____.

A) Wrapper method
B) Feature extraction technique
C) Filter method
D) Supervised learning approach
Answer: C

50. Feature engineering requires _____.


A) Domain knowledge
B) Data preprocessing techniques
C) Machine learning expertise
D) All of the above
Answer: D

UNIT 2

1. What is NumPy primarily used for in Python?

A) String manipulation
B) Handling large numerical datasets efficiently
C) Web development
D) File handling
Answer: B

2. Which of the following is the correct way to create a NumPy array?

A) array = np.array([1, 2, 3])


B) array = np.create([1, 2, 3])
C) array = numpy([1, 2, 3])
D) array = np.array[1, 2, 3]
Answer: A

3. What function is used to create a NumPy array of zeros?

A) np.ones(shape)
B) np.zeros(shape)
C) np.empty(shape)
D) np.full(shape, 0)
Answer: B

4. How can you access the shape of a NumPy array arr?

A) arr.dim()
B) arr.shape
C) arr.len()
D) arr.size()
Answer: B

5. Which NumPy function is used for element-wise addition of two arrays?

A) np.add(arr1, arr2)
B) arr1 + arr2
C) np.sum(arr1, arr2)
D) Both A and B
Answer: D
6. In NumPy, what does arr.reshape(2,3) do?

A) Converts the array into a 2x3 matrix


B) Flattens the array
C) Removes empty values
D) None of the above
Answer: A

7. Which Python library is used for data manipulation and analysis?

A) NumPy
B) Pandas
C) Matplotlib
D) Scikit-learn
Answer: B

8. What is a DataFrame in Pandas?

A) A table-like data structure


B) A one-dimensional array
C) A NumPy array
D) A type of function
Answer: A

9. How do you read a CSV file into a Pandas DataFrame?

A) pd.load_csv('file.csv')
B) pd.read_csv('file.csv')
C) pd.import_csv('file.csv')
D) pd.open_csv('file.csv')
Answer: B

10. What is the default axis for operations in Pandas?

A) axis=0 (Column-wise)
B) axis=1 (Row-wise)
C) axis=2
D) It depends on the function
Answer: A

11. How do you check for missing values in a DataFrame?

A) df.missing()
B) df.check_null()
C) df.isnull()
D) df.has_nan()
Answer: C

12. What function is used to remove missing values in Pandas?


A) dropna()
B) remove_na()
C) delete_null()
D) clean_na()
Answer: A

13. Principal Component Analysis (PCA) is primarily used for?

A) Data Visualization
B) Dimensionality Reduction
C) Data Augmentation
D) Feature Selection
Answer: B

14. PCA works by reducing dimensions while preserving?

A) Exact data points


B) As much variance as possible
C) All categorical features
D) Column names
Answer: B

15. How is Linear Discriminant Analysis (LDA) different from PCA?

A) LDA focuses on class separability, while PCA focuses on variance


B) LDA is unsupervised, PCA is supervised
C) LDA increases the number of dimensions
D) PCA removes class labels
Answer: A

16. Which of the following is true about PCA?

A) It increases dimensionality
B) It finds the most important features
C) It replaces missing values
D) It only works for categorical data
Answer: B

17. What is the primary purpose of Matplotlib?

A) Machine learning
B) Data visualization
C) Database management
D) Image processing
Answer: B

18. Which function is used to create a line plot in Matplotlib?

A) plt.plot()
B) plt.line()
C) plt.scatter()
D) plt.bar()
Answer: A

19. What function is used to create subplots in Matplotlib?

A) plt.subplots()
B) plt.figure()
C) plt.subplot()
D) plt.multi_plot()
Answer: A

20. How do you create a scatter plot in Matplotlib?

A) plt.scatter(x, y)
B) plt.plot(x, y, 'o')
C) plt.scatterplot(x, y)
D) Both A and B
Answer: D

21. Which function is used to create a bar graph in Matplotlib?

A) plt.bar()
B) plt.bargraph()
C) plt.hist()
D) plt.bars()
Answer: A

22. What is the purpose of a histogram?

A) To show relationships between two variables


B) To display the distribution of numerical data
C) To plot categorical data
D) To visualize time-series data
Answer: B

23. How do you create a pie chart in Matplotlib?

A) plt.pie(values, labels=labels)
B) plt.circle(values, labels=labels)
C) plt.plot(values, labels=labels)
D) plt.piechart(values, labels=labels)
Answer: A

24. What command is used to display the plots created with Matplotlib?

A) plt.display()
B) plt.show()
C) plt.plot()
D) plt.draw()
Answer: B
25. Which function allows you to set labels for the x-axis and y-axis in Matplotlib?

A) plt.xlabel() and plt.ylabel()


B) plt.xlabels() and plt.ylabels()
C) plt.labelx() and plt.labely()
D) plt.axis_labels()
Answer: A

26. What is the default data type of a NumPy array?

A) int
B) float
C) str
D) object
Answer: B

27. Which NumPy function is used to generate an array of evenly spaced values?

A) np.linspace(start, stop, num)


B) np.arange(start, stop, step)
C) np.range(start, stop, step)
D) Both A and B
Answer: D

28. What does arr.T do in NumPy?

A) Transposes the array


B) Converts the array into a tuple
C) Returns a flattened array
D) Sorts the array
Answer: A

29. How do you compute the mean of a NumPy array arr?

A) arr.mean()
B) np.mean(arr)
C) arr.average()
D) Both A and B
Answer: D

30. Which of the following statements about Pandas Series is true?

A) It is a one-dimensional labeled array


B) It can hold only numerical data
C) It does not support missing values
D) It is faster than NumPy arrays
Answer: A

31. How can you access the first 5 rows of a DataFrame df?
A) df.head(5)
B) df.first(5)
C) df.start(5)
D) df.initial(5)
Answer: A

32. What will df.info() return?

A) Summary of dataset including data types and non-null values


B) Descriptive statistics of numerical columns
C) Shape of the DataFrame
D) A graphical representation of data
Answer: A

33. How do you drop duplicate rows in a DataFrame?

A) df.remove_duplicates()
B) df.drop_duplicates()
C) df.delete_duplicates()
D) df.clear_duplicates()
Answer: B

34. What does df.iloc[0:5, 1:3] do?

A) Selects the first 5 rows and columns 1 to 3 (excluding 3)


B) Selects all rows and first 3 columns
C) Selects row index 1 to 3 and all columns
D) Returns the entire DataFrame
Answer: A

35. What is the main objective of PCA?

A) To extract the most important features


B) To increase the dimensionality of data
C) To remove outliers
D) To balance class distribution
Answer: A

36. In PCA, principal components are selected based on?

A) Their variance
B) Their correlation with target variable
C) Their mean value
D) Their sum
Answer: A

37. How many principal components can be extracted from a dataset with n features?

A) n+1
B) n-1
C) n
D) n/2
Answer: C

38. Which step is performed first in PCA?

A) Compute eigenvectors and eigenvalues


B) Standardize the data
C) Select principal components
D) Transform the data
Answer: B

39. Linear Discriminant Analysis (LDA) is mainly used for?

A) Dimensionality reduction with class separation


B) Clustering
C) Outlier detection
D) Feature scaling
Answer: A

40. LDA maximizes which type of variance?

A) Within-class variance
B) Between-class variance
C) Total variance
D) Feature variance
Answer: B

41. Which type of plot is best for showing trends over time?

A) Bar plot
B) Line plot
C) Pie chart
D) Histogram
Answer: B

42. What does plt.hist(data) do?

A) Creates a histogram
B) Creates a scatter plot
C) Displays descriptive statistics
D) Sorts the data
Answer: A

43. What argument in Matplotlib controls line style?

A) linestyle='dashed'
B) linetype='solid'
C) style='line'
D) plot_type='curve'
Answer: A
44. How can you add a title to a plot in Matplotlib?

A) plt.set_title("Title")
B) plt.title("Title")
C) plt.add_title("Title")
D) plt.label("Title")
Answer: B

45. Which function is used to add grid lines in a Matplotlib plot?

A) plt.grid(True)
B) plt.show_grid()
C) plt.add_grid()
D) plt.enable_grid()
Answer: A

46. In Matplotlib, what does plt.legend() do?

A) Adds labels to axes


B) Adds a legend to the plot
C) Creates a bar chart
D) Hides grid lines
Answer: B

47. Which of the following is true about plt.subplot(rows, cols, index)?

A) Creates a figure with subplots


B) Defines the position of each subplot
C) Index starts from 1
D) All of the above
Answer: D

48. Which function is used to save a plot in Matplotlib?

A) plt.export("plot.png")
B) plt.save("plot.png")
C) plt.savefig("plot.png")
D) plt.store("plot.png")
Answer: C

49. What is the difference between plt.bar() and plt.hist()?

A) plt.bar() is for categorical data, plt.hist() is for continuous data


B) plt.bar() is a histogram, plt.hist() is a bar chart
C) They are the same
D) plt.bar() is used for scatter plots
Answer: A

50. Which function is used to set the figure size in Matplotlib?


A) plt.figure(figsize=(width, height))
B) plt.size(width, height)
C) plt.figsize(width, height)
D) plt.shape(width, height)
Answer: A

UNIT 03

1. What is classification in machine learning?

A) A technique to divide data into groups based on similarity


B) A technique to predict continuous values
C) A technique for clustering data
D) A method for dimensionality reduction
Answer: A

2. Which of the following algorithms is NOT used for classification?

A) K-Nearest Neighbors (KNN)


B) Decision Tree
C) Linear Regression
D) Naïve Bayes
Answer: C

3. In K-Nearest Neighbors (KNN), what does the ‘K’ represent?

A) Number of clusters
B) Number of nearest neighbors
C) Number of classes
D) Number of features
Answer: B

4. How does KNN classify a new data point?

A) By choosing the class that appears most frequently among its K nearest neighbors
B) By computing the mean of K nearest points
C) By using decision boundaries
D) By using probability scores
Answer: A

5. What is the major drawback of KNN?

A) It requires a large amount of labeled training data


B) It does not work with high-dimensional data
C) It is sensitive to the choice of K
D) All of the above
Answer: D
6. Which of the following is true about the Decision Tree algorithm?

A) It is based on probability
B) It uses a tree-like model of decisions
C) It only works for binary classification
D) It is the same as KNN
Answer: B

7. What is the process of dividing a Decision Tree into multiple branches called?

A) Splitting
B) Pruning
C) Leafing
D) Partitioning
Answer: A

8. Which criterion is commonly used for selecting the best split in a Decision Tree?

A) Mean Square Error


B) Entropy
C) Root Mean Square Error
D) Adjusted R-Squared
Answer: B

9. What is the role of pruning in Decision Trees?

A) It adds more branches to the tree


B) It removes branches to prevent overfitting
C) It increases the complexity of the tree
D) It normalizes the dataset
Answer: B

10. What assumption does the Naïve Bayes classifier make?

A) Features are dependent on each other


B) Features are independent given the class label
C) Data must be normally distributed
D) It does not require labeled data
Answer: B

11. Which formula is used in Naïve Bayes to calculate probabilities?

A) Bayes’ Theorem
B) Euclidean Distance Formula
C) Gradient Descent
D) Sigmoid Function
Answer: A

12. Which classifier is best suited for text classification problems like spam filtering?
A) Decision Tree
B) Naïve Bayes
C) KNN
D) SVM
Answer: B

13. What does the Support Vector Machine (SVM) algorithm do?

A) It finds the best hyperplane that separates data points from different classes
B) It constructs decision trees for classification
C) It predicts continuous values
D) It uses K nearest neighbors for classification
Answer: A

14. What is the main purpose of the kernel trick in SVM?

A) To reduce computation time


B) To handle nonlinear data by transforming it into a higher dimension
C) To normalize the data
D) To prune the model
Answer: B

15. Which of the following is a commonly used kernel in SVM?

A) Linear Kernel
B) Polynomial Kernel
C) Radial Basis Function (RBF) Kernel
D) All of the above
Answer: D

16. In regression, what is the dependent variable?

A) The variable that is being predicted


B) The variable used to predict the outcome
C) The constant term in the equation
D) The error term
Answer: A

17. Which of the following is an assumption of Linear Regression?

A) The relationship between the independent and dependent variable is linear


B) There should be no multicollinearity among independent variables
C) Residuals should be normally distributed
D) All of the above
Answer: D

18. What does the coefficient in a Linear Regression model represent?

A) The change in the dependent variable for a one-unit change in the independent variable
B) The error term in the model
C) The total variance of the data
D) The intercept value
Answer: A

19. What is the formula for simple linear regression?

A) Y = mx + c
B) Y = b0 + b1X + e
C) Y = X1 + X2 + X3
D) Y = ax^2 + bx + c
Answer: B

20. Which metric is commonly used to evaluate the performance of a regression model?

A) Accuracy
B) Confusion Matrix
C) Mean Squared Error (MSE)
D) F1-Score
Answer: C

21. What is the main difference between Linear and Polynomial Regression?

A) Polynomial Regression captures nonlinear relationships by adding polynomial terms


B) Polynomial Regression requires fewer features
C) Linear Regression is used only for categorical variables
D) Polynomial Regression increases bias
Answer: A

22. If a regression model overfits the training data, what technique can be used?

A) Increasing the number of features


B) Regularization techniques like Lasso or Ridge
C) Ignoring irrelevant variables
D) Reducing the training set size
Answer: B

23. What is the main drawback of Polynomial Regression?

A) It is computationally expensive
B) It may overfit the data
C) It assumes linear relationships
D) It cannot handle large datasets
Answer: B

24. What is the role of the R-squared metric in regression?

A) It measures the accuracy of classification models


B) It tells how well the independent variables explain the variance of the dependent variable
C) It calculates the confusion matrix
D) It is used only in logistic regression
Answer: B
25. Which of the following algorithms can be used for both classification and regression?

A) K-Nearest Neighbors (KNN)


B) Decision Tree
C) Support Vector Machine (SVM)
D) All of the above
Answer: D

26. What is the main objective of classification algorithms?

A) Predicting continuous values


B) Assigning labels to data points
C) Finding clusters in data
D) Reducing dimensionality
Answer: B

27. What is the best way to handle missing values in KNN classification?

A) Remove rows with missing values


B) Replace missing values with mean/median
C) Use KNN imputation
D) Any of the above methods can work
Answer: D

28. Which distance metric is commonly used in KNN classification?

A) Manhattan Distance
B) Euclidean Distance
C) Cosine Similarity
D) Jaccard Similarity
Answer: B

29. What happens if we choose a very large K value in KNN?

A) Model may underfit


B) Model may overfit
C) Model will be too fast
D) Model becomes highly sensitive to noise
Answer: A

30. Which feature selection method is commonly used in Decision Trees?

A) Correlation Coefficient
B) Information Gain
C) K-Means Clustering
D) Cross-validation
Answer: B

31. What is the depth of a Decision Tree?


A) Number of leaves in the tree
B) Number of features in the dataset
C) Longest path from root to a leaf node
D) Number of internal nodes
Answer: C

32. What is the major disadvantage of Decision Trees?

A) High bias
B) Low accuracy
C) Overfitting
D) Cannot handle categorical data
Answer: C

33. How can you prevent overfitting in Decision Trees?

A) Pruning
B) Using a smaller dataset
C) Increasing tree depth
D) Using KNN instead
Answer: A

34. What is the primary advantage of Naïve Bayes classifier?

A) Works well with small datasets


B) Works well with high-dimensional data
C) Works well with text classification problems
D) All of the above
Answer: D

35. Which Naïve Bayes classifier is suitable for text classification?

A) Gaussian Naïve Bayes


B) Bernoulli Naïve Bayes
C) Multinomial Naïve Bayes
D) None of the above
Answer: C

36. What is the main assumption of Naïve Bayes?

A) Features are dependent on each other


B) Features are independent given the class label
C) Data is normally distributed
D) No missing values are allowed
Answer: B

37. What is the role of the hyperplane in Support Vector Machines (SVM)?

A) To separate different classes in the dataset


B) To minimize classification errors
C) To maximize the margin between different classes
D) All of the above
Answer: D

38. Which kernel function is best suited for non-linearly separable data in SVM?

A) Linear Kernel
B) Polynomial Kernel
C) RBF (Radial Basis Function) Kernel
D) Sigmoid Kernel
Answer: C

39. What is the role of the C parameter in SVM?

A) Controls the trade-off between margin size and misclassification


B) Determines the number of support vectors
C) Adjusts the number of classes
D) Controls the kernel function
Answer: A

40. What does regularization in regression help with?

A) Reducing overfitting
B) Increasing complexity of the model
C) Improving training accuracy
D) None of the above
Answer: A

41. Which of the following methods is used for regularization in regression?

A) L1 Regularization (Lasso)
B) L2 Regularization (Ridge)
C) Elastic Net
D) All of the above
Answer: D

42. Which metric is most commonly used for evaluating regression models?

A) Accuracy
B) F1-score
C) Mean Absolute Error (MAE)
D) Precision
Answer: C

43. What happens if we use a high-degree polynomial in Polynomial Regression?

A) Model may overfit the data


B) Model may underfit the data
C) Model will always be more accurate
D) Model will be unaffected
Answer: A
44. What is heteroscedasticity in regression?

A) Variance of residuals is constant


B) Variance of residuals changes across different values of independent variables
C) Data is normally distributed
D) None of the above
Answer: B

45. What is the key difference between classification and regression?

A) Classification predicts categorical labels, whereas regression predicts continuous values


B) Classification is always supervised, while regression is unsupervised
C) Classification is slower than regression
D) Regression requires more data than classification
Answer: A

46. What is the impact of outliers in Linear Regression?

A) They have no effect


B) They can significantly distort the regression line
C) They improve the accuracy of the model
D) They help in feature selection
Answer: B

47. Which method is used to detect multicollinearity in regression?

A) Variance Inflation Factor (VIF)


B) Confusion Matrix
C) ROC Curve
D) k-Fold Cross Validation
Answer: A

48. Which algorithm can be used for both classification and regression tasks?

A) Decision Tree
B) KNN
C) Support Vector Machine
D) All of the above
Answer: D

49. What is the role of the bias term in regression models?

A) It represents the model’s assumptions about the data


B) It helps in reducing overfitting
C) It controls the learning rate
D) It ensures that the model always passes through the origin
Answer: A

50. What happens if the learning rate is too high in a regression model?
A) The model converges too quickly
B) The model may fail to converge and oscillate
C) The model achieves better accuracy
D) The model performs well on test data
Answer: B

BOOM-BOOM
1. What is the key difference between data and information?

A) Data is structured, while information is always unstructured


B) Data is raw facts, while information is processed and meaningful
C) Information is always numerical, while data is categorical
D) Data is always stored in a database, whereas information is not
Answer: B

2. Which of the following is an example of numerical (continuous) data?

A) Number of students in a class


B) Phone numbers
C) Temperature in Celsius
D) Zip codes
Answer: C

3. Which type of categorical data has an inherent order?

A) Nominal data
B) Ordinal data
C) Continuous data
D) Discrete data
Answer: B

4. What is the main characteristic of time-series data?

A) It contains missing values


B) It is always categorical
C) It has a time-dependent order
D) It does not require timestamps
Answer: C

5. What is the process of adding meaningful labels to raw data called?

A) Data Cleaning
B) Data Labeling
C) Feature Extraction
D) Data Transformation
Answer: B

6. Why is feature selection important in machine learning?


A) It improves model interpretability
B) It reduces overfitting
C) It speeds up model training
D) All of the above
Answer: D

7. Which of the following is NOT a feature selection technique?

A) Sequential Forward Selection


B) Principal Component Analysis (PCA)
C) Sequential Backward Selection
D) Bidirectional Feature Selection
Answer: B

8. What is the primary goal of feature extraction?

A) Reduce the number of input variables while retaining information


B) Remove missing values from the dataset
C) Create more categorical variables
D) Increase model complexity
Answer: A

9. Which of the following is a wrapper method for feature selection?

A) Recursive Feature Elimination (RFE)


B) Chi-Square Test
C) Pearson Correlation
D) Mutual Information
Answer: A

10. What is a major disadvantage of using too many features in a model?

A) Underfitting
B) Increased computational cost
C) Higher accuracy
D) Model becomes more interpretable
Answer: B

11. Which of the following is the correct way to create a 2D NumPy array?

A) np.array([[1,2,3], [4,5,6]])
B) np.array((1,2,3), (4,5,6))
C) np.array(1,2,3,4,5,6)
D) np.create([1,2,3], [4,5,6])
Answer: A

12. How can you find the shape of a NumPy array?


A) array.shape()
B) array.shape
C) array.size()
D) array.dimension()
Answer: B

13. Which of the following can be used to read a CSV file in Pandas?

A) pd.read_csv(‘file.csv’)
B) pd.load_csv(‘file.csv’)
C) pd.open_csv(‘file.csv’)
D) pd.read(‘file.csv’)
Answer: A

14. What does df.info() display in Pandas?

A) Summary statistics of numerical columns


B) Basic information about data types and missing values
C) A histogram of the dataset
D) Correlation between features
Answer: B

15. Which function is used to drop missing values from a Pandas DataFrame?

A) df.dropna()
B) df.clean()
C) df.remove_na()
D) df.drop_missing()
Answer: A

16. Which technique is commonly used for dimensionality reduction?

A) PCA (Principal Component Analysis)


B) K-Means Clustering
C) Logistic Regression
D) Cross-validation
Answer: A

17. What is the main difference between PCA and LDA?

A) PCA is supervised, LDA is unsupervised


B) PCA maximizes variance, LDA maximizes class separability
C) PCA is for classification, LDA is for regression
D) LDA requires categorical data, PCA does not
Answer: B

18. Which visualization is best for showing the distribution of a numerical variable?

A) Bar Graph
B) Histogram
C) Line Plot
D) Pie Chart
Answer: B

19. What is the purpose of plt.subplot() in Matplotlib?

A) Create multiple figures


B) Combine multiple plots in one figure
C) Remove a plot from a figure
D) Change plot colors
Answer: B

20. Which argument is used to change the marker type in a scatter plot?

A) marker=
B) symbol=
C) type=
D) point=
Answer: A

21. What is the main objective of classification?

A) Predict continuous values


B) Assign labels to input data
C) Reduce dimensions
D) Perform clustering
Answer: B

22. How does K-Nearest Neighbors (KNN) classify a new data point?

A) By finding the class label of the nearest neighbor


B) By computing the mean of nearest points
C) By building decision trees
D) By calculating probability distributions
Answer: A

23. Which algorithm is based on the probability theorem?

A) Decision Tree
B) KNN
C) Naïve Bayes
D) SVM
Answer: C

24. What is the role of the hyperplane in SVM?

A) Separates classes in the dataset


B) Performs feature scaling
C) Reduces dimensionality
D) Identifies the most important features
Answer: A

25. Which function is used to evaluate a classification model’s performance?

A) Mean Absolute Error


B) Accuracy Score
C) Root Mean Squared Error
D) R-squared Value
Answer: B

26. What is the output of a regression model?

A) A class label
B) A probability distribution
C) A continuous value
D) A confusion matrix
Answer: C

27. What assumption does Linear Regression make?

A) Linear relationship between variables


B) Features are dependent on each other
C) Features are non-linearly distributed
D) Data must be categorical
Answer: A

28. What is the main disadvantage of Polynomial Regression?

A) It cannot model nonlinear relationships


B) It is prone to overfitting
C) It is not interpretable
D) It only works with categorical data
Answer: B

29. What is the purpose of regularization in regression?

A) To reduce overfitting
B) To increase training accuracy
C) To decrease model complexity
D) To improve visualization
Answer: A

30. What does the R-squared value indicate in regression?

A) Model’s ability to classify labels


B) How well independent variables explain the variance in the dependent variable
C) The error rate in classification
D) The number of missing values in the dataset
Answer: B

You might also like