100 MCQs on Data Science and Machine Learning [ BCA 6004]
1. What is the primary goal of Data Science?
a) Data Storage
b) Data Cleaning
c) Extracting knowledge from data
d) Web Designing
Answer: c) Extracting knowledge from data
2. Which of the following is a type of structured data?
a) Audio files
b) Excel spreadsheets
c) Images
d) Video streams
Answer: b) Excel spreadsheets
3. What does CSV stand for?
a) Column Separated Values
b) Comma Separated Values
c) Character Sorted Values
d) Common Standard Values
Answer: b) Comma Separated Values
4. Which Python library is primarily used for data manipulation?
a) Matplotlib
b) Seaborn
c) Pandas
d) NumPy
Answer: c) Pandas
5. What function in Pandas is used to check for missing values?
a) isna()
b) dropna()
c) fillna()
d) checkna()
Answer: a) isna()
6. Which type of plot best visualizes the distribution of a single numeric
variable?
a) Line plot
b) Histogram
c) Bar chart
d) Scatter plot
Answer: b) Histogram
7. Which function is used in Python to import external libraries?
a) include
b) load
c) use
d) import
Answer: d) import
8. Which data type does not support mathematical operations directly in
Python?
a) int
b) float
c) string
d) bool
Answer: c) string
9. Which one is a real-world application of Data Science?
a) Image recognition
b) Web development
c) Desktop publishing
d) Hardware repair
Answer: a) Image recognition
10. Which method in Pandas is used to remove duplicate rows?
a) dropna()
b) unique()
c) drop_duplicates()
d) remove()
Answer: c) drop_duplicates()
MCQs 11–20: Machine Learning Basics
11. Which of the following is a supervised learning algorithm?
a) K-Means
b) Linear Regression
c) DBSCAN
d) Apriori
Answer: b) Linear Regression
12. Which ML task predicts continuous numeric values?
a) Classification
b) Clustering
c) Regression
d) Dimensionality Reduction
Answer: c) Regression
13. Which algorithm works on the concept of proximity or similarity?
a) Naive Bayes
b) Linear Regression
c) KNN
d) Decision Tree
Answer: c) KNN
14. What type of ML is used when no labeled data is available?
a) Supervised Learning
b) Reinforcement Learning
c) Unsupervised Learning
d) Transfer Learning
Answer: c) Unsupervised Learning
15. Which ML algorithm is best for binary classification problems?
a) Logistic Regression
b) K-Means
c) PCA
d) Apriori
Answer: a) Logistic Regression
16. Which of the following is not a supervised learning algorithm?
a) SVM
b) KNN
c) K-Means
d) Decision Tree
Answer: c) K-Means
17. Which metric is used to evaluate a classification model?
a) RMSE
b) Accuracy
c) MAE
d) MSE
Answer: b) Accuracy
18. Which evaluation metric is best for regression models?
a) Confusion Matrix
b) Precision
c) Mean Squared Error
d) F1-Score
Answer: c) Mean Squared Error
19. What is the purpose of a confusion matrix?
a) Optimize weights
b) Visualize data
c) Evaluate classification performance
d) Reduce dimensionality
Answer: c) Evaluate classification performance
20. Which method reduces model overfitting in Decision Trees?
a) Clustering
b) Pruning
c) Feature scaling
d) Stacking
Answer: b) Pruning
MCQs 21–30: Advanced Concepts & Algorithms
21. Which algorithm is used for association rule learning?
a) Apriori
b) SVM
c) Linear Regression
d) KNN
Answer: a) Apriori
22. What is the role of the activation function in neural networks?
a) Reduce overfitting
b) Normalize weights
c) Introduce non-linearity
d) Store previous values
Answer: c) Introduce non-linearity
23. Which of the following is an ensemble technique?
a) PCA
b) Bagging
c) K-Means
d) Normalization
Answer: b) Bagging
24. Which of the following is NOT a boosting method?
a) AdaBoost
b) Gradient Boosting
c) XGBoost
d) Random Forest
Answer: d) Random Forest
25. What is the first step in any data science project?
a) Model Training
b) Data Cleaning
c) Problem Definition
d) Hyperparameter Tuning
Answer: c) Problem Definition
26. Which method is used to handle categorical features in machine learning?
a) Binarization
b) Normalization
c) Label Encoding
d) Standardization
Answer: c) Label Encoding
27. Which regularization technique adds absolute value of coefficients to the loss
function?
a) L2
b) Ridge
c) L1
d) ElasticNet
Answer: c) L1
28. Which of the following is used for feature selection?
a) Chi-square test
b) Linear Regression
c) SMOTE
d) Backpropagation
Answer: a) Chi-square test
29. Which of the following is used to improve class imbalance in data?
a) Cross-validation
b) SMOTE
c) PCA
d) Label encoding
Answer: b) SMOTE
30. Which model is best suited for time-series forecasting?
a) KNN
b) Naive Bayes
c) ARIMA
d) Random Forest
Answer: c) ARIMA
MCQs 31–40: Real-World Tools and Concepts
31. Which Python library is most widely used for machine learning models?
a) Matplotlib
b) TensorFlow
c) Seaborn
d) Scikit-learn
Answer: d) Scikit-learn
32. Which of the following is an unsupervised learning technique?
a) Naive Bayes
b) Random Forest
c) K-Means Clustering
d) Logistic Regression
Answer: c) K-Means Clustering
33. Which cross-validation method is widely used for time series data?
a) K-Fold
b) Shuffle Split
c) Leave-One-Out
d) TimeSeriesSplit
Answer: d) TimeSeriesSplit
34. Which command is used to install Python packages?
a) get
b) install
c) pip install
d) py install
Answer: c) pip install
35. Which loss function is commonly used in regression problems?
a) Cross Entropy
b) Hinge Loss
c) Mean Squared Error
d) Log Loss
Answer: c) Mean Squared Error
36. Which technique combines multiple models to improve performance?
a) Regularization
b) Pruning
c) Ensembling
d) Normalization
Answer: c) Ensembling
37. Which concept describes learning from interaction with the environment?
a) Supervised Learning
b) Reinforcement Learning
c) Unsupervised Learning
d) Weak Supervision
Answer: b) Reinforcement Learning
38. Which of the following is a measure of model interpretability?
a) Confusion Matrix
b) SHAP Values
c) RMSE
d) ROC Curve
Answer: b) SHAP Values
39. Which method can detect outliers in data?
a) KNN
b) Boxplot
c) PCA
d) One-hot encoding
Answer: b) Boxplot
40. Which of the following metrics is suitable for multi-class classification?
a) MAE
b) Accuracy
c) MSE
d) R-Squared
Answer: b) Accuracy
41. Which type of ML algorithm predicts continuous values?
a) Classification
b) Regression
c) Clustering
d) Association
Answer: b
42. Which Python library is widely used for numerical computations?
a) Matplotlib
b) NumPy
c) Seaborn
d) Requests
Answer: b
43. What does EDA stand for in Data Science?
a) Easy Data Analysis
b) Exploratory Data Analysis
c) External Data Automation
d) Extended Data Arrangement
Answer: b
44. Which ML concept divides the dataset into training and test data?
a) Data Cleaning
b) Data Splitting
c) Data Mining
d) Data Integration
Answer: b
45. Which method reduces the number of input variables in a dataset?
a) Scaling
b) Normalization
c) Dimensionality Reduction
d) Cross-validation
Answer: c
46. What is the full form of API?
a) Advanced Programming Interface
b) Application Programming Interface
c) Automated Program Integration
d) Application Protocol Interface
Answer: b
47. What is 'Label Encoding' used for?
a) Scaling continuous variables
b) Encoding categorical variables into numeric form
c) Handling missing values
d) Data normalization
Answer: b
48. Which of the following is a classification algorithm?
a) Random Forest
b) K-Means
c) PCA
d) DBSCAN
Answer: a
49. Which of the following is a benefit of Data Visualization?
a) Increase data redundancy
b) Detect data trends and patterns
c) Hide anomalies
d) Compress the dataset
Answer: b
50. Which of these is an activation function in neural networks?
a) Sigmoid
b) Gradient Boosting
c) KNN
d) Clustering
Answer: a
51. In ML, what is a 'hyperparameter'?
a) A parameter learned during training
b) A parameter set before the training begins
c) A test dataset
d) None
Answer: b
52. Which of these is used for web scraping in Python?
a) Matplotlib
b) BeautifulSoup
c) NumPy
d) Keras
Answer: b
53. What is a 'Confusion Matrix' used for?
a) Model visualization
b) Evaluate classification model performance
c) Regression accuracy
d) Data clustering
Answer: b
54. Which distance metric is used in K-Means by default?
a) Cosine
b) Euclidean
c) Manhattan
d) Hamming
Answer: b
55. Which tool is commonly used for big data processing?
a) MS Excel
b) Apache Hadoop
c) Notepad
d) Paint
Answer: b
56. What is the goal of clustering algorithms?
a) Predict continuous values
b) Group similar data points
c) Remove outliers
d) Calculate accuracy
Answer: b
57. What is the process of normalizing data to a common scale?
a) Data Imputation
b) Data Scaling
c) Data Mining
d) Data Collection
Answer: b
58. Which data type is ideal for bar charts?
a) Continuous numerical
b) Categorical
c) Text
d) Boolean
Answer: b
59. What is the range of a probability value?
a) 0 to 10
b) 0 to 1
c) 0 to 100
d) -1 to 1
Answer: b
60. Which ML algorithm uses a sigmoid function?
a) Linear Regression
b) Logistic Regression
c) Decision Tree
d) K-Means
Answer: b
61. Which of these is a data type in Python?
a) integer
b) matrix
c) pointer
d) character array
Answer: a
62. Which algorithm is not suitable for regression problems?
a) Decision Tree
b) Linear Regression
c) Logistic Regression
d) Ridge Regression
Answer: c
63. What does 'outlier' mean in a dataset?
a) Normal data value
b) Value far from other observations
c) Missing value
d) Average value
Answer: b
64. What is TensorFlow mainly used for?
a) Web hosting
b) Deep Learning applications
c) Text editing
d) Video compression
Answer: b
65. What is one-hot encoding used for?
a) Converting numerical to categorical
b) Converting categorical variables into binary vectors
c) Removing null values
d) Image processing
Answer: b
66. What does 'scaling' do to data?
a) Adds new features
b) Adjusts values to a common scale
c) Removes outliers
d) Deletes null values
Answer: b
67. Which is a deep learning library?
a) Scikit-learn
b) Keras
c) Pandas
d) Seaborn
Answer: b
68. Which ML technique uses feedback to learn?
a) Supervised Learning
b) Unsupervised Learning
c) Reinforcement Learning
d) None
Answer: c
69. Which function is used to read a CSV file in Pandas?
a) read()
b) read_csv()
c) csv_read()
d) load_csv()
Answer: b
70. Which of these is a common data visualization chart?
a) Pie chart
b) Dendrogram
c) Heatmap
d) All of the above
Answer: d
71. What is the goal of feature selection?
a) Increase number of features
b) Choose important variables
c) Create new variables
d) Remove dataset
Answer: b
72. Which of these is an ensemble learning technique?
a) Decision Tree
b) Random Forest
c) K-Means
d) Linear Regression
Answer: b
73. Which of the following is NOT a supervised learning algorithm?
a) Naive Bayes
b) Linear Regression
c) K-Means
d) Decision Tree
Answer: c
74. What is the main advantage of Random Forest over Decision Tree?
a) Faster
b) Higher accuracy
c) More randomness
d) No need of training
Answer: b
75. What does 'F1 Score' measure?
a) Average accuracy
b) Balance between precision and recall
c) Total errors
d) Training time
Answer: b
76. Which Python function gives basic details of a DataFrame?
a) head()
b) describe()
c) tail()
d) shape()
Answer: b
77. Which library is popular for interactive dashboards in Python?
a) Streamlit
b) NumPy
c) Pandas
d) Matplotlib
Answer: a
78. Which term describes reducing variance by training on random data subsets?
a) Bagging
b) Boosting
c) Stacking
d) Pruning
Answer: a
79. What is the typical structure of a neural network?
a) Rows
b) Layers
c) Tables
d) Graphs
Answer: b
80. What is 'data leakage' in ML?
a) Data stored in wrong format
b) Using test data information in training
c) Loss of data
d) Corrupted dataset
Answer: b
81. Which cloud platform provides ML services?
a) AWS
b) Photoshop
c) VLC
d) Adobe XD
Answer: a
82. What is 'epoch' in neural network training?
a) One forward pass
b) One complete cycle through training dataset
c) One test run
d) Random split
Answer: b
83. Which term refers to synthetic data generation for balancing classes?
a) SMOTE
b) PCA
c) Binning
d) Label Encoding
Answer: a
84. What does 'bias' refer to in ML models?
a) Error due to overly simplistic assumptions
b) Overfitting issue
c) Data normalization
d) Training time
Answer: a
85. What does 'variance' in ML indicate?
a) Error due to model's sensitivity to fluctuations in training data
b) Model's inability to learn
c) Corrupt dataset
d) Dataset size
Answer: a
86. Which of these is a categorical variable?
a) Age
b) Salary
c) Gender
d) Temperature
Answer: c
87. Which type of ML technique is used to predict yes/no outcomes?
a) Regression
b) Classification
c) Clustering
d) Association
Answer: b
88. Which Python IDE is most popular for data science?
a) PyCharm
b) Notepad++
c) Paint
d) Eclipse
Answer: a
89. Which file format supports structured tabular data?
a) .csv
b) .jpg
c) .mp3
d) .doc
Answer: a
90. What is 'scikit-learn' mainly used for?
a) Machine Learning
b) Image Editing
c) Game Design
d) Video Editing
Answer: a
91. What is K in KNN?
a) Number of classes
b) Number of nearest neighbors
c) Number of layers
d) Number of epochs
Answer: b
92. What is the purpose of data wrangling?
a) Visualizing data
b) Preparing raw data for analysis
c) Deleting data
d) Compressing files
Answer: b
93. Which file type is typically used to store models?
a) .csv
b) .pkl
c) .mp3
d) .txt
Answer: b
94. Which of these is a neural network optimization algorithm?
a) Adam
b) NumPy
c) Seaborn
d) KNN
Answer: a
95. What is data augmentation used for?
a) Increase dataset size artificially
b) Remove null values
c) Data encryption
d) Text compression
Answer: a
96. What type of chart is best for trend analysis over time?
a) Line chart
b) Bar chart
c) Pie chart
d) Heatmap
Answer: a
97. What is the purpose of a validation set?
a) Final model evaluation
b) Hyperparameter tuning
c) Data visualization
d) Feature scaling
Answer: b
98. Which of these is a supervised learning task?
a) Clustering
b) Regression
c) Association
d) Dimensionality reduction
Answer: b
99. What is 'cross-validation' used for?
a) Test model on unseen data multiple times
b) Increase data size
c) Data encryption
d) Generate synthetic data
Answer: a
100. What is the purpose of boosting techniques?
a) Reduce model bias and improve accuracy
b) Increase model size
c) Add random noise
d) Encrypt data
Answer: a