100% found this document useful (1 vote)
844 views16 pages

Data Science 100 MCQs

The document contains 100 multiple-choice questions (MCQs) focused on Data Science and Machine Learning, covering topics such as data manipulation, machine learning algorithms, evaluation metrics, and real-world applications. Each question is followed by four answer options, with the correct answer provided. The questions are categorized into sections that include basic concepts, advanced algorithms, and practical tools.

Uploaded by

askeladd915
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
844 views16 pages

Data Science 100 MCQs

The document contains 100 multiple-choice questions (MCQs) focused on Data Science and Machine Learning, covering topics such as data manipulation, machine learning algorithms, evaluation metrics, and real-world applications. Each question is followed by four answer options, with the correct answer provided. The questions are categorized into sections that include basic concepts, advanced algorithms, and practical tools.

Uploaded by

askeladd915
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

100 MCQs on Data Science and Machine Learning [ BCA 6004]

1. What is the primary goal of Data Science?


a) Data Storage
b) Data Cleaning
c) Extracting knowledge from data
d) Web Designing
Answer: c) Extracting knowledge from data

2. Which of the following is a type of structured data?


a) Audio files
b) Excel spreadsheets
c) Images
d) Video streams
Answer: b) Excel spreadsheets

3. What does CSV stand for?


a) Column Separated Values
b) Comma Separated Values
c) Character Sorted Values
d) Common Standard Values
Answer: b) Comma Separated Values

4. Which Python library is primarily used for data manipulation?


a) Matplotlib
b) Seaborn
c) Pandas
d) NumPy
Answer: c) Pandas

5. What function in Pandas is used to check for missing values?


a) isna()
b) dropna()
c) fillna()
d) checkna()
Answer: a) isna()

6. Which type of plot best visualizes the distribution of a single numeric


variable?
a) Line plot
b) Histogram
c) Bar chart
d) Scatter plot
Answer: b) Histogram
7. Which function is used in Python to import external libraries?
a) include
b) load
c) use
d) import
Answer: d) import

8. Which data type does not support mathematical operations directly in


Python?
a) int
b) float
c) string
d) bool
Answer: c) string

9. Which one is a real-world application of Data Science?


a) Image recognition
b) Web development
c) Desktop publishing
d) Hardware repair
Answer: a) Image recognition

10. Which method in Pandas is used to remove duplicate rows?


a) dropna()
b) unique()
c) drop_duplicates()
d) remove()
Answer: c) drop_duplicates()

MCQs 11–20: Machine Learning Basics

11. Which of the following is a supervised learning algorithm?


a) K-Means
b) Linear Regression
c) DBSCAN
d) Apriori
Answer: b) Linear Regression

12. Which ML task predicts continuous numeric values?


a) Classification
b) Clustering
c) Regression
d) Dimensionality Reduction
Answer: c) Regression
13. Which algorithm works on the concept of proximity or similarity?
a) Naive Bayes
b) Linear Regression
c) KNN
d) Decision Tree
Answer: c) KNN

14. What type of ML is used when no labeled data is available?


a) Supervised Learning
b) Reinforcement Learning
c) Unsupervised Learning
d) Transfer Learning
Answer: c) Unsupervised Learning

15. Which ML algorithm is best for binary classification problems?


a) Logistic Regression
b) K-Means
c) PCA
d) Apriori
Answer: a) Logistic Regression

16. Which of the following is not a supervised learning algorithm?


a) SVM
b) KNN
c) K-Means
d) Decision Tree
Answer: c) K-Means

17. Which metric is used to evaluate a classification model?


a) RMSE
b) Accuracy
c) MAE
d) MSE
Answer: b) Accuracy

18. Which evaluation metric is best for regression models?


a) Confusion Matrix
b) Precision
c) Mean Squared Error
d) F1-Score
Answer: c) Mean Squared Error

19. What is the purpose of a confusion matrix?


a) Optimize weights
b) Visualize data
c) Evaluate classification performance
d) Reduce dimensionality
Answer: c) Evaluate classification performance

20. Which method reduces model overfitting in Decision Trees?


a) Clustering
b) Pruning
c) Feature scaling
d) Stacking
Answer: b) Pruning

MCQs 21–30: Advanced Concepts & Algorithms

21. Which algorithm is used for association rule learning?


a) Apriori
b) SVM
c) Linear Regression
d) KNN
Answer: a) Apriori

22. What is the role of the activation function in neural networks?


a) Reduce overfitting
b) Normalize weights
c) Introduce non-linearity
d) Store previous values
Answer: c) Introduce non-linearity

23. Which of the following is an ensemble technique?


a) PCA
b) Bagging
c) K-Means
d) Normalization
Answer: b) Bagging

24. Which of the following is NOT a boosting method?


a) AdaBoost
b) Gradient Boosting
c) XGBoost
d) Random Forest
Answer: d) Random Forest

25. What is the first step in any data science project?


a) Model Training
b) Data Cleaning
c) Problem Definition
d) Hyperparameter Tuning
Answer: c) Problem Definition

26. Which method is used to handle categorical features in machine learning?


a) Binarization
b) Normalization
c) Label Encoding
d) Standardization
Answer: c) Label Encoding

27. Which regularization technique adds absolute value of coefficients to the loss
function?
a) L2
b) Ridge
c) L1
d) ElasticNet
Answer: c) L1

28. Which of the following is used for feature selection?


a) Chi-square test
b) Linear Regression
c) SMOTE
d) Backpropagation
Answer: a) Chi-square test

29. Which of the following is used to improve class imbalance in data?


a) Cross-validation
b) SMOTE
c) PCA
d) Label encoding
Answer: b) SMOTE

30. Which model is best suited for time-series forecasting?


a) KNN
b) Naive Bayes
c) ARIMA
d) Random Forest
Answer: c) ARIMA

MCQs 31–40: Real-World Tools and Concepts

31. Which Python library is most widely used for machine learning models?
a) Matplotlib
b) TensorFlow
c) Seaborn
d) Scikit-learn
Answer: d) Scikit-learn

32. Which of the following is an unsupervised learning technique?


a) Naive Bayes
b) Random Forest
c) K-Means Clustering
d) Logistic Regression
Answer: c) K-Means Clustering

33. Which cross-validation method is widely used for time series data?
a) K-Fold
b) Shuffle Split
c) Leave-One-Out
d) TimeSeriesSplit
Answer: d) TimeSeriesSplit

34. Which command is used to install Python packages?


a) get
b) install
c) pip install
d) py install
Answer: c) pip install

35. Which loss function is commonly used in regression problems?


a) Cross Entropy
b) Hinge Loss
c) Mean Squared Error
d) Log Loss
Answer: c) Mean Squared Error

36. Which technique combines multiple models to improve performance?


a) Regularization
b) Pruning
c) Ensembling
d) Normalization
Answer: c) Ensembling

37. Which concept describes learning from interaction with the environment?
a) Supervised Learning
b) Reinforcement Learning
c) Unsupervised Learning
d) Weak Supervision
Answer: b) Reinforcement Learning

38. Which of the following is a measure of model interpretability?


a) Confusion Matrix
b) SHAP Values
c) RMSE
d) ROC Curve
Answer: b) SHAP Values

39. Which method can detect outliers in data?


a) KNN
b) Boxplot
c) PCA
d) One-hot encoding
Answer: b) Boxplot

40. Which of the following metrics is suitable for multi-class classification?


a) MAE
b) Accuracy
c) MSE
d) R-Squared
Answer: b) Accuracy

41. Which type of ML algorithm predicts continuous values?


a) Classification
b) Regression
c) Clustering
d) Association
Answer: b

42. Which Python library is widely used for numerical computations?


a) Matplotlib
b) NumPy
c) Seaborn
d) Requests
Answer: b

43. What does EDA stand for in Data Science?


a) Easy Data Analysis
b) Exploratory Data Analysis
c) External Data Automation
d) Extended Data Arrangement
Answer: b
44. Which ML concept divides the dataset into training and test data?
a) Data Cleaning
b) Data Splitting
c) Data Mining
d) Data Integration
Answer: b

45. Which method reduces the number of input variables in a dataset?


a) Scaling
b) Normalization
c) Dimensionality Reduction
d) Cross-validation
Answer: c

46. What is the full form of API?


a) Advanced Programming Interface
b) Application Programming Interface
c) Automated Program Integration
d) Application Protocol Interface
Answer: b

47. What is 'Label Encoding' used for?


a) Scaling continuous variables
b) Encoding categorical variables into numeric form
c) Handling missing values
d) Data normalization
Answer: b

48. Which of the following is a classification algorithm?


a) Random Forest
b) K-Means
c) PCA
d) DBSCAN
Answer: a

49. Which of the following is a benefit of Data Visualization?


a) Increase data redundancy
b) Detect data trends and patterns
c) Hide anomalies
d) Compress the dataset
Answer: b

50. Which of these is an activation function in neural networks?


a) Sigmoid
b) Gradient Boosting
c) KNN
d) Clustering
Answer: a

51. In ML, what is a 'hyperparameter'?


a) A parameter learned during training
b) A parameter set before the training begins
c) A test dataset
d) None
Answer: b

52. Which of these is used for web scraping in Python?


a) Matplotlib
b) BeautifulSoup
c) NumPy
d) Keras
Answer: b

53. What is a 'Confusion Matrix' used for?


a) Model visualization
b) Evaluate classification model performance
c) Regression accuracy
d) Data clustering
Answer: b

54. Which distance metric is used in K-Means by default?


a) Cosine
b) Euclidean
c) Manhattan
d) Hamming
Answer: b

55. Which tool is commonly used for big data processing?


a) MS Excel
b) Apache Hadoop
c) Notepad
d) Paint
Answer: b

56. What is the goal of clustering algorithms?


a) Predict continuous values
b) Group similar data points
c) Remove outliers
d) Calculate accuracy
Answer: b
57. What is the process of normalizing data to a common scale?
a) Data Imputation
b) Data Scaling
c) Data Mining
d) Data Collection
Answer: b

58. Which data type is ideal for bar charts?


a) Continuous numerical
b) Categorical
c) Text
d) Boolean
Answer: b

59. What is the range of a probability value?


a) 0 to 10
b) 0 to 1
c) 0 to 100
d) -1 to 1
Answer: b

60. Which ML algorithm uses a sigmoid function?


a) Linear Regression
b) Logistic Regression
c) Decision Tree
d) K-Means
Answer: b

61. Which of these is a data type in Python?


a) integer
b) matrix
c) pointer
d) character array
Answer: a

62. Which algorithm is not suitable for regression problems?


a) Decision Tree
b) Linear Regression
c) Logistic Regression
d) Ridge Regression
Answer: c

63. What does 'outlier' mean in a dataset?


a) Normal data value
b) Value far from other observations
c) Missing value
d) Average value
Answer: b

64. What is TensorFlow mainly used for?


a) Web hosting
b) Deep Learning applications
c) Text editing
d) Video compression
Answer: b

65. What is one-hot encoding used for?


a) Converting numerical to categorical
b) Converting categorical variables into binary vectors
c) Removing null values
d) Image processing
Answer: b

66. What does 'scaling' do to data?


a) Adds new features
b) Adjusts values to a common scale
c) Removes outliers
d) Deletes null values
Answer: b

67. Which is a deep learning library?


a) Scikit-learn
b) Keras
c) Pandas
d) Seaborn
Answer: b

68. Which ML technique uses feedback to learn?


a) Supervised Learning
b) Unsupervised Learning
c) Reinforcement Learning
d) None
Answer: c

69. Which function is used to read a CSV file in Pandas?


a) read()
b) read_csv()
c) csv_read()
d) load_csv()
Answer: b
70. Which of these is a common data visualization chart?
a) Pie chart
b) Dendrogram
c) Heatmap
d) All of the above
Answer: d

71. What is the goal of feature selection?


a) Increase number of features
b) Choose important variables
c) Create new variables
d) Remove dataset
Answer: b

72. Which of these is an ensemble learning technique?


a) Decision Tree
b) Random Forest
c) K-Means
d) Linear Regression
Answer: b

73. Which of the following is NOT a supervised learning algorithm?


a) Naive Bayes
b) Linear Regression
c) K-Means
d) Decision Tree
Answer: c

74. What is the main advantage of Random Forest over Decision Tree?
a) Faster
b) Higher accuracy
c) More randomness
d) No need of training
Answer: b

75. What does 'F1 Score' measure?


a) Average accuracy
b) Balance between precision and recall
c) Total errors
d) Training time
Answer: b

76. Which Python function gives basic details of a DataFrame?


a) head()
b) describe()
c) tail()
d) shape()
Answer: b

77. Which library is popular for interactive dashboards in Python?


a) Streamlit
b) NumPy
c) Pandas
d) Matplotlib
Answer: a

78. Which term describes reducing variance by training on random data subsets?
a) Bagging
b) Boosting
c) Stacking
d) Pruning
Answer: a

79. What is the typical structure of a neural network?


a) Rows
b) Layers
c) Tables
d) Graphs
Answer: b

80. What is 'data leakage' in ML?


a) Data stored in wrong format
b) Using test data information in training
c) Loss of data
d) Corrupted dataset
Answer: b

81. Which cloud platform provides ML services?


a) AWS
b) Photoshop
c) VLC
d) Adobe XD
Answer: a

82. What is 'epoch' in neural network training?


a) One forward pass
b) One complete cycle through training dataset
c) One test run
d) Random split
Answer: b
83. Which term refers to synthetic data generation for balancing classes?
a) SMOTE
b) PCA
c) Binning
d) Label Encoding
Answer: a

84. What does 'bias' refer to in ML models?


a) Error due to overly simplistic assumptions
b) Overfitting issue
c) Data normalization
d) Training time
Answer: a

85. What does 'variance' in ML indicate?


a) Error due to model's sensitivity to fluctuations in training data
b) Model's inability to learn
c) Corrupt dataset
d) Dataset size
Answer: a

86. Which of these is a categorical variable?


a) Age
b) Salary
c) Gender
d) Temperature
Answer: c

87. Which type of ML technique is used to predict yes/no outcomes?


a) Regression
b) Classification
c) Clustering
d) Association
Answer: b

88. Which Python IDE is most popular for data science?


a) PyCharm
b) Notepad++
c) Paint
d) Eclipse
Answer: a

89. Which file format supports structured tabular data?


a) .csv
b) .jpg
c) .mp3
d) .doc
Answer: a

90. What is 'scikit-learn' mainly used for?


a) Machine Learning
b) Image Editing
c) Game Design
d) Video Editing
Answer: a

91. What is K in KNN?


a) Number of classes
b) Number of nearest neighbors
c) Number of layers
d) Number of epochs
Answer: b

92. What is the purpose of data wrangling?


a) Visualizing data
b) Preparing raw data for analysis
c) Deleting data
d) Compressing files
Answer: b

93. Which file type is typically used to store models?


a) .csv
b) .pkl
c) .mp3
d) .txt
Answer: b

94. Which of these is a neural network optimization algorithm?


a) Adam
b) NumPy
c) Seaborn
d) KNN
Answer: a

95. What is data augmentation used for?


a) Increase dataset size artificially
b) Remove null values
c) Data encryption
d) Text compression
Answer: a
96. What type of chart is best for trend analysis over time?
a) Line chart
b) Bar chart
c) Pie chart
d) Heatmap
Answer: a

97. What is the purpose of a validation set?


a) Final model evaluation
b) Hyperparameter tuning
c) Data visualization
d) Feature scaling
Answer: b

98. Which of these is a supervised learning task?


a) Clustering
b) Regression
c) Association
d) Dimensionality reduction
Answer: b

99. What is 'cross-validation' used for?


a) Test model on unseen data multiple times
b) Increase data size
c) Data encryption
d) Generate synthetic data
Answer: a

100. What is the purpose of boosting techniques?


a) Reduce model bias and improve accuracy
b) Increase model size
c) Add random noise
d) Encrypt data
Answer: a

You might also like