DS_MCQs
DS_MCQs
Answer: b) the transformation of various forms of information into data to be used in analysis
Answer: b) a function that describes the likelihood of different outcomes in a random variable
4. Which skill set is NOT typically considered essential for a data scientist?
a) statistical knowledge
b) coding skills
c) domain expertise
d) financial auditing
A) Data storage
B) Data collection
C) Extracting insights and knowledge from data
D) Data entry
Answer: B) The process of cleaning and transforming raw data into a usable format
8. Which programming language is widely used for statistical analysis and visualization in
Data Science?
A) Java
B) R
C) C#
D) Ruby
Answer: B) R
A) To confirm hypotheses
B) To summarize the main characteristics of a dataset
C) To create predictive models
D) To store data efficiently
10. What does data preprocessing typically involve in the context of data science?
a) Designing machine learning algorithms
b) Cleaning and transforming raw data into a usable format
c) Building predictive models
d) Analyzing data trends and patterns
Answer: b) Cleaning and transforming raw data into a usable format
Module 2:
a) Ronald Fisher
b) John Tukey
c) Karl Pearson
d) Francis Galton
3. In the Data Science Process, what is typically the step that follows data cleaning?
a) Model deployment
b) Model evaluation
c) Exploratory Data Analysis (EDA)
d) Data collection
a) Box plots
b) Linear regression
c) Histograms
d) Summary statistics
A) Linear Regression
B) K-Means Clustering
C) Decision Trees
D) Principal Component Analysis
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Semi-supervised learning
A) The intercept
B) The relationship between the dependent and independent variable
C) The error term
D) The correlation coefficient
9. Which of the following is NOT a key objective of Exploratory Data Analysis (EDA)?
a) Identifying patterns or anomalies in the data
b) Testing the performance of predictive models
c) Checking assumptions required for statistical modeling
d) Summarizing data distributions
Answer: b) Testing the performance of predictive models
1. Why are algorithms like Linear Regression and k-Nearest Neighbors (k-NN) considered
poor choices for filtering spam?
a) They require too much labeled data for effective spam filtering
b) They are computationally too complex for real-time spam filtering
c) They do not handle text data and high-dimensional features well
d) They are unsupervised algorithms, which makes them unsuitable for spam filtering
Answer: c) They do not handle text data and high-dimensional features well
2. Which of the following is the primary reason Naïve Bayes works well for spam filtering?
Answer: b) It assumes feature independence, making it effective even with limited data
4. Which of the following tools would be useful for web scraping to gather data from
websites?
a) SQL
b) BeautifulSoup
c) TensorFlow
d) PyTorch
Answer: b) BeautifulSoup
6. Which type of machine learning algorithm is commonly used for spam filtering?
A) Linear Regression
B) Decision Trees
C) Naive Bayes
D) K-Means Clustering
7. In spam filtering, what does the term "false positive" refer to?
10. What is one advantage of using Support Vector Machines (SVM) for spam filtering over
k-NN?
a) SVMs are unsupervised, making them easier to implement
b) SVMs are computationally simpler than k-NN for large datasets
c) SVMs can better handle high-dimensional feature spaces
d) SVMs require no preprocessing of data
Answer: c) SVMs can better handle high-dimensional feature spaces
11. Which of the following libraries is commonly used for handling large amounts of data
in Python?
a) BeautifulSoup
b) NumPy
c) TensorFlow
d) scikit-learn
Answer: b) NumPy
12. In the context of APIs, what does the term “endpoint” refer to?
a) The location of the database server
b) A specific URL used to access a function or data
c) A graphical interface for interacting with the API
d) A library used for authenticating users
Answer: b) A specific URL used to access a function or data
Module 4:
1. In the context of feature generation, which of the following is essential for creating
meaningful features?
a) Mutual Information
b) Recursive Feature Elimination (RFE)
c) Principal Component Analysis (PCA)
d) Chi-square test
4. Which of the following describes the primary function of filters in feature selection?
A) Cross-validation
B) Recursive Feature Elimination (RFE)
C) Principal Component Analysis (PCA)
D) All of the above
10. Which of the following is a dimensionality reduction technique that preserves variance
in data?
a) Recursive Feature Elimination (RFE)
b) Principal Component Analysis (PCA)
c) Mutual Information
d) Chi-square test
Answer: b) Principal Component Analysis (PCA)
Module 5:
a) A subset of nodes that interact with nodes outside the subset more than with each other
b) A subset of nodes with denser connections among themselves than with the rest of the graph
c) A group of disconnected nodes
d) The smallest possible group of nodes in a graph
Answer: b) A subset of nodes with denser connections among themselves than with the rest of
the graph
7. Which of the following is a common metric used to analyze social network graphs?
A) PageRank
B) Linear regression
C) K-Means clustering
D) Random forest
Answer: A) PageRank
8. What does the term "degree centrality" refer to in social network analysis?
A) The number of connections a node has
B) The average distance from a node to all other nodes
C) The measure of how well-connected a network is
D) The importance of a node based on its connections
10. Which algorithm is commonly used for detecting communities in a social network?
a) PageRank
b) K-means
c) Louvain method
d) Apriori algorithm
Answer: c) Louvain method