0% found this document useful (0 votes)
34 views

DS_MCQs

The document consists of multiple-choice questions and answers related to data science concepts, including datafication, statistical inference, exploratory data analysis, machine learning algorithms, and feature selection. It covers topics such as the importance of domain expertise, the role of algorithms in spam filtering, and the principles of data visualization. Each module provides insights into essential skills and methodologies in the field of data science.

Uploaded by

Mebanlam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

DS_MCQs

The document consists of multiple-choice questions and answers related to data science concepts, including datafication, statistical inference, exploratory data analysis, machine learning algorithms, and feature selection. It covers topics such as the importance of domain expertise, the role of algorithms in spam filtering, and the principles of data visualization. Each module provides insights into essential skills and methodologies in the field of data science.

Uploaded by

Mebanlam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Module 1:

1. Which of the following best describes the term datafication?

a) the process of analyzing data to find patterns and trends


b) the transformation of various forms of information into data to be used in analysis
c) the practice of storing data in large, distributed databases
d) the creation of algorithms for predictive modeling

Answer: b) the transformation of various forms of information into data to be used in analysis

2. In data science, statistical inference primarily helps in:

a) building predictive models


b) drawing conclusions about a population based on sample data
c) organizing and storing large datasets
d) visualizing data for easy interpretation

Answer: b) drawing conclusions about a population based on sample data

3. What is a probability distribution in the context of statistical modeling?

a) a method of organizing data in databases for faster retrieval


b) a function that describes the likelihood of different outcomes in a random variable
c) a tool for visualizing data with charts and graphs
d) a type of algorithm used in machine learning for clustering

Answer: b) a function that describes the likelihood of different outcomes in a random variable

4. Which skill set is NOT typically considered essential for a data scientist?

a) statistical knowledge
b) coding skills
c) domain expertise
d) financial auditing

Answer: d) financial auditing

5. What is the primary goal of Data Science?

A) Data storage
B) Data collection
C) Extracting insights and knowledge from data
D) Data entry

Answer: C) Extracting insights and knowledge from data


6. Which of the following describes "structured data"?

A) Data that is unorganized and free-form


B) Data that is organized in a fixed field within a record
C) Data that cannot be stored in databases
D) Data that is only textual

Answer: B) Data that is organized in a fixed field within a record

7. What does the term "data wrangling" refer to?

A) The process of collecting data from various sources


B) The process of cleaning and transforming raw data into a usable format
C) The visualization of data using graphs
D) The statistical analysis of data

Answer: B) The process of cleaning and transforming raw data into a usable format

8. Which programming language is widely used for statistical analysis and visualization in
Data Science?

A) Java
B) R
C) C#
D) Ruby

Answer: B) R

9. In data analysis, what is the purpose of exploratory data analysis (EDA)?

A) To confirm hypotheses
B) To summarize the main characteristics of a dataset
C) To create predictive models
D) To store data efficiently

10. What does data preprocessing typically involve in the context of data science?
a) Designing machine learning algorithms
b) Cleaning and transforming raw data into a usable format
c) Building predictive models
d) Analyzing data trends and patterns
Answer: b) Cleaning and transforming raw data into a usable format
Module 2:

1. Which of the following is a primary goal of Exploratory Data Analysis (EDA)?

a) To apply machine learning models to the data


b) To summarize and visualize the main characteristics of a dataset
c) To clean and preprocess data for analysis
d) To implement algorithms for predictive accuracy

Answer: b) To summarize and visualize the main characteristics of a dataset

2. Who is considered the pioneer of Exploratory Data Analysis (EDA)?

a) Ronald Fisher
b) John Tukey
c) Karl Pearson
d) Francis Galton

Answer: b) John Tukey

3. In the Data Science Process, what is typically the step that follows data cleaning?

a) Model deployment
b) Model evaluation
c) Exploratory Data Analysis (EDA)
d) Data collection

Answer: c) Exploratory Data Analysis (EDA)

4. Which of the following would not be considered a basic tool of EDA?

a) Box plots
b) Linear regression
c) Histograms
d) Summary statistics

Answer: b) Linear regression

5. Which of the following algorithms is used for classification tasks?

A) Linear Regression
B) K-Means Clustering
C) Decision Trees
D) Principal Component Analysis

Answer: C) Decision Trees

6. What type of algorithm is K-Means?

A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Semi-supervised learning

Answer: B) Unsupervised learning

7. In linear regression, what does the slope of the line represent?

A) The intercept
B) The relationship between the dependent and independent variable
C) The error term
D) The correlation coefficient

Answer: B) The relationship between the dependent and independent variable

8. Which of the following is a common evaluation metric for classification algorithms?

A) Mean Absolute Error (MAE)


B) Root Mean Square Error (RMSE)
C) Accuracy
D) R-squared

9. Which of the following is NOT a key objective of Exploratory Data Analysis (EDA)?
a) Identifying patterns or anomalies in the data
b) Testing the performance of predictive models
c) Checking assumptions required for statistical modeling
d) Summarizing data distributions
Answer: b) Testing the performance of predictive models

10. What is a typical output of Exploratory Data Analysis (EDA)?


a) A fully trained machine learning model
b) Insights about data structure, distributions, and potential relationships
c) A cleaned dataset ready for deployment
d) A final business decision based on data
Answer: b) Insights about data structure, distributions, and potential relationships
Module 3:

1. Why are algorithms like Linear Regression and k-Nearest Neighbors (k-NN) considered
poor choices for filtering spam?

a) They require too much labeled data for effective spam filtering
b) They are computationally too complex for real-time spam filtering
c) They do not handle text data and high-dimensional features well
d) They are unsupervised algorithms, which makes them unsuitable for spam filtering

Answer: c) They do not handle text data and high-dimensional features well

2. Which of the following is the primary reason Naïve Bayes works well for spam filtering?

a) It uses clustering to separate spam and non-spam emails


b) It assumes feature independence, making it effective even with limited data
c) It applies deep learning techniques to classify emails
d) It requires a very large dataset to work effectively

Answer: b) It assumes feature independence, making it effective even with limited data

3. In data wrangling, an API (Application Programming Interface) is commonly used to:

a) Train machine learning models


b) Access and retrieve data from external sources
c) Visualize data with graphical tools
d) Store data in a relational database

Answer: b) Access and retrieve data from external sources

4. Which of the following tools would be useful for web scraping to gather data from
websites?

a) SQL
b) BeautifulSoup
c) TensorFlow
d) PyTorch

Answer: b) BeautifulSoup

5. What is the primary goal of a spam filter?

A) To improve email delivery speed


B) To block unwanted emails
C) To enhance email security
D) To organize inbox messages

Answer: B) To block unwanted emails

6. Which type of machine learning algorithm is commonly used for spam filtering?

A) Linear Regression
B) Decision Trees
C) Naive Bayes
D) K-Means Clustering

Answer: C) Naive Bayes

7. In spam filtering, what does the term "false positive" refer to?

A) A legitimate email marked as spam


B) A spam email correctly identified as spam
C) A legitimate email delivered to the inbox
D) An email that is neither spam nor legitimate

Answer: A) A legitimate email marked as spam

8. Which of the following features is often used in spam detection?

A) Email sender's IP address


B) Frequency of certain words
C) Presence of links
D) All of the above

Answer: D) All of the above

9. Which of the following is a limitation of Naïve Bayes in spam filtering?


a) It struggles with large datasets
b) It fails when features are not truly independent
c) It is unable to process textual data
d) It cannot handle binary classification problems
Answer: b) It fails when features are not truly independent

10. What is one advantage of using Support Vector Machines (SVM) for spam filtering over
k-NN?
a) SVMs are unsupervised, making them easier to implement
b) SVMs are computationally simpler than k-NN for large datasets
c) SVMs can better handle high-dimensional feature spaces
d) SVMs require no preprocessing of data
Answer: c) SVMs can better handle high-dimensional feature spaces

11. Which of the following libraries is commonly used for handling large amounts of data
in Python?
a) BeautifulSoup
b) NumPy
c) TensorFlow
d) scikit-learn
Answer: b) NumPy

12. In the context of APIs, what does the term “endpoint” refer to?
a) The location of the database server
b) A specific URL used to access a function or data
c) A graphical interface for interacting with the API
d) A library used for authenticating users
Answer: b) A specific URL used to access a function or data
Module 4:

1. In the context of feature generation, which of the following is essential for creating
meaningful features?

a) Using only numerical data


b) Relying on domain expertise to guide feature creation
c) Choosing features that correlate highly with each other
d) Applying standard machine learning models without preprocessing

Answer: b) Relying on domain expertise to guide feature creation

2. Which of the following methods is a wrapper technique for feature selection?

a) Mutual Information
b) Recursive Feature Elimination (RFE)
c) Principal Component Analysis (PCA)
d) Chi-square test

Answer: b) Recursive Feature Elimination (RFE)

3. In recommendation systems, Singular Value Decomposition (SVD) is mainly used for:

a) Enhancing data visualization


b) Reducing the dimensionality of large datasets
c) Increasing the number of available features
d) Selecting important features

Answer: b) Reducing the dimensionality of large datasets

4. Which of the following describes the primary function of filters in feature selection?

a) Evaluating the usefulness of each feature based on model performance


b) Using tree-based algorithms to identify important features
c) Scoring each feature independently based on statistical properties
d) Combining features to create a new, simplified dataset

Answer: c) Scoring each feature independently based on statistical properties

5. What is feature generation?

A) The process of selecting the most important features from a dataset


B) The process of creating new features from existing data
C) The process of removing unnecessary features from a dataset
D) The process of visualizing data distributions
Answer: B) The process of creating new features from existing data

6. Which of the following is an example of feature generation?

A) Converting categorical variables into numerical values


B) Normalizing the data
C) Creating interaction terms between variables
D) All of the above

Answer: D) All of the above

7. What is the main purpose of feature selection?

A) To improve model accuracy by reducing overfitting


B) To create new features that better represent the data
C) To visualize the data
D) To increase the number of features in the model

Answer: A) To improve model accuracy by reducing overfitting

8. Which of the following techniques is commonly used for feature selection?

A) Cross-validation
B) Recursive Feature Elimination (RFE)
C) Principal Component Analysis (PCA)
D) All of the above

Answer: D) All of the above

9. What is a common characteristic of features generated through feature engineering?


a) They are always numerical
b) They capture domain-specific insights to improve model performance
c) They eliminate the need for data cleaning
d) They are automatically created without human intervention
Answer: b) They capture domain-specific insights to improve model performance

10. Which of the following is a dimensionality reduction technique that preserves variance
in data?
a) Recursive Feature Elimination (RFE)
b) Principal Component Analysis (PCA)
c) Mutual Information
d) Chi-square test
Answer: b) Principal Component Analysis (PCA)
Module 5:

1. In social network analysis, a community in a graph is best described as:

a) A subset of nodes that interact with nodes outside the subset more than with each other
b) A subset of nodes with denser connections among themselves than with the rest of the graph
c) A group of disconnected nodes
d) The smallest possible group of nodes in a graph

Answer: b) A subset of nodes with denser connections among themselves than with the rest of
the graph

2. Partitioning of graphs in social network analysis is typically used to:

a) Visualize data points in two-dimensional space


b) Divide the graph into distinct groups or communities
c) Increase the number of edges in a graph
d) Reduce the number of nodes without affecting the graph structure

Answer: b) Divide the graph into distinct groups or communities

3. Which of the following is a basic principle of data visualization?

a) Avoid using labels and legends to keep the visualization clean


b) Emphasize clarity and simplicity for better understanding
c) Use as many colors and fonts as possible to capture attention
d) Limit the visualization to only bar and line charts

Answer: b) Emphasize clarity and simplicity for better understanding

4. Which of the following best represents an ethical issue in data science?

a) Using clustering algorithms to detect communities in a social network


b) Applying data science techniques to predict customer preferences
c) Collecting user data without consent for targeted advertising
d) Using decision trees to analyze patterns in customer feedback

Answer: c) Collecting user data without consent for targeted advertising

5. What is a social network graph?


A) A visual representation of social media posts
B) A graph that represents individuals as nodes and relationships as edges
C) A chart showing social media metrics over time
D) A diagram illustrating marketing strategies

Answer: B) A graph that represents individuals as nodes and relationships as edges

7. Which of the following is a common metric used to analyze social network graphs?
A) PageRank
B) Linear regression
C) K-Means clustering
D) Random forest

Answer: A) PageRank

8. What does the term "degree centrality" refer to in social network analysis?
A) The number of connections a node has
B) The average distance from a node to all other nodes
C) The measure of how well-connected a network is
D) The importance of a node based on its connections

Answer: A) The number of connections a node has

9. Which algorithm is commonly used for community detection in social networks?


A) K-Means clustering
B) Spectral clustering
C) Breadth-first search
D) Apriori algorithm

Answer: B) Spectral clustering

10. Which algorithm is commonly used for detecting communities in a social network?
a) PageRank
b) K-means
c) Louvain method
d) Apriori algorithm
Answer: c) Louvain method

You might also like