0% found this document useful (0 votes)
67 views

TYCS Data Science Questions Bank

The document outlines questions that cover key concepts in data science across three units - including data preparation techniques, machine learning algorithms, and model evaluation metrics. Some of the topics covered are data types, data cleaning, feature engineering, supervised and unsupervised learning, linear and logistic regression, decision trees, ensemble methods, and model performance metrics like precision, recall, and F1 score.

Uploaded by

Gaurav bansode
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

TYCS Data Science Questions Bank

The document outlines questions that cover key concepts in data science across three units - including data preparation techniques, machine learning algorithms, and model evaluation metrics. Some of the topics covered are data types, data cleaning, feature engineering, supervised and unsupervised learning, linear and logistic regression, decision trees, ensemble methods, and model performance metrics like precision, recall, and F1 score.

Uploaded by

Gaurav bansode
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

TY BSc CS - Data Science Question Bank

UNIT 1

1. Applications and domains of data science.


2. Difference between data Science and business intelligence.
3. Difference between data Science and artificial intelligence.
4. Difference between data Science and machine learning.
5. Difference between data warehousing and data mining OR what is data warehousing
and data mining?
6. Difference between Structured and unstructured data.
7. write a Short note on different data Sources.
8. Explain difference between Structured, unstructured and Semi-Structured data.
9. Explain the various strategies to handle missing values.
10. Explain the various strategies to detect outliers and treat them.
11. Explain what is data cleaning and its techniques. (handling missing values , handling
duplicates , handling outliers).
12. Explain various data transformation techniques. (scaling and normalization).
13. Explain various techniques to handle categorical variables/data.
14. What are features? Explain the relevant ways to select features.
15. Explain Joins in SQL.
16. Explain what is data wrangling & Types of Data wrangling Techniques.
17. Explain what is feature Engineering And How to handle it?
18. Explain what is dummification. HOWow to Create dummies in categorical variables.
19. Explain what is standardization, normalization in feature Scaling.
20. Explain the various tools and libraries used in data science.

3 marks questions -
1) What is data?
2) What is data Science ?
3) Explain handling missing values.
4) Explain handling duplicates.
5) Explain handling outliers.
6) Explain min- max scaler.
7) Explain Standardization.
8) Explain One - hot encoder.
9) Explain Label encoding.
10)Explain Structured , unstructured and semi- structured data.
UNIT 2

1. Explain the various data visualization Techniques, with Examples. (histogram, bar
chart, line chart, scatter plot, box plot)
2. Explain what is hypothesis testing and its types.
3. Explain the classification of machine learning. (supervised, unsupervised and
reinforcement)
4. Explain what supervised learning is. (regression and classification)
5. Explain the difference between Classification and regression.
6. Explain what is clustering with an example.
7. Explain Bias variance tradeoff.
8. Explain the difference between overfitting and underfitting. OR Explain what is
underfitting and overfitting.
9. Explain linear regression in detail.
10. Explain logistic regression in detail. (non linear)
11. Explain with an example Confusion matrix, Precision, recall and F1 - Score OR
explain what is precision , F1 score, recall and accuracy.
12. Explain what is cross-validation. (K - Fold, stratified).
13. Explain the working of Decision trees.
14. Explain what a random forest classifier is.
15. Explain SVM algorithm.
16. Explain the architecture of ANN.
17. Explain what ensemble learning is.
18. Explain the difference between bagging and boosting techniques.
19. Explain the working of the K-NN algorithm.
20. explain how gradient descent is used for optimization.

3 marks questions -
1) What is dimensionality reduction?
2) What is bias?
3) What is variance?
4) What is overfitting?
5) What is underfitting?
6) What is Mean , median, mode , and Standard deviation?
7) What is hyperparameter tuning?
8) What is ANOVA?
UNIT 3

1. Explain the matrix to evaluate the performance of classification. (confusion matrix ,


precision , F1 score, recall and accuracy).
2. Explain the working of weighted balanced accuracy.
3. Explain what is F Beta score.
4. What are the principles of effective data visualization?
5. Explain the types of visualization with examples. (bar chart, scatter plot , box plot,
line chart, heat map).
6. What are the visualization tools used for analysis? (matplotlib, seaborn, tableau,
powerBI).
7. Explain what is storytelling in analysis and how to communicate insights through
visualization.
8. Explain some of the data management activities.
9. Short note on ETL.
10. Explain why data governance and data quality are important . OR difference
between data governance and data quality.
11. What is data privacy and how to manage it.
12. Explain the types of data security considerations.

3 marks questions -
1. Explain the following concepts (any 3) -
(confusion matrix, ROC AUC curve, precision, F1 score, recall and accuracy).

You might also like