0% found this document useful (0 votes)
29 views

Data Science QnA

The document provides a quiz on key concepts in data science and artificial intelligence. It includes multiple choice and true/false questions testing understanding of topics like data types, data analysis tools like NumPy and Pandas, and basic statistical measures. It also includes calculations and short answer questions to further assess comprehension.

Uploaded by

Roboticol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Science QnA

The document provides a quiz on key concepts in data science and artificial intelligence. It includes multiple choice and true/false questions testing understanding of topics like data types, data analysis tools like NumPy and Pandas, and basic statistical measures. It also includes calculations and short answer questions to further assess comprehension.

Uploaded by

Roboticol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Science

Read the following statements and select the correct option

1. this is the fuel in Ai that helps in extracting useful and meaningful


information
a) Algorithm b) data c) operating system
2. it is a field focus on extracting knowledge and insight from Day by
using scientific method
a) data science b) machine learning
c) artificial intelligence
3. the three broad domain in AI are:
a) data science, NLP & CV
b) Algorithm, data & coding
c) CV, NLP & algorithm
4. which of the following is one of the key data science skills?
a) Statistic
b) machine learning
c) data visualisation
5. the computer, which is powered by AI, can collect, absorb and
process data much quicker than humans.
a) expert system
b) Super computer
c) both of them
6. one of the following is an example of expert system.
a) IBM watson
b) Siri
c) Alexa
7. A type of data answers key questions such as how many, how
much and how often
a) Numerical data
b) quantitative data
c) any of them
8. a data type that cannot be measured or Expressed as a number
a) Qualitative data
b) categorical data
c) any of them
9. it is first hand data, collected by the researcher for its work
a) primary data
b) secondary data
c) personal data
10. it is the second hand data which is readily available from the
other source.
a) Outsourced data
b) second party data
c) secondary data
11. „colour of the sea‟ is an example of this type of data.
a) Quantitative data
b) qualitative data
c) none of them
12. „ your shoes size‟ is an example of this type of data.
a) Quantitative data
b) qualitative data
c) none of them
13. textbox, newspapers and online articles are examples of this
type of data.
a) Primary data
b) secondary data
c) text data
14. it is a plain text file that contains an list of data
a) CSV file
b) SQL file
c) both of them
15. a process of collecting quantity facts in systematic and
organised manner to report any social problem or status of facts In
certain area of society is known as
a) Case study
b) historical research
c) survey
16. which one of these is an example of processed data?
a) CCTV Recordings of shopper visits
b) customer comments
c) tables from service
17. Which one of these is an example of raw data?
a) A report on a specific industrial sector
b) Annual company report
c) a report on qualitative focus groups
d) A transcript from a group
18. The full form of csv is
a) comma separated value
b) computer structured values
c) none of the above
19. The full form of SQL is
a) serial query list
b) structured query language
c) none of the above
20. It is an open source Python package that is most widely used
for data science/ data analysis and machine learning tasks.
a) Pandas
b) NumPy
c) MatPlotLib
21. It is a Python package for the computation and processing of
the multidimensional and single dimensional array elements.
a) NumPy
b) MatPlotLib
c) Pandas
22. It is a data visualisation and graphical plotting library build on
NumPy arrays in Python
a) Pandas
b) Python Plot
c) MatPlotLib
23. NumPy stands for
a) Numbering python
b) number in Python
c) numerical python
d) none of the above
24. what is/ are the advantage(s) of NumPy Arrays Over classic
python lists?
a) Insertion, concatenation and deletion are faster
b) it has better support for mathematical operations
c) they consume less memory
d) all of them
25. In data Science, which of the Python library is more popular?
a) Numpy
b) Pandas
c) OpenCv
d) Django
26. we can analyse the data in Pandas with:
a) Series
b) data frame
c) both a and b
27. Panda key data structure is called as
a) Keyframe
b) data frame
c) statistic
28. NumPy.array(list), what it does?
a) It convert array to list
b) it convert list to array
c) it convert array to array
d) none of these
29. What will be the output for the following code?
import numpy as np
a=np.array([1,2,3])
print(a)
a) [[1,2,3]]
b) [1]
c) [1,2,3]
30. *@Ac# is this type of data.

a) Alphanumeric
(b) Alphabetic
(c) Numeric

31. If temperature is less than Zero, then what is data in the statement?

(a) temperature
(b) temperature <0
(c) 0
(d) <0

32. When an investigator uses the data which has already been collected by
others, such data is called as:

(a) Primary data


(b) Collected data
(c) Processed data
(d) Secondary data

33. Point out the correct statement.

(a) Raw data is original source of data

(b) Pre-processed data is original data source

(c) None of these

(d) Raw data is the data obtained after processing steps

34. The government and non-government publications are considered as:

a) External secondary data sources

(b) Internal secondary data sources

(c) External primary data sources

(d) Internal primary data sources


35. The report on quality control, production and financial accounts issued by the
companies are considered as :

(a) External secondary data sources

(b) Internal secondary data sources

(c) External primary data sources

(d) Internal primary data sources

36. This is a library used to create data structures and carry out scientific
calculations.
(a) Python
(b) Numpy
(c) Pandas
(d) Mathplotlib

37. How to use numpy?


(a) Import numpy as np
(b) import pandas as pd
(c) Import pyplot as plt

38. Which of the following statements is False in the case of the KNN Algorithm?
(a) For a very large value of K, points from other classes may be included in the
neighborhood
(b) For the very small value of K, the algorithm is very sensitive to noise
(c) KNN is used only for classification problem statements
(d) KNN is a lazy learner

II. Calculate and write the correct answer:

1. Rohan bated 7 games last weekend. His scores are : 155, 165,
138, 172, 127, 193, 142. What is the range of Rohan‟s score?
Answer : 54
2. If the standard deviation of a data Set is 4, what is the variance?
Answer : 2
3. What is the standard deviation for this given data?
5, 1 0, 7, 12, 0, 20, 15, 22, 8, 2
Answer : 6.89
4. Students wer IIe asked how many hours a night they sleep.
The responses were: 10, 8, 7.5, 6, 5, 5.5, 6.5,9
Find the standard deviation
Answer : 1.638
5. what is the variance of the first 10 natural numbers?
Answer : 9.166

III. Give one word answer for the following statements:

1. The type of data that cannot be counted measured or easily expressed using
numbers
Ans : Qualitative Data
2. This type of data describes qualities or characteristics.
Ans : Qualitative Data
3. A data visualization and graphical plotting library built on NumPy arrays.
Ans : Matplotlib
4. This is the average of numbers in the list.
Ans : Mean
5. The term used to describe the middle of a sorted list of numbers.
Ans : Median
6. The number which appears most often in a sorted array.
Ans : Mode
7. This is defined as the average of the squared differences from the Mean.
Ans : Variance

IV. Questions and Answers

1. What do you understand about Data Science?


Ans : Data Science is a field focused on extracting knowledge and insights from
data by using scientific methods.

2. Name three domains where AI can be classified?


Ans : Al can be classified into three (3) broad domains I.e. Data Science, Natural
Language Processing and Computer Vision.
3. What are the Applications of Data science?
Ans : Applications of data science are
a) Image Recognition
b) Speech Recognition
c) Fraud & Risk Detection
d) Internet Search etc.

4. What are the differences between Artificial Intelligence and Data Science?
Ans
5. Explain the term of revisiting the AI Project Cycle?
Ans :
a) Problem Scoping : refers to understanding a problem and finding out
various factors which affect the problem, define the goal or aim of the
project.
The 4W's of Problem Scoping are Who, What, Where and Why. These 4W’s helps
in identifying and understanding the problem in a better and efficient manner.

1) Who - "Who" part helps us in comprehending and categorizing who all are
affected directly and indirectly with the problem and who are called the
Stakeholders
2) What - "What" part helps us in understanding and identifying the nature of
the problem and how do we get to know what helps to get us know the
evidence.
3) Where - "Where" does the problem arises, situation and the location.
4) Why - "Why" is the given problem worth solving.

b) Data acquisition : Data acquisition is the process of gathering, filtering


and cleaning data before it is stored in a data storage system.
c) Data Visualization is the presentation of data in graphical format. It helps
people understand the significance of data by summarizing and presenting
huge amount of data in a simple and easy-to-understand format and helps
to communicate information clearly and effectively.
d) Modelling : Creating model from data
e) Evaluation : Evaluating the project

6. What do you mean by Data?


Ans : It is a collection of most valuable resources.

7. What are the different types of data ?


Ans : Qualitative Data and Quantitative Data

8. What is the difference between Qualitative data and Qualitative Data?


Ans :

Qualitative Data is the information that cannot be counted, measured or easily


expressed using numbers.

Qualitative Data describes qualities or characteristics and cannot be counted or


measured using numbers.

9. What is the classification of Data?


Ans :
Classification is a process of categorizing a given set of data into classes, can be
performed on both structured and unstructured data.
10. What do you mean by Data Collection?
Ans :
Data Collection is a process of collecting the information from all the relevant
sources to find answers to the project problem. Data collection can be classified
into primary data collection and secondary data collection.

11. Differentiate between Primary Data Collection and Secondary Data


Collection.
Ans :
12. What are the methods of Data Collection?
Ans :
Most common types of data collection are surveys, interviews, online tracking,
social media platforms, literature sources etc.

13. Define the term Data type. What are the file format of Datasets?
Ans :
Data type is termed as a file format which is being used for storing the encoding
data in a computer file.
Commonly used file format of datasets are CSV files, spreadsheets and SQL.

14. What is CSV?


Ans : CSV (comma separated values) is a plain text file in which each line is a
data record and each record consists of one or more than one data fields, the
field is separated by commas.

15. Explain the term Numpy.


Ans : Numpy stands for Numerical Python and is applied for scientific
programming in Python, especially for numbers.
16. Define the term SQL.
Ans : SQL stands for Structured Query Language used for relational databases.
It can be used to insert, search, update and delete database records.

17. What are Pandas ?


Ans : Pandas is an open source Python package that is most widely used for
data science/data analysis and machine learning tasks.

18. What is the use of Arrays?


Ans : Arrays are used to store homogeneous data (same data type) of fixed size
in sequential order in memory.

19. Explain the list.


Ans : Lists are used to store data of growing size and storing this data in an
available place anywhere (not sequential) in the memory.

20. What is the difference between Python List and Numpy Array?
Ans

21. What do you understand about Matplotlib?


Ans : Matplotlib is a data visualization and graphical plotting library built on
NumPy arrays in Python for making 2D plots from data in arrays.

22. Differentiate between Pandas and Numpy.


Ans
23. What is Statistics?
Ans : Statistics is a mathematical Science related to data collection, analysis and
its interpretation. It is used to solve large Complex problems while looking for
meaningful patterns and trends in data.

Statistical methods are useful to ensure that data is interpreted correctly. A


data set contains information about a sample. It consists of cases which are the
objects in the collection. A case has one or more attributes called variables
which are characteristics of cases.

24. What are types of variables?


Ans :
Continuous Variable : A variable is continuous if the values of the variable form
an interval. For example, weight of a person in a school - it can be 60 or 70 kg
but can also be 60.5 kg. It can be an infinite region of values.

Discrete Variable : A variable is discrete if its possibilities form a set of separate


numbers. For example, medical data, wherein specific units are possible and not
an interval.

25. Define the term Measures of Central tendency. What are the different
methods to measure?
Ans : Measures of central tendency are numbers that tend to cluster around the
“centre” of a set of values.
There are three methods to measures of central tendency :
a) Mean is the average or the most common value in a collection of numbers.
b) Median is the 'middle' of a sorted list of numbers that occurs in a sorted
array.
c) Mode is simply the number which appears most often in a sorted array.

26. Explain the term KNN.


Ans
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.

K-NN algorithm stores all the available data and classifies a new data point
based on the similarity, it means when new data appears then it can be easily
classified into a well suited category by using K-NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.

27. What are the advantages and disadvantages of KNN?


Ans :

28. Where to use KNN?


Ans :
It is mostly used in simple recommendation systems (you tube, amazon, netflix
etc), image recognition and general decision models.

You might also like