Ch-04: Data and Analysis - Short Question and Answers | PDF
Ch-04: Data and Analysis - Short Question and Answers | PDF
Chapter
04 Data and Analysis
SHORT QUESTION AND ANSWERS
1. What is the primary purpose of data analysis?
Ans: The primary purpose of data analysis is inspecting, cleaning and summarizing
data to extract useful information for decision-making. This process helps in
building statistical models to make predictions or understand real-world
situations. It reduces the risks in decision-making by providing valuable
information and statistics.
2. What is statistical modeling?
Ans: Statistical modeling is the process of applying statistical techniques to analyze
data. It involves creating a model that represents the relationships between two
or more variables. Statistical modeling is used to understand these
relationships, draw meaningful conclusions and make predications about real-
world situations. For example, statistical modeling can be applied to sales data
from the past few years to predict sales for the upcoming year.
3. What is use case?
Ans: A use case is a technique in system analysis that describes how a system will be
used to achieve a specific goal or solve a real-world problem. It helps to identify,
clarify and organize tasks predictions about real-world situations. For example,
statistical modeling can be applied to sales data from the past few years to
predict sales for the upcoming year.
4. What are the common components of planning a use case?
Ans: The common components include outlining a clear goal and expected
outcomes, understanding the scope of work, assessing available resources,
providing the required data, evaluating risks and defining key performance
indicators as measures of success.
5. What are some common approaches to solving use cases?
Ans: The common approaches include forecasting classification, pattern and
anatomy detection, recommendations and image recognition.
1|P age
Computer Science Class-11
6. Can you give examples of typical use cases across different fields?
Ans: The examples of typical use cases include predicting customer chum rates,
segmenting customers, detecting fraud, developing recommendation systems
and optimizing prices.
7. What is the first step in solving a data science case study?
Ans: The first step is formulating the right question that involves reviewing the
available literature in order to understand the business problem and translate
in into a clear question. This often involves interacting with stakeholders and
defining specific objectives.
8. What is the use of data wrangling?
Ans: Data wrangling is used to organize and preprocess the collected data for
analysis. It includes cleaning data by removing duplicates, correcting errors and
handling missing values.
9. In the context of weather forecasting. What types of data would you collect
and why?
Ans: In the context of weather forecasting, the data is collected about temperature,
amount of rainfall, wind speed and direction, atmospheric pressure etc. This
data is essential to build models that can predict future weather conditions and
provide accurate forecasts.
10. What is machine learning?
Ans: Machine learning is a branch of artificial intelligence. It involves using
algorithms to build systems that can learn from data and make predictions or
decisions. These algorithms analyze patterns and insights from the data to
improve their performance overtime. Once trained on a dataset, machine
learning models can made decisions or predictions based on patterns without
requiring further human input or often functioning independently.
11. What are the two main categories of statistical modeling methods used in
data analysis?
Ans: The two main categories are supervised learning and unsupervised learning.
12. How does supervised learning differ from unsupervised learning?
Ans: In supervised learning, the model is trained by giving dataset with labeled inputs
and corresponding outputs. It learns the relationship between inputs and
outputs by analyzing the labeled data. In supervised learning, the algorithms
2|P age
Computer Science Class-11
find patterns and relationships within the data without any prior knowledge.
The model tries to search for pattern and gives desired result.
13. What is a linear regression model?
Ans: A linear regression model is mathematical equation that is used to predict a
response for aa given predictor value. It uses the independent variable (x) for
prediction and dependent variable (y) to be predicted based on value of
variable x.
14. What does the equation y = mx + b represent in linear regression?
Ans: In linear regression, y = mx + b represents the equation of a line where y is the
dependent variable, x is the independent variable, m is the slope of the line and
b is the y-intercept.
15. What are the roles of the intercept and slope in a linear regression model?
Ans: The intercept (y-intercept) is the point where the line crosses the y-axis and the
slope represents the rate of change in the dependent variable with respect to
the independent variable.
16. What is a classification model in statistical modeling?
Ans: A classification model is used when the result is a discrete value such as
predicting whether an employee will receive a salary raise or not. It categorizes
data into predefined categories.
17. How does classification differ from regression in statistical modeling?
Ans: Regression models are used to predict continuous values such as predicting a
person’s salary or age. The output is a number that can take any value within a
range. classification models are used to predict discrete values such as yes/no
or true/false. For example, classifying an email as spam or not spam. The output
is a category or class label not a number.
18. What is unsupervised learning?
Ans: Unsupervised learning is a statistical modeling technique used in machine
learning that deals with unlabeled data. The algorithm of this technique find
patterns and relationship within the data without any prior knowledge. It
means that data is not assigned to any category. The model tries to search for
a pattern and gives the desired result.
3|P age
Computer Science Class-11
4|P age
Computer Science Class-11
33. What is the purpose of data collection methods? Name any two.
Ans: Data collection methods are the techniques used to gather reliable data for
analysis or research purpose. The collected data is analyzed to make informed
decisions or solve problems. Two data collection methods are primary data
collection methods and secondary data collection method.
34. What is primary data and how does it differ from secondary data?
Ans: Primary data is collected directly from the source that has never been used in
the past. Secondary data refers to the data that has already been collected by
someone else. Primary data collection provides more authentic and original
data than secondary. Primary data is typically more expensive than secondary
due to costs of data collection and analysis.
35. What is the purpose of using interviews as a primary data collection method?
Ans: Interviews are used to collect detailed and specific information directly from
individuals. Interview provides flexibility because the questions can be adjusted
or changed anytime according to the situation.
36. How does the observation method work in primary data collection and what
are its advantages?
Ans: The observation is used to observe a situation and record the findings. It can be
used to evaluate the behavior of different people. It is very effective method
because it is does not directly depend on other participants that can reduce
bias.
37. What is the use of surveys and questionnaires?
Ans: Surveys and questionnaires are used to gather information from a large group
of people quickly and efficiently. They can be conducted face-to-face, by post
or over the internet to get respondents from anywhere in the world. The
answers can be yes or no, true or false, multiple choice and even open-ended
questions.
38. What is focus group in primary data collection? Give its benefit and drawback.
Ans: A focus group is similar to an interview but it is conducted with a group of
people who all have something in common. The data collected is similar to in-
person interviews. The benefit is that they offer a better understanding of why
a certain group of people thinks in a particular way. A drawback of this method
is the lack of privacy as many people are present.
6|P age
Computer Science Class-11
7|P age
Computer Science Class-11
44. What role does A/B testing play in YouTube’s platform development?
Ans: YouTube uses A/B testing to compare two versions of new features such as new
recommendation algorithm or platform layout. It randomly shows each version
to different users and collect data about user interaction with each version.
45. What type of data does YouTube collect and why?
Ans: YouTube collects data on user interactions such as clickthrough rates, watch
time and user feedback. It then evaluates the performance of each version. This
process helps YouTube to improve algorithm, recommend more relevant
content to users and improve the features for better user satisfaction.
46. What is the purpose of data exploration in the data analysis process?
Ans: Data exploration is the initial step in data analysis process. It is used to
examine data to understand its main characteristics using visual and statistical
methods. This process helps in identifying patterns, trends and relationships
within the data.
47. Why is data cleaning important before performing statistical analysis and
visualization?
Ans: Data cleaning involves detecting and correcting errors and inconsistences in
data to improve its quality. The unnecessary data is removed to ensure the
analysis focuses on relevant information. The summary statistics are calculated
to understand the tendency and dispersion. The tendency shows how close is
the data. The dispersion shows how scattered is the data.
48. What are data products?
Ans: Data products are the tools that help businesses to improve the decision-
making processes and operation. These products are designed to solve
problems, automate processes, improve decision-making or create new
opportunities using data. The primary goal of data products is to deliver value
49. What is data visualization and why is it important?
Ans: Data visualization is the graphical representation of information and data. It
represents data using visual elements such as charts, graphs and maps. Data
visualization tools make it easier to see and understand trends and patterns in
the data. They are useful for presenting data clearly and effectively to non-
technical users. Data visualization tools are essential to analyze massive
amounts of information and make data-driven decisions.
8|P age
Computer Science Class-11
50. What types of visual elements are commonly used in data visualization?
Ans: Some common visual elements used in data visualization include charts, graphs
and maps etc.
51. What role does Python play in data visualization?
Ans: Python plays a significant role in data visualization by providing various libraries
and tools that support the creation of a wide range of visualizations. Libraries
such as Matplotlib, Seaborn and Plotly offer functionalities for generating
charts, graphs and interactive plots. That makes it easier to analyze and
interpret data.
52. What is Matplotlib and what is its primary use in data visualization?
Ans: Matplotlib is the most widely used library for data visualization. It is used to
create static, animated and interactive in Python. It is primarily used to
represent data graphically that makes it easier to analyze and understand. Most
of the Mataplotlib utilities are available in puplot submodule.
53. Which Matplotlib method is used to create a scatter plot?
Ans: The scatter() method in Matplotlib library is used to create a scatter plot. It is
used to plot data points as dots on a two-dimensional graph where each dot
represents a pair of values from the dataset.
54. What types of plots can you created in Python?
Ans: Different types of plots can be created in Python such as scatter plot, line plot,
histogram, bar chart, pie chart and box plot etc.
55. What tools are commonly used for building statistical models?
Ans: The common tools for building statistical model include MS-Excel, Weka, R
Studio and Python.
56. Where can you find datasets for statistical modeling?
Ans: Datasets can be found on platforms such as www.kaggle.com and
https://github.com
57. What is y = 3x + 4 equation in Python?
Ans: The equation y=3x+4y represents a linear relationship between the variables x
and y. It is used to generate and plot data points for statistical modelling and
analysis.
9|P age
Computer Science Class-11
58. What is Google Colab and how is it used in statistical modelling with Python?
Ans: Google Colab is an online platform that provides a Python environment for
coding and executing Python scripts. It is used for building and testing statistical
models by allowing users to write and run Python code in a browser.
10 | P a g e