0% found this document useful (0 votes)
47 views

Ch-04: Data and Analysis - Short Question and Answers | PDF

The document provides a comprehensive overview of data analysis concepts, including the purpose of data analysis, statistical modeling, and various data collection methods. It discusses key techniques such as supervised and unsupervised learning, data wrangling, and the importance of data visualization. Additionally, it highlights the role of A/B testing in platforms like Airbnb and YouTube for improving user experience through data-driven decisions.

Uploaded by

shahzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Ch-04: Data and Analysis - Short Question and Answers | PDF

The document provides a comprehensive overview of data analysis concepts, including the purpose of data analysis, statistical modeling, and various data collection methods. It discusses key techniques such as supervised and unsupervised learning, data wrangling, and the importance of data visualization. Additionally, it highlights the role of A/B testing in platforms like Airbnb and YouTube for improving user experience through data-driven decisions.

Uploaded by

shahzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Computer Science Class-11

Chapter
04 Data and Analysis
SHORT QUESTION AND ANSWERS
1. What is the primary purpose of data analysis?
Ans: The primary purpose of data analysis is inspecting, cleaning and summarizing
data to extract useful information for decision-making. This process helps in
building statistical models to make predictions or understand real-world
situations. It reduces the risks in decision-making by providing valuable
information and statistics.
2. What is statistical modeling?
Ans: Statistical modeling is the process of applying statistical techniques to analyze
data. It involves creating a model that represents the relationships between two
or more variables. Statistical modeling is used to understand these
relationships, draw meaningful conclusions and make predications about real-
world situations. For example, statistical modeling can be applied to sales data
from the past few years to predict sales for the upcoming year.
3. What is use case?
Ans: A use case is a technique in system analysis that describes how a system will be
used to achieve a specific goal or solve a real-world problem. It helps to identify,
clarify and organize tasks predictions about real-world situations. For example,
statistical modeling can be applied to sales data from the past few years to
predict sales for the upcoming year.
4. What are the common components of planning a use case?
Ans: The common components include outlining a clear goal and expected
outcomes, understanding the scope of work, assessing available resources,
providing the required data, evaluating risks and defining key performance
indicators as measures of success.
5. What are some common approaches to solving use cases?
Ans: The common approaches include forecasting classification, pattern and
anatomy detection, recommendations and image recognition.

1|P age
Computer Science Class-11

6. Can you give examples of typical use cases across different fields?
Ans: The examples of typical use cases include predicting customer chum rates,
segmenting customers, detecting fraud, developing recommendation systems
and optimizing prices.
7. What is the first step in solving a data science case study?
Ans: The first step is formulating the right question that involves reviewing the
available literature in order to understand the business problem and translate
in into a clear question. This often involves interacting with stakeholders and
defining specific objectives.
8. What is the use of data wrangling?
Ans: Data wrangling is used to organize and preprocess the collected data for
analysis. It includes cleaning data by removing duplicates, correcting errors and
handling missing values.
9. In the context of weather forecasting. What types of data would you collect
and why?
Ans: In the context of weather forecasting, the data is collected about temperature,
amount of rainfall, wind speed and direction, atmospheric pressure etc. This
data is essential to build models that can predict future weather conditions and
provide accurate forecasts.
10. What is machine learning?
Ans: Machine learning is a branch of artificial intelligence. It involves using
algorithms to build systems that can learn from data and make predictions or
decisions. These algorithms analyze patterns and insights from the data to
improve their performance overtime. Once trained on a dataset, machine
learning models can made decisions or predictions based on patterns without
requiring further human input or often functioning independently.
11. What are the two main categories of statistical modeling methods used in
data analysis?
Ans: The two main categories are supervised learning and unsupervised learning.
12. How does supervised learning differ from unsupervised learning?
Ans: In supervised learning, the model is trained by giving dataset with labeled inputs
and corresponding outputs. It learns the relationship between inputs and
outputs by analyzing the labeled data. In supervised learning, the algorithms
2|P age
Computer Science Class-11

find patterns and relationships within the data without any prior knowledge.
The model tries to search for pattern and gives desired result.
13. What is a linear regression model?
Ans: A linear regression model is mathematical equation that is used to predict a
response for aa given predictor value. It uses the independent variable (x) for
prediction and dependent variable (y) to be predicted based on value of
variable x.
14. What does the equation y = mx + b represent in linear regression?
Ans: In linear regression, y = mx + b represents the equation of a line where y is the
dependent variable, x is the independent variable, m is the slope of the line and
b is the y-intercept.
15. What are the roles of the intercept and slope in a linear regression model?
Ans: The intercept (y-intercept) is the point where the line crosses the y-axis and the
slope represents the rate of change in the dependent variable with respect to
the independent variable.
16. What is a classification model in statistical modeling?
Ans: A classification model is used when the result is a discrete value such as
predicting whether an employee will receive a salary raise or not. It categorizes
data into predefined categories.
17. How does classification differ from regression in statistical modeling?
Ans: Regression models are used to predict continuous values such as predicting a
person’s salary or age. The output is a number that can take any value within a
range. classification models are used to predict discrete values such as yes/no
or true/false. For example, classifying an email as spam or not spam. The output
is a category or class label not a number.
18. What is unsupervised learning?
Ans: Unsupervised learning is a statistical modeling technique used in machine
learning that deals with unlabeled data. The algorithm of this technique find
patterns and relationship within the data without any prior knowledge. It
means that data is not assigned to any category. The model tries to search for
a pattern and gives the desired result.

3|P age
Computer Science Class-11

19. What does clustering mean in the context of unsupervised learning?


Ans: Clustering involves grouping data items based on their similarities. For example,
customers may be clustered into groups based on their usage patterns such as
long call duration or heavy internet usage.
20. What is the purpose of association rules in unsupervised learning?
Ans: Association is a method that identifies the relationships or associations among
a set of items within large datasets. It identifies the combination of items that
often occur together. For example, an association exists among bread, milk and
butter. If a customer buys bread, he may also buy butter and milk. Many
retailers and e-commerce platforms often use this method to recommend
products to their customers to increase sales.
21. What is the K-means clustering algorithm?
Ans: K-means clustering is a popular clustering algorithm. It combines a specified
number of data points into specific groups based on similarities.
22. What is experimental design in data science?
Ans: Experimental design is a systematic technique to plan, organize and conducting
experiments effectively. It focuses on careful planning to ensure that the
collected data is suitable for analysis and the experiment is structured to get
reliable result.
23. What is random assignment important in experimental design?
Ans: Random assignment helps reduce bias and other effects by ensuring that
participants or treatments are randomly distributed across experimental
conditions.
24. Which statistical methods can be used in data analysis for experimental
design?
Ans: The statistical methods used in data analysis include hypothesis testing,
regression analysis, ANOVA (analysis of variance) and other techniques
depending on the experimental design and research question.
25. How does randomization improve the validity of an experiment?
Ans: Randomization improves the validity of an experiment by dividing people into
various different treatment groups randomly. It ensures that there are not bias
or preplanned groups.

4|P age
Computer Science Class-11

26. What is the purpose of replication in an experiment design?


Ans: The purpose of replication in an experiment design is to ensure that the results
are consistent and reliable. It makes sure that the results were not a
coincidence. It is a fundamental principle that helps to validate the results.
27. What is the role of a control group in an experiment?
Ans: A control group is a group that has not received any treatment. It is used to
compare the findings of a treatment group with its outcomes. It ensures that
the findings on the treatment group are actually caused by the medicine.
28. What is the difference between correlation and causation?
Ans: Correlation shows that two variables are related but it does not mean on causes
the other. Causation means one variable directly causes a change in another.
Correlation does not imply causation but causation always implies correlation.
29. What is the difference between a population and a sample in statistics?
Ans: Population refers to entire group of individuals or objects under study while
sample refers to a subset of the population selected for study. Population is
typically larger than sample. Population can provide information about the
entire population whereas sample can provide estimates about the population.
30. Define the term parameter in the context of statistics.
Ans: Parameter is the number that describes something about the whole population.
It describes the entire population based on all possible observations. The
parameters are usually unknown. They are often estimated using sample data.
There will always be some uncertainty about the accuracy of the estimates.
31. Differentiate between parameter and statistic.
Ans: Parameter refers the characteristic of population while statistic refers to
characteristic of sample. Parameter describe the whole population accurately
while statistic estimates the parameter based on sample data. Parameter is
typically unknown whereas statistics is known when sample data is available.
32. What are three common types averages used to represent typical values in a
population?
Ans: The three common types of averages are mean, median and mode. The mean
is the arithmetic average of a set of values. the median is the middle value when
the data is ordered. The mode is the value that appears most frequently in the
dataset.
5|P age
Computer Science Class-11

33. What is the purpose of data collection methods? Name any two.
Ans: Data collection methods are the techniques used to gather reliable data for
analysis or research purpose. The collected data is analyzed to make informed
decisions or solve problems. Two data collection methods are primary data
collection methods and secondary data collection method.
34. What is primary data and how does it differ from secondary data?
Ans: Primary data is collected directly from the source that has never been used in
the past. Secondary data refers to the data that has already been collected by
someone else. Primary data collection provides more authentic and original
data than secondary. Primary data is typically more expensive than secondary
due to costs of data collection and analysis.
35. What is the purpose of using interviews as a primary data collection method?
Ans: Interviews are used to collect detailed and specific information directly from
individuals. Interview provides flexibility because the questions can be adjusted
or changed anytime according to the situation.
36. How does the observation method work in primary data collection and what
are its advantages?
Ans: The observation is used to observe a situation and record the findings. It can be
used to evaluate the behavior of different people. It is very effective method
because it is does not directly depend on other participants that can reduce
bias.
37. What is the use of surveys and questionnaires?
Ans: Surveys and questionnaires are used to gather information from a large group
of people quickly and efficiently. They can be conducted face-to-face, by post
or over the internet to get respondents from anywhere in the world. The
answers can be yes or no, true or false, multiple choice and even open-ended
questions.
38. What is focus group in primary data collection? Give its benefit and drawback.
Ans: A focus group is similar to an interview but it is conducted with a group of
people who all have something in common. The data collected is similar to in-
person interviews. The benefit is that they offer a better understanding of why
a certain group of people thinks in a particular way. A drawback of this method
is the lack of privacy as many people are present.

6|P age
Computer Science Class-11

39. How are oral histories used in data collection?


Ans: Oral histories involve collecting the opinions and personal experience of people
about a particular event. This method focuses on collecting details about one
particular topic or issue. People share their stories and memories through
recorded interviews.
40. How government archives are used in data collection? What challenges does
this method have?
Ans: Government archives are large collections of official documents and records.
They may contain letters, reports, registers, map, photos, audio and videos. An
important advantage is that information is reliable and verifiable. The
challenges include potential difficult in accessing restricted information and
time-consuming process of find and retrieving relevant data.
41. What role do libraries play in secondary data collection?
Ans: The libraries have a large collection of important and authentic information on
different topics. They also have business directories, annual reports and other
similar documents that help businesses in their research.
42. How does Airbnb use A/B testing?
Ans: Airbnb uses statistical experimentation such as A/B testing to improve its
platform and user experiences. For example, when testing a new feature such
as a new booking process or search method. Airbnb creates two different
versions. They randomly show each version to different users. They randomly
show each version to different users. They collect and analyze data on how
users interact with each version to determine which one performs better.
43. Why data-driven approach is it important for Airbnb to make changes to its
platform?
Ans: This data-driven approach enables Airbnb to make informed decisions about
changes that can increase user satisfaction and booking rates. Airbnb performs
continuous testing and uses real data to ensure that the updates are effective
and based on actual evidence. This method leads to a more user-friendly and
efficient service.

7|P age
Computer Science Class-11

44. What role does A/B testing play in YouTube’s platform development?
Ans: YouTube uses A/B testing to compare two versions of new features such as new
recommendation algorithm or platform layout. It randomly shows each version
to different users and collect data about user interaction with each version.
45. What type of data does YouTube collect and why?
Ans: YouTube collects data on user interactions such as clickthrough rates, watch
time and user feedback. It then evaluates the performance of each version. This
process helps YouTube to improve algorithm, recommend more relevant
content to users and improve the features for better user satisfaction.
46. What is the purpose of data exploration in the data analysis process?
Ans: Data exploration is the initial step in data analysis process. It is used to
examine data to understand its main characteristics using visual and statistical
methods. This process helps in identifying patterns, trends and relationships
within the data.
47. Why is data cleaning important before performing statistical analysis and
visualization?
Ans: Data cleaning involves detecting and correcting errors and inconsistences in
data to improve its quality. The unnecessary data is removed to ensure the
analysis focuses on relevant information. The summary statistics are calculated
to understand the tendency and dispersion. The tendency shows how close is
the data. The dispersion shows how scattered is the data.
48. What are data products?
Ans: Data products are the tools that help businesses to improve the decision-
making processes and operation. These products are designed to solve
problems, automate processes, improve decision-making or create new
opportunities using data. The primary goal of data products is to deliver value
49. What is data visualization and why is it important?
Ans: Data visualization is the graphical representation of information and data. It
represents data using visual elements such as charts, graphs and maps. Data
visualization tools make it easier to see and understand trends and patterns in
the data. They are useful for presenting data clearly and effectively to non-
technical users. Data visualization tools are essential to analyze massive
amounts of information and make data-driven decisions.

8|P age
Computer Science Class-11

50. What types of visual elements are commonly used in data visualization?
Ans: Some common visual elements used in data visualization include charts, graphs
and maps etc.
51. What role does Python play in data visualization?
Ans: Python plays a significant role in data visualization by providing various libraries
and tools that support the creation of a wide range of visualizations. Libraries
such as Matplotlib, Seaborn and Plotly offer functionalities for generating
charts, graphs and interactive plots. That makes it easier to analyze and
interpret data.
52. What is Matplotlib and what is its primary use in data visualization?
Ans: Matplotlib is the most widely used library for data visualization. It is used to
create static, animated and interactive in Python. It is primarily used to
represent data graphically that makes it easier to analyze and understand. Most
of the Mataplotlib utilities are available in puplot submodule.
53. Which Matplotlib method is used to create a scatter plot?
Ans: The scatter() method in Matplotlib library is used to create a scatter plot. It is
used to plot data points as dots on a two-dimensional graph where each dot
represents a pair of values from the dataset.
54. What types of plots can you created in Python?
Ans: Different types of plots can be created in Python such as scatter plot, line plot,
histogram, bar chart, pie chart and box plot etc.
55. What tools are commonly used for building statistical models?
Ans: The common tools for building statistical model include MS-Excel, Weka, R
Studio and Python.
56. Where can you find datasets for statistical modeling?
Ans: Datasets can be found on platforms such as www.kaggle.com and
https://github.com
57. What is y = 3x + 4 equation in Python?
Ans: The equation y=3x+4y represents a linear relationship between the variables x
and y. It is used to generate and plot data points for statistical modelling and
analysis.

9|P age
Computer Science Class-11

58. What is Google Colab and how is it used in statistical modelling with Python?
Ans: Google Colab is an online platform that provides a Python environment for
coding and executing Python scripts. It is used for building and testing statistical
models by allowing users to write and run Python code in a browser.

10 | P a g e

You might also like