0% found this document useful (0 votes)
83 views

Home Assignment Dataliteracy

The document is a home assignment focused on data literacy, covering multiple-choice questions, short answer questions, long answer questions, and Python programming tasks related to data analysis and visualization. It emphasizes the importance of data literacy in making informed decisions, understanding data, and enhancing career skills. Additionally, it includes practical exercises on data preprocessing, visualization techniques, and statistical measures.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Home Assignment Dataliteracy

The document is a home assignment focused on data literacy, covering multiple-choice questions, short answer questions, long answer questions, and Python programming tasks related to data analysis and visualization. It emphasizes the importance of data literacy in making informed decisions, understanding data, and enhancing career skills. Additionally, it includes practical exercises on data preprocessing, visualization techniques, and statistical measures.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

HOME ASSIGNMENT – DATA LITERACY

A. Multiple-choice questions
1. Which of the following best defines data literacy?
A) The ability to read and write data B) The ability to find and use data effectively
C) The ability to analyse data using AI D) The ability to collect and store data securely
ANSWER : Data literacy refers to the ability to read, interpret, create, and communicate data in a meaningful
way.

2. What is the purpose of data preprocessing?


A) To make data more complex B) To make data less accessible
C) To clean and prepare data for analysis D) To increase the size of the dataset
Answer : Data preprocessing involves cleaning, transforming, and organizing raw data to ensure it is ready for
analysis.

3. How can missing data be handled in a dataset?


A) By ignoring it B) By replacing missing values with estimates
C) By deleting rows or columns with missing values D) By converting missing values to zero
Answer : Missing Values in a dataset can be handled by replacing it with estimates it or By Deleting rows or
columns with missing values

4. Which of the following statements about the quantity of data needed for machine learning projects is true?
A) More data is always better for good predictions.
B) Small batches of data are sufficient for complex models.
C) Data quantity depends solely on the number of features.
D) Data diversity is not essential for model performance.
Answer : Machine learning models generally perform better with more data because it helps the model learn
patterns and generalize effectively, reducing overfitting and underfitting.

5. Which of the following is an example of a primary source of data collection?


A) Web scraping B) Social media data tracking C) Surveys D) Kaggle datasets

6. What method of data collection involves direct communication with individuals or groups to gather information?
A) Observations B) Experiments C) Interviews D)Marketing campaigns

7. Which of the following is an example of ratio scale data?


A) Grading students' exam papers as "A," "B," "C," "D," and "F"
B) Measuring the temperature in Celsius
C) Rating a meal at a restaurant as "unpalatable," "unappetizing," "just okay," "tasty," and "delicious"
D) Recording the weight of a person in kilograms

8. What is the distinguishing feature of ratio scale data?


A) It involves categories without a specific order
B) It has a zero point and allows for ratios to be calculated
C) It involves categories with a strict order but no measurable differences between categories
D) It has a definite order, but the differences between categories cannot be measured

Answer : The distinguishing feature of ratio scale data is the presence of a true zero point, which indicates
the absence of the quantity being measured. This allows for meaningful comparisons using ratios (e.g., "twice as
much" or "half as much"). Examples include weight, height, age, and distance.
9. Which statistical measure is most suitable for data sets with evenly spread values and no exceptionally high or
low values?
A) Mean B) Median C) Mode D) Variance
Answer : The mean is the most suitable measure for data sets with evenly spread values and no outliers because
it considers all data points and provides the average value.

10. What is the term used to describe the graphical or pictorial representation of data?
A) Statistical summary B) Data organization
C) Data visualization D) Data interpretation

B. Short answer questions:

1. Explain the concept of data literacy and its importance in today's digital age.
Answer : Data literacy is the ability to read, understand , analyse and communicate data effectively. It
enables individuals to make informed decisions, assess data critically, and solve problems using data.

Importance in Today's Digital Age:


 Informed Decisions: Helps make decisions based on facts, not assumptions.
 Navigating Data: In a data-driven world, it allows individuals to leverage data for personal and
professional benefits.
 Critical Thinking: Enables evaluation of data quality and credibility, avoiding biases.
 Career Skills: Data literacy is increasingly valuable across industries, enhancing job prospects.
 Digital Citizenship: Promotes responsible data use and understanding of privacy and ethics.

2. What is data preprocessing?(HW)


3.What is data visualization and why is it important?
Answer : Data visualization is the process of representing data graphically or visually through charts, graphs,
maps, or diagrams. It simplifies complex data sets, making them easier to understand, interpret, and analyze.

Following are the reasons for Data visualisation to be considered important in data Analysis :
 Simplifies complex data.
 Reveals patterns, trends, and outliers.
 Aids data-driven decision-making.
 Enhances communication and understanding.
 Saves time in interpreting data.

4. How does a line graph differ from a bar graph?(HW)


5. When would you use a scatter plot?
Answer : A scatter plot is used to visualize the relationship or correlation between two numerical variables.
Scatter plots are ideal for exploring potential connections between variables in data analysis.

6. What is data?(HW)
7. What do you mean by web scraping?(HW)

9. Construct a 3x2 matrix where each element is given by 𝑎𝑖𝑗 = 𝑖 ∗ ?(HW)


8. If a matrix has 6 elements, what are the possible orders it can have?(HW)

10. Find the transpose of the matrix B = [ 5 −1 4


2 3 6 ] ?(HW)
C. Long answer questions:
1. Discuss the advantages and limitations of using a pie chart in data visualization. Provide examples to illustrate
your points.
Answer :

Advantages of Pie Charts

1. Simple and Clear: Easily shows proportions and percentages.


o Example: Market share distribution.
2. Quick Understanding: Effective for non-technical audiences.
o Example: Election vote percentages.
3. Good for Few Categories: Best with 3-4 categories.
o Example: Survey results like "Yes," "No," "Maybe."

Limitations of Pie Charts

1. Lacks Precision: Hard to compare similar-sized segments.


2. Overcrowding: Becomes unreadable with many categories.
3. No Trends: Cannot show changes over time.
4. Needs Context: Percentages alone can be misleading.

2. Explain the terms mean, median and mode?(HW)


3. Explain the four levels of measurement?(HW)
4. Given the matrices A and B. Calculate A+ B and B – A. ?(HW)

D. Python Programs
1. The ages of a group of people in a community are: 25, 28, 30, 35, 40, 45, 50, 55, 60, 65.
Write a program to calculate the mean, median, and mode of the ages.

2. A company recorded the daily temperatures (in degrees Celsius) for five consecutive days:
20°C, 22°C, 25°C, 18°C, and 23°C. Determine the variance and standard deviation of the temperatures.

3. Plot a line chart representing the weekly number of customer inquiries received by a
customer service center:
• Week 1: 150 inquiries
• Week 2: 170 inquiries
• Week 3: 180 inquiries
• Week 4: 200 inquiries

import matplotlib.pyplot as plt


weeks=[‘Week 1’, ‘Week 2’, ‘Week 3’, ‘Week 4’]
inquiries =[150, 170, 180, 200] # Plotting the line chart
plt.plot(weeks, inquiries, marker='o', color='b', linestyle='-', linewidth=2, markersize=6)

# Adding title and labels


plt.title('Weekly Number of Customer Inquiries', fontsize=14)
plt.xlabel('Week', fontsize=12)
plt.ylabel('Number of Inquiries', fontsize=12)

# Displaying the plot


plt.grid(True)
plt.show()

4. Plot a bar chart representing the number of books sold by different genres in a bookstore:
• Fiction: 120 books
• Mystery: 90 books
• Science Fiction: 80 books
• Romance: 110 books
• Biography: 70 books

5. Visualize the distribution of different types of transportation used by commuters in a city


using a pie chart:
• Car: 40%
• Public Transit: 30%
• Walking: 20%
• Bicycle: 10%

Answer :
import matplotlib.pyplot as plt

# Data for transportation distribution


transport_modes = ['Car', 'Public Transit', 'Walking', 'Bicycle']
percentages = [40, 30, 20, 10]

clr = ['skyblue', 'orange', 'lightgreen', 'gold']

# Plotting the pie chart


plt.figure(figsize=(8, 6))

plt.pie(percentages, labels=transport_modes, autopct='%1.1f%%', colors=clr, startangle=90, explode=(0.1, 0, 0, 0))

# Adding title
plt.title('Distribution of Transportation Modes Among Commuters', fontsize=14)

# Displaying the pie chart


plt.show()

You might also like