0% found this document useful (0 votes)

20 views23 pages

NM

Uploaded by

vetrikumar6380268095

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views23 pages

NM

Uploaded by

vetrikumar6380268095

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

1.

Here is a simplified Python code to address the given problem. The code assumes you have a dataset
(e.g., a CSV file) with student names and their scores in various subjects (e.g., Math, Science, English).

### Python Code

```python

# Import necessary libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Load the dataset into a Pandas DataFrame

# Replace 'your_file.csv' with the path to your dataset

df = pd.read_csv('your_file.csv')

# Handle missing values by replacing them with the mean of the respective column

df.fillna(df.mean(), inplace=True)
# Calculate the average score for each student

df['Average_Score'] = df.iloc[:, 1:].mean(axis=1) # Assuming the first column is student names

# Categorize students into performance levels

def categorize_performance(avg_score):

if avg_score >= 80:

return 'High'

elif avg_score >= 50:

return 'Medium'

else:

return 'Low'

df['Performance_Category'] = df['Average_Score'].apply(categorize_performance)

# Identify the subject with the highest average score across students

subject_avg_scores = df.iloc[:, 1:-2].mean()

highest_avg_subject = subject_avg_scores.idxmax()

# Determine the number of students in each performance category

category_counts = df['Performance_Category'].value_counts()

# Visualization: Bar chart for average score per subject

subject_avg_scores.plot(kind='bar', title='Average Score Per Subject', ylabel='Average Score',

xlabel='Subjects', color='skyblue')

plt.show()

# Visualization: Pie chart for performance category distribution

category_counts.plot(kind='pie', autopct='%1.1f%%', title='Performance Category Distribution',

ylabel='')

plt.show()

```

### Explanation

1. Data Loading and Cleaning:

- Loads a CSV file into a Pandas DataFrame.

- Handles missing values by replacing them with the column mean.

2. **Data Manipulation:**

- Calculates the average score for each student.

- Categorizes students based on their average score into "High," "Medium," or "Low."

3. **Analysis:**

- Finds the subject with the highest average score across all students.

- Counts the number of students in each performance category.

4. **Visualization:**

- Creates a bar chart showing the average scores for each subject.

- Creates a pie chart showing the percentage of students in each performance category.

### Instructions to Run

1. Upload your dataset (e.g., `your_file.csv`) to Google Colab.

2. Replace `'your_file.csv'` in the code with the actual file path.

3. Run the code cells step-by-step in Google Colab.

### Sample Output (Assuming Example Dataset)

**Bar Chart:**

Displays a bar chart with average scores for Math, Science, and English.

**Pie Chart:**

Shows a pie chart with categories like "High" (30%), "Medium" (50%), and "Low" (20%).

**Console Output:**

- Subject with the highest average score: `Science`

- Performance category counts:

```

Medium 5

High 3

Low 2

Name: Performance_Category, dtype: int64

```
Here’s a concise Python code that you can run in Google Colab to analyze a COVID-19 dataset as
described in the question.

### Python Code

```python

# Import necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'covid_data.csv' with the path to your dataset

df = pd.read_csv('covid_data.csv')

# Handle missing values and duplicates

df.fillna(0, inplace=True)

df.drop_duplicates(inplace=True)

# Add a new column for daily new cases

df['New_Cases'] = df['Total_Cases'].diff().fillna(0)

# Extract 'Date' into separate columns for Year, Month, and Day

df['Date'] = pd.to_datetime(df['Date'])

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

df['Day'] = df['Date'].dt.day

# Calculate total cases and deaths globally

total_cases = df['Total_Cases'].sum()
total_deaths = df['Total_Deaths'].sum()

# Identify the country with the highest number of cases and deaths

country_cases = df.groupby('Country')['Total_Cases'].max()

country_deaths = df.groupby('Country')['Total_Deaths'].max()

highest_cases_country = country_cases.idxmax()

highest_deaths_country = country_deaths.idxmax()

# Analyze daily new cases trend (last 30 days)

last_30_days = df[df['Date'] >= (df['Date'].max() - pd.Timedelta(days=30))]

# Visualization: Line chart for total cases trend

df.groupby('Date')['Total_Cases'].sum().plot(kind='line', title='Trend of Total COVID-19 Cases Over

Time', ylabel='Total Cases', xlabel='Date')

plt.show()

# Bar chart for top 5 countries with the highest cases

top_5_countries = country_cases.nlargest(5)

top_5_countries.plot(kind='bar', title='Top 5 Countries with Highest Cases', ylabel='Total Cases',

xlabel='Countries', color='orange')

plt.show()

# Pie chart for proportion of cases in continents

continent_cases = df.groupby('Continent')['Total_Cases'].sum()

continent_cases.plot(kind='pie', autopct='%1.1f%%', title='Proportion of Cases by Continent', ylabel='')

plt.show()

# Print key results

print(f"Total cases globally: {total_cases}")

print(f"Total deaths globally: {total_deaths}")

print(f"Country with highest cases: {highest_cases_country}")

print(f"Country with highest deaths: {highest_deaths_country}")

```

---

### Explanation

1. Data Loading and Cleaning:

- The dataset is loaded into a DataFrame, missing values are replaced with 0, and duplicates are
dropped.

2. **Data Manipulation:**

- Calculates daily new cases (`New_Cases`).

- Extracts `Year`, `Month`, and `Day` from the `Date` column for analysis.

3. **Analysis:**

- Computes total global cases and deaths.

- Identifies the countries with the highest cases and deaths.

- Filters data for the last 30 days to analyze trends.

4. **Visualization:**

- Line Chart: Shows the trend of total cases over time.

- **Bar Chart:** Displays the top 5 countries with the highest cases.

- Pie Chart: Shows the proportion of cases by continent.

---
### Instructions to Run

1. Upload your dataset (e.g., `covid_data.csv`) to Google Colab.

2. Replace `'covid_data.csv'` in the code with the file name.

3. Run each code cell step-by-step to load, analyze, and visualize the data.

---

### Sample Output (Assuming Example Dataset)

**Console Output:**

```

Total cases globally: 500,000,000

Total deaths globally: 5,000,000

Country with highest cases: USA

Country with highest deaths: Brazil

```

**Visualizations:**

1. Line chart showing the rising trend of total cases globally.

2. Bar chart highlighting the top 5 countries with the highest total cases.

3. Pie chart dividing the proportion of cases by continent.

Here’s a simple Python code that you can run in Google Colab to analyze a sales dataset as described
in the question.

### Python Code

```python

# Import necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'sales_data.csv' with the path to your dataset

df = pd.read_csv('sales_data.csv')

# Handle missing values and duplicates

df.fillna(0, inplace=True)

df.drop_duplicates(inplace=True)

# Add a new column for total revenue

df['Total_Revenue'] = df['Quantity'] * df['Price']

# Group by product category to calculate total revenue and number of items sold

category_summary = df.groupby('Product_Category').agg(

Total_Revenue=('Total_Revenue', 'sum'),

Total_Quantity=('Quantity', 'sum')

# Identify the top 3 products generating the highest revenue

top_products = df.groupby('Product').agg(Total_Revenue=('Total_Revenue', 'sum')).nlargest(3,

'Total_Revenue')

# Determine the month with the highest total sales

df['Date'] = pd.to_datetime(df['Date'])

df['Month'] = df['Date'].dt.to_period('M')

monthly_sales = df.groupby('Month').agg(Total_Revenue=('Total_Revenue', 'sum'))

highest_sales_month = monthly_sales.idxmax()

# Visualization: Bar chart for total revenue by product category

category_summary['Total_Revenue'].plot(kind='bar', title='Total Revenue by Product Category',

ylabel='Total Revenue', xlabel='Product Category', color='green')

plt.show()

# Visualization: Line graph for monthly sales trends

monthly_sales.plot(kind='line', title='Monthly Sales Trends', ylabel='Total Revenue', xlabel='Month',

marker='o', color='blue')
plt.show()

# Print key results

print("Top 3 products generating highest revenue:")

print(top_products)

print(f"Month with highest total sales: {highest_sales_month}")

```

---

### Explanation

1. Data Loading and Cleaning:

- Loads the sales dataset into a Pandas DataFrame.

- Handles missing values by replacing them with 0 and removes duplicate entries.

2. **Data Manipulation:**

- Calculates `Total_Revenue` for each transaction as `Quantity × Price`.

- Groups the data by `Product_Category` to calculate total revenue and number of items sold.

3. **Analysis:**

- Identifies the top 3 products generating the highest revenue.

- Determines the month with the highest total sales.

4. **Visualization:**

- Bar Chart: Displays total revenue by product category.

- Line Graph: Shows monthly sales trends.

---
### Instructions to Run

1. Upload your dataset (e.g., `sales_data.csv`) to Google Colab.

2. Replace `'sales_data.csv'` in the code with your dataset's filename.

3. Run each code cell step-by-step to analyze and visualize the data.

---

### Sample Output (Assuming Example Dataset)

**Console Output:**

```

Top 3 products generating highest revenue:

Total_Revenue

Product

Product_A 100000.00

Product_B 80000.00

Product_C 75000.00

Month with highest total sales: 2024-05

```

**Visualizations:**

1. **Bar Chart:** Shows total revenue for categories like "Electronics," "Furniture," etc.

2. **Line Graph:** Displays sales trends over months with peaks and valleys.
[12/6, 8:53 PM] : Below is a Python code template to solve the tourism data analysis problem
described. You'll need a tourism dataset in CSV format to run this code. The code will include the
required steps, explanations, and instructions to execute it in Google Colab.

### Code

```python

# Step 1: Import Libraries

import pandas as pd

import matplotlib.pyplot as plt

# Step 2: Load the Dataset

# Replace 'tourism_data.csv' with your actual file name

from google.colab import files

uploaded = files.upload() # Upload the dataset

data = pd.read_csv(list(uploaded.keys())[0])

# Step 3: Data Cleaning

data.drop_duplicates(inplace=True) # Remove duplicate rows

data.dropna(inplace=True) # Drop rows with missing values

# Step 4: Data Manipulation

# Add Total Visitors column

data['Total_Visitors'] = data['Domestic_Visitors'] + data['International_Visitors']

# Extract year and month from the 'Date' column

data['Date'] = pd.to_datetime(data['Date'])

data['Year'] = data['Date'].dt.year

data['Month'] = data['Date'].dt.month

# Step 5: Analysis

# Identify the month with the highest total visitors

highest_month = data.loc[data['Total_Visitors'].idxmax()]

# Calculate the average number of visitors per year

average_visitors_per_year = data.groupby('Year')['Total_Visitors'].mean()

# Proportion of domestic vs international visitors by year

proportion = data.groupby('Year')[['Domestic_Visitors', 'International_Visitors']].sum()

proportion['Domestic_Proportion'] = proportion['Domestic_Visitors'] /
(proportion['Domestic_Visitors'] + proportion['International_Visitors'])

proportion['International_Proportion'] = proportion['International_Visitors'] /
(proportion['Domestic_Visitors'] + proportion['International_Visitors'])

# Step 6: Visualization

# Bar Chart - Total Visitors per Month

monthly_totals = data.groupby('Month')['Total_Visitors'].sum()

monthly_totals.plot(kind='bar', title='Total Visitors Per Month', ylabel='Visitors', xlabel='Month')

plt.show()

# Pie Chart - Proportion of Domestic vs International Visitors

latest_year = data['Year'].max()

latest_data = proportion.loc[latest_year]

latest_data[['Domestic_Proportion', 'International_Proportion']].plot(kind='pie', autopct='%1.1f%%',

title=f'Domestic vs International Visitors ({latest_year})', ylabel='')

plt.show()

# Line Graph - Trend of Total Visitors Over the Years

yearly_totals = data.groupby('Year')['Total_Visitors'].sum()

yearly_totals.plot(kind='line', title='Total Visitors Over the Years', ylabel='Visitors', xlabel='Year')

plt.show()

# Step 7: Output Results

print("Month with Highest Total Visitors:")

print(highest_month)

print("\nAverage Visitors Per Year:")

print(average_visitors_per_year)

print("\nProportion of Domestic vs International Visitors by Year:")

print(proportion)

```

---

### Instructions to Run in Google Colab

1. **Upload the Dataset**: Replace the placeholder `tourism_data.csv` with your dataset. When you
run the `files.upload()` block, it will prompt you to upload your file.

2. **Install Required Libraries**: Google Colab already includes `pandas` and `matplotlib`. No
additional installations are necessary.
3. **Run the Cells**: Copy and paste the code into Google Colab and execute each cell sequentially.

---

### Explanation of the Code

1. **Data Cleaning**: Handles duplicates and missing values to ensure data consistency.

2. **Data Manipulation**: Calculates the total visitors and extracts `Year` and `Month` for analysis.

3. **Analysis**:

- Finds the month with the highest visitors.

- Calculates average yearly visitors.

- Analyzes proportions of domestic and international visitors.

4. **Visualization**: Uses bar, pie, and line plots to display results graphically.

---

### Output

The output includes:

1. **Text Outputs**:

- The month with the highest visitors.

- The average number of visitors per year.

- Proportion data.

2. **Graphs**:

- A bar chart for total visitors per month.

- A pie chart for domestic vs international visitor proportions.

- A line graph showing the trend of total visitors over the years.

Run the code to view the exact outputs based on your dataset. If you'd like me to adjust the code or
work with a sample dataset, let me know!
[12/6, 8:53 PM] : Here's an example of what the output might look like if we use a fictional
dataset for tourism data. This will give you an idea of the expected results:

### Sample Text Output

#### Month with Highest Total Visitors

```

Date 2023-07-01

Domestic_Visitors 500,000

International_Visitors 300,000

Total_Visitors 800,000

Year 2023

Month 7

Name: 189, dtype: object

```

#### Average Visitors Per Year

```

Year

2019 450,000.0

2020 200,000.0

2021 350,000.0

2022 500,000.0

2023 600,000.0

Name: Total_Visitors, dtype: float64

```

#### Proportion of Domestic vs International Visitors by Year

```
Domestic_Visitors International_Visitors Domestic_Proportion International_Proportion

Year

2019 2,000,000 700,000 0.740 0.260

2020 1,200,000 500,000 0.706 0.294

2021 1,500,000 700,000 0.682 0.318

2022 2,000,000 1,000,000 0.667 0.333

2023 2,500,000 1,300,000 0.658 0.342

```

---

### Sample Visualizations

1. Bar Chart: Total Visitors Per Month

A bar chart showing total visitors for each month, with July as the peak month.

2. Pie Chart: Proportion of Domestic vs International Visitors (2023)

A pie chart for 2023 might show:

- 65.8% Domestic Visitors

- 34.2% International Visitors

3. Line Graph: Total Visitors Over the Years

A line graph showing a general upward trend in tourism, with a dip in 2020 (likely due to external
factors like a pandemic) and steady growth afterward.

---

### Key Notes

- The outputs will vary depending on your dataset.

- If you don't have real tourism data, you can simulate data by creating a CSV file with columns like
`Date`, `Domestic_Visitors`, and `International_Visitors`.

- Let me know if you’d like help generating sample data for testing!

Here is how you can run the code on **Google Colab**, step-by-step:

---

### Step 1: Open Google Colab

1. Go to [Google Colab](https://colab.research.google.com/).

2. Create a new notebook by clicking on "File > New Notebook".

---

### Step 2: Upload the Dataset

1. Save your dataset (e.g., `bank_campaign_data.csv`) on your local machine.

2. In Google Colab, click on the folder icon in the left sidebar.

3. Click the upload icon and upload your dataset.

---

### Step 3: Run the Code

1. Copy and paste the following Python code into a code cell in Colab:

```python

# Install necessary libraries

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Load the dataset

from google.colab import files

uploaded = files.upload() # Upload the dataset here

file_path = list(uploaded.keys())[0] # Get the uploaded file name

data = pd.read_csv(file_path)

# Step 2: Data Cleaning

# Handle missing values

data.fillna(method='ffill', inplace=True)

# Drop duplicate entries

data.drop_duplicates(inplace=True)

# Step 3: Data Manipulation

# Add a column for Contacted_Last_Month

data['Contacted_Last_Month'] = data['campaign'].apply(lambda x: 'Yes' if x > 0 else 'No')

# Convert categorical variables to numeric using one-hot encoding

categorical_cols = ['job', 'marital', 'education']

data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)

# Step 4: Analysis

# Average age of customers who subscribed

avg_age = data[data['y'] == 'yes']['age'].mean()

# Most common job category for subscribed customers

most_common_job = data[data['y'] == 'yes']['job'].mode()[0]

# Proportion of subscribed customers

subscribed_proportion = len(data[data['y'] == 'yes']) / len(data)

# Step 5: Visualization

# Bar chart showing subscription rate by job

sns.countplot(x='job', hue='y', data=data)

plt.title('Subscription Rate by Job')

plt.xticks(rotation=45)

plt.show()

# Pie chart showing subscription proportion

data['y'].value_counts().plot.pie(autopct='%1.1f%%', labels=['Not Subscribed', 'Subscribed'])

plt.title('Subscription Proportion')

plt.ylabel('')

plt.show()
# Histogram for age distribution

data['age'].plot.hist(bins=10)

plt.title('Distribution of Customer Ages')

plt.xlabel('Age')

plt.show()

# Print analysis results

print(f"Average Age of Subscribed Customers: {avg_age:.2f}")

print(f"Most Common Job for Subscribed Customers: {most_common_job}")

print(f"Proportion of Subscribed Customers: {subscribed_proportion:.2%}")

```

2. Run the cell.

3. When prompted, upload your dataset (e.g., `bank_campaign_data.csv`).

---

### Sample Output:

1. The console will display:

```

Average Age of Subscribed Customers: 41.20

Most Common Job for Subscribed Customers: admin

Proportion of Subscribed Customers: 12.50%

```

2. Visualizations:

- Bar Chart: Subscription rate by job category.

- Pie Chart: Proportion of subscribed vs. not subscribed customers.

- **Histogram**: Age distribution of customers.

---

### **Note**:

Make sure your dataset includes the necessary columns like `age`, `job`, `campaign`, `y`, and other
required fields. Adjust column names in the code if they differ in your dataset.

Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Chapter 3 - Roles of Travel Agencies and Tour Operators
No ratings yet
Chapter 3 - Roles of Travel Agencies and Tour Operators
34 pages
Practical File Class 12 2025-26
No ratings yet
Practical File Class 12 2025-26
19 pages
Grade 12 - IP Practicals (1 To 9)
No ratings yet
Grade 12 - IP Practicals (1 To 9)
12 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Practicals
No ratings yet
Practicals
42 pages
Assignment 8
No ratings yet
Assignment 8
2 pages
Rough Note Text
No ratings yet
Rough Note Text
4 pages
1 2 Merged
No ratings yet
1 2 Merged
12 pages
INDEX
No ratings yet
INDEX
16 pages
Codes
No ratings yet
Codes
44 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Computer Science Ip
No ratings yet
Computer Science Ip
16 pages
Marking Scheme Practical Paper
No ratings yet
Marking Scheme Practical Paper
5 pages
Certificate
No ratings yet
Certificate
25 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
My P Report
No ratings yet
My P Report
14 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
23 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Untitled 5
No ratings yet
Untitled 5
10 pages
DAV ESX Answer
No ratings yet
DAV ESX Answer
58 pages
Text
No ratings yet
Text
7 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
7 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
9 pages
Project File - A
No ratings yet
Project File - A
20 pages
Report MSA Practice02
No ratings yet
Report MSA Practice02
29 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
AI Practical Project
No ratings yet
AI Practical Project
15 pages
Naan Mudhalvan - Google Cloud Data Analytics
No ratings yet
Naan Mudhalvan - Google Cloud Data Analytics
33 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
IP Practical
No ratings yet
IP Practical
24 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
DMV600 Mock Test
No ratings yet
DMV600 Mock Test
3 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Profitanalysis
No ratings yet
Profitanalysis
18 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
ML Report
No ratings yet
ML Report
12 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
HBR Creating Data Driven Culture 111247
No ratings yet
HBR Creating Data Driven Culture 111247
7 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Class12 IP Practical Solutions
No ratings yet
Class12 IP Practical Solutions
39 pages
02 OS90522EN15GLA0 Data Storages
No ratings yet
02 OS90522EN15GLA0 Data Storages
84 pages
Concept Paper
No ratings yet
Concept Paper
17 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
MMW Lesson 1 (1st Year - MMLS)
No ratings yet
MMW Lesson 1 (1st Year - MMLS)
63 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Even Students
No ratings yet
Even Students
36 pages
21 Project Report and Viva Voce - BBA
No ratings yet
21 Project Report and Viva Voce - BBA
10 pages
Class12 IP Practical File With Outputs
No ratings yet
Class12 IP Practical File With Outputs
8 pages
Ism ch-4
No ratings yet
Ism ch-4
25 pages
Class12 IP Practical File
No ratings yet
Class12 IP Practical File
6 pages
4 (1)
No ratings yet
4 (1)
16 pages
B Plus Tree
No ratings yet
B Plus Tree
36 pages
A Level Physics Coursework Edexcel
100% (2)
A Level Physics Coursework Edexcel
5 pages
Linux Network
No ratings yet
Linux Network
318 pages
Accounting Seminar Paper Lecture Notes
100% (1)
Accounting Seminar Paper Lecture Notes
10 pages
Unit 2
No ratings yet
Unit 2
21 pages
12 IP Practial Programs 2025-26
No ratings yet
12 IP Practial Programs 2025-26
10 pages
Cache Memory, Virtual Memory and Auxiliary Memory Notes
No ratings yet
Cache Memory, Virtual Memory and Auxiliary Memory Notes
42 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Hibernate Class Notes
No ratings yet
Hibernate Class Notes
97 pages
Assignment On: Computer Architecture and Organization
No ratings yet
Assignment On: Computer Architecture and Organization
6 pages
Lab 4
No ratings yet
Lab 4
18 pages
Data Visualization
No ratings yet
Data Visualization
19 pages
Ip Project
No ratings yet
Ip Project
27 pages
Rajalakshmi Engineering College
No ratings yet
Rajalakshmi Engineering College
3 pages
Bda U-5
No ratings yet
Bda U-5
30 pages
Caesar II Error
No ratings yet
Caesar II Error
2 pages
Niversity: Abdul Majid Niazai
No ratings yet
Niversity: Abdul Majid Niazai
8 pages
Sample - Global Embroidery Market
No ratings yet
Sample - Global Embroidery Market
17 pages
Qualitative Research Design
No ratings yet
Qualitative Research Design
12 pages
Fundational Concepts of The Ais
No ratings yet
Fundational Concepts of The Ais
13 pages
FCE Listening Part 1 - Free Practice Test
100% (1)
FCE Listening Part 1 - Free Practice Test
1,019 pages
Week 5 ER To Relation Mapping - 1
No ratings yet
Week 5 ER To Relation Mapping - 1
14 pages
4 - 11-29-2023 - 17-16-44 - Master of Science (M.SC.) 1st Semester (Full-Re-Improvement) December, 2023
No ratings yet
4 - 11-29-2023 - 17-16-44 - Master of Science (M.SC.) 1st Semester (Full-Re-Improvement) December, 2023
4 pages
Seminar Proposal Sharley G
No ratings yet
Seminar Proposal Sharley G
16 pages
EXP 5 DE Lab
No ratings yet
EXP 5 DE Lab
5 pages
WINSEM2024-25 BCSE205L TH VL2024250501363 2025-02-28 Reference-Material-II
No ratings yet
WINSEM2024-25 BCSE205L TH VL2024250501363 2025-02-28 Reference-Material-II
9 pages
Python Pandas Project
No ratings yet
Python Pandas Project
17 pages
SAP Module 4
No ratings yet
SAP Module 4
3 pages

NM

Uploaded by

NM

Uploaded by

1.

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset into a Pandas DataFrame

# Replace 'your_file.csv' with the path to your dataset

df['Average_Score'] = df.iloc[:, 1:].mean(axis=1) # Assuming the first column is student names

# Categorize students into performance levels

if avg_score >= 80:

elif avg_score >= 50:

subject_avg_scores = df.iloc[:, 1:-2].mean()

# Determine the number of students in each performance category

# Visualization: Bar chart for average score per subject

subject_avg_scores.plot(kind='bar', title='Average Score Per Subject', ylabel='Average Score',

# Visualization: Pie chart for performance category distribution

category_counts.plot(kind='pie', autopct='%1.1f%%', title='Performance Category Distribution',

1. **Data Loading and Cleaning:**

- Loads a CSV file into a Pandas DataFrame.

- Handles missing values by replacing them with the column mean.

- Calculates the average score for each student.

- Counts the number of students in each performance category.

### Instructions to Run

1. Upload your dataset (e.g., `your_file.csv`) to Google Colab.

2. Replace `'your_file.csv'` in the code with the actual file path.

3. Run the code cells step-by-step in Google Colab.

### Sample Output (Assuming Example Dataset)

- Subject with the highest average score: `Science`

- Performance category counts:

Name: Performance_Category, dtype: int64

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'covid_data.csv' with the path to your dataset

# Handle missing values and duplicates

# Add a new column for daily new cases

# Calculate total cases and deaths globally

# Analyze daily new cases trend (last 30 days)

last_30_days = df[df['Date'] >= (df['Date'].max() - pd.Timedelta(days=30))]

# Visualization: Line chart for total cases trend

df.groupby('Date')['Total_Cases'].sum().plot(kind='line', title='Trend of Total COVID-19 Cases Over

# Bar chart for top 5 countries with the highest cases

top_5_countries.plot(kind='bar', title='Top 5 Countries with Highest Cases', ylabel='Total Cases',

# Pie chart for proportion of cases in continents

continent_cases.plot(kind='pie', autopct='%1.1f%%', title='Proportion of Cases by Continent', ylabel='')

# Print key results

print(f"Total cases globally: {total_cases}")

print(f"Country with highest cases: {highest_cases_country}")

print(f"Country with highest deaths: {highest_deaths_country}")

1. **Data Loading and Cleaning:**

- Calculates daily new cases (`New_Cases`).

- Computes total global cases and deaths.

- Identifies the countries with the highest cases and deaths.

- Filters data for the last 30 days to analyze trends.

- **Line Chart:** Shows the trend of total cases over time.

- **Pie Chart:** Shows the proportion of cases by continent.

1. Upload your dataset (e.g., `covid_data.csv`) to Google Colab.

2. Replace `'covid_data.csv'` in the code with the file name.

### Sample Output (Assuming Example Dataset)

Total cases globally: 500,000,000

Total deaths globally: 5,000,000

Country with highest cases: USA

Country with highest deaths: Brazil

1. Line chart showing the rising trend of total cases globally.

3. Pie chart dividing the proportion of cases by continent.

### Python Code

# Import necessary libraries

import matplotlib.pyplot as plt

# Load the dataset

# Replace 'sales_data.csv' with the path to your dataset

# Handle missing values and duplicates

# Add a new column for total revenue

df['Total_Revenue'] = df['Quantity'] * df['Price']

# Identify the top 3 products generating the highest revenue

top_products = df.groupby('Product').agg(Total_Revenue=('Total_Revenue', 'sum')).nlargest(3,

# Determine the month with the highest total sales

1. Data Loading and Cleaning:

1. Data Loading and Cleaning:

- Line Chart: Shows the trend of total cases over time.

- Pie Chart: Shows the proportion of cases by continent.

1. Data Loading and Cleaning:

- Bar Chart: Displays total revenue by product category.

- Line Graph: Shows monthly sales trends.

### Sample Text Output

#### Month with Highest Total Visitors

#### Average Visitors Per Year

#### Proportion of Domestic vs International Visitors by Year

### Sample Visualizations

1. Bar Chart: Total Visitors Per Month

2. Pie Chart: Proportion of Domestic vs International Visitors (2023)