Practical programs 1
Practical programs 1
In this project, we will analyze meteor shower data to predict visibility from different cities.
By using historical data, we will determine the best times and locations for observing meteor
showers. The analysis will consider comet paths, local conditions, and historical visibility.
We will collect and prepare datasets that include information about meteor showers, their
origins from comets, historical visibility data, and city-specific conditions (like light pollution
and weather).
We will clean the data to handle any missing or inconsistent values. This will involve:
We will analyze the cleaned data to find patterns in meteor shower visibility based on
historical data. This will include:
Using machine learning techniques, we will develop a predictive model to forecast meteor
shower visibility based on the gathered features. This model can help users determine the
likelihood of visibility for upcoming meteor showers from specific locations.
Step 6: Visualization
We will create visualizations to present our findings effectively. This will include:
Step 7: Conclusion
Finally, we will summarize our findings, discussing the effectiveness of our predictive model
and providing recommendations for optimal meteor shower viewing times and locations.
Page 12 of 17
Program:
# Load necessary libraries
library(dplyr)
library(ggplot2)
library(lubridate)
library(caret)
# Impute missing values or remove rows if necessary (not shown in this sample)
Page 13 of 17
test_data <- model_data[-train_index, ]
# Predictions
predictions <- predict(model, test_data)
# Step 7: Conclusion
cat("\nSummary of findings and recommendations for meteor shower viewing.\n")
Sample output:
Page 14 of 17
PROJECT- II
Analyzing Basketball Stats: A Data Science Project Inspired by Space Jam
Step 1: Introduction
In this project, we aim to analyze basketball statistics to understand player performance and
identify patterns. We will work with sample datasets representing human players and Tune
Squad characters. The analysis will include data cleaning, visualization, and the identification
of scoring trends.
We will create sample datasets directly within the R code. These datasets will include player
names, points scored, assists, and rebounds. The data will include some missing values to
simulate real-world data challenges.
We will check for missing values in our datasets. For any missing values found, we will use
the mean of the respective columns to fill in these gaps. This step is crucial for ensuring that
our analysis can be performed without the issues caused by missing data.
Once our data is clean, we will combine the datasets for human players and Tune Squad
characters into a single dataset. We will then create visualizations to compare the points
scored by each type of player. This will help us identify performance trends and differences
between the two groups.
We will create a histogram to visualize the distribution of points scored by all players. This
visualization will help us check for bimodal distributions, indicating distinct scoring patterns
among different player types.
Step 6: Conclusion
After analyzing the data, we will summarize our findings regarding player performance. We
will also discuss the effectiveness of our data cleaning techniques and suggest potential
areas for further analysis or improvement.
Program:
# Load necessary libraries
library(dplyr)
library(ggplot2)
Page 15 of 17
Assists = c(5, 7, 6, 4, NA),
Rebounds = c(3, 2, NA, 5, 4)
)
Page 16 of 17
Sample Output:
Page 17 of 17