0% found this document useful (0 votes)
10 views

Practical programs 1

Uploaded by

sahaana.23.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Practical programs 1

Uploaded by

sahaana.23.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

PROJECT- I

Meteor Showers Visibility Prediction Project


Step 1: Introduction

In this project, we will analyze meteor shower data to predict visibility from different cities.
By using historical data, we will determine the best times and locations for observing meteor
showers. The analysis will consider comet paths, local conditions, and historical visibility.

Step 2: Data Collection

We will collect and prepare datasets that include information about meteor showers, their
origins from comets, historical visibility data, and city-specific conditions (like light pollution
and weather).

Step 3: Data Preprocessing

We will clean the data to handle any missing or inconsistent values. This will involve:

• Checking for missing values in our datasets.


• Imputing or removing missing values where necessary.
• Ensuring that the data types are appropriate for analysis.

Step 4: Data Analysis

We will analyze the cleaned data to find patterns in meteor shower visibility based on
historical data. This will include:

• Identifying the best times for meteor showers.


• Analyzing how local conditions (like moon phase and weather) affect visibility.

Step 5: Prediction Model

Using machine learning techniques, we will develop a predictive model to forecast meteor
shower visibility based on the gathered features. This model can help users determine the
likelihood of visibility for upcoming meteor showers from specific locations.

Step 6: Visualization

We will create visualizations to present our findings effectively. This will include:

• Time series plots showing historical meteor shower visibility.


• Heat maps indicating visibility from various cities during different meteor showers.

Step 7: Conclusion

Finally, we will summarize our findings, discussing the effectiveness of our predictive model
and providing recommendations for optimal meteor shower viewing times and locations.

Page 12 of 17
Program:
# Load necessary libraries
library(dplyr)
library(ggplot2)
library(lubridate)
library(caret)

# Sample Data Creation (replace this with actual CSV loading)


# Create a sample dataset for meteor showers
meteor_data <- data.frame(
Shower = c('Perseid', 'Geminid', 'Quadrantid', 'Lyrid', 'Orionid'),
PeakDate = as.Date(c('2023-08-12', '2023-12-13', '2023-01-04', '2023-04-23', '2023-10-21')),
Visibility = c(8, 10, 5, 7, 6) # scale from 1 to 10
)

# Sample dataset for cities and conditions


city_data <- data.frame(
City = c('New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'),
LightPollution = c(8, 4, 6, 5, 3), # scale from 1 to 10
WeatherCondition = c('Clear', 'Clear', 'Cloudy', 'Clear', 'Clear')
)

# Step 3: Data Preprocessing


# Check for missing values
cat("Missing values in meteor data:\n")
print(colSums(is.na(meteor_data)))

cat("\nMissing values in city data:\n")


print(colSums(is.na(city_data)))

# Impute missing values or remove rows if necessary (not shown in this sample)

# Step 4: Data Analysis


# Merge datasets to analyze visibility conditions
combined_data <- merge(meteor_data, city_data, by = NULL)

# Analyze visibility based on light pollution


ggplot(combined_data, aes(x = LightPollution, y = Visibility, color = Shower)) +
geom_point(size = 3) +
geom_smooth(method = 'lm', se = FALSE) +
ggtitle('Meteor Shower Visibility vs Light Pollution') +
xlab('Light Pollution Level') +
ylab('Visibility Rating') +
theme_minimal()

# Step 5: Prediction Model


# Prepare data for model
model_data <- combined_data %>%
select(LightPollution, Visibility)

# Train a simple linear regression model


set.seed(123)
train_index <- createDataPartition(model_data$Visibility, p = 0.8, list = FALSE)
train_data <- model_data[train_index, ]

Page 13 of 17
test_data <- model_data[-train_index, ]

model <- train(Visibility ~ LightPollution, data = train_data, method = 'lm')

# Predictions
predictions <- predict(model, test_data)

# Step 6: Visualization of Predictions


test_data$PredictedVisibility <- predictions
ggplot(test_data, aes(x = LightPollution)) +
geom_point(aes(y = Visibility), color = 'blue') +
geom_point(aes(y = PredictedVisibility), color = 'red') +
ggtitle('Actual vs Predicted Meteor Shower Visibility') +
xlab('Light Pollution Level') +
ylab('Visibility Rating') +
theme_minimal()

# Step 7: Conclusion
cat("\nSummary of findings and recommendations for meteor shower viewing.\n")

Sample output:

Page 14 of 17
PROJECT- II
Analyzing Basketball Stats: A Data Science Project Inspired by Space Jam

Step 1: Introduction

In this project, we aim to analyze basketball statistics to understand player performance and
identify patterns. We will work with sample datasets representing human players and Tune
Squad characters. The analysis will include data cleaning, visualization, and the identification
of scoring trends.

Step 2: Data Collection

We will create sample datasets directly within the R code. These datasets will include player
names, points scored, assists, and rebounds. The data will include some missing values to
simulate real-world data challenges.

Step 3: Data Preprocessing

We will check for missing values in our datasets. For any missing values found, we will use
the mean of the respective columns to fill in these gaps. This step is crucial for ensuring that
our analysis can be performed without the issues caused by missing data.

Step 4: Data Analysis

Once our data is clean, we will combine the datasets for human players and Tune Squad
characters into a single dataset. We will then create visualizations to compare the points
scored by each type of player. This will help us identify performance trends and differences
between the two groups.

Step 5: Identifying Bimodal Distributions

We will create a histogram to visualize the distribution of points scored by all players. This
visualization will help us check for bimodal distributions, indicating distinct scoring patterns
among different player types.

Step 6: Conclusion

After analyzing the data, we will summarize our findings regarding player performance. We
will also discuss the effectiveness of our data cleaning techniques and suggest potential
areas for further analysis or improvement.

Program:
# Load necessary libraries
library(dplyr)
library(ggplot2)

# Sample Data Creation


human_players <- data.frame(
Player = c('Player A', 'Player B', 'Player C', 'Player D', 'Player E'),
Points = c(10, 22, 15, NA, 35),

Page 15 of 17
Assists = c(5, 7, 6, 4, NA),
Rebounds = c(3, 2, NA, 5, 4)
)

tune_squad_players <- data.frame(


Player = c('Bugs Bunny', 'Daffy Duck', 'Porky Pig', 'Lola Bunny', 'Tweety'),
Points = c(25, NA, 30, 22, 10),
Assists = c(3, 2, NA, 5, 1),
Rebounds = c(2, 4, 3, 5, NA)
)

# Step 3: Data Preprocessing


cat("Missing values in human players before imputation:\n")
print(colSums(is.na(human_players)))

cat("\nMissing values in Tune Squad players before imputation:\n")


print(colSums(is.na(tune_squad_players)))

# Impute missing values using mean for numerical columns


human_players <- human_players %>%
mutate(across(where(is.numeric), ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)))

tune_squad_players <- tune_squad_players %>%


mutate(across(where(is.numeric), ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)))

cat("\nMissing values in human players after imputation:\n")


print(colSums(is.na(human_players)))

cat("\nMissing values in Tune Squad players after imputation:\n")


print(colSums(is.na(tune_squad_players)))

# Step 4: Data Analysis


human_players$Type <- 'Human'
tune_squad_players$Type <- 'Tune Squad'
combined_data <- rbind(human_players, tune_squad_players)

# Visualize points scored by each player


ggplot(combined_data, aes(x = Type, y = Points)) +
geom_boxplot() +
ggtitle('Points Scored by Player Type') +
theme_minimal()

# Step 5: Identifying Bimodal Distributions


# Plotting histogram to check for bimodality in points
ggplot(combined_data, aes(x = Points)) +
geom_histogram(binwidth = 5, fill = 'blue', alpha = 0.7, boundary = 0) +
geom_density(color = 'red') +
ggtitle('Distribution of Points Scored') +
xlab('Points') +
ylab('Frequency') +
theme_minimal()

Page 16 of 17
Sample Output:

Page 17 of 17

You might also like