Experiment 1: Working with Objects in Memory
Aim:
To understand the creation, manipulation, and management of objects in R's memory.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Create Basic Objects:
         ○ Use the assignment operator = or = to create variables.
         ○ Example: x = 10 or y = "Hello"
   3. Manipulate Objects:
         ○ Perform operations on numeric objects.
         ○ Example: z = x + 20
   4. Check Object Class and Type:
         ○ Use functions like class() and typeof() to verify the type of objects.
         ○ Example: class(x) or typeof(y)
   5. Inspect Objects in Memory:
         ○ Use ls() to list all objects in the current environment.
         ○ Example: ls()
   6. Remove Objects:
         ○ Use rm() to delete objects from memory.
         ○ Example: rm(x)
   7. Perform Simple Operations:
         ○ Work with sequences, vectors, and logical conditions.
         ○ Example: vec = c(1, 2, 3, 4)
   8. End: Display the final state of objects in the memory.
R Code:
# Create objects
x = 10
y = "Hello"
z = x + 20
# Print objects
print(x)
print(y)
print(z)
# Check object types
cat("Class of x:", class(x), "\n")
cat("Type of y:", typeof(y), "\n")
# List objects in memory
cat("Objects in memory:", ls(), "\n")
# Remove an object
rm(x)
# Confirm removal
cat("Objects after removing 'x':", ls(), "\n")
# Work with a vector
vec = c(1, 2, 3, 4, 5)
print(vec)
# Perform an operation on the vector
vec_squared = vec^2
print(vec_squared)
Output Example:
[1] 10
[1] "Hello"
[1] 30
Class of x: numeric
Type of y: character
Objects in memory: y z
Objects after removing 'x': y z vec
[1] 1 2 3 4 5
[1]    1   4   9 16 25
Experiment 2: Demonstrate Data Frame
Aim:
To create and manipulate a Data Frame in R, showcasing its structure, operations, and
applications.
Algorithm:
    1. Start the R Environment: Open RStudio or the R Console.
    2. Create a Data Frame:
          ○ Use the data.frame() function.
          ○ Example: df = data.frame(Column1, Column2, Column3)
    3. Inspect the Data Frame:
          ○ View the structure using str().
          ○ Check dimensions using dim().
    4. Access Data Frame Elements:
          ○ Use indexing: df[row, column].
          ○ Access columns using the $ operator: df$ColumnName.
    5. Perform Operations:
          ○ Add, modify, or delete rows and columns.
          ○ Example: df$NewColumn = some_operation.
    6. Summary and Viewing:
          ○ Display the first few rows with head().
          ○ Summarize data using summary().
    7. End: Save or display the modified Data Frame.
R Code:
# Create a data frame
students = data.frame(
    Roll_No = c(101, 102, 103, 104),
    Name = c("Alice", "Bob", "Charlie", "Diana"),
    Marks = c(85, 90, 78, 92),
    Grade = c("A", "A+", "B", "A+")
)
# Display the data frame
print("Original Data Frame:")
print(students)
# View structure and dimensions
cat("\nStructure of the Data Frame:\n")
str(students)
cat("\nDimensions of the Data Frame: ")
print(dim(students))
# Access specific elements
cat("\nMarks of the second student:")
print(students[2, "Marks"])
cat("\nNames of all students:")
print(students$Name)
# Add a new column
students$Attendance = c(90, 95, 85, 88)
cat("\nData Frame after adding Attendance column:\n")
print(students)
# Modify a column
students$Marks = students$Marks + 5
cat("\nData Frame after increasing marks by 5:\n")
print(students)
Output Example:
Original Data Frame:
    Roll_No      Name Marks Grade
1       101     Alice     85       A
2       102       Bob     90      A+
3       103 Charlie       78       B
4       104     Diana     92      A+
Structure of the Data Frame:
'data.frame':         4 obs. of   4 variables:
 $ Roll_No: num       101 102 103 104
 $ Name       : chr   "Alice" "Bob" "Charlie" "Diana"
 $ Marks      : num   85 90 78 92
 $ Grade      : chr   "A" "A+" "B" "A+"
Dimensions of the Data Frame:
[1] 4 4
Marks of the second student:
[1] 90
Names of all students:
[1] "Alice"        "Bob"         "Charlie" "Diana"
Data Frame after adding Attendance column:
    Roll_No      Name Marks Grade Attendance
1       101     Alice       85        A             90
2       102        Bob      90       A+             95
3       103 Charlie         78        B             85
4       104     Diana       92       A+             88
Data Frame after increasing marks by 5:
    Roll_No      Name Marks Grade Attendance
1       101     Alice       90        A             90
2       102        Bob      95       A+             95
3       103 Charlie         83        B             85
4       104     Diana       97       A+             88
This program demonstrates the creation and manipulation of a data frame in R.
Experiment 3: Perform Matrix Operations
Aim:
To create and perform various operations on matrices in R, such as addition, multiplication,
transposition, and inversion.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Create Matrices:
         ○ Use the matrix() function.
         ○ Example: matrix(data, nrow, ncol)
   3. Perform Basic Matrix Operations:
         ○ Addition, subtraction, and multiplication.
         ○ Use operators like +, -, *.
   4. Transpose the Matrix:
         ○ Use the t() function.
   5. Find the Determinant:
         ○ Use the det() function.
   6. Find the Inverse of a Matrix:
         ○ Use the solve() function (for square matrices).
   7. End: Display the final results of the operations.
R Code:
# Create two matrices
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
matrix2 <- matrix(c(6, 5, 4, 3, 2, 1), nrow = 2, ncol = 3)
# Display the matrices
cat("Matrix 1:\n")
print(matrix1)
cat("\nMatrix 2:\n")
print(matrix2)
# Matrix addition
matrix_sum <- matrix1 + matrix2
cat("\nMatrix Addition (Matrix 1 + Matrix 2):\n")
print(matrix_sum)
# Transpose of a matrix
transpose_matrix <- t(matrix1)
cat("\nTranspose of Matrix 1:\n")
print(transpose_matrix)
# Multiplication of matrices (requires compatible dimensions)
matrix3 <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
matrix4 <- matrix(c(5, 6, 7, 8), nrow = 2, ncol = 2)
matrix_product <- matrix3 %*% matrix4
cat("\nMatrix Multiplication (Matrix 3 x Matrix 4):\n")
print(matrix_product)
# Determinant of a square matrix
det_matrix <- det(matrix3)
cat("\nDeterminant of Matrix 3:\n")
print(det_matrix)
# Inverse of a square matrix (if determinant is not zero)
if (det_matrix != 0) {
    inverse_matrix <- solve(matrix3)
    cat("\nInverse of Matrix 3:\n")
    print(inverse_matrix)
} else {
    cat("\nMatrix 3 is not invertible.\n")
}
Output Example:
Matrix 1:
       [,1] [,2] [,3]
[1,]      1    3    5
[2,]      2    4    6
Matrix 2:
       [,1] [,2] [,3]
[1,]      6    4    2
[2,]      5    3    1
Matrix Addition (Matrix 1 + Matrix 2):
       [,1] [,2] [,3]
[1,]      7    7    7
[2,]      7    7    7
Transpose of Matrix 1:
       [,1] [,2]
[1,]      1    2
[2,]      3      4
[3,]      5      6
Matrix Multiplication (Matrix 3 x Matrix 4):
       [,1] [,2]
[1,]     19     22
[2,]     43     50
Determinant of Matrix 3:
[1] -2
Inverse of Matrix 3:
       [,1] [,2]
[1,]     -2    1.5
[2,]      1 -0.5
This program demonstrates how to create matrices, perform basic arithmetic operations,
transpose, find determinants, and calculate inverses in R.
Experiment 4: Working with Various Built-in Functions in R
Aim:
To explore and demonstrate the use of various built-in functions in R for mathematical,
statistical, and data manipulation tasks.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
  2. Use Mathematical Functions:
        ○ Demonstrate functions like sqrt(), log(), exp(), and abs().
  3. Use Statistical Functions:
        ○ Demonstrate functions like mean(), median(), sd(), var(), and
           summary().
  4. Use Character Functions:
        ○ Demonstrate functions like toupper(), tolower(), substr(), and
           paste().
  5. Use Sequence and Repetition Functions:
        ○ Use seq() and rep() to generate sequences and repeated values.
  6. Perform Aggregation:
        ○ Use aggregate() to group and summarize data.
  7. End: Display the results of all operations.
R Code:
# 1. Mathematical Functions
x <- 16
y <- -4
cat("Square root of", x, ":", sqrt(x), "\n")
cat("Absolute value of", y, ":", abs(y), "\n")
cat("Natural logarithm of", x, ":", log(x), "\n")
cat("Exponential of", y, ":", exp(y), "\n\n")
# 2. Statistical Functions
data <- c(10, 20, 30, 40, 50)
cat("Mean of data:", mean(data), "\n")
cat("Median of data:", median(data), "\n")
cat("Standard deviation of data:", sd(data), "\n")
cat("Variance of data:", var(data), "\n")
cat("Summary of data:\n")
print(summary(data))
cat("\n")
# 3. Character Functions
text <- "Hello R"
cat("Uppercase:", toupper(text), "\n")
cat("Lowercase:", tolower(text), "\n")
cat("Substring (1 to 5):", substr(text, 1, 5), "\n")
cat("Concatenate strings:", paste("Learning", "R", sep = " "), "\n\
n")
# 4. Sequence and Repetition
sequence <- seq(1, 10, by = 2)
cat("Generated sequence:", sequence, "\n")
repeated <- rep(5, times = 4)
cat("Repeated values:", repeated, "\n\n")
# 5. Aggregation
df <- data.frame(
    Category = c("A", "A", "B", "B", "C"),
    Value = c(10, 15, 10, 20, 30)
)
aggregated <- aggregate(Value ~ Category, data = df, sum)
cat("Aggregated values by category:\n")
print(aggregated)
Output Example:
Square root of 16 : 4
Absolute value of -4 : 4
Natural logarithm of 16 : 2.772589
Exponential of -4 : 0.018316
Mean of data: 30
Median of data: 30
Standard deviation of data: 15.81139
Variance of data: 250
Summary of data:
     Min. 1st Qu.       Median   Mean 3rd Qu.   Max.
     10.0       20.0      30.0   30.0    40.0   50.0
Uppercase: HELLO R
Lowercase: hello r
Substring (1 to 5): Hello
Concatenate strings: Learning R
Generated sequence: 1 3 5 7 9
Repeated values: 5 5 5 5
Aggregated values by category:
    Category Value
1           A      25
2            B     30
3            C     30
This program demonstrates the use of various built-in functions in R for handling
mathematical operations, statistical analysis, character string manipulations, sequence
generation, and data aggregation.
Experiment 5: Import and Export Files in R
Aim:
To demonstrate the import and export of data files in R, such as CSV, Excel, and text files,
and perform basic operations on the imported data.
Algorithm:
    1. Start the R Environment: Open RStudio or the R Console.
    2. Import a CSV File:
          ○ Use the read.csv() function to load data.
          ○ Example: data <- read.csv("file.csv").
    3. View and Manipulate Data:
          ○ Use functions like head(), str(), and summary() to inspect the data.
    4. Export a CSV File:
          ○ Use the write.csv() function to save the modified data to a new file.
    5. Import an Excel File (Optional):
          ○ Use the readxl package and the read_excel() function.
    6. Export Data to an Excel File:
          ○ Use the writexl package and the write_xlsx() function.
    7. Import and Export Text Files:
          ○ Use read.table() and write.table() functions.
    8. End: Verify the imported and exported files.
R Code:
# 1. Importing a CSV File
cat("Importing CSV file...\n")
data <- read.csv("sample.csv", header = TRUE)
cat("First few rows of the data:\n")
print(head(data))
# 2. Display Summary and Structure
cat("\nSummary of the imported data:\n")
print(summary(data))
cat("\nStructure of the imported data:\n")
str(data)
# 3. Modify Data
cat("\nModifying data: Adding a new column...\n")
data$NewColumn <- data$ExistingColumn * 2    # Replace ExistingColumn
with an actual column name
# 4. Exporting Data to a New CSV File
cat("\nExporting modified data to a new CSV file...\n")
write.csv(data, "modified_data.csv", row.names = FALSE)
cat("Data exported to 'modified_data.csv'\n")
# 5. Importing and Exporting Excel Files (requires `readxl` and
`writexl` packages)
if (!requireNamespace("readxl", quietly = TRUE))
install.packages("readxl")
if (!requireNamespace("writexl", quietly = TRUE))
install.packages("writexl")
library(readxl)
library(writexl)
cat("\nImporting Excel file...\n")
excel_data <- read_excel("sample.xlsx")
cat("First few rows of the Excel data:\n")
print(head(excel_data))
cat("\nExporting data to an Excel file...\n")
write_xlsx(data, "exported_data.xlsx")
cat("Data exported to 'exported_data.xlsx'\n")
# 6. Importing and Exporting Text Files
cat("\nImporting Text file...\n")
text_data <- read.table("sample.txt", header = TRUE, sep = "\t")
cat("First few rows of the text data:\n")
print(head(text_data))
cat("\nExporting data to a text file...\n")
write.table(data, "exported_data.txt", row.names = FALSE, sep = "\
t")
cat("Data exported to 'exported_data.txt'\n")
Output Example:
Importing CSV file...
First few rows of the data:
    Column1 Column2 Column3
1        1         A        10
2        2         B        20
3        3         C        30
Summary of the imported data:
    Column1       Column2     Column3
 Min.    :1   A:1           Min.     :10
 1st Qu.:2    B:1           1st Qu.:20
 Median :3    C:1           Median :30
 Mean    :3                 Mean     :30
 3rd Qu.:3                  3rd Qu.:30
 Max.    :3                 Max.     :30
Structure of the imported data:
'data.frame':       3 obs. of      3 variables:
 $ Column1: int     1 2 3
 $ Column2: Factor w/ 3 levels "A","B","C": 1 2 3
 $ Column3: num     10 20 30
Modifying data: Adding a new column...
Exporting modified data to a new CSV file...
Data exported to 'modified_data.csv'
Importing Excel file...
First few rows of the Excel data:
    Column1 Column2 Column3
1        1         A        10
2        2         B        20
Exporting data to an Excel file...
Data exported to 'exported_data.xlsx'
Importing Text file...
First few rows of the text data:
    Column1 Column2 Column3
1         1          X        5
2         2          Y        10
Exporting data to a text file...
Data exported to 'exported_data.txt'
This program demonstrates importing and exporting CSV, Excel, and text files in R, with
basic operations performed on the imported data.
Experiment 6: Implement Statistical Methods
Aim:
To implement statistical methods such as mean, median, variance, standard deviation,
correlation, and regression analysis using R.
Algorithm:
    1. Start the R Environment: Open RStudio or the R Console.
    2. Create or Import Data:
          ○ Define a dataset manually or import it using read.csv().
    3. Calculate Basic Statistics:
          ○ Use functions like mean(), median(), var(), and sd().
    4. Perform Correlation Analysis:
          ○ Use the cor() function to calculate the correlation between variables.
    5. Perform Linear Regression:
          ○ Use the lm() function to fit a linear regression model.
    6. Visualize the Results:
          ○ Use plot() to create a scatter plot and abline() to add a regression line.
    7. End: Print the results and display the visualization.
R Code:
# Step 1: Create a dataset
data <- data.frame(
    x = c(5, 10, 15, 20, 25),
    y = c(12, 20, 28, 36, 44)
)
cat("Dataset:\n")
print(data)
# Step 2: Calculate Basic Statistics
mean_x <- mean(data$x)
median_x <- median(data$x)
variance_x <- var(data$x)
sd_x <- sd(data$x)
cat("\nBasic Statistics for x:\n")
cat("Mean:", mean_x, "\n")
cat("Median:", median_x, "\n")
cat("Variance:", variance_x, "\n")
cat("Standard Deviation:", sd_x, "\n")
# Step 3: Correlation Analysis
correlation <- cor(data$x, data$y)
cat("\nCorrelation between x and y:", correlation, "\n")
# Step 4: Perform Linear Regression
cat("\nPerforming Linear Regression:\n")
model <- lm(y ~ x, data = data)
cat("Regression Summary:\n")
print(summary(model))
# Step 5: Visualize Data and Regression Line
plot(data$x, data$y, main = "Scatter Plot with Regression Line",
       xlab = "X", ylab = "Y", col = "blue", pch = 19)
abline(model, col = "red", lwd = 2)
Expected Output:
Dataset:
    x   y
1   5 12
2 10 20
3 15 28
4 20 36
5 25 44
Basic Statistics for x:
Mean: 15
Median: 15
Variance: 62.5
Standard Deviation: 7.905694
Correlation between x and y: 1
Performing Linear Regression:
Regression Summary:
Call:
lm(formula = y ~ x, data = data)
Residuals:
         1          2       3          4        5
-1.421e-14 -7.105e-15      0.000e+00   7.105e-15    1.421e-14
Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)        4.000   5.568e-15 7.18e+14      <2e-16 ***
x                  1.600   3.712e-16 4.31e+15      <2e-16 ***
---
Signif. codes:      0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.693e-15 on 3 degrees of freedom
Multiple R-squared:         1, Adjusted R-squared:         1
F-statistic: 1.86e+31 on 1 and 3 DF,       p-value: < 2.2e-16
Visualization:
   ●   A scatter plot is displayed with data points (x vs. y) in blue and the regression line in
       red.
This program demonstrates how to calculate statistical measures, evaluate correlations, and
perform regression analysis, along with visualizing results in R.
Experiment 7: Working with Machine Learning Algorithms
Aim:
To implement a basic machine learning algorithm, such as linear regression or k-nearest
neighbors (KNN), using R.
Algorithm for K-Nearest Neighbors (KNN):
   1. Start the R Environment: Open RStudio or the R Console.
   2. Load Required Libraries: Install and load the class package for KNN.
   3. Prepare the Dataset:
         ○ Use a built-in dataset such as iris, or load your own dataset.
         ○ Split the dataset into training and testing sets.
   4. Normalize the Data:
         ○ Scale the features to ensure they are on a comparable scale.
   5. Implement KNN:
         ○ Use the knn() function to classify test data based on training data.
   6. Evaluate the Model:
         ○ Compare predictions with actual labels to calculate accuracy.
   7. End: Display the results and accuracy.
R Code:
# Step 1: Load Required Libraries
if (!requireNamespace("class", quietly = TRUE))
install.packages("class")
library(class)
# Step 2: Load and Prepare Dataset
data(iris)   # Load the iris dataset
cat("First few rows of the iris dataset:\n")
print(head(iris))
# Step 3: Split the Data into Training and Testing Sets
set.seed(123)    # For reproducibility
indices <- sample(1:nrow(iris), size = 0.7 * nrow(iris))      # 70%
training
train_data <- iris[indices, ]
test_data <- iris[-indices, ]
train_features <- train_data[, 1:4]      # Sepal and Petal dimensions
train_labels <- train_data[, 5]          # Species column
test_features <- test_data[, 1:4]
test_labels <- test_data[, 5]
# Step 4: Normalize the Features (Optional)
normalize <- function(x) {
    return((x - min(x)) / (max(x) - min(x)))
}
train_features <- as.data.frame(lapply(train_features, normalize))
test_features <- as.data.frame(lapply(test_features, normalize))
# Step 5: Implement KNN
k <- 3    # Number of neighbors
predicted_labels <- knn(train_features, test_features, train_labels,
k)
# Step 6: Evaluate the Model
accuracy <- sum(predicted_labels == test_labels) /
length(test_labels) * 100
cat("\nAccuracy of the KNN model:", accuracy, "%\n")
# Confusion Matrix
cat("\nConfusion Matrix:\n")
print(table(Predicted = predicted_labels, Actual = test_labels))
Expected Output:
First few rows of the iris dataset:
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1             5.1             3.5               1.4          0.2   setosa
2             4.9             3.0               1.4          0.2   setosa
3             4.7             3.2               1.3          0.2   setosa
4             4.6             3.1               1.5          0.2   setosa
5             5.0             3.6               1.4          0.2   setosa
6             5.4             3.9               1.7          0.4   setosa
Accuracy of the KNN model: 97.7777777777778 %
Confusion Matrix:
              Actual
Predicted       setosa versicolor virginica
    setosa            15              0             0
    versicolor          0            14             1
    virginica           0             0            15
Conclusion:
The KNN algorithm was successfully implemented, and the accuracy of the model was
calculated. This demonstrates the effectiveness of KNN for classification tasks.
Experiment 8: Implement Time Series Analysis
Aim:
To analyze and forecast a time series dataset using R.
Algorithm:
    1. Start the R Environment: Open RStudio or the R Console.
  2. Load Required Libraries: Install and load necessary libraries such as forecast
     and ggplot2.
  3. Load the Time Series Data:
        ○ Use built-in datasets like AirPassengers or import your own dataset.
        ○ Convert the dataset into a time series object using the ts() function if not
            already in time series format.
  4. Visualize the Data:
        ○ Use plot() to visualize the time series data.
  5. Decompose the Time Series:
        ○ Apply decomposition using the decompose() function to separate the trend,
            seasonality, and residuals.
  6. Apply Forecasting Method:
        ○ Use methods like ARIMA or exponential smoothing for forecasting.
  7. Evaluate the Forecast:
        ○ Compare the predicted values with the actual data.
  8. End: Display the plots and results.
R Code:
# Step 1: Load Required Libraries
if (!requireNamespace("forecast", quietly = TRUE))
install.packages("forecast")
if (!requireNamespace("ggplot2", quietly = TRUE))
install.packages("ggplot2")
library(forecast)
library(ggplot2)
# Step 2: Load the Time Series Data
data("AirPassengers")       # Built-in dataset
ts_data <- AirPassengers
# Step 3: Visualize the Time Series Data
cat("Time Series Data Summary:\n")
print(summary(ts_data))
plot(ts_data, main = "AirPassengers Data", xlab = "Year", ylab =
"Passengers", col = "blue")
# Step 4: Decompose the Time Series
decomposed <- decompose(ts_data)
plot(decomposed)
# Step 5: Apply ARIMA Model for Forecasting
model <- auto.arima(ts_data)
cat("\nARIMA Model Summary:\n")
print(summary(model))
# Forecast the next 12 months
forecasted <- forecast(model, h = 12)
cat("\nForecasted Values:\n")
print(forecasted)
# Step 6: Plot the Forecast
plot(forecasted, main = "AirPassengers Forecast", xlab = "Year",
ylab = "Passengers", col = "blue")
# Step 7: Evaluate the Model (Optional)
accuracy_metrics <- accuracy(forecasted)
cat("\nAccuracy Metrics:\n")
print(accuracy_metrics)
Expected Output:
Time Series Data Summary:
   Min. 1st Qu.       Median      Mean 3rd Qu.     Max.
   104       180        265       280     360      622
ARIMA Model Summary:
Series: ts_data
ARIMA(0,1,1)(0,1,1)[12]
Coefficients:
           ma1        sma1
       -0.401      -0.627
s.e.     0.088      0.076
sigma^2 estimated as 1378:         log likelihood=-508.33
AIC=1022.67        AICc=1022.91     BIC=1031.47
Forecasted Values:
           Point Forecast         Lo 80    Hi 80     Lo 95   Hi 95
Jan 1961            443.961    421.3156 466.6064 409.3448 478.5772
Feb 1961            444.450    421.9786 466.9214 409.9040 478.9960
...
Accuracy Metrics:
                        ME       RMSE          MAE         MPE        MAPE
Training set 0.1234 35.67891 28.2345 -0.0234                  3.67890
Visualizations:
   1. Original Time Series Plot:
         ○ Shows trends and seasonality in the dataset.
   2. Decomposition Plot:
         ○ Displays trend, seasonality, and residual components.
   3. Forecast Plot:
         ○ Presents the original data with forecasted values and confidence intervals.
Conclusion:
The time series analysis was successfully performed, including decomposition and
forecasting using ARIMA. The forecasted values provide insights into future trends.
Experiment 9: Demonstrate Data Mining Algorithms
Aim:
To demonstrate a basic data mining algorithm, such as association rule mining using the
Apriori algorithm in R.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Load Required Libraries: Install and load the arules library for association rule
      mining.
   3. Prepare the Dataset:
         ○ Use a built-in dataset like Groceries or create your own transactional data.
   4. Perform Data Preprocessing:
         ○ Convert the dataset into a transaction format if necessary.
   5. Apply the Apriori Algorithm:
         ○ Use the apriori() function to discover frequent itemsets and association
              rules.
   6. Analyze the Rules:
         ○ Sort and inspect the rules based on confidence, support, or lift.
   7. End: Display the mined rules and relevant metrics.
R Code:
# Step 1: Load Required Libraries
if (!requireNamespace("arules", quietly = TRUE))
install.packages("arules")
library(arules)
# Step 2: Load the Dataset
data("Groceries")    # Built-in transactional dataset
cat("Summary of Groceries Dataset:\n")
print(summary(Groceries))
# Step 3: Apply the Apriori Algorithm
rules <- apriori(
    Groceries,
    parameter = list(support = 0.01, confidence = 0.5)
)
# Step 4: Inspect the Rules
cat("\nSummary of Association Rules:\n")
print(summary(rules))
# Inspect the top 5 rules sorted by lift
cat("\nTop 5 Association Rules:\n")
inspect(head(sort(rules, by = "lift"), 5))
# Step 5: Visualize Rules (Optional)
if (!requireNamespace("arulesViz", quietly = TRUE))
install.packages("arulesViz")
library(arulesViz)
plot(rules, method = "graph", control = list(type = "items"))
Expected Output:
Summary of Groceries Dataset:
transactions as itemMatrix in sparse format with
 9835 rows (elements/itemsets/transactions) and
 169 columns (items) and a density of 0.02609146
Summary of Association Rules:
set of 420 rules
Top 5 Association Rules:
       lhs                        rhs                        support confidence
lift
[1] {whole milk}              => {other vegetables} 0.0745             0.5587045
3.122
[2] {root vegetables}         => {whole milk}               0.0486     0.4937238
2.250
[3] {yogurt}                  => {whole milk}               0.0560     0.4023948
1.834
...
Visualizations:
   1. Graph Plot:
         ○ Displays items and association rules in a network format.
   2. Scatter Plot (Optional):
         ○ Shows the relationship between support, confidence, and lift.
Conclusion:
The Apriori algorithm was successfully implemented to discover frequent itemsets and
generate association rules. This demonstrates the basic principles of data mining and
association rule learning in R.
Experiment 10: Implement Text Mining Algorithms
Aim:
To implement text mining using R by preprocessing textual data and extracting insights such
as frequent terms or word clouds.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Load Required Libraries: Install and load necessary libraries such as tm,
      wordcloud, and SnowballC.
   3. Load the Text Data:
         ○ Use a built-in dataset or read a text file containing the data.
   4. Preprocess the Data:
         ○ Convert the text to lowercase.
         ○ Remove stopwords, punctuation, and numbers.
         ○ Perform stemming to normalize words.
    5. Create a Document-Term Matrix:
          ○ Use the DocumentTermMatrix() function to create a term-document
              matrix.
    6. Analyze the Data:
          ○ Find the most frequent terms.
          ○ Visualize the terms using a word cloud.
    7. End: Display the insights and visualizations.
R Code:
# Step 1: Load Required Libraries
if (!requireNamespace("tm", quietly = TRUE)) install.packages("tm")
if (!requireNamespace("wordcloud", quietly = TRUE))
install.packages("wordcloud")
if (!requireNamespace("SnowballC", quietly = TRUE))
install.packages("SnowballC")
library(tm)
library(wordcloud)
library(SnowballC)
# Step 2: Load Text Data
text_data <- c(
    "Text mining is the process of deriving meaningful information
from text.",
    "It involves cleaning, preprocessing, and analyzing textual
data.",
    "Applications of text mining include sentiment analysis, topic
modeling, and more."
)
# Step 3: Create a Corpus
corpus <- Corpus(VectorSource(text_data))
# Step 4: Preprocess the Data
corpus <- tm_map(corpus, content_transformer(tolower))            # Convert to
lowercase
corpus <- tm_map(corpus, removePunctuation)                      # Remove
punctuation
corpus <- tm_map(corpus, removeNumbers)                          # Remove
numbers
corpus <- tm_map(corpus, removeWords, stopwords("en")) # Remove
stopwords
corpus <- tm_map(corpus, stemDocument)                                      # Perform
stemming
# Step 5: Create a Document-Term Matrix
dtm <- DocumentTermMatrix(corpus)
cat("\nDocument-Term Matrix Summary:\n")
print(dtm)
# Step 6: Analyze and Visualize Data
# Find the most frequent terms
freq_terms <- findFreqTerms(dtm, lowfreq = 2)
cat("\nFrequent Terms (Appearing >= 2 times):\n")
print(freq_terms)
# Visualize with Word Cloud
word_freq <- as.data.frame(as.matrix(dtm))
word_freq <- colSums(word_freq)
wordcloud(names(word_freq), word_freq, max.words = 50, colors =
brewer.pal(8, "Dark2"))
Expected Output:
Document-Term Matrix Summary:
A document-term matrix (3 documents, 20 terms)
Frequent Terms (Appearing >= 2 times):
[1] "data"      "text"     "mine"
Word Cloud Visualization:
A colorful word cloud showing frequent terms like "text," "data," and "mine."
Conclusion:
Text mining was successfully performed using R. Preprocessing techniques and analysis,
including generating a word cloud, helped extract meaningful insights from textual data.
Experiment 11: Data Visualization Techniques
Aim:
To demonstrate various data visualization techniques in R using basic and advanced plots.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Load Required Libraries: Install and load necessary libraries like ggplot2.
   3. Load the Dataset: Use a built-in dataset such as mtcars or import your own.
   4. Generate Visualizations:
         ○ Create basic plots (line plot, bar plot, etc.).
         ○ Create advanced plots (scatter plot, histogram, etc.).
         ○ Add labels, titles, and themes to the plots.
   5. Customize the Plots:
         ○ Use colors, point shapes, and additional features for better insights.
   6. Display the Results: Render the plots and analyze the insights.
   7. End: Save the plots if required.
R Code:
# Step 1: Load Required Libraries
if (!requireNamespace("ggplot2", quietly = TRUE))
install.packages("ggplot2")
library(ggplot2)
# Step 2: Load Dataset
data("mtcars")
cat("Dataset Summary:\n")
print(summary(mtcars))
# Step 3: Generate Basic Visualizations
# Bar Plot - Number of cylinders
barplot(table(mtcars$cyl), main = "Number of Cylinders", col =
"blue",
          xlab = "Cylinders", ylab = "Frequency")
# Scatter Plot - MPG vs Horsepower
plot(mtcars$mpg, mtcars$hp, main = "MPG vs Horsepower",
     xlab = "Miles Per Gallon (MPG)", ylab = "Horsepower (HP)",
     col = "red", pch = 19)
# Step 4: Generate Advanced Visualizations with ggplot2
# Histogram - MPG Distribution
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
  labs(title = "MPG Distribution", x = "Miles Per Gallon", y =
"Frequency")
# Box Plot - MPG by Cylinders
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(fill = "orange") +
  labs(title = "MPG by Cylinder Count", x = "Number of Cylinders", y
= "MPG")
# Step 5: Customize a Scatter Plot with ggplot2
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(gear))) +
  geom_point(size = 3) +
  labs(title = "MPG vs Weight by Gear", x = "Weight", y = "Miles Per
Gallon") +
  theme_minimal()
# Step 6: Save a Plot (Optional)
ggsave("scatter_plot.png", width = 8, height = 6)
Expected Output:
   1. Bar Plot:
         ○ Displays the frequency of cars based on the number of cylinders.
   2. Scatter Plot:
         ○ Shows the relationship between miles per gallon (MPG) and horsepower
             (HP).
   3. Histogram:
         ○ Represents the distribution of MPG across the dataset.
   4. Box Plot:
         ○ Compares MPG values for different cylinder categories.
   5. Advanced Scatter Plot:
         ○ Highlights the relationship between weight and MPG, grouped by the number
            of gears.
Conclusion:
Various data visualization techniques were successfully implemented using R. Both basic
and advanced plots provide insights into the dataset, demonstrating the power of visual
analysis.
Experiment 12: Experiment with Hypothesis Testing Methods
Aim:
To perform hypothesis testing in R to determine if there is a significant difference between
two sample groups.
Algorithm:
   1. Start the R Environment: Open RStudio or the R Console.
   2. Set Up Hypotheses:
          ○ Define the null hypothesis (H₀) and the alternative hypothesis (H₁).
          ○ Example: H₀ - There is no significant difference between the means of two
              groups.
   3. Load or Generate Data:
          ○ Use a built-in dataset or simulate data for testing.
   4. Perform Hypothesis Testing:
          ○ Use appropriate statistical tests (e.g., t-test, ANOVA, chi-square test).
          ○ Choose the test based on the data type and hypothesis.
   5. Interpret the Results:
          ○ Compare the p-value with the significance level (α = 0.05).
          ○ Accept or reject the null hypothesis based on the p-value.
   6. End: Report the conclusion of the test.
R Code:
# Step 1: Generate Sample Data
set.seed(123)
group1 <- rnorm(30, mean = 50, sd = 5)                 # Group 1 data
group2 <- rnorm(30, mean = 55, sd = 5)                 # Group 2 data
# Step 2: Define Hypotheses
# H₀: The means of group1 and group2 are equal.
# H₁: The means of group1 and group2 are not equal.
# Step 3: Perform an Independent t-test
t_test_result <- t.test(group1, group2, alternative = "two.sided")
# Step 4: Display the Results
cat("T-Test Results:\n")
print(t_test_result)
# Step 5: Interpret the Results
if (t_test_result$p.value < 0.05) {
    cat("\nConclusion: Reject the null hypothesis. There is a
significant difference between the groups.\n")
} else {
    cat("\nConclusion: Fail to reject the null hypothesis. No
significant difference is found.\n")
}
# Step 6: Visualization (Optional)
boxplot(group1, group2, names = c("Group 1", "Group 2"),
          main = "Boxplot of Two Groups",
          col = c("lightblue", "pink"),
          ylab = "Values")
Expected Output:
T-Test Results:
        Welch Two Sample t-test
data:    group1 and group2
t = -3.632, df = 57.76, p-value = 0.0006345
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -7.898827 -2.101173
sample estimates:
mean of x mean of y
 50.28389    54.48388
Conclusion: Reject the null hypothesis. There is a significant
difference between the groups.
Visualization:
A boxplot comparing the two groups, showing the difference in their distributions.
Conclusion:
Hypothesis testing was successfully conducted using an independent t-test. The results
indicate whether there is a statistically significant difference between the means of the two
sample groups.