23DSCP206-Data Mining Using R
1. Simple Calculator
Aim
To write an R script to create a Simple Calculator that performs basic arithmetic operations:
addition, subtraction, multiplication, and division, using functions and control structures.
Procedure
1. Define Functions: Create separate functions for each arithmetic operation (addition,
subtraction, multiplication, division).
2. User Input: Use the readline() function to accept input for the two numbers and the
operation choice.
3. Display Menu: Show a menu with operation options and prompt the user to select one.
4. Decision Making: Use conditional statements (if-else) to call the appropriate function
based on the user’s choice.
5. Loop for Continuation: Use a while loop to allow the user to perform multiple
calculations until they choose to exit.
6. Handle Division: Ensure division is handled properly by checking for division by zero if
needed.
Program:
# Define functions for arithmetic operations
add <- function(n1, n2) {
print(paste(n1, "+", n2, "=", n1 + n2))
}
subtract <- function(n1, n2) {
print(paste(n1, "-", n2, "=", n1 - n2))
}
multiply <- function(n1, n2) {
print(paste(n1, "*", n2, "=", n1 * n2))
}
divide <- function(n1, n2) {
if (n2 != 0) {
print(paste(n1, "/", n2, "=", n1 / n2))
} else {
print("Error: Division by zero is not allowed.")
}
}
# Display program header
print("*** Simple Calculator ***")
print("-----------------------------")
ch <- 'y' # Initialize user choice variable
# Start the loop
while (ch == 'y' | ch == 'Y') {
# Input two numbers
n1 <- as.integer(readline(prompt = "Enter the value for n1: "))
n2 <- as.integer(readline(prompt = "Enter the value for n2: "))
# Display operation menu
print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Division")
# Accept operation choice
op <- as.integer(readline(prompt = "Enter the operation number (1/2/3/4): "))
# Perform the selected operation
if (op == 1) {
add(n1, n2)
} else if (op == 2) {
subtract(n1, n2)
} else if (op == 3) {
multiply(n1, n2)
} else if (op == 4) {
divide(n1, n2)
} else {
print("Invalid operation. Please enter a valid choice.")
}
# Prompt for continuation
print("Do you want to continue?")
ch <- readline(prompt = "Enter y / n: ")
}
Output:
*** Simple Calculator ***
-----------------------------
Enter the value for n1: 12
Enter the value for n2: 8
1. Addition
2. Subtraction
3. Multiplication
4. Division
Enter the operation number (1/2/3/4): 1
12 + 8 = 20
Do you want to continue?
Enter y / n: y
Enter the value for n1: 10
Enter the value for n2: 0
Enter the operation number (1/2/3/4): 4
Error: Division by zero is not allowed.
Do you want to continue?
Enter y / n: n
Result
The program successfully implements a Simple Calculator in R, demonstrating using
functions, conditional statements, and loops. Users can perform multiple calculations until they
choose to exit.
2. Understanding Vector Concepts in R
Aim
To explore and perform various operations and manipulations using vectors in R,
including creation, indexing, modification, operations, and handling missing values.
Procedure
1. Create Vectors: Learn to create numeric, character, and logical vectors. Explore
sequences and replications.
2. Access and Index Elements: Access vector elements using positive, negative, and
logical indexing.
3. Modify Vectors: Update existing elements, add new elements, and extend vectors.
4. Perform Vector Operations: Apply arithmetic, scalar, and logical operations on vectors.
5. Use Built-in Functions: Utilize functions like sum(), mean(), and sort() to analyze
vectors.
6. Understand Recycling: Observe how shorter vectors are recycled during operations.
7. Filter Vector Elements: Extract elements that satisfy certain conditions.
8. Combine Vectors: Combine multiple vectors into one.
9. Handle Missing Values: Identify and handle missing values using is.na() and
na.omit().
10.Apply Functions to Vectors: Use functions to manipulate vector elements.
11.Understand Coercion: Explore how R coerces elements of different types in a vector.
Program:
# 1. Creating Vectors
numeric_vec <- c(10, 20, 30, 40)
char_vec <- c("R", "Programming", "Vector")
logical_vec <- c(TRUE, FALSE, TRUE, FALSE)
seq_vec <- seq(1, 10, by = 2)
rep_vec <- rep(5, times = 4)
# 2. Accessing and Indexing
first_element <- numeric_vec[1]
subset_elements <- numeric_vec[2:4]
specific_elements <- numeric_vec[c(1, 3)]
exclude_element <- numeric_vec[-2]
logical_indexing <- numeric_vec[logical_vec]
# 3. Modifying Vectors
numeric_vec[2] <- 25
numeric_vec <- c(numeric_vec, 50)
# 4. Vector Operations
vec_a <- c(1, 2, 3)
vec_b <- c(4, 5, 6)
addition <- vec_a + vec_b
multiplication <- vec_a * vec_b
scalar_multiplication <- vec_a * 2
logical_comparison <- vec_a > 2
# 5. Vector Functions
vector_sum <- sum(numeric_vec)
vector_mean <- mean(numeric_vec)
vector_length <- length(numeric_vec)
sorted_vector <- sort(numeric_vec)
# 6. Vector Recycling
vec_short <- c(1, 2)
recycled_operation <- vec_a + vec_short
# 7. Filtering Vectors
filtered_vec <- numeric_vec[numeric_vec > 20]
# 8. Combining Vectors
combined_vec <- c(vec_a, vec_b)
# 9. Handling Missing Values (NA)
vec_with_na <- c(1, 2, NA, 4)
na_removed <- na.omit(vec_with_na)
na_check <- is.na(vec_with_na)
# 10. Apply Function to Each Element
sqrt_vec <- sqrt(numeric_vec)
# 11. Coercion in Vectors
mixed_vec <- c(1, "two", TRUE)
Result:
The program demonstrates successful creation, manipulation, and operations on
vectors, including indexing, filtering, and handling missing values. Additionally, vector-specific
functions, recycling, and coercion are effectively applied and observed.
3. Understanding Lists in R
Aim: To understand the creation, manipulation, and various operations on lists in R, including
accessing elements, modifying components, combining lists, using nested lists, and applying
functions on list elements.
Procedure:
1. Create Lists: Define a list containing various data types such as numeric, character, and
logical vectors.
2. Access Elements: Access components of the list using index, name, or specific indices
within a component.
3. Modify Elements: Update components, add new components, and remove existing
ones.
4. Combine Lists: Merge two lists into a single list.
5. Apply Functions: Use lapply and sapply to apply functions across list components.
6. Work with Nested Lists: Create and access nested lists.
7. Unlist: Flatten a list component into a simple vector.
8. Check and Iterate: Inspect the structure of a list using str() and iterate over its
components.
9. Use Cases: Extract specific components and perform summary statistics.
Program:
# 1. Creating Lists
list_example <- list(
numbers = c(1, 2, 3, 4), # Numeric vector
words = c("apple", "banana", "cherry"), # Character vector
logicals = c(TRUE, FALSE, TRUE) # Logical vector
)
print("Original List:")
print(list_example)
# 2. Accessing List Elements
# By index
print("Access by index (first element):")
print(list_example[[1]])
# By name
print("Access by name (words):")
print(list_example$words)
# Access specific element in a component
print("Access specific value in words (second element):")
print(list_example$words[2])
# 3. Modifying List Elements
list_example$numbers <- c(10, 20, 30) # Replace the numeric vector
print("Modified List (numbers replaced):")
print(list_example)
# Adding new components
list_example$new_component <- "Hello, R!" # Add a new component
print("List after adding a new component:")
print(list_example)
# 4. Removing List Elements
list_example$new_component <- NULL # Remove the new component
print("List after removing the new component:")
print(list_example)
# 5. Combining Lists
list_combined <- c(list_example, list(additional = c(100, 200)))
print("Combined List:")
print(list_combined)
# 6. Applying Functions on Lists
# Using lapply to apply a function to each element
print("Length of each component in the list:")
print(lapply(list_example, length))
# Using sapply to simplify the result
print("Length of each component (simplified):")
print(sapply(list_example, length))
# 7. Nested Lists
nested_list <- list(
first = list(a = 1, b = 2),
second = list(c = "hello", d = "world")
)
print("Nested List:")
print(nested_list)
# Accessing elements in nested lists
print("Access nested element (second$c):")
print(nested_list$second$c)
# 8. Unlisting
flat_vector <- unlist(list_example$numbers)
print("Unlisted numeric vector:")
print(flat_vector)
# 9. Checking and Iterating Over a List
print("Checking structure of the list:")
str(list_example)
print("Iterating over the list:")
for (item in list_example) {
print(item)
}
# 10. Use Cases
# Extracting specific components
selected_components <- list_example[c("numbers", "words")]
print("Selected components (numbers and words):")
print(selected_components)
# Summary of the numbers in the list
print("Summary of the numbers component:")
print(summary(list_example$numbers))
Result:
The R script successfully demonstrates various operations on lists, including creation,
access, modification, combination, nesting, unlisting, and applying functions. Additionally, it
showcases the utility of lists in handling heterogeneous data and complex structures effectively.
4. Matrix Operations in R
Aim:
To understand the creation, manipulation, and operations on matrices in R programming.
Procedure:
1. Creating Matrices:
○ Use the matrix() function to create matrices, filling column-wise or row-wise.
○ Combine vectors into matrices using rbind() and cbind().
2. Accessing Matrix Elements:
○ Access specific elements, rows, or columns using indexing.
3. Modifying Matrices:
○ Update elements or rows in a matrix using indexing.
4. Matrix Arithmetic:
○ Perform element-wise addition, scalar multiplication, and matrix multiplication.
5. Matrix Functions:
○ Calculate transpose, row sums, column sums, row means, and column means.
6. Logical Operations:
○ Apply conditions to matrices and return logical matrices.
7. Binding Matrices:
○ Combine matrices by columns or rows using cbind() and rbind().
8. Diagonal and Identity Matrices:
○ Create identity matrices using diag() and customize diagonal elements.
9. Matrix Subsetting:
○ Extract submatrices using row and column indexing.
10.Matrix Operations:
○ Calculate the determinant and inverse of matrices.
11.Checking Properties:
○ Use functions like dim(), nrow(), and ncol() to check matrix properties.
Program:
# 1. Creating Matrices
matrix1 <- matrix(1:9, nrow = 3, ncol = 3)
print("Matrix 1 (3x3 filled column-wise):")
print(matrix1)
matrix2 <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE)
print("Matrix 2 (2x3 filled row-wise):")
print(matrix2)
row1 <- c(1, 2, 3)
row2 <- c(4, 5, 6)
matrix3 <- rbind(row1, row2)
print("Matrix 3 (using rbind):")
print(matrix3)
col1 <- c(1, 4)
col2 <- c(2, 5)
matrix4 <- cbind(col1, col2)
print("Matrix 4 (using cbind):")
print(matrix4)
# 2. Accessing Matrix Elements
print(matrix2[2, 3])
print(matrix2[2, ])
print(matrix1[, 3])
# 3. Modifying Matrices
matrix1[1, 1] <- 10
matrix2[2, ] <- c(7, 8, 9)
# 4. Matrix Arithmetic
matrix_add <- matrix1 + matrix(9:1, 3, 3)
matrix_mult <- matrix1 * 2
matrix_dot <- matrix1 %*% t(matrix1)
# 5. Matrix Functions
t(matrix1)
rowSums(matrix1)
colSums(matrix1)
rowMeans(matrix1)
colMeans(matrix1)
# 6. Logical Operations
matrix1 > 5
# 7. Binding Matrices
matrix_combined <- cbind(matrix1, matrix2[, 1])
# 8. Diagonal and Identity Matrices
diag_matrix <- diag(1, nrow = 3, ncol = 3)
custom_diag <- diag(c(1, 2, 3))
# 9. Matrix Subsetting
sub_matrix <- matrix1[1:2, 2:3]
# 10. Matrix Operations
det_matrix <- det(matrix1)
inv_matrix <- solve(diag_matrix)
# 11. Checking Properties
dim(matrix1)
nrow(matrix1)
ncol(matrix1)
Result:
The matrix operations, including creation, modification, arithmetic, and analysis using R
functions like matrix(), rbind(), cbind(), and det(), were successfully executed. Key
results such as matrix dimensions, subsetting, and determinant calculation were verified.
5. DataFrames in R
Aim:
To learn the creation, manipulation, and analysis of DataFrames in R, including
operations like accessing, filtering, sorting, and handling missing data.
Procedure:
1. Creating a DataFrame: Use the data.frame() function to create a structured table of
data.
2. Accessing Data: Retrieve specific rows, columns, or cells using indexing and column
names.
3. Adding a New Column: Append a column with the $ operator.
4. Adding a New Row: Add rows using the rbind() function.
5. Removing a Column: Use the $ operator to assign NULL to a column for removal.
6. Summary Statistics: Use functions like summary() to analyze columns.
7. Handling Missing Data: Introduce missing values, identify them using is.na(), and
replace them with computed values.
8. Filtering Data: Filter rows using the subset() function.
9. Sorting Data: Sort rows using the order() function.
10.Saving and Loading Data: Save the DataFrame to a file using write.csv() and load
it back with read.csv().
Program:
# Step 1: Creating a DataFrame
students <- data.frame(
StudentID = 1:5,
Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
Age = c(20, 22, 23, 21, 22),
Marks = c(85, 90, 78, 88, 95),
Passed = c(TRUE, TRUE, TRUE, TRUE, TRUE)
)
print("Step 1: DataFrame Created")
print(students)
# Step 2: Accessing Data
print("\nStep 2: Accessing Data")
print("Access the first row:")
print(students[1, ])
print("Access the 'Name' column:")
print(students$Name)
print("Access specific data (Marks of 3rd student):")
print(students[3, "Marks"])
# Step 3: Adding a New Column
print("\nStep 3: Adding a New Column")
students$Grade <- c("A", "A+", "B", "A", "A+")
print("DataFrame after adding 'Grade' column:")
print(students)
# Step 4: Adding a New Row
print("\nStep 4: Adding a New Row")
new_row <- data.frame(StudentID = 6, Name = "Frank", Age = 23, Marks = 80, Passed =
TRUE, Grade = "B+")
students <- rbind(students, new_row)
print("DataFrame after adding a new row:")
print(students)
# Step 5: Removing a Column
print("\nStep 5: Removing a Column")
students$Passed <- NULL
print("DataFrame after removing the 'Passed' column:")
print(students)
# Step 6: Summary Statistics
print("\nStep 6: Summary Statistics")
print("Summary of the 'Marks' column:")
print(summary(students$Marks))
# Step 7: Handling Missing Data
print("\nStep 7: Handling Missing Data")
students$Marks[4] <- NA # Introduce a missing value
print("DataFrame with a missing value:")
print(students)
print("Replacing missing values with the column mean:")
students$Marks[is.na(students$Marks)] <- mean(students$Marks, na.rm = TRUE)
print(students)
# Step 8: Filtering Data
print("\nStep 8: Filtering Data")
print("Filter students with Marks > 85:")
filtered_students <- subset(students, Marks > 85)
print(filtered_students)
# Step 9: Sorting Data
print("\nStep 9: Sorting Data")
print("Sort students by Age in descending order:")
sorted_students <- students[order(-students$Age), ]
print(sorted_students)
# Step 10: Saving and Loading Data
print("\nStep 10: Saving and Loading Data")
write.csv(students, "students.csv", row.names = FALSE)
print("DataFrame saved to 'students.csv'.")
loaded_students <- read.csv("students.csv")
print("DataFrame loaded from 'students.csv':")
print(loaded_students)
Result:
The data frame was successfully created, manipulated, and analyzed using various
operations like adding/removing columns, filtering, and sorting. Missing data was handled
effectively, and the DataFrame was saved and loaded from a CSV file.
6. Descriptive Statistics in R
Aim:
To calculate and interpret descriptive statistics (mean, median, mode, variance, standard
deviation, and summary statistics) for a dataset in R.
Procedure
1. Import or create a dataset in R.
2. Compute basic descriptive statistics for numeric columns in the dataset:
○ Mean, Median, Mode, Variance and Standard Deviation
3. Generate summary statistics for the entire dataset.
4. Visualize the data using histograms and boxplots to understand the distribution and
variability.
Program:
# Step 1: Load or Create a Dataset
students <- data.frame(
StudentID = 1:10,
Marks = c(78, 85, 62, 90, 88, 75, 80, 68, 95, 84),
Age = c(20, 21, 20, 22, 21, 20, 22, 19, 23, 21)
)
print("Dataset:")
print(students)
# Step 2: Compute Descriptive Statistics
# Mean of Marks and Age
mean_marks <- mean(students$Marks)
mean_age <- mean(students$Age)
print(paste("Mean Marks:", mean_marks))
print(paste("Mean Age:", mean_age))
# Median of Marks and Age
median_marks <- median(students$Marks)
median_age <- median(students$Age)
print(paste("Median Marks:", median_marks))
print(paste("Median Age:", median_age))
# Mode Function
get_mode <- function(x) {
uniq_vals <- unique(x)
uniq_vals[which.max(tabulate(match(x, uniq_vals)))]
}
# Mode of Marks and Age
mode_marks <- get_mode(students$Marks)
mode_age <- get_mode(students$Age)
print(paste("Mode Marks:", mode_marks))
print(paste("Mode Age:", mode_age))
# Variance of Marks
variance_marks <- var(students$Marks)
print(paste("Variance of Marks:", variance_marks))
# Standard Deviation of Marks
std_dev_marks <- sd(students$Marks)
print(paste("Standard Deviation of Marks:", std_dev_marks))
# Step 3: Summary Statistics
summary_stats <- summary(students)
print("Summary Statistics:")
print(summary_stats)
# Step 4: Visualization
# Histogram for Marks
hist(
students$Marks,
main = "Histogram of Marks",
xlab = "Marks",
col = "lightblue",
border = "black"
)
# Boxplot for Marks
boxplot(
students$Marks,
main = "Boxplot of Marks",
ylab = "Marks",
col = "lightgreen"
)
Result:
The program successfully computes descriptive statistics (mean, median, mode,
variance, standard deviation) for the dataset and visualizes the data using a histogram and
boxplot.