UNIT-II_R_programming-1
UNIT-II_R_programming-1
UNIT-II
R-Ready Data-Sets
• R provides built-in data-sets.
• Data-sets are also present in user-contributed-packages.
• The data-sets are useful for learning, practice and experimentation.
• The datasets are useful for data analysis and statistical modelling.
• data() can be used to access a list of the data-sets.
• The list of available data-sets is organized
i) alphabetically by name and
ii) grouped by package.
• The availability of the data-sets depends on the installed contributed-packages.
Built-in Data-Sets
• These datasets are included in the base R installation
• These data-sets are found in the package called "datasets."
For example,
R> library(help="datasets") #To view summary of the data-sets within the
package
Contributed Data-Sets
• Contributed datasets are created by the R-community.
• The datasets are not included in the base R installation.
• But the datasets are available through additional packages.
• You can install and load additional packages containing the required datasets.
For example,
R> install.packages("tseries") # to install the package
Table Format
• Table-format files are plain-text files with three features:
1) Header: If present, the header is the first line of the file.
The header provides column names.
2) Delimiter: The delimiter is a character used to separate entries in each line.
3) Missing Value: A unique character string denoting missing values
This is converted to `NA` when reading.
• Table-format files typically have extensions like `.txt` or `.csv`.
Syntax:
read.table(file, header = FALSE, sep = "")
where,
file: The name of the file from which data should be read.
This can be a local file path or a URL.
header: This indicates whether the first row of the file contains column names
Default is FALSE.
Arjuna 31 Mandya
Karna 29 Maddur
• To read this data into R and create a data frame from it:
# Specify the file path
file_path <- "data.txt"
Output:
Name Age City
Krishna 26 Mysore
Arjuna 31 Mandya
Karna 29 Maddur
Web-Based Files
• read.table() can be used for reading tabular data from web-based files.
• We can import data directly from the internet.
Example: To read tabular data from a web-based file located at the following
URL:
https://example.com/data.txt.
# Specify the URL of the web-based file
url <- "https://example.com/data.txt"
print(my_data)
Spreadsheet Workbooks
• R often deals with spreadsheet software file formats, such as Microsoft Office
Excel's `.xls` or `.xlsx`.
• Exporting spreadsheet files to a table format, like CSV, is generally preferable
before working with R.
header: This indicates whether the first row of the file contains column names
Default is FALSE.
• To read this data into R and create a data frame from it:
# Specify the file path
file_path <- "data.csv"
Writing Files
• write.table() is used to write a data frame to a text file.
Syntax:
write.table(x, file, sep = " ",row.names = TRUE, col.names = TRUE, quote
= TRUE)
where
x: The data frame or matrix to be written to the file.
file: The name of the file where the data should be saved.
sep: This represents the field separator character (e.g., "\t" for tab-separated
values, "," for comma-separated values).
row.names: A logical value indicating whether row names should be written to
the file. Default is TRUE.
BGS FGC Mysuru
Statistical Analysis and R Programming
# Confirmation message
cat(paste("Data saved to", file_name))
• We close the PDF device using dev.off() to complete the PDF file.
city = "Mysore",
)
# Use dget to read and recreate the R object from the text file
recreated_list <- dget(file = "my_list.txt")
Calling Functions
Scoping
• Scoping-rules determine how the language accesses objects within a session.
• These rules also dictate when duplicate object-names can coexist.
Environments
• Environments are like separate compartments where data structures and
functions are stored.
• They help distinguish identical names associated with different scopes.
• Environments are dynamic and can be created, manipulated, or removed.
• Three important types of environments are:
1) Global Environment
Global Environment
• It is the space where all user-defined objects exist by default.
• When objects are created outside of any function, they are stored in global
environment.
• Use: Objects in the global environment are accessible from anywhere within the
session. Thus they are globally available.
• `ls()` lists objects in the current global environment.
• Example:
R> v1 <- 9
R> v2 <- "victory"
R> ls()
[1] "v1" "v2"
Local Environment
• Local environment is created when a function is called.
• Objects defined within a function are typically stored in its local environment.
• When a function completes, its local environment is automatically removed.
• These environments are isolated from the Global Environment.
• This allows identical argument-names in functions and the global workspace.
• Use: Local environments protect objects from accidental modification by other
functions.
# Define a function with a local environment
my_function <- function() {
local_var <- 42
return(local_var)
}
• Example:
R> ls("package:graphics") #lists objects contained in graphics package
environment
"abline" "arrows" "assocplot" "axis"
Search-path
• A search-path is used to access data structures and functions from different
environments.
• The search-path is a list of environments available in the session.
• search() is used to view the search-path.
• Example:
R> search()
".GlobalEnv" "package:stats" "package:graphics" “package:base”
• The search-path
i) starts at the global environment (.GlobalEnv) and
ii) ends with the base package environment (package:base).
• When looking for an object, R searches environments in the specified order.
• If the object isn't found in one environment, R proceeds to the next in the
searchpath.
• environment() can be used to determine function's environment.
R> environment(seq)
<environment: namespace:base>
R> environment(arrows)
<environment: namespace:graphics>
Protected Names
• These names are associated with built-in functions and objects.
• These names are predefined and have specific functionalities.
• These names should not be directly modified or reassigned by users.
• Examples:
BGS FGC Mysuru
Statistical Analysis and R Programming
Argument Matching
• Argument matching refers to the process by which function-arguments are
matched
to their corresponding parameter-names within a function call
• Five ways to match function arguments are
1) Exact matching
2) Partial matching
3) Positional matching
4) Mixed matching
5) Ellipsis (...) argument
Exact
• Exact matching is the default argument matching method.
• In this, arguments are matched based on their exact parameter-names.
• Advantages:
1) Less prone to mis-specification of arguments.
2) The order of arguments doesn't matter.
• Disadvantages:
1) Can be cumbersome for simple operations.
2) Requires users to remember or look up full, case-sensitive tags.
• Example:
R> mat <- matrix(data=1:4, nrow=2, ncol=2, dimnames=list(c("A","B"),
c("C","D")))
R> mat
CD
A13
B24
Partial Matching
• Partial matching allows to specify only a part of the parameter-name as
argument.
• The argument is matched to the parameter whose name starts with the provided
partial name.
• Example:
R> mat <- matrix(nr=2, di=list(c("A","B"), c("C","D")), nc=2, dat=1:4)
BGS FGC Mysuru
Statistical Analysis and R Programming
R> mat
CD
A13
B24
• Advantages:
1) Requires less code compared to exact matching.
2) Argument tags are still visible, reducing the chance of mis-specification.
• Disadvantages:
1) Can become tricky when multiple arguments share the same starting
letters in their tags.
2) Each tag must be uniquely identifiable, which can be challenging to
remember.
Positional Matching
• Positional matching occurs when you specify arguments in the order in which
the
parameters are defined in the function's definition.
• Arguments are matched to parameters based on their position.
• args() can be used to find the order of arguments in the function.
• Example:
R> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
NULL
R> mat <- matrix(1:4, 2, 2, F, list(c("A","B"), c("C","D")))
R> mat
CD
A13
B24
• Advantages:
1) Results in shorter, cleaner code for routine tasks.
2) No need to remember specific argument tags.
• Disadvantages:
1) Requires users to know and match the defined order of arguments.
2) Reading code from others can be challenging, especially for unfamiliar
functions.
BGS FGC Mysuru
Statistical Analysis and R Programming
Mixed Matching
• Mixed matching allows a combination of exact, partial, and positional matching
in a
single function call.
• Example:
R> mat <- matrix(1:4, 2, 2, dim=list(c("A","B"),c("C","D")))
R> mat
CD
A13
B24
if Statement
The if statement is the simplest decision-making statement which helps us to take
a decision on the basis of the condition.
The block of code inside the if statement will be executed only when the boolean
expression evaluates to be true. If the statement evaluates false, then the code
which is mentioned after the condition will run.
Syntax:
if(boolean_expression)
{
// If the boolean expression is true, then statement(s) will be executed.
}
Example:
x <-20
y<-24
if(x<y)
{
print(x,"is a smaller number\n")
}
Output: 20 is a smaller number
If-else statement
There is another type of decision-making statement known as the if-else
statement. An if-else statement is the if statement followed by an else statement.
An if-else statement, else statement will be executed when the boolean expression
will false.
Syntax:
if(boolean_expression)
{
// statement(s) will be executed if the boolean expression is true.
}
else
{
// statement(s) will be executed if the boolean expression is false.
}
Example:
a<- 100
if(a<20)
{
cat("a is less than 20\n")
}
else
BGS FGC Mysuru
Statistical Analysis and R Programming
{
cat("a is not less than 20\n")
}
cat("The value of a is", a)
Example:
marks=83;
if(marks>75){
BGS FGC Mysuru
Statistical Analysis and R Programming
print("First class")
}else if(marks>65){
print("Second class")
}else if(marks>55){
print("Third class")
}else{
print("Fail")
}
Output: First class
nested if Statement
An if-else statement within another if-else statement is called nested if statement.
This is used when an action has to be performed based on many decisions. Hence,
it is called as multi-way decision
Syntax:
if(expr1)
{
if(expr2)
statement1
else
statement2
}
else
{
if(expr3)
statement3
else
statement4
}
Here, firstly expr1 is evaluated to true or false.
➢ If the expr1 is evaluated to true, then expr2 is evaluated to true or false.
• If the expr2 is evaluated to true, then statement1 is executed.
o If the expr2 is evaluated to false, then statement2 is executed.
➢ If the expr1 is evaluated to false, then expr3 is evaluated to true or false.
• If the expr3 is evaluated to true, then statement3 is executed.
o If the expr3 is evaluated to false, then statement4 is executed.
Example:
a <- 7
BGS FGC Mysuru
Statistical Analysis and R Programming
b <- 8
c <- 6
if (a > b) {
if (a > c) {
cat("largest = ", a, "\n")
}
else
{
cat("largest =", c, "\n")
}
} else {
if (b > c) {
cat("largest =", b, "\n")
}
else
{
cat("largest =", c, "\n")
}
}
Output:
Largest Value is: 8
Syntax:
ifelse(test, yes, no)
where
test: A logical vector or expression that specifies the condition to be tested.
yes: The value to be returned when the condition is TRUE.
no: The value to be returned when the condition is FALSE.
Example:
# Create a numeric vector
BGS FGC Mysuru
Statistical Analysis and R Programming
switch Statement
This is basically a “multi-way” decision statement.
This is used when we must choose among many alternatives.
Syntax:
switch(expression,
case1, result1,
case2, result2,
...,...
default)
where
expression: The expression whose value you want to match against the cases.
case1, case2, ...: Values to compare against the expression.
result1, result2, ...: Code blocks when the expression matches the
corresponding case.
default: (Optional) Code block when none of the cases match.
Example:
grade <- "B"
# Check the grade and provide feedback
switch(grade,
"A" = cat("Excellent!\n"),
"B" = cat("Well done\n"),
"C" = cat("You passed\n"),
"D" = cat("Better try again\n"),
BGS FGC Mysuru
Statistical Analysis and R Programming
cat("Invalid grade\n")
)
Output:
Well done
Coding Loops
• Loops are used to execute one or more statements repeatedly.
• There are 2 types of loops:
1) while loop
2) for loop
for Loop
• `for` loop is useful when iterating over elements in a vectors, lists or data-
frames.
• Syntax:
for (variable in sequence) {
# Code to be executed in each iteration
}
where
variable: The loop-variable that takes on values from the sequence in each
iteration.
sequence: The sequence of values over which the loop iterates.
Example:
numbers <- c(1, 2, 3, 4, 5)
for (i in numbers) {
print(2*i)
}
Output:
2 4 6 8 10
Example:
for (i in 1:3)
{
BGS FGC Mysuru
Statistical Analysis and R Programming
for (j in 1:3)
{
product <- i * j
cat(product, "\t")
}
cat("\n")
}
Output:
123
246
369
while Loop
• A while loop statement can be used to execute a set of statements repeatedly as
long as a given condition is true.
• Syntax:
while(expression)
{
statement1;
}
• Firstly, the expression is evaluated to true or false.
• If the expression is evaluated to false, the control comes out of the loop without
executing the body of the loop.
• If the expression is evaluated to true, the body of the loop (i.e. statement1) is
executed.
• After executing the body of the loop, control goes back to the beginning of the
while
statement.
Example:
BGS FGC Mysuru
Statistical Analysis and R Programming
Output:
Mean Scores: 60 70 80 90
WRITING FUNCTIONS
Function Creation
• A function is a block of code to perform a specific task.
BGS FGC Mysuru
Statistical Analysis and R Programming
Syntax:
function_name <- function(arg1, arg2, ...)
{
# Function body
# Perform some operations using arg1, arg2, and other arguments
# Optionally, return a result using 'return' statement
}
Where
`function_name`: This is the name of the function.
`arg1, arg2, ...`: These are the function-arguments.
`{ ... }` : This is the body of the function, enclosed in curly braces `{}`. -
`return(...)`: Optionally, you can use the `return` statement to return values
Example:
square <- function(x) {
result <- x * x
return(result)
}
• We call the `square` function with the argument `5` and store the result in the
`result` variable.
• Finally, we print the result, which is "The square of 5 is: 25".
Output:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144
Passing arguments
Example: Function to print area of circle
Circ.area function(r)
BGS FGC Mysuru
Statistical Analysis and R Programming
{
Area pi*r^2
Return (area)
}
Circ.area(5)
Output: 78.539
Using return
• return is used to specify what value should be returned as the result of the
function
• This allows you to pass a value or an object back to the calling code.
• If there's no `return` statement inside a function:
i) The function ends when the last line in the body code is executed.
ii) It returns the most recently assigned or created object in the function.
iii) If nothing is created, the function returns `NULL`.
Example:
add_numbers <- function(x, y)
{
result <- x + y
return(result)
}
# Call the function and store the result in a variable
sum_result <- add_numbers(5, 3)
# Print the result
cat("The sum is:", sum_result, "\n")
Output:
The sum is:8
result.
• We use the return statement to specify that the result should be returned as the
output of the function.
• When we call add_numbers(5, 3), it calculates the sum of 5 and 3 and returns
the
result, which is 8.
• We store the returned result in the variable sum_result and then print it.
Arguments
Lazy Evaluation
• Lazy evaluation means expressions are evaluated only when needed.
• The evaluation of function-arguments is deferred until they are actually needed.
• The arguments are not evaluated immediately when a function is called but are
evaluated when they are accessed within the function.
• This can help optimize performance and save computational resources.
• Example:
lazy_example <- function(a, b)
{
cat("Inside the function\n")
cat("a =", a, "\n")
cat("b =", b, "\n")
cat("Performing some operations...\n")
result <- a + b
cat("Operations completed\n")
return(result)
}
# Create two variables
x <- 10
y <- 20
# Call the function with the variables
lazy_example(x, y)
Output:
Inside the function
a = 10
b = 20
Performing some operations...
Operations completed
BGS FGC Mysuru
Statistical Analysis and R Programming
[1] 30
Setting Defaults
• You can provide predefined values for some or all of the arguments in a
function.
• Useful for providing a default behavior if user doesn't specify a value for
arguments.
• Syntax:
function_name <- function(arg1 = default_value1, arg2 = default_value2, ...)
{
# Function body
# Use arg1, arg2, and other arguments
}
Where
`arg1`, `arg2`, etc.: These are the function-arguments for which you want
to set default values.
`default_value1`, `default_value2`, etc.: These are the values you assign
as defaults for the respective arguments.
• Example: Function to calculate the area of a rectangle
calculate_rectangle_area <- function(width = 2, height = 3) {
area <- width * height
return(area)
BGS FGC Mysuru
Statistical Analysis and R Programming
}
# Call the function without specifying width and height
default_area <- calculate_rectangle_area()
# Call the function with custom width and height
custom_area <- calculate_rectangle_area(width = 5, height = 4)
cat("Default Area:", default_area, "\n")
cat("Custom Area:", custom_area, "\n")
Output:
Default Area:6
Custom Area:20
Output:
The argument 'x' is missing
The argument 'x' is provided with a value of 42
Explanation of above program:
• We define a function called `check_argument` that takes one argument, `x`.
• Inside the function, we use the `missing` function to check if the argument `x`
is missing (not provided). If it is missing, we print a message indicating that it is
missing. Otherwise, we print the value of `x`.
• When we call `check_argument()` without providing `x`, the function uses
`missing` to check if `x` is missing and prints "The argument 'x' is missing."
• When we call `check_argument(42)` with a value of 42 for `x`, the function uses
`missing` to check that `x` is provided with a value and prints "The argument 'x'
is provided with a value of 42."
}
if (plotit) {
plot(1:length(fibseq), fibseq, ...)
} else {
return(fibseq)
}
}
Figure 11-1: The default plot produced by a call to myfibplot, with thresh=150
Specialized Functions
Helper Functions
• These functions are designed to assist another function in performing
computations
• They enhance the readability of complex functions.
• They can be either defined internally or externally.
Output:
The average is: 30
Output:
The square of 5 is: 25
Explanation of above program:
• We define an internally defined helper function called `square`. This function
calculates the square of a number `x`.
• Inside the function, we perform the calculation and store the result in the `result`
variable.
• We use the `return` statement to specify that the `result` should be returned as
the output of the function.
BGS FGC Mysuru
Statistical Analysis and R Programming
• We then call the `square` function with a value of `5` and store the result in the
`squared_num` variable.
• Finally, we print the squared value using the `cat` function.
Disposable Functions
• These functions are created and used for a specific, one-time task.
• They are not intended for reuse or long-term use.
• They are often employed to perform a single, temporary operation.
• They are discarded after use.
• Example: A disposable function to calculate the area of a rectangle once
calculate_rectangle_area <- function(length, width)
{
area <- length * width
cat("The area of the rectangle is:", area, "\n")
}
# Use the disposable function to calculate the area of a specific rectangle
calculate_rectangle_area(5, 3)
Output:
The area of the rectangle is: 15
Explanation of above program:
• We define a function called `calculate_rectangle_area` that calculates the area
of a rectangle based on its length and width.
• We use this function once to calculate the area of a specific rectangle with a
length of 5 units and a width of 3 units.
Recursive Functions
• These functions call themselves within their own definition.
• They solve problems by breaking them down into smaller, similar sub-
problems.
• They consist of two parts: a base case and a recursive case.
• The base case defines the condition under which the recursion stops.
The recursive case defines how the problem is divided into smaller subproblems
and solved recursively.
BGS FGC Mysuru
Statistical Analysis and R Programming
Output:
The 5th Fibonacci number is: 4
Exception
When there’s an unexpected problem during execution of a function, R will notify
you with either a warning or an error.
BGS FGC Mysuru
Statistical Analysis and R Programming
In R, you can issue warnings with the warning command, and you can throw
errors with the stop command
Example for warning command:
warn_test <- function(x){
if(x<=0){
warning("'x' is less than or equal to 0 but setting it to 1 and
continuing")
x <- 1
}
return(5/x)
}
warn_test(0)
Output:
5
Warning message:
In warn_test(0) :
'x' is less than or equal to 0 but setting it to 1 and continuing
Explanation:
In warn_test, if x is nonpositive, the function issues a warning, and x is
overwritten to be 1.
warn_test has continued to execute and returned the value 5
Explanation:
In error_test, on the other hand, if x is nonpositive, the function throws an error
and terminates immediately.
BGS FGC Mysuru
Statistical Analysis and R Programming
The call to error_test did not return anything because R exited the function at the
stop command.
Example:
v<-c(1,2,4,'0',5)
for (i in v)
{
try(print(5/i))
}
Output:
5
2.5
1.25
Error in 5/I : non numeric argument to binary operator
1
Explanation:
In the example given above we have code which has non-numeric value in the
vector and we are trying to divide 5 with every element of the vector.
Using the try block we can see the code ran for all the other cases even after the
error in one of the iteration.
Using tryCatch
The try block prevents your code from stopping but cannot provide a way to
handle exceptions. Trycatch helps to handle the conditions and control what
happens based on the conditions.
Syntax:
check = tryCatch({
expression
}, warning = function(w){
BGS FGC Mysuru
Statistical Analysis and R Programming
Example:
check <- function(expression){
withCallingHandlers(expression,
warning = function(w){
message("warning:\n", w)
},
error = function(e){
message("error:\n", e)
},
finally = {
message("Completed")
})
}
check({10/2})
check({10/0})
check({10/'noe'})
Output:
Timing
it’s often useful to keep track of progress or see how long a certain task took to
complete.
If you want to know how long a computation takes to complete, you can use the
Sys.time command.
This command outputs an object that details current date and time information
based on your system.
Sys.time()
Output: "2016-03-06 16:39:27 NZDT"
Syntax:
Starttime Sys.time()
{
Func()
}
Endtime Sys.time()
Example:
Sleep_func function()
{
Sys.sleep(5)
}
Starttime Sys.time()
{
Sleep_func()
}
Endtime Sys.time()
Print(Endtime-Starttime)
Output:
5.008 sec
Explanation:
Store/Record the time before the execution in the variable Starttime, then after
the execution of the function, store the time in Endtime variable.
The difference between Endtime and Starttime gives the running time of the
function.
Visibility
The location where we can find a variable and also access it if required is called
the scope of a variable. There are mainly two types of variable scopes:
Global Variables: As the name suggests, Global Variables can be accessed from
any part of the program.
• They are available throughout the lifetime of a program.
• They are declared anywhere in the program outside all of the functions or
blocks.
• Declaring global variables: Global variables are usually declared outside
of all of the functions and blocks. They can be accessed from any portion
of the program.
Example:
# global variable
global = 5
# within a function
display = function()
{
print(global)
}
display()
Output:
5
10
Local Variables: Variables defined within a function or block are said to be local
to those functions.
• Local variables do not exist outside the block in which they are declared,
i.e. they can not be accessed or used outside that block.
• Declaring local variables: Local variables are declared inside a block.
Example:
func = function()
{
# this variable is local to the
# function func() and cannot be
# accessed outside this function
age = 18
print(age)
}
cat("Age is:\n")
func()
Output:
Age is :18
BGS FGC Mysuru