0% found this document useful (0 votes)
5 views

Concepts

R programming

Uploaded by

khh8ga6y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Concepts

R programming

Uploaded by

khh8ga6y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Concepts:

1. Lab program1: Write a R-Program for different types of data


structures in R: -
Data Structure: - A data structure is a particular way of organizing data in a
computer so that it can be used effectively. The idea is to reduce the space and
time complexities of different tasks. Data structures in R programming are
tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D,
or nD) and whether they’re homogeneous (all elements must be of the
identical type) or heterogeneous (the elements are often of various types). This
gives rise to the five data types which are most frequently utilized in data
analysis.
The most essential data structures used in R include:
a) Vectors
b) Lists
c) Data frames
d) Matrices
e) Arrays
a) Vectors
A vector is an ordered collection of basic data types of a given length.
The only key thing here is all the elements of a vector must be of the identical
data type e.g homogeneous data structures. Vectors are one-dimensional data
structures.
To combine the list of items to a vector, use the c() function and separate the
items by a comma.
b) Lists
A list is a generic object consisting of an ordered collection of objects. Lists
are heterogeneous data structures. These are also one-dimensional data
structures. A list can be a list of vectors, list of matrices, a list of characters and
a list of functions and so on.
To create a list, use the list() function
c) Data frames
Data frames are generic data objects of R which are used to store the
tabular data. Data frames are the foremost popular data objects in R
programming because we are comfortable in seeing the data within the tabular
form. They are two-dimensional, heterogeneous data structures. These are lists
of vectors of equal lengths.
Data frames have the following constraints placed upon them:
 A data-frame must have column names and every row should have a
unique name.
 Each column must have the identical number of items.
 Each item in a single column must be of the same data type.
 Different columns may have different data types.
To create a data frame we use the data.frame() function.
d) Matrix
A matrix is a rectangular arrangement of numbers in rows and columns.
In a matrix, as we know rows are the ones that run horizontally and columns
are the ones that run vertically. Matrices are two-dimensional, homogeneous
data structures.
To create a matrix in R you need to use the function called matrix. The
arguments to this matrix() are the set of elements in the vector. You have to
pass how many numbers of rows [nrow] and how many numbers of
columns[ncol] you want to have in your matrix and this is the important point
you have to remember that by default, matrices are in column-wise order.
e) Arrays
Arrays are the R data objects which store the data in more than two
dimensions. Arrays are n-dimensional data structures. For example, if we create
an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each with
2 rows and 3 columns. They are homogeneous data structures.
To create an array in R you need to use the function called array(). The
arguments to this array() are the set of elements in vectors and you have to pass
a vector containing the dimensions of the array. The dim parameter to specify
the dimensions.

2. Lab program2: - Write a R program that includes variables,


constants, data types.
i. R- Variables:
A variable is a memory allocated for the storage of specific data and the
name associated with the variable is used to work around this reserved block.
R Programming Language is a dynamically typed language, i.e. the R
Language Variables are not declared with a data type rather they take the data
type of the R-object assigned to them.
R supports three ways of variable assignment:
 Using equal operator- operators use an arrow or an equal sign to
assign values to variables.
 Using the leftward operator- data is copied from right to left.
 Using the rightward operator- data is copied from left to right.

Syntax for creating R Variables


Types of Variable Creation in R:
 Using equal to operators
variable_name = value

 using leftward operator


variable_name <- value

 using rightward operator


value -> variable_name

Method for R Variables


R provides some useful methods to perform operations on variables.
These methods are used to determine the data type of the variable, finding a
variable, deleting a variable, etc. One of the methods is:
1. class() function
This built-in function is used to determine the data type of the variable provided
to it.
The R variable to be checked is passed to this as an argument and it prints the
data type in return.
Syntax
class(variable)

ii. R Constants: - Constants are those entities whose values aren't meant to
be changed anywhere throughout the code. In R, we can declare constants
using the <- symbol. There are 2 basic types of constants. These are
numeric constants and character constants.
a. Numeric Constants: All the numbers you will be using within a program
fall under this category. There are sub types like integer, double or
complex, which is checked using typeof() function.
b. Character Constants: These can be signified by means of either single
quotes (') or using double quotes (") as delimiters.

The five types of R constants - numeric, integer, complex, logical, string.


In addition to these, there are 4 specific types of R constants
- Null, NA, Inf, NaN.
Built-in constants:
 LETTERS - to display a list of all uppercase letters
 letters - to display a list of all small letters
 month.abb - to print 3 letter abbreviations of all English months
 pi - to print the numerical value of the constant pi
iii. R Data-Types:
R Data types are used to specify the kind of data that can be stored
in a variable.
For effective memory consumption and precise computation, the right data
type must be selected.
Each R data type has its own set of regulations and restrictions. Variables are
not needed to be declare with a data type in R, data type even can be changed.
Data Types in R Programming Language
Each variable in R has an associated data type. Each R-Data Type requires
different amounts of memory and has some specific operations which can be
performed over it.
Data Types in R are:
a) numeric – (3,6.7,121)
b) Integer – (2L, 42L; where ‘L’ declares this as an integer)
c) logical – (‘True’)
d) complex – (7 + 5i; where ‘i’ is imaginary number)
e) character – (“a”, “B”, “c is third”, “69”)
f) raw – ( as.raw(55); raw creates a raw vector of the specified length)

3. Lab Program3: Write a R program that include operators, control


structures, default values for arguments, returning complex objects
i. R Operators:
Operators are the symbols directing the compiler to perform
various kinds of operations between the operands. Operators simulate
the various mathematical, logical, and decision operations performed on
a set of Complex Numbers, Integers, and Numerical as input operands.
R supports majorly four kinds of binary operators between a set of
operands.
Types of the operator in R language
 Arithmetic Operators[+, -, *, /, ^, %%, %/%]
 Logical Operators[&, &&, |, ||, !]
 Relational Operators[==,!=,>,<,>=,<=]
 Assignment Operators[=,<-,<<-]
 Miscellaneous Operators[:,%in%,%*%]
ii. Control Structures: Control statements are expressions used to
control the execution and flow of the program based on the conditions
provided in the statements. These structures are used to make a decision
after assessing the variable. In this article, we’ll discuss all the control
statements with the examples.
In R programming, there are 8 types of control statements as follows:
 if condition
 if-else condition
 for loop
 nested loops
 while loop
 repeat and break statement
 return statement
 next statement
if-else condition
It is similar to if condition but when the test expression in if condition fails, then
statements in else condition are executed.
Syntax:
if(expression){
statements
....
....
}
else{
statements
....
....
}
iii. Function Arguments in R Programming:
Arguments are the parameters provided to a function to perform
operations in a programming language. In R programming, we can use
as many arguments as we want and are separated by a comma. There is
no limit on the number of arguments in a function in R.
Adding Arguments in R
We can pass an argument to a function while calling the function by simply
giving the value as an argument inside the parenthesis. Below is an
implementation of a function with a single argument.
Syntax:
function_name <- function(arg1, arg2, … )
{
code
}
Adding Default Value in R
The default value in a function is a value that is not required to specify
each time the function is called. If the value is passed by the user, then the user-
defined value is used by the function otherwise, the default value is used.
Function as Argument
In R programming, functions can be passed to another functions as arguments.

4. Lab program4: Write R program for quick sort implementation,


binary search tree.
a. Quick Sort in R:
Quick Sort is a sorting algorithm based on the Divide and Conquer
algorithm that picks an element as a pivot and partitions the given array
around the picked pivot by placing the pivot in its correct position in the
sorted array.
How does Quick Sort work?
The key process in quick Sort is a partition(). The target of partitions is to
place the pivot (any element can be chosen to be a pivot) at its correct position
in the sorted array and put all smaller elements to the left of the pivot, and all
greater elements to the right of the pivot.
Partition is done recursively on each side of the pivot after the pivot is placed in
its correct position and this finally sorts the array.
Choice of Pivot:
There are many different choices for picking pivots.
o Always pick the first element as a pivot.
o Always pick the last element as a pivot (implemented below)
o Pick a random element as a pivot.
o Pick the middle as the pivot.
b. Binary Search Tree:
A Binary Search Tree is a data structure used in computer science for
organizing and storing data in a sorted manner. Each node in a Binary Search
Tree has at most two children, a left child and a right child, with the left child
containing values less than the parent node and the right child containing
values greater than the parent node. This hierarchical structure allows for
efficient searching, insertion, and deletion operations on the data stored in the
tree.
 Inorder Traversal: At first traverse left subtree then visit the root and
then traverse the right subtree.
Follow the below steps to implement the idea:
 Traverse left subtree
 Visit the root and print the data.
 Traverse the right subtree
The Inorder traversal of the BST gives the values of the nodes in sorted order.
To get the decreasing order visit the right, root, and left subtree.
 Preorder Traversal: At first visit the root then traverse left
subtree and then traverse the right subtree.
Follow the below steps to implement the idea:
 Visit the root and print the data.
 Traverse left subtree
 Traverse the right subtree
 Postorder Traversal: At first traverse left subtree then traverse
the right subtree and then visit the root.
Follow the below steps to implement the idea:
 Traverse left subtree
 Traverse the right subtree
 Visit the root and print the data.

5. Lab program5: Write a R program for calculating cumulative sums,


and products minima maxima and calculus:
a. Cumulative Sums: The cumulative sum can be defined as the sum
of a set of numbers as the sum value grows with the sequence of
numbers.
cumsum() function in R Language is used to calculate the
cumulative sum of the vector passed as argument.
Syntax: cumsum(x)
Parameters:
x: Numeric Object
b. Cumulative Product:cumprod() function in R Language is used
to calculate the cumulative product of the vector passed as
argument.
Syntax: cumprod(x)
Parameters:
x: Numeric Object
c. Cumulative minima: cummin() function in R Language is used
to calculate the cumulative minima of the values of vector passed
as arguments.
Syntax: cummin(x)
Parameters:
x: numeric object
d. Cumulative maxima: The cumulative maxima is the max value
of elements 1 through l for an element l of the given variables.
cummax() function in R Language is used to calculate the
cumulative maxima of the values of vector passed as arguments.
Syntax: cummax(x)
Parameters:
x: numeric object
e. Calculus:
 expression() function in R Language is used to create an expression
from the values passed as argument. It creates an object of the
expression class.
Syntax: expression(character)
Parameters:
character: Expression, like calls, symbols, constants
 D() function:In R programming, derivative of a function can be
computed using deriv() and D() function. It is used to compute
derivatives of simple expressions.
Syntax:
deriv(expr, name)
D(expr, name)
Parameters:
expr: represents an expression or a formula with no LHS
name: represents character vector to which derivatives will be
computed
 Integrate() function: integrate() function in R Language is used to
compute single order integral of the function provided.
Syntax: integrate(f, lower, upper)
Parameters:
f: represents a function
lower: represents lower limit of integration
upper: represents upper limit of integration

6. Lab program6: Write a R program for finding stationary distribution


of markanov chains:
a. Markov Chain: Markov chains, named after Andrey Markov, a
stochastic model that depicts a sequence of possible events where
predictions or probabilities for the next state are based solely on
its previous event state, not the states before. In simple words, the
probability that n+1th steps will be x depends only on the nth steps
not the complete sequence of steps that came before n. This
property is known as Markov Property or Memorylessness.

b. Stationary Distribution of Markov Chain: A stationary


distribution of a Markov chain is a probability distribution that
remains unchanged in the Markov chain as time progresses.
Typically, it is represented as a row vector π whose entries are
probabilities summing to 1, and given transition matrix P, it

𝜋=πP.
satisfies

In other words, π is invariant by the matrix P.


c. Stopifnot function: If any of the expressions in ... are
not all TRUE, stop is called, producing an error message indicating
the first of the elements of ... which were not true.
Usage
stopifnot(...)
Arguments
... any number of (logical) R expressions, which should
evaluate to TRUE.
Value
(NULL if all statements in ... are TRUE.)

7. Lab program7: Write R program that include linear algebra


operations on vectors and matrices:
a. Vectors: Vectors are the most basic data types in R. Even a single
object created is also stored in the form of a vector. Vectors are
nothing but arrays as defined in other languages. Vectors contain
a sequence of homogeneous types of data. If mixed values are
given then it auto converts the data according to the precedence.
There are various operations that can be performed on vectors in
R.
 Arithmetic operations
We can perform arithmetic operations between 2 vectors. These
operations are performed element-wise and hence the length of both
the vectors should be the same. The arithmetic operations on vectors
include: Addition (+), Subtraction (-), Multiplication (*), Division (/)
b. Matrix: Matrices in R are a bunch of values, either real or
complex numbers, arranged in a group of fixed number of rows
and columns. Matrices are used to depict the data in a structured
and well-organized format. It is necessary to enclose the elements
of a matrix in parentheses or brackets.
 Operations on Matrices
There are four basic operations i.e. DMAS (Division, Multiplication,
Addition, Subtraction) that can be done with matrices. Both the matrices
involved in the operation should have the same number of rows and
columns.
i. Matrices Addition
The addition of two same ordered matrices M r*c and N r*c yields a matrix
R r*c where every element is the sum of corresponding elements of the
input matrices. Here, ‘+’ operator is used for matrix addition.
ii. Matrices Subtraction
The subtraction of two same ordered matrices M r*c and N r*c yields a
matrix
R r*c where every element is the difference of corresponding elements of
the second input matrix from the first. Here, ‘-’ operator is used for
matrix addition.
iii. Matrices Multiplication
The multiplication of two same ordered matrices M r*c and N r*c yields a
matrix
R r*c where every element is the product of corresponding elements of the
input matrices. Here, ‘%*%’ operator is used for matrix addition.The
multiplication operator * is used for multiplying a matrix by scalar or
element-wise multiplication of two matrices.
iv. Multiplication with scalar:
If you multiply a matrix with a scalar value, then every element of the
matrix will be multiplied with that scalar.
v. Determinant of matrix:
det() function in R Language is used to calculate the determinant of the
specified matrix.
Syntax: det(x, …)
Parameters:
x: matrix

vi. Transpose of matrix: t() function in R Language is used to


calculate transpose of a matrix or Data Frame.
Syntax: t(x)
Parameters:
x: matrix or data frame
vii. Inverse of matrix:
The inverse of a matrix is just a reciprocal of the matrix as we do in
normal arithmetic for a single number which is used to solve the
equations to find the value of unknown variables. The inverse of a
matrix is that matrix which when multiplied with the original matrix
will give as an identity matrix.
 The solve() is a generic built-in function in R which is used
to find the inverse of a matrix.
 inv() function is a built-in function in R which is especially
used to find the inverse of a matrix.

8. Lab program8: Write a R program for visual representation of an


object with creating graphs using graphic functions: Plot(), Hist(),
Line Chart(), Pie(), Boxplot(), Scatterplot().
R – graphs
There are hundreds of charts and graphs present in R. For example, bar plot,
box plot, mosaic plot, dot chart, coplot, histogram, pie chart, scatter graph, etc.
Types of R – Charts
o Plot
o Line Chart
o Pie Diagram or Pie Chart
o Histogram
o Box Plot
o Scatter Plot
a. Plot() function: The plot() function is used to draw points (markers) in a
diagram. The function takes parameters for specifying points in the
diagram.
Parameter 1 specifies points on the x-axis.
Parameter 2 specifies points on the y-axis.
b. Line Chart(): A line graph has a line that connects all the points in a
diagram.
To create a line, use the plot() function and add the type parameter with a
value of "l".
c. Pie Chart(): Pie chart is a circular chart divided into different segments
according to the ratio of data provided. The total value of the pie is 100
and the segments tell the fraction of the whole pie. It is another method
to represent statistical data in graphical form and pie() function is used
to perform the same.
Syntax: pie(x, labels, col, main, radius)
where,
 x is data vector
 labels shows names given to slices
 col fills the color in the slices as given parameter
 main shows title name of the pie chart
 radius indicates radius of the pie chart. It can be between -1 to +1

d. Histogram: Histogram is a graphical representation used to create a


graph with bars representing the frequency of grouped data in vector.
Histogram is same as bar chart but only difference between them is
histogram represents frequency of grouped data rather than data itself.
Syntax: hist(x, col, border, main, xlab, ylab)
where:
 x is data vector
 col specifies the color of the bars to be filled
 border specifies the color of border of bars
 main specifies the title name of histogram
 xlab specifies the x-axis label
 ylab specifies the y-axis label
e. Box Plot: Box plot shows how the data is distributed in the data vector.
It represents five values in the graph i.e., minimum, first quartile, second
quartile(median), third quartile, the maximum value of the data vector.
Syntax: boxplot(x, xlab, ylab, notch)
where,
 x specifies the data vector
 xlab specifies the label for x-axis
 ylab specifies the label for y-axis
 notch, if TRUE then creates notch on both the sides of the box
f. Scatter Plot: A Scatter plot is another type of graphical representation
used to plot the points to show relationship between two data vectors.
One of the data vectors is represented on x-axis and another on y-axis.
Syntax: plot(x, y, type, xlab, ylab, main)
Where,
 x is the data vector represented on x-axis
 y is the data vector represented on y-axis
 type specifies the type of plot to be drawn. For example, “l” for
lines, “p” for points, “s” for stair steps, etc.
 xlab specifies the label for x-axis
 ylab specifies the label for y-axis
 main specifies the title name of the graph
9. Lab Program9: Write R program for with any dataset containing
data frame objects, indexing and sub setting data frames, and
employ manipulating and analysing data.
 dplyr: The dplyr package in R Programming Language is a
structure of data manipulation that provides a uniform set of
verbs, helping to resolve the most frequent data manipulation
hurdles.
Functions of dplyr:
i. filter() function:For choosing cases and using their values as a base
for doing so.
Syntax: filter(data,values)
Where.
data: The data selected by user
Values: The particular value that the user want to fetch.
ii. Select() function: For choosing variables and using their names as a
base for doing so.
Syntax: select(data, variable(Column names))
Where.
data: The data selected by user
variable(names): Variable names of the data
iii. arrange() function: For reordering of the cases.
Syntax: #To arrange the data in ascending order
arrange(data, Column name)
#To arrange the data in descending order
arrange(data, Column name,descending=TRUE)
iv. mutate() function: Addition of new variables which are the
functions of prevailing variables.
Syntax: mutate(data, new variable)
v. rename() function: For choosing variables and using their names as
a base for doing so.
Syntax: rename(data, new variable name=previous variable name)
vi. distinct() function: The distinct() the function is a data manipulation
function provided by the dplyr package in R. Its primary purpose is
to identify and return unique rows or distinct combinations of values
within a data frame based on specified columns. This function is
particularly useful for data cleaning, exploratory data analysis, and
obtaining unique records from a dataset.
1. It is important to remove duplicate rows as data duplication can be
caused due to errors in data entry, merging data from different
sources, inconsistent naming conventions, or data scraping issues.
2. Duplicate data in a dataset can lead to biased results.
Syntax: distinct(data, variable name)
vii. summarize():Condensing various values to one value.
Syntax: summarize(data, measures of Central tendancy(variable))
Where,
Measures of central tendency: mean,median,1 st quartile,min,max,3rd
quartile

10.Lab Program10: Write a program to create an any application of


linear regression in multivariate context for predictive purpose:
A linear model is used to predict the value of an unknown
variable based on independent variables. It is mostly used for finding
out the relationship between variables and forecasting.
The lm() function is used to fit linear models to data frames in the R
Language. It can be used to carry out regression, single stratum analysis
of variance, and analysis of covariance to predict the value
corresponding to data that is not in the data frame. These are very
helpful in predicting the price of real estate, weather forecasting, etc.
Syntax:
lm( fitting_formula, dataframe )
Parameter:
 fitting_formula: determines the formula for the linear model.
 dataframe: determines the name of the data frame that contains the
data.
Predict values for unknown data points using the fitted model
To predict values for novel inputs using the above fitted linear model, we
use predict() function. The predict() function takes the model and data frame
with unknown data points and predicts the value for each data point according
to the fitted model.
Syntax:
predict( model, data )
Parameter:
model: determines the linear model.
data: determines the data frame with unknown data points.

You might also like