0% found this document useful (0 votes)
3 views

Introduction to r Chap 2

R is a programming language designed for statistical computing and data analysis, widely used for data visualization and modeling. It was developed to improve upon the S language and is known for its ease of use and extensibility. The document covers basic mathematical operations, variable declaration, data types, and data structures in R, along with examples of scripts and functions.

Uploaded by

saicharith035
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction to r Chap 2

R is a programming language designed for statistical computing and data analysis, widely used for data visualization and modeling. It was developed to improve upon the S language and is known for its ease of use and extensibility. The document covers basic mathematical operations, variable declaration, data types, and data structures in R, along with examples of scripts and functions.

Uploaded by

saicharith035
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CHAPTER 02

INTRODUCTION TO R
17-12-2024

R is a programming language and environment specifically designed for statistical


computing and data analysis.
It is case sensitive.
●​ It is widely used for data visualization, statistical modeling, machine learning,
and scientific research.
●​ It includes powerful libraries for statistical techniques.
●​ R was first released in 1995.
●​ R was inspired by the S programming language
●​ R was created by Ross Ihaka and Robert Gentleman at the University of
Auckland, New Zealand.

WHY DEVELOPED?

●​ Improvement Over S Language


●​ Ease of Use for Statisticians
●​ Reproducibility of Research
●​ Extensibility
●​ Academic and Scientific Focus

BASIC MATHS

●​ 1+1
[1] 2

●​ 3*6
[1] 18

●​ > 9/2
[1] 4.5

●​ > 8*2-9
[1] 7

●​ > 5^2
[1] 25
●​ > "monica"
[1] "monica"

VARIABLES

DECLARATION
x<- 90
y=8
15->z

> x<-90
> y=8
> 15->z
> sum(x,y,z)
[1] 113

> assign("q",40). Assigns a variable with a given value.

To know the datatype of a variable we use


●​ class(__)

Variable names can contain any combination of alpha numeric character with
periods(.) and underscore(_)

It cant be started with underscore and number

> 8num=9
Error: unexpected symbol in "8num"
> _num=1
Error: unexpected symbol in "_num"
> num1=30

DATATYPES

4 TYPES

●​ Numeric (int,float)
●​ Character (strings)
●​ Date /POSIXct (time based)
●​ Logical (true/false)
Numeric data

●​ T⁠ he most commonly used numeric data is numeric.


●​ This is similar to a float or double in other languages.
●​ ⁠It handles integers and decimals, both positive and negative, and of course,
zero.
●​ ⁠A numeric value stored in a variable is automatically assumed to be numeric.
●​ Testing whether a variable is numeric is done with the function is.numeric.

●​ Testing whether the number is.numeric is done by is.numeric


is.numeric

> is.numeric(num1)
[1] TRUE

Integer
●​ As the name implies this is for whole numbers only, no decimals.
●​ To set an integer to a variable it is necessary to append the value with an L.
●​ <-5L

Only whole number ,no decimals.


If numeric decimals r used.

Whole number=integer
Append with capital L then it is treated as integer.

> j<-3L
> class(j)
[1] "integer"
> j=3
> class(j)
[1] "numeric"

> is.numeric(j)
[1] TRUE
> is.integer(j)
[1] FALSE
TO REMOVE ANY INTEGER

rm(X)

CHARACTER DATA TYPE

●​ Char
●​ Factor

> x="monica"
>x
[1] "monica"
> class(x)
[1] "character"
> y=factor("welcome")
>y
[1] welcome
Levels: welcome

Length of character/number

Returns number of character or returns number of intergers in the given data

nchar(x)

> nchar(x)
[1] 6

DATE

●​ R has numeric diff types of dates.the most useful are date and POSIXct
●​ Date stores just a date
●​ Yyyy-mm-dd
●​ POSIXct stores date and time
●​ Both objects are actually represented as the number of days.

DATE

●​ example
date=as.Date("2024-12-19")
> date
[1] "2024-12-19"
> class(date)
[1] "Date"
as.numeric describe the number of days till the mentioned date

> as.numeric(date)
[1] 20076

POSIXct

> date1=as.POSIXct("2024-12-19 8:28")


> date1
[1] "2024-12-19 08:28:00 IST"

> class(date1)
[1] "POSIXct" "POSIXt"

> date3=as.Date("2024-12-19 3:30")


> date3
[1] "2024-12-19"

LOGICAL

is.logical
TRUE=1
FALSE=0

> TRUE*4
[1] 4
> FALSE*3
[1] 0
> Y=TRUE*29
>Y
[1] 29

> is.logical(Y)
[1] FALSE–gives false as it it not logical but it is logical
> class(Y)
[1] "numeric"
8==3
[1] FALSE
> 8>9
[1] FALSE
> 9>12
[1] FALSE
> 12>6
[1] TRUE
> 5>=6
[1] FALSE
> 7<=4
[1] FALSE
> 7<=9
[1] TRUE

DATA STRUCTURES

Vector is nothing but an array.


Vector is collection of elements of same type.

c(elements)
●​ Elements shld be common type
●​ Vectors cannot be of mixed type.
●​ We do not any any dimensions for vectors in r programming
language.
●​ THE MOST COMMON WAY TO CREATE VECTOR IS C.
●​ No need of for loop and range function
Example
●​ > a=c(2,4,6,8)
●​ > a
●​ [1] 2 4 6 8

No need of loop just mention the range


> b=1:10
>b
[1] 1 2 3 4 5 6 7 8 9 10

VECTOR OPERATIONS

Multiply each element by 3


The multiplication operator(*)

> s=1:10
>s
[1] 1 2 3 4 5 6 7 8 9 10
> s*3
[1] 3 6 9 12 15 18 21 24 27 30

> s+2
[1] 3 4 5 6 7 8 9 10 11 12

> s-1
[1] 0 1 2 3 4 5 6 7 8 9

> s/3
[1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333
2.6666667 3.0000000
[10] 3.3333333

> s^5
[1] 1 32 243 1024 3125 7776 16807 32768 59049 100000

We can start the range from even negative numbers


Decreasing order as well as increasing order

VECTOR ADDITION

The length of the elements assigned to character shld be of same length

> x=-5:4
> y=1:10
>x
[1] -5 -4 -3 -2 -1 0 1 2 3 4
>y
[1] 1 2 3 4 5 6 7 8 9 10
> x+y
[1] -4 -2 0 2 4 6 8 10 12 14

> m=1:5
n=0:2
> m+n
[1] 1 3 5 4 6
Warning message:
In m + n : longer object length is not a multiple of shorter object length

x-y
[1] -6 -6 -6 -6 -6 -6 -6 -6 -6 -6

x*y
[1] -5 -8 -9 -8 -5 0 7 16 27 40

x/y
[1] -5.0000000 -2.0000000 -1.0000000 -0.5000000 -0.2000000 0.0000000 0.1428571
0.2500000
[9] 0.3333333 0.4000000

READ LINE

A function which returns string function

Concatinates any number of strings cat

#read a number
x=readline(prompt ="enter a number a=")
x=as.integer(x)
y=readline(prompt="enter a number b=")
y=as.integer(y)
z=x+y
cat("sum is",z)

enter a number a=2


enter a number b=3
sum is 5
Write an R SCRIPT TO ACCEPT THE TWO INTEGERS AND PRINT THE PRODUCT
OF NUMBERS

a=readline(prompt="enter number a=")


a=as.integer(a)
b=readline(prompt="enter number b=")
b=as.integer(b)
d=a*b
cat("product is",d)

enter number a=3


enter number b=9
product is 27

SIMPLE IF

#check a given number is greater than


a=readline(prompt="enter a number=")
a=as.integer(a)
if(a>5)
{
cat(a,"is greater than 5")
}

enter a number=8
8 is greater than 5

IF ELSE

#CHECK GIVEN NUMBER IS POSITIVE OR NEGATIVE


a=readline(prompt="enter a number=")
a=as.integer(a)
if(a>0)
{
cat(a,"is positive")
}else
{
cat(a,"is negative")
}

enter a number=6
6 is positive
enter a number=-1
-1 is negative

#find the greatest of 2 numbers


a=readline(prompt="enter first number=")
a=as.integer(a)
b=readline(prompt="enter second number=")
b=as.integer(b)
if(a>b)
{
cat(a,"is largest")
}else
{
cat(b,"is largest")
}

enter first number=3


enter second number=6
6 is largest

Write an R script to find sum and average of first 10 numbers


a=1:10
sum=0
for(i in 1:10)
{
sum=sum+i
}
avg=sum/10
cat("Sum of the first 10 numbers:", sum,"\n")
cat("Average of the first 10 numbers:",avg)

Sum of the first 10 numbers: 55


Average of the first 10 numbers: 5.5
Write an r script to print numbers from 5-0 using while loop

i=5
while(i>=0)
{
cat(i,"")
i=i-1
}

543210

Break

Terminate the execution

Next

Skip the current iteration

i=5
while(i>=0)
{
cat(i,"")
if(i==2)
break
i=i-1
}

5432

Next

i=5
while(i>=0)
{
cat(i,"")
if(i==2)
next
i=i-1 #skips this statement
}
Accessing individual elements of a vector is done using square brackets([])

The first element of x is retrieved by typing x[1]


The first 2 elements by x[1:2]
Non consecutive elements by x[c(1,4)]

> y=5:10
>y
[1] 5 6 7 8 9 10
> y[2]
[1] 6
> y[c(6,9)]
[1] 10 NA

This works for all type of vectors

DATA STRUCTURES

Vectors
Lists
Data frames
Matrices
Arrays
Strings
Factors

DATA FRAMES
The data represented inthe form of rows and columns

x=1:5
y=5:1
z=c("c","c++","java","python","r")
df=data.frame(x,y,z)
print(df)

xy z
115 c
2 2 4 c++
3 3 3 java
4 4 2 python
551 r
TO FIND THE SHAPE

x=1:5
y=5:1
z=c("c","c++","java","python","r")
df=data.frame(x,y,z)
print(df)
cat("number of rows",nrow(df))
cat("\nnumber of columns",ncol(df))
cat("\ndimension",dim(df))

number of rows 5
number of columns 3
dimension 5 3

We can rename any columns

df=data.frame(first=x,second=y,course=z)
first second course
1 1 5 c
2 2 4 c++
3 3 3 java
4 4 2 python
5 5 1 r

WRITE AN R SCRIPT TO CREATE DATA FRAME WITH COLUMNS AS ROLL


NUMBER,NAME,SGPA,CGPA FOR 5 STUDENTS

rollno=1:5
name=c("nithya","harika","monica","arun","vinay")
sgpa=c(8.9,7.6,6.5,8,9)
cgpa=c(8.8,9,7.9,8.8,7.5)
df=data.frame(rollno,name,sgpa,cgpa)
print(df)
cat("number of rows",nrow(df))
cat("\nnumber of columns",ncol(df))
cat("\ndimension",dim(df))
print(head(df)) #prints the first 5 rows columns

rollno name sgpa cgpa


1 1 nithya 8.9 8.8
2 2 harika 7.6 9.0
3 3 monica 6.5 7.9
4 4 arun 8.0 8.8
5 5 vinay 9.0 7.5

number of rows 5
number of columns 4
dimension 5 4

> class(rollno)
[1] "integer"

print(head(df,n=3))
rollno name sgpa cgpa
1 1 nithya 8.9 8.8
2 2 harika 7.6 9.0
3 3 monica 6.5 7.9

print(tail(df,n=3))
rollno name sgpa cgpa
3 3 monica 6.5 7.9
4 4 arun 8.0 8.8
5 5 vinay 9.0 7.5

rownames(df)=c("1st","2nd","3rd","4th","5th")
print(df)
rollno name sgpa cgpa
1st 1 nithya 8.9 8.8
2nd 2 harika 7.6 9.0
3rd 3 monica 6.5 7.9
4th 4 arun 8.0 8.8
5th 5 vinay 9.0 7.5

print(df$cgpa) #to access specific column


8.8 9.0 7.9 8.8 7.5

ARRAY

Can have more than 2 dimensions


array(vector,dim=c(nrow,ncol,nmat))
nrow=number of rows
ncol=number of columns
nmat=number of matrics
a=array(c(1:6),dim=c(2,3,2))
print(a)

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

,,2

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

a=array(c(1:12),dim=c(3,2,2))
print(a)

[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

,,2

[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12

a=array(c(1:12),dim=c(2,2,3))
print(a)

[,1] [,2]
[1,] 1 3
[2,] 2 4

,,2

[,1] [,2]
[1,] 5 7
[2,] 6 8

,,3
[,1] [,2]
[1,] 9 11
[2,] 10 12

b=array(c(1,0,0,2,0,1,0,1,1,1),dim=c(2,3,3))
print(b)

,,1

[,1] [,2] [,3]


[1,] 1 0 0
[2,] 0 2 1

,,2

[,1] [,2] [,3]


[1,] 0 1 1
[2,] 1 1 0

,,3

[,1] [,2] [,3]


[1,] 0 0 0
[2,] 2 1 1

cat ("\n 1st row 2nd col mat1",b[1,2,1])


cat ("\n 1st row 3rd col mat3",b[1,3,3])
cat ("\n 1st row of matrix 2",b[c(1), ,2])

1st row 2nd col mat1 0


1st row 3rd col mat3 0
1st row of matrix 2 0 1 1

cat ("\n 2nd row of matrix 3:",b[ ,c(2),3])


2nd row of matrix 3: 0 1

cat("\n 2 is in mat 2:",2%in%b)


cat("\n 2 is in mat 2:",6%in%b)

2 is in mat 2: TRUE
2 is in mat 2: FALSE
STRING

Any value within the pair of single quote or double quote in r is treated as string.

s1="hello"
print(s1)
s2="welcome to data visualization"
print(s2)

[1] "hello"
[1] "welcome to data visualization"

Length of the string

cat("\n length of the string1:",nchar(s1))


length of the string1: 5

Concatenation

print(paste("\n joining two strings (concatenation):",s1,s2))


"\n joining two strings (concatenation): hello welcome to data visualization"

Equal and upper


cat("\n s1 and s2 equal:",s1==s2)
cat("\n s1 in lower case:",toupper(s1))

s1 and s2 equal: FALSE


s1 in lower case: HELLO

PLOT

Plot function(), used to plot points in graph.


par(mar=c(1,1,1,1)) for the margins
plot(3,4)
x=c(1,3,5,7)
y=c(2,4,6,8)
plot(x,y,col="red")

LINE CHART USING PLOT FUNCTION

l=line plot
b=both line and plots
s=step plot
n=no plot
h=histogram like plot
x=c(2,3,4,6,8)
plot(x,type="l",col="black")
x=c(2,3,4,6,8)
plot(x,type="b",col="black")

x=c(2,3,4,6,8)
plot(x,type="h",col="red")

x=c(2,3,4,6,8)
plot(x,type="s",col="blue",main="line chart",xlab="x",ylab="y",lwd=3,lty=2)
x=c(2,3,4,6,8)
plot(x,type="l",col="red",main="line chart")

Main denotes the title of the chart.


x=c(2,3,4,6,8)
plot(x,type="l",col="red",main="line chart",xlab="x",ylab="y")

1 col = is the colour parameter


2 Line width=lwd (changes the width or size of the line)
3 Line style=lty (between 0-6)

FOR MULTIPLE LINES

line1=c(1,3,5,7,9)
line2=c(2,4,6,8,10)
plot(line1,type="l",col="blue",lwd=3,lty=3)
lines(line2,col="red",lwd=3,lty=3)
line1=c(1,3,5,7,9)
line2=c(2,4,6,8,10)
line3=c(2,3,4,5,6)
plot(line1,type="l",col="blue",lwd=3,lty=3)
lines(line2,col="red",lwd=3,lty=3)
lines(line3,col="green",lwd=3,lty=3)

PIE PLOT

x=c(50,60,70,80,90)
pie(x)
x=c(50,60,70,80,90)
lab=c("c","c++","java","r","python")
pie(x,label=lab)

x=c(50,60,70,80,90)
lab=c("c","c++","java","r","python")
colors=c("green","blue","purple","yellow","red")
pie(x,label=lab,col=colors)
pie(x,label=x,col=colors,main="MARKS")

BAR PLOT

y=c(4,8,9,12,15)
barplot(y,main="bar plot")

HORIZONTAL

y=c(4,8,9,12,15)
#barplot(y,main="bar plot")
barplot(y,main="bar plot",horiz=TRUE)
BOX PLOT

A box graph is a chart that is used to display information in the form of distribution by
drawing boxplots for each of them

Based on 5 sets
Min
First Quartile
Median
Third quartile
Max

boxplot(x,data,notch,varwidth,names,main)

MAPS

leaflet() package

It is the most popular open source java script libraries for mobile friendly interactive maps
Can add popups,map tiles

To load any package


library(package_name)
library(leaflet)

addTiles()--by default if no arg is passes it creates an openstreetmap map function on the top
of the map
Pipe operator

Output of one function is taken as input to another function.


map=leaflet()%>%addTiles()%>%addMarkers(long,lat,popup)

library(leaflet)
map=leaflet()%>%addTiles()%>%addMarkers(lng=77.5946,lat=12.9716,popup='bengaluru')
par(mar=c(1,1,1,1))
print(map)

Create a dataframe with name as city and columns as latitude longitude city name

cityname latitude longitude


1 bengaluru 12.9716 77.5946
2 hyderabad 17.4065 78.4772
3 mysuru 12.2958 76.6394
4 chennai 13.0843 80.2705
5 kochi 9.9312 76.2673

Multiple city

library(leaflet)
city=data.frame(
name=c("bengaluru","hyderabad","mysuru","chennai","kochi"),
lat=c(12.9716,17.4065,12.2958,13.0843,9.9312),
lng=c(77.5946,78.4772,76.6394,80.2705,76.2673))
city_map=leaflet()%>%addTiles()
city_map <- city_map %>%addCircleMarkers(
data = city,
lat = ~lat,
lng = ~lng,
col = "red",
popup = ~paste("City:", name)
)
print(city_map)
CREATE A DF WITH COL NAMES AS ROLLNO , SEM, SGPA, CGPA FOR 5
STUDENTS

SCATTER PLOT

rollno=c(314,310,322,335,334)
sem=c(5,6,4,5,3)
sgpa=c(8.9,7.6,6.5,8,9)
cgpa=c(8.8,9,7.9,8.8,7.5)
df=data.frame(rollno,sem,sgpa,cgpa)
print(df)
par(mar=c(1,1,1,1))
print(plot(df[['sgpa']],df[['cgpa']]))

x=c(1,2,3,4)
y=2*x
plot(x,y)
rbind is used to add rows to the data frame
cbind is used to add column to the data frame

Rbind

data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
print(data)
new_df=rbind(data,c("grapes","green"))
#print(new_df)

fruit color
1 apple red
2 orange orange
3 kiwi brown

data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
#print(data)
new_df=rbind(data,c("grapes","green"))
print(new_df)

fruit color
1 apple red
2 orange orange
3 kiwi brown
4 grapes green

cbind

data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
#print(data)
new_df=rbind(data,c("grapes","green"))
#print(new_df)
new_col=cbind(data,price=c(250,300,150))
print(new_col)

fruit color price


1 apple red 250
2 orange orange 300
3 kiwi brown 150
QUESTIONS

2m
1.​ List out the basic arithmetic operator
2.​ List out the data types in r programming
3.​ Describe how to convert integer to character data type
4.​ Describe the difference between the date and posixct
5.​ How to you identify datatype of a variable
6.​ Define a vector
7.​ List out any 3 vector operations
8.​ What is readline()
9.​ Differentiate between vector and list.
10.​List out any 3 functions of list data structure
11.​Create a matrix with 2 rows 3 columns using r statement
12.​Diff btw matrix and arrays
13.​List out any 3 functions of string data type
14.​List out any 3 functions of df
15.​Describe the functions to create scatter plot and box plot
16.​What is a leaflet package

5m

1.​ Create a df with col names as state,city,population.(any 5 states)


2.​ Write an r program to print even and odd numbers frm 1 to 20
3.​ Write an r program to find reverse of a given number
4.​ Implement with markers for any 5 cities in india
5.​ Consider 2 matrix a with 3 rows 3 cols,b with 3 rows 3 cols perform matrix addition
and multiplication
6.​ Consider a sample dataframe of your choice and implement the following graphs
using r program (line,bar,scatter,box)
7.​ Replace the given string with new string

str_replace===stringr() package
library(stringr)
str_sub==to extract substring od a given string

library(stringr)
print (str_c("data", "science"))
print (str_sub("data science", 4,8))
print (str_replace ("data scince", "data", "political"))

You might also like