Introduction to r Chap 2
Introduction to r Chap 2
INTRODUCTION TO R
17-12-2024
WHY DEVELOPED?
BASIC MATHS
● 1+1
[1] 2
● 3*6
[1] 18
● > 9/2
[1] 4.5
● > 8*2-9
[1] 7
● > 5^2
[1] 25
● > "monica"
[1] "monica"
VARIABLES
DECLARATION
x<- 90
y=8
15->z
> x<-90
> y=8
> 15->z
> sum(x,y,z)
[1] 113
Variable names can contain any combination of alpha numeric character with
periods(.) and underscore(_)
> 8num=9
Error: unexpected symbol in "8num"
> _num=1
Error: unexpected symbol in "_num"
> num1=30
DATATYPES
4 TYPES
● Numeric (int,float)
● Character (strings)
● Date /POSIXct (time based)
● Logical (true/false)
Numeric data
> is.numeric(num1)
[1] TRUE
Integer
● As the name implies this is for whole numbers only, no decimals.
● To set an integer to a variable it is necessary to append the value with an L.
● <-5L
Whole number=integer
Append with capital L then it is treated as integer.
> j<-3L
> class(j)
[1] "integer"
> j=3
> class(j)
[1] "numeric"
> is.numeric(j)
[1] TRUE
> is.integer(j)
[1] FALSE
TO REMOVE ANY INTEGER
rm(X)
● Char
● Factor
> x="monica"
>x
[1] "monica"
> class(x)
[1] "character"
> y=factor("welcome")
>y
[1] welcome
Levels: welcome
Length of character/number
nchar(x)
> nchar(x)
[1] 6
DATE
● R has numeric diff types of dates.the most useful are date and POSIXct
● Date stores just a date
● Yyyy-mm-dd
● POSIXct stores date and time
● Both objects are actually represented as the number of days.
DATE
● example
date=as.Date("2024-12-19")
> date
[1] "2024-12-19"
> class(date)
[1] "Date"
as.numeric describe the number of days till the mentioned date
> as.numeric(date)
[1] 20076
POSIXct
> class(date1)
[1] "POSIXct" "POSIXt"
LOGICAL
is.logical
TRUE=1
FALSE=0
> TRUE*4
[1] 4
> FALSE*3
[1] 0
> Y=TRUE*29
>Y
[1] 29
> is.logical(Y)
[1] FALSE–gives false as it it not logical but it is logical
> class(Y)
[1] "numeric"
8==3
[1] FALSE
> 8>9
[1] FALSE
> 9>12
[1] FALSE
> 12>6
[1] TRUE
> 5>=6
[1] FALSE
> 7<=4
[1] FALSE
> 7<=9
[1] TRUE
DATA STRUCTURES
c(elements)
● Elements shld be common type
● Vectors cannot be of mixed type.
● We do not any any dimensions for vectors in r programming
language.
● THE MOST COMMON WAY TO CREATE VECTOR IS C.
● No need of for loop and range function
Example
● > a=c(2,4,6,8)
● > a
● [1] 2 4 6 8
VECTOR OPERATIONS
> s=1:10
>s
[1] 1 2 3 4 5 6 7 8 9 10
> s*3
[1] 3 6 9 12 15 18 21 24 27 30
> s+2
[1] 3 4 5 6 7 8 9 10 11 12
> s-1
[1] 0 1 2 3 4 5 6 7 8 9
> s/3
[1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333
2.6666667 3.0000000
[10] 3.3333333
> s^5
[1] 1 32 243 1024 3125 7776 16807 32768 59049 100000
VECTOR ADDITION
> x=-5:4
> y=1:10
>x
[1] -5 -4 -3 -2 -1 0 1 2 3 4
>y
[1] 1 2 3 4 5 6 7 8 9 10
> x+y
[1] -4 -2 0 2 4 6 8 10 12 14
> m=1:5
n=0:2
> m+n
[1] 1 3 5 4 6
Warning message:
In m + n : longer object length is not a multiple of shorter object length
x-y
[1] -6 -6 -6 -6 -6 -6 -6 -6 -6 -6
x*y
[1] -5 -8 -9 -8 -5 0 7 16 27 40
x/y
[1] -5.0000000 -2.0000000 -1.0000000 -0.5000000 -0.2000000 0.0000000 0.1428571
0.2500000
[9] 0.3333333 0.4000000
READ LINE
#read a number
x=readline(prompt ="enter a number a=")
x=as.integer(x)
y=readline(prompt="enter a number b=")
y=as.integer(y)
z=x+y
cat("sum is",z)
SIMPLE IF
enter a number=8
8 is greater than 5
IF ELSE
enter a number=6
6 is positive
enter a number=-1
-1 is negative
i=5
while(i>=0)
{
cat(i,"")
i=i-1
}
543210
Break
Next
i=5
while(i>=0)
{
cat(i,"")
if(i==2)
break
i=i-1
}
5432
Next
i=5
while(i>=0)
{
cat(i,"")
if(i==2)
next
i=i-1 #skips this statement
}
Accessing individual elements of a vector is done using square brackets([])
> y=5:10
>y
[1] 5 6 7 8 9 10
> y[2]
[1] 6
> y[c(6,9)]
[1] 10 NA
DATA STRUCTURES
Vectors
Lists
Data frames
Matrices
Arrays
Strings
Factors
DATA FRAMES
The data represented inthe form of rows and columns
x=1:5
y=5:1
z=c("c","c++","java","python","r")
df=data.frame(x,y,z)
print(df)
xy z
115 c
2 2 4 c++
3 3 3 java
4 4 2 python
551 r
TO FIND THE SHAPE
x=1:5
y=5:1
z=c("c","c++","java","python","r")
df=data.frame(x,y,z)
print(df)
cat("number of rows",nrow(df))
cat("\nnumber of columns",ncol(df))
cat("\ndimension",dim(df))
number of rows 5
number of columns 3
dimension 5 3
df=data.frame(first=x,second=y,course=z)
first second course
1 1 5 c
2 2 4 c++
3 3 3 java
4 4 2 python
5 5 1 r
rollno=1:5
name=c("nithya","harika","monica","arun","vinay")
sgpa=c(8.9,7.6,6.5,8,9)
cgpa=c(8.8,9,7.9,8.8,7.5)
df=data.frame(rollno,name,sgpa,cgpa)
print(df)
cat("number of rows",nrow(df))
cat("\nnumber of columns",ncol(df))
cat("\ndimension",dim(df))
print(head(df)) #prints the first 5 rows columns
number of rows 5
number of columns 4
dimension 5 4
> class(rollno)
[1] "integer"
print(head(df,n=3))
rollno name sgpa cgpa
1 1 nithya 8.9 8.8
2 2 harika 7.6 9.0
3 3 monica 6.5 7.9
print(tail(df,n=3))
rollno name sgpa cgpa
3 3 monica 6.5 7.9
4 4 arun 8.0 8.8
5 5 vinay 9.0 7.5
rownames(df)=c("1st","2nd","3rd","4th","5th")
print(df)
rollno name sgpa cgpa
1st 1 nithya 8.9 8.8
2nd 2 harika 7.6 9.0
3rd 3 monica 6.5 7.9
4th 4 arun 8.0 8.8
5th 5 vinay 9.0 7.5
ARRAY
,,2
a=array(c(1:12),dim=c(3,2,2))
print(a)
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
,,2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
a=array(c(1:12),dim=c(2,2,3))
print(a)
[,1] [,2]
[1,] 1 3
[2,] 2 4
,,2
[,1] [,2]
[1,] 5 7
[2,] 6 8
,,3
[,1] [,2]
[1,] 9 11
[2,] 10 12
b=array(c(1,0,0,2,0,1,0,1,1,1),dim=c(2,3,3))
print(b)
,,1
,,2
,,3
2 is in mat 2: TRUE
2 is in mat 2: FALSE
STRING
Any value within the pair of single quote or double quote in r is treated as string.
s1="hello"
print(s1)
s2="welcome to data visualization"
print(s2)
[1] "hello"
[1] "welcome to data visualization"
Concatenation
PLOT
l=line plot
b=both line and plots
s=step plot
n=no plot
h=histogram like plot
x=c(2,3,4,6,8)
plot(x,type="l",col="black")
x=c(2,3,4,6,8)
plot(x,type="b",col="black")
x=c(2,3,4,6,8)
plot(x,type="h",col="red")
x=c(2,3,4,6,8)
plot(x,type="s",col="blue",main="line chart",xlab="x",ylab="y",lwd=3,lty=2)
x=c(2,3,4,6,8)
plot(x,type="l",col="red",main="line chart")
line1=c(1,3,5,7,9)
line2=c(2,4,6,8,10)
plot(line1,type="l",col="blue",lwd=3,lty=3)
lines(line2,col="red",lwd=3,lty=3)
line1=c(1,3,5,7,9)
line2=c(2,4,6,8,10)
line3=c(2,3,4,5,6)
plot(line1,type="l",col="blue",lwd=3,lty=3)
lines(line2,col="red",lwd=3,lty=3)
lines(line3,col="green",lwd=3,lty=3)
PIE PLOT
x=c(50,60,70,80,90)
pie(x)
x=c(50,60,70,80,90)
lab=c("c","c++","java","r","python")
pie(x,label=lab)
x=c(50,60,70,80,90)
lab=c("c","c++","java","r","python")
colors=c("green","blue","purple","yellow","red")
pie(x,label=lab,col=colors)
pie(x,label=x,col=colors,main="MARKS")
BAR PLOT
y=c(4,8,9,12,15)
barplot(y,main="bar plot")
HORIZONTAL
y=c(4,8,9,12,15)
#barplot(y,main="bar plot")
barplot(y,main="bar plot",horiz=TRUE)
BOX PLOT
A box graph is a chart that is used to display information in the form of distribution by
drawing boxplots for each of them
Based on 5 sets
Min
First Quartile
Median
Third quartile
Max
boxplot(x,data,notch,varwidth,names,main)
MAPS
leaflet() package
It is the most popular open source java script libraries for mobile friendly interactive maps
Can add popups,map tiles
addTiles()--by default if no arg is passes it creates an openstreetmap map function on the top
of the map
Pipe operator
library(leaflet)
map=leaflet()%>%addTiles()%>%addMarkers(lng=77.5946,lat=12.9716,popup='bengaluru')
par(mar=c(1,1,1,1))
print(map)
Create a dataframe with name as city and columns as latitude longitude city name
Multiple city
library(leaflet)
city=data.frame(
name=c("bengaluru","hyderabad","mysuru","chennai","kochi"),
lat=c(12.9716,17.4065,12.2958,13.0843,9.9312),
lng=c(77.5946,78.4772,76.6394,80.2705,76.2673))
city_map=leaflet()%>%addTiles()
city_map <- city_map %>%addCircleMarkers(
data = city,
lat = ~lat,
lng = ~lng,
col = "red",
popup = ~paste("City:", name)
)
print(city_map)
CREATE A DF WITH COL NAMES AS ROLLNO , SEM, SGPA, CGPA FOR 5
STUDENTS
SCATTER PLOT
rollno=c(314,310,322,335,334)
sem=c(5,6,4,5,3)
sgpa=c(8.9,7.6,6.5,8,9)
cgpa=c(8.8,9,7.9,8.8,7.5)
df=data.frame(rollno,sem,sgpa,cgpa)
print(df)
par(mar=c(1,1,1,1))
print(plot(df[['sgpa']],df[['cgpa']]))
x=c(1,2,3,4)
y=2*x
plot(x,y)
rbind is used to add rows to the data frame
cbind is used to add column to the data frame
Rbind
data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
print(data)
new_df=rbind(data,c("grapes","green"))
#print(new_df)
fruit color
1 apple red
2 orange orange
3 kiwi brown
data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
#print(data)
new_df=rbind(data,c("grapes","green"))
print(new_df)
fruit color
1 apple red
2 orange orange
3 kiwi brown
4 grapes green
cbind
data=data.frame(fruit=c("apple","orange","kiwi"),
color=c("red","orange",'brown'))
#print(data)
new_df=rbind(data,c("grapes","green"))
#print(new_df)
new_col=cbind(data,price=c(250,300,150))
print(new_col)
2m
1. List out the basic arithmetic operator
2. List out the data types in r programming
3. Describe how to convert integer to character data type
4. Describe the difference between the date and posixct
5. How to you identify datatype of a variable
6. Define a vector
7. List out any 3 vector operations
8. What is readline()
9. Differentiate between vector and list.
10.List out any 3 functions of list data structure
11.Create a matrix with 2 rows 3 columns using r statement
12.Diff btw matrix and arrays
13.List out any 3 functions of string data type
14.List out any 3 functions of df
15.Describe the functions to create scatter plot and box plot
16.What is a leaflet package
5m
str_replace===stringr() package
library(stringr)
str_sub==to extract substring od a given string
library(stringr)
print (str_c("data", "science"))
print (str_sub("data science", 4,8))
print (str_replace ("data scince", "data", "political"))