This document provides an introduction to programming in R. It begins with an overview of R as a programming language and its history. Some key points covered include R's basics and features, how it compares to other languages like Python and Java, and its main IDE RStudio. The document then discusses variables, operators, and basic data types in R like vectors, matrices, arrays, lists, and data frames. It provides examples of how to create and manipulate objects of each data type.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
75 views
Introduction To Data Science With R Programming
This document provides an introduction to programming in R. It begins with an overview of R as a programming language and its history. Some key points covered include R's basics and features, how it compares to other languages like Python and Java, and its main IDE RStudio. The document then discusses variables, operators, and basic data types in R like vectors, matrices, arrays, lists, and data frames. It provides examples of how to create and manipulate objects of each data type.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40
Introduction to Data Science with R
Programming
Dr. D. Vimal Kumar
Associate Professor Department of Computer Science Nehru Arts and Science College Coimbatore TABLE OF CONTENTS Programming Language History R Basics & Features Comparison other programming languages RStudio Merits & Demerits Variables Operators Data Types Programming Language •What is a programming language? A programming language is a set of rules that provides a way of telling a computer what operations to perform. • What determines a good programming languages? Run-time performance Ease of designing, Coding Debugging Maintenance Reusability Comparison with other Languages R Programming Python Java •It was stably released •It was stably released •It was stably released in 2014. in 1996. in 1995. •It has more functions •It has less functions •It has large number of and packages. and packages. inbuilt functions and packages. •It is an interpreter •It is an interpreter •It is interpreter and base language base language compiled based language. •It is statistical design •It is general purpose •It is general purpose and graphics language. programming programming language designed for language. web applications . •It is difficult to learn •It is easy to •It is easy to learn and and understand. understand. understand. •R is mostly use for •Generic programming •Java is mostly used in data analysis. tasks such as design design of windows of software's or applications and web History of R • A Programming Language • Graphics Representation • A Statistical Package • An Interpreter computer language • Open Source • Object Oriented Language • Reporting • Used by statisticians and data miners for data analysis Cont….. • R was created by Ross Ihaka and Robert Gentleman • Well developed , simple and effective Programming language • Effective data handling and storage facility • Collection of operators for calculation on array, list, vector and matrices • Provides large coherent and integrated collection of tools for data analysis • Provides Graphical facilities for data analysis • R can be interfaced with languages like python , C, C++, Matlab and Hadoop RStudio • RStudio is designed to make it easy to write scripts. • RStudio makes it convenient to view and interact with the objects stored in your environment. ... • RStudio makes it easy to set your working directory and access files on your computer. ... • RStudio makes graphics much more accessible for a casual user Rstudio IDE Merits of R • Open Source. R is an open-source programming language. ... • Exemplary Support for Data Wrangling. • The Array of Packages. • Quality Plotting and Graphing. ... • Highly Compatible. ... • Platform Independent. ... • Eye-Catching Reports. ... • Machine Learning Operations Disadvantages of R Programming • Weak Origin. R shares its origin with a much older programming language “S”. • Data Handling In R, the physical memory stores the objects. ... • Basic Security. R lacks basic security. ... • Complicated Language. R is not an easy language to learn. ... • Lesser Speed. ... • Spread Across various Packages So why learn R?? Variable Operators Cont.... • Arithmetic Operators: These operators help us perform the basic arithmetic operations like addition, subtraction, multiplication, etc. • Relational Operators: These operators help us perform the relational operations like checking if a variable is greater than, lesser than or equal to another variable. The output of a relational operation is always a logical value. • Logical Operators: These operators compare the two entities and are typically used with boolean (logical) values such as ‘and’, ‘or’ and ‘not’. Arithmetic Operators Relational Operat0rs Logical Operators Assignment Operator Assignment Operators: These operators are used to assign values to variables in R. The assignment can be performed by using either the assignment operator (<-) or equals operator (=). The value of the variable can be assigned in two ways, left assignment and right assignment. Cont.... Sample Program My.name <- readline(prompt <-"Enter name:") My.age <- readline(prompt <- "Enter age:") # Convert character to integer My.age <- as.integer(My.age) print(paste("Hi,", My.name, "next year you will be", My.age+1, "years old.")) Sample Program – Data Visualisation hist(mtcars$mpg) hist(mtcars$mpg, breaks=3, col="red") Data Types Data Types Vectors • Vectors are the most basic R data objects. It contains element of the same type. The data types can be logical, integer, double, character, complex or raw. A vector's type can be checked with the typeof() function. Another important property of a vector is its length. • remove and rm can be used to remove objects. • Positive Index – The values inside the brackets are assigned with Index. Positive Index used to retrieve the members inside the vector • Negative Index – Used to remove the member from the vector. • Range Index : Produce vector slice between two indexes by using colon Operator • Named Vector – Vector members can be assigned names and retrieved using names. Names can also be reversed in string vectors Cont.... # Create a Vector . a <- c(3,4,5,6,8 ) print(a) print(length(a)) print (max(a)) print (min(a)) print (head(a,2)) print (tail(a , 3)) #Naming the vector v<-c(1,2,3) names(v) = c("First", "Second","Third") v["First"] print(v["First"]) Types of Vectors Matrix • Matrix is the R object in which the elements are arranged in a two- dimensional rectangular layout. The basic syntax for creating a matrix in R is − matrix(data, nrow, ncol, byrow, dimnames) Where: • data is the input vector which becomes the data elements of the matrix. • nrow is the number of rows to be created. • ncol is the number of columns to be created. • byrow is a logical clue. If TRUE, then the input vector elements are arranged by row. • dimname is the names assigned to the rows and columns. Matrix – Example Mymatrix <- matrix(c(1:25), nrow = 5, ncol = 5, byrow = TRUE) print(Mymatrix) • Output: [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 6 7 8 9 10 [3,] 11 12 13 14 15 [4,] 16 17 18 19 20 [5,] 21 22 23 24 25 Example – Matrix Operation M1 <- matrix (c(2,4,5,6,7,8,7,1), nrow=2, byrow=TRUE) M2 <- matrix (c(9,8,7,6,5,4,3,2), nrow=2, byrow=FALSE) # Addition of two Matrix addmatrix <- M1+M2 print(addmatrix) # Subtraction of two Matrix submatrix <- M1-M2 print(submatrix) #Multiplication of two Matrix multiplymatrix <- M1*M2 print(multiplymatrix) # Transpose of Matrix M2 <- matrix (c(9,8,7,6,5,4,3,2), nrow=2, byrow=FALSE) tranmatrix <- t(M2) Print(tranmatrix) Arrays While matrices are confined to two dimensions, arrays Arrays in R are data objects which can be used to store data in more than two dimensions. It takes vectors as input and uses the values in the dim parameter to create an array.
The basic syntax for creating an array in R is −
array(data, dim, dimnames) Where: • data is the input vector which becomes the data elements of the array. • dim is the dimension of the array, where you pass the number of rows, column and the number of matrices to be created by mentioned dimensions. • dimname is the names assigned to the rows and columns. Example - Array # Create an array. a <- array(c('green','yellow'),dim=c(3,3,2)) print(a) When we execute the above code, it produces the following result: • ,,1 • [,1] [,2] [,3] • [1,] "green" "yellow" "green" • [2,] "yellow" "green" "yellow" • [3,] "green" "yellow" "green" • ,,2 • [,1] [,2] [,3] • [1,] "yellow" "green" "yellow" • [2,] "green" "yellow" "green" • [3,] "yellow" "green" "yellow" Difference
S.No Vectors List
1 Vector stores elements of the A list holds different data
same type or converts such as Numeric, Character, implicitly. logical, etc
2 vector is not recursive Lists are recursive
3 The vector is one- list is a multidimensional
dimensional object List Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it. •Listcancontainelementsofdifferentdatatypes •Itcancontainnumbers,vectorsorlistinsideitself •Itiscreatedusinglist()function Syntax Listname <- list(values)
ConvertList toVector - unlist()
List - Example listdata <-list("Green","Red", c( 21, 32, 11),TRUE, 24.5, 11) print (listdata) # To See the value stored print (listdata [1]) print (listdata [3]) # Assign names to the list names(listdata) <- c("lst quarter","A_matrix","A Innerlist") # Remove the fourth element from list listdata[4] <- NULL # Print 4th Element print(listdata[4]) # Update 3rd element listdata[2] <- "Updated element" print(listdata[2]) listdata[2] <- 45.5 print(listdata[2]) Data Frames • Data Frame • A Data Frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values for each column. Below are some of the characteristics of a Data Frame that needs to be considered every time we work with them: • The column names should be non-empty. • Each column should contain the same amount of data items. • The data stored in a data frame can be of numeric, factor or character type. • The row names should be unique. Create Dataframe emp_id = c(100:104) emp_name = c("John","Henry","Adam","Ron","Gary") dept = c("Sales","Finance","Marketing","HR","R & D") emp.data <- data.frame(emp_id, emp_name, dept) print(emp.data) FACTORS • Factors are data objects that help to categorise the data and store it as levels • Factor variables are used for statistical Modeling • It can store both string and Integer datatype Factor- Example # Create the vectors for data frame. height <- c(132,151,162,139,166,147,122) weight <- c(48,49,66,53,67,52,40) gender <- c("male","male","female","female","male","female","male") input_data <- data.frame(height,weight,gender) print(input_data) # Test if the gender column is a factor. print(as.factor(input_data$gender)) # Print the gender column so see the levels. print(input_data$gender) Thank You Any Queries