R Programming Language is designed specifically for data analysis, visualization, and statistical modeling. Here, we'll walk through the basics of programming with R, from installation to writing our first lines of code, best practices, and much more.
Why someone might choose to learn R ?
- Data Analysis: It's great for understanding and analyzing data of any size.
- Statistics: R has powerful tools for statistical analysis, which are essential for researchers and analysts.
- Visualization: With R, we can create eye-catching visuals to explore and present data effectively.
- Machine Learning: While not as popular as Python, R still offers machine learning capabilities for tasks like classification and regression.
- Reproducible Research: R enables transparent and reproducible research by combining code, data, and text in one document.
1. Installation
The first step in starting our journey with R is to install it on our system. R is open-source software, which means it's freely available for download and use. We can download the latest version of R from the Comprehensive R Archive Network (CRAN) from the official website.
Downlaod for Windows2. Assignment
In R Programming Language there are particular Assignment are available we will discuss all of them.
1. Leftward Assignment: This is the most common way to assign values in R. It uses the <- operator, where the value is assigned to the variable on the left-hand side.
Syntax:- x <- 5
2. Rightward Assignment: Also use the -> operator for assignment, where the variable is specified on the right-hand side.
Syntax:- 5 -> x
3. Equal Sign Assignment: Although less common, you can use the equal sign (=) for assignment as well.
Syntax:- x = 5
3. Data Types
In R, variables are containers used to store data values. These data values can belong to different types, such as numeric, character, logical, and more.
Numeric Data Type
Numeric variables in R represent numerical values, including integers and floating-point numbers.
R
# Numeric variables
x <- 10 # Integer
y <- 3.14 # Floating-point number
# Print variables
print(x)
print(y)
Output:
[1] 10
[1] 3.14
Character Data Type
Character variables store text data, such as strings of characters.
R
# Character variables
name <- "John Doe"
city <- 'New York'
# Print variables
print(name)
print(city)
Output:
[1] "John Doe"
[1] "New York"
Logical Data Type
Logical variables can have only two possible values: TRUE or FALSE, representing boolean values.
R
# Logical variables
is_raining <- TRUE
is_sunny <- FALSE
# Print variables
print(is_raining)
print(is_sunny)
Output:
[1] TRUE
[1] FALSE
Factors Data Type
Factors are used to represent categorical data with a fixed number of unique levels.
R
# Factors variables
gender <- c("Male", "Female", "Male", "Female", "Male")
gender_factor <- factor(gender)
# Print factors
print(gender_factor)
Output:
[1] Male Female Male Female Male
Levels: Female Male
4. Data Structures
Vectors
Vectors are one-dimensional arrays that can hold numeric, character, or logical values. They are created using the c() function.
R
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)
# Character vector
char_vector <- c("apple", "banana", "orange")
# Logical vector
logical_vector <- c(TRUE, FALSE, TRUE)
# Print vectors
print(num_vector)
print(char_vector)
print(logical_vector)
Output:
[1] "apple" "banana" "orange"
[1] TRUE FALSE TRUE
Lists
Lists are versatile data structures that can hold elements of different data types. They are created using the list() function.
R
# List
my_list <- list(name = "John", age = 30, is_student = TRUE)
# Print list
print(my_list)
Output:
$name
[1] "John"
$age
[1] 30
$is_student
[1] TRUE
Data Frames
Data frames are two-dimensional structures that resemble tables or spreadsheets. They are used to store datasets, with rows representing observations and columns representing variables. Data frames can contain different types of data.
R
# Creating a data frame
df <- data.frame(
name = c("John", "Emma", "Alice"),
age = c(25, 30, 35),
gender = c("Male", "Female", "Female")
)
df
Output:
name age gender
1 John 25 Male
2 Emma 30 Female
3 Alice 35 Female
Matrices
Matrices are two-dimensional arrays that contain elements of the same data type. They are created using the matrix() function.
R
mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
mat
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
5. Control Structures
Control structures in R are essential for controlling the flow of execution in our code. They allow us to make decisions, repeat tasks, and execute blocks of code conditionally.
If-Else Statements
If-else statements allow you to execute different blocks of code based on whether a condition is true or false.
R
# If-else statement
x <- 10
if (x > 5) {
print("x is greater than 5")
} else {
print("x is less than or equal to 5")
}
Output:
[1] "x is greater than 5"
For Loops
For loops are used to iterate over a sequence of values and execute a block of code for each iteration.
R
# For loop
for (i in 1:5) {
print(paste("Iteration:", i))
}
Output:
[1] "Iteration: 1"
[1] "Iteration: 2"
[1] "Iteration: 3"
[1] "Iteration: 4"
[1] "Iteration: 5"
While Loops
While loops continue executing a block of code as long as a specified condition is true.
R
# While loop
x <- 1
while (x <= 5) {
print(paste("Value of x:", x))
x <- x + 1
}
Output:
[1] "Value of x: 1"
[1] "Value of x: 2"
[1] "Value of x: 3"
[1] "Value of x: 4"
[1] "Value of x: 5"
Repeat Loop
Repeat loops repeatedly execute a block of code until a break statement is encountered.
R
# Repeat loop
x <- 1
repeat {
print(paste("Value of x:", x))
x <- x + 1
if (x > 5) {
break
}
}
Output:
[1] "Value of x: 1"
[1] "Value of x: 2"
[1] "Value of x: 3"
[1] "Value of x: 4"
[1] "Value of x: 5"
Switch Statement
Switch statements provide a way to select one of many blocks of code to be executed.
R
# Switch statement
day <- "Monday"
switch(day,
"Monday" = print("It's Monday!"),
"Tuesday" = print("It's Tuesday!"),
"Wednesday" = print("It's Wednesday!"),
"Thursday" = print("It's Thursday!"),
"Friday" = print("It's Friday!"),
"Saturday" = print("It's Saturday!"),
"Sunday" = print("It's Sunday!"))
Output:
[1] "It's Monday!"
6. Functions in R
Functions play a crucial role in R programming, allowing us to encapsulate reusable pieces of code. They enable to break down complex tasks into smaller, manageable units, making our code more modular, readable, and maintainable.
Defining a Function
In R, we can define our own functions using the function() keyword. A function typically consists of a name, a list of parameters (arguments), and a block of code that defines its behavior.
- my_function is the name of the function.
- x and y are the parameters of the function.
- result <- x + y is the code block that computes the result.
- return(result) specifies the value that the function should return.
R
# Defining a function
my_function <- function(x, y) {
result <- x + y
return(result)
}
Calling a Function
Once a function is defined, we can call it by its name and pass arguments to it.
R
# Calling the function
output <- my_function(3, 5)
print(output)
Output:
[1] 8
7. Pre-built datasets in R
Pre-built datasets in R are ready-to-use collections of data that come bundled with the R programming language. These datasets cover various topics and are available for users to practice data analysis and visualization without the need to import external data.
R
# List pre-built datasets in R
data()
Output:
Data sets in package ‘datasets’:
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock
Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in.....................................................................................
8. Visualization with R
In R, visualization is a powerful tool for exploring data, communicating insights, and presenting findings effectively. Several packages offer diverse functionalities for creating various types of plots and graphics. Some popular R packages for visualization:
1. ggplot2: ggplot2 is a versatile and widely used package for creating static, publication-quality graphics. It follows the grammar of graphics paradigm, making it intuitive to use for creating a wide range of visualizations. With ggplot2, users can easily customize plots by adding layers, adjusting aesthetics, and modifying themes.
R
install.packages("ggplot2")
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point() +
labs(title = "Fuel Efficiency vs Horsepower",
x = "Miles per Gallon", y = "Horsepower")
Output:
How To Start Programming With R2.plotly: plotly is an interactive visualization package that allows users to create web-based, interactive plots. It supports a wide range of chart types, including scatter plots, line plots, bar charts, and 3D plots. plotly visualizations can be easily embedded into websites or shared online.
R
install.packages("plotly")
library(plotly)
plot_ly(mtcars, x = ~mpg, y = ~hp, type = "bar", mode = "markers",
marker = list(color = 'rgba(255, 100, 100, 0.5)')) %>%
layout(title = "Fuel Efficiency vs Horsepower",
xaxis = list(title = "Miles per Gallon"),
yaxis = list(title = "Horsepower"))
Output:
How To Start Programming With R
3. lattice: lattice is a package for creating trellis plots, which are multi-panel displays of data. It provides a high-level interface for creating conditioned plots, such as scatter plots, histograms, and boxplots, with a single function call.
R
install.packages("lattice")
library(lattice)
xyplot(hp ~ mpg | cyl, data = mtcars,
main = "Fuel Efficiency vs Horsepower by Cylinder Count",
xlab = "Miles per Gallon", ylab = "Horsepower")
Output:
How To Start Programming With R9. Data manipulation
Data manipulation involves the process of transforming and modifying data to extract useful information or prepare it for analysis. This can include tasks such as filtering rows, selecting columns, creating new variables, aggregating data, and joining datasets. The dplyr package in R is a powerful tool for data manipulation tasks.
Filtering Rows: Selecting rows based on certain conditions.
R
# Load the dplyr package
library(dplyr)
# Filter cars with mpg greater than 30
filtered_cars <- mtcars %>%
filter(mpg > 30)
head(filtered_cars)
Output:
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
10. Exploring Shiny
Shiny is an R package that allows us to build interactive web applications directly from R. It bridges the gap between data analysis in R and web development, enabling us to create interactive dashboards, data visualization tools, and more without requiring knowledge of HTML, CSS, or JavaScript.
Creating a Basic Shiny App with the Iris Dataset
Let's create a basic Shiny app using the iris dataset. Our app will have interactive elements such as dropdown menus to select the axes for plotting, and another dropdown menu to choose different types of plots. Additionally, we'll provide an option to download the plotted image.
R
# Install and load required packages
if (!require("shiny")) install.packages("shiny")
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")
library(shiny)
library(ggplot2)
library(dplyr)
# Define UI
ui <- fluidPage(
titlePanel("Interactive Iris Data Visualization"),
sidebarLayout(
sidebarPanel(
selectInput(inputId = "x_axis",
label = "Select X-axis:",
choices = c("Sepal Length", "Sepal Width", "Petal Length",
"Petal Width"),
selected = "Sepal Length"),
selectInput(inputId = "y_axis",
label = "Select Y-axis:",
choices = c("Sepal Length", "Sepal Width", "Petal Length",
"Petal Width"),
selected = "Sepal Width"),
selectInput(inputId = "plot_type",
label = "Select Plot Type:",
choices = c("Scatter Plot", "Line Plot", "Bar Plot"),
selected = "Scatter Plot"),
downloadButton(outputId = "download_plot", label = "Download Plot")
),
mainPanel(
plotOutput(outputId = "iris_plot")
)
)
)
# Define server logic
server <- function(input, output) {
output$iris_plot <- renderPlot({
x_var <- switch(input$x_axis,
"Sepal Length" = "Sepal.Length",
"Sepal Width" = "Sepal.Width",
"Petal Length" = "Petal.Length",
"Petal Width" = "Petal.Width")
y_var <- switch(input$y_axis,
"Sepal Length" = "Sepal.Length",
"Sepal Width" = "Sepal.Width",
"Petal Length" = "Petal.Length",
"Petal Width" = "Petal.Width")
plot_data <- iris
if (input$plot_type == "Scatter Plot") {
ggplot(plot_data, aes_string(x = x_var, y = y_var)) +
geom_point() +
labs(x = input$x_axis, y = input$y_axis, title = "Scatter Plot of Iris Dataset")
} else if (input$plot_type == "Line Plot") {
ggplot(plot_data, aes_string(x = x_var, y = y_var, group = "Species",
color = "Species")) +
geom_line() +
labs(x = input$x_axis, y = input$y_axis, title = "Line Plot of Iris Dataset")
} else if (input$plot_type == "Bar Plot") {
ggplot(plot_data, aes_string(x = "Species", y = y_var, fill = "Species")) +
geom_bar(stat = "identity") +
labs(x = "Species", y = input$y_axis, title = "Bar Plot of Iris Dataset")
}
})
output$download_plot <- downloadHandler(
filename = function() {
paste("iris_plot_", Sys.Date(), ".png", sep = "")
},
content = function(file) {
ggsave(file, plot = output$iris_plot(), device = "png")
}
)
}
# Run the application
shinyApp(ui = ui, server = server)
Output:
Basic Shiny AppConclusion
This guide has covered the basics of programming with R, a user-friendly language designed for data analysis and visualization. We started with installation and explored variables, control structures, functions, and pre-built datasets. We then delved into data analysis and visualization, both with base R functions and specialized packages like ggplot2. Finally, we introduced Shiny for building interactive web applications.
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read
Spring Boot Interview Questions and Answers Spring Boot is a Java-based framework used to develop stand-alone, production-ready applications with minimal configuration. Introduced by Pivotal in 2014, it simplifies the development of Spring applications by offering embedded servers, auto-configuration, and fast startup. Many top companies, inc
15+ min read
Python Variables In Python, variables are used to store data that can be referenced and manipulated during program execution. A variable is essentially a name that is assigned to a value. Unlike many other programming languages, Python variables do not require explicit declaration of type. The type of the variable i
6 min read