Open In App

How To Start Programming With R

Last Updated : 02 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

R Programming Language is designed specifically for data analysis, visualization, and statistical modeling. Here, we'll walk through the basics of programming with R, from installation to writing our first lines of code, best practices, and much more.

Why someone might choose to learn R ?

  1. Data Analysis: It's great for understanding and analyzing data of any size.
  2. Statistics: R has powerful tools for statistical analysis, which are essential for researchers and analysts.
  3. Visualization: With R, we can create eye-catching visuals to explore and present data effectively.
  4. Machine Learning: While not as popular as Python, R still offers machine learning capabilities for tasks like classification and regression.
  5. Reproducible Research: R enables transparent and reproducible research by combining code, data, and text in one document.

1. Installation

The first step in starting our journey with R is to install it on our system. R is open-source software, which means it's freely available for download and use. We can download the latest version of R from the Comprehensive R Archive Network (CRAN) from the official website.

Screenshot-2024-04-30-233839
Downlaod for Windows

2. Assignment

In R Programming Language there are particular Assignment are available we will discuss all of them.

1. Leftward Assignment: This is the most common way to assign values in R. It uses the <- operator, where the value is assigned to the variable on the left-hand side.

Syntax:- x <- 5

2. Rightward Assignment: Also use the -> operator for assignment, where the variable is specified on the right-hand side.

Syntax:- 5 -> x

3. Equal Sign Assignment: Although less common, you can use the equal sign (=) for assignment as well.

Syntax:- x = 5

3. Data Types

In R, variables are containers used to store data values. These data values can belong to different types, such as numeric, character, logical, and more.

Numeric Data Type

Numeric variables in R represent numerical values, including integers and floating-point numbers.

R
# Numeric variables
x <- 10      # Integer
y <- 3.14    # Floating-point number

# Print variables
print(x)
print(y)

Output:

[1] 10
[1] 3.14

Character Data Type

Character variables store text data, such as strings of characters.

R
# Character variables
name <- "John Doe"
city <- 'New York'

# Print variables
print(name)
print(city)

Output:

[1] "John Doe"
[1] "New York"

Logical Data Type

Logical variables can have only two possible values: TRUE or FALSE, representing boolean values.

R
# Logical variables
is_raining <- TRUE
is_sunny <- FALSE

# Print variables
print(is_raining)
print(is_sunny)

Output:

[1] TRUE
[1] FALSE

Factors Data Type

Factors are used to represent categorical data with a fixed number of unique levels.

R
# Factors variables
gender <- c("Male", "Female", "Male", "Female", "Male")
gender_factor <- factor(gender)

# Print factors
print(gender_factor)

Output:

[1] Male   Female Male   Female Male  
Levels: Female Male

4. Data Structures

Vectors

Vectors are one-dimensional arrays that can hold numeric, character, or logical values. They are created using the c() function.

R
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)

# Character vector
char_vector <- c("apple", "banana", "orange")

# Logical vector
logical_vector <- c(TRUE, FALSE, TRUE)

# Print vectors
print(num_vector)
print(char_vector)
print(logical_vector)

Output:

[1] "apple"  "banana" "orange"
[1] TRUE FALSE TRUE

Lists

Lists are versatile data structures that can hold elements of different data types. They are created using the list() function.

R
# List
my_list <- list(name = "John", age = 30, is_student = TRUE)

# Print list
print(my_list)

Output:

$name
[1] "John"

$age
[1] 30

$is_student
[1] TRUE

Data Frames

Data frames are two-dimensional structures that resemble tables or spreadsheets. They are used to store datasets, with rows representing observations and columns representing variables. Data frames can contain different types of data.

R
# Creating a data frame
df <- data.frame(
  name = c("John", "Emma", "Alice"),
  age = c(25, 30, 35),
  gender = c("Male", "Female", "Female")
)
df

Output:

   name age gender
1 John 25 Male
2 Emma 30 Female
3 Alice 35 Female

Matrices

Matrices are two-dimensional arrays that contain elements of the same data type. They are created using the matrix() function.

R
mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
mat

Output:

     [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

5. Control Structures

Control structures in R are essential for controlling the flow of execution in our code. They allow us to make decisions, repeat tasks, and execute blocks of code conditionally.

If-Else Statements

If-else statements allow you to execute different blocks of code based on whether a condition is true or false.

R
# If-else statement
x <- 10

if (x > 5) {
  print("x is greater than 5")
} else {
  print("x is less than or equal to 5")
}

Output:

[1] "x is greater than 5"

For Loops

For loops are used to iterate over a sequence of values and execute a block of code for each iteration.

R
# For loop
for (i in 1:5) {
  print(paste("Iteration:", i))
}

Output:

[1] "Iteration: 1"
[1] "Iteration: 2"
[1] "Iteration: 3"
[1] "Iteration: 4"
[1] "Iteration: 5"

While Loops

While loops continue executing a block of code as long as a specified condition is true.

R
# While loop
x <- 1

while (x <= 5) {
  print(paste("Value of x:", x))
  x <- x + 1
}

Output:

[1] "Value of x: 1"
[1] "Value of x: 2"
[1] "Value of x: 3"
[1] "Value of x: 4"
[1] "Value of x: 5"

Repeat Loop

Repeat loops repeatedly execute a block of code until a break statement is encountered.

R
# Repeat loop
x <- 1

repeat {
  print(paste("Value of x:", x))
  x <- x + 1
  if (x > 5) {
    break
  }
}

Output:

[1] "Value of x: 1"
[1] "Value of x: 2"
[1] "Value of x: 3"
[1] "Value of x: 4"
[1] "Value of x: 5"

Switch Statement

Switch statements provide a way to select one of many blocks of code to be executed.

R
# Switch statement
day <- "Monday"

switch(day,
       "Monday" = print("It's Monday!"),
       "Tuesday" = print("It's Tuesday!"),
       "Wednesday" = print("It's Wednesday!"),
       "Thursday" = print("It's Thursday!"),
       "Friday" = print("It's Friday!"),
       "Saturday" = print("It's Saturday!"),
       "Sunday" = print("It's Sunday!"))

Output:

[1] "It's Monday!"

6. Functions in R

Functions play a crucial role in R programming, allowing us to encapsulate reusable pieces of code. They enable to break down complex tasks into smaller, manageable units, making our code more modular, readable, and maintainable.

Defining a Function

In R, we can define our own functions using the function() keyword. A function typically consists of a name, a list of parameters (arguments), and a block of code that defines its behavior.

  1. my_function is the name of the function.
  2. x and y are the parameters of the function.
  3. result <- x + y is the code block that computes the result.
  4. return(result) specifies the value that the function should return.
R
# Defining a function
my_function <- function(x, y) {
  result <- x + y
  return(result)
}

Calling a Function

Once a function is defined, we can call it by its name and pass arguments to it.

R
# Calling the function
output <- my_function(3, 5)
print(output) 

Output:

[1] 8

7. Pre-built datasets in R

Pre-built datasets in R are ready-to-use collections of data that come bundled with the R programming language. These datasets cover various topics and are available for users to practice data analysis and visualization without the need to import external data.

R
# List pre-built datasets in R
data()

Output:

Data sets in package ‘datasets’:

AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock
Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in.....................................................................................

8. Visualization with R

In R, visualization is a powerful tool for exploring data, communicating insights, and presenting findings effectively. Several packages offer diverse functionalities for creating various types of plots and graphics. Some popular R packages for visualization:

1. ggplot2: ggplot2 is a versatile and widely used package for creating static, publication-quality graphics. It follows the grammar of graphics paradigm, making it intuitive to use for creating a wide range of visualizations. With ggplot2, users can easily customize plots by adding layers, adjusting aesthetics, and modifying themes.

R
install.packages("ggplot2")
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) + 
  geom_point() + 
  labs(title = "Fuel Efficiency vs Horsepower", 
       x = "Miles per Gallon", y = "Horsepower")

Output:

gh
How To Start Programming With R

2.plotly: plotly is an interactive visualization package that allows users to create web-based, interactive plots. It supports a wide range of chart types, including scatter plots, line plots, bar charts, and 3D plots. plotly visualizations can be easily embedded into websites or shared online.

R
install.packages("plotly")
library(plotly)
plot_ly(mtcars, x = ~mpg, y = ~hp, type = "bar", mode = "markers", 
        marker = list(color = 'rgba(255, 100, 100, 0.5)')) %>%
  layout(title = "Fuel Efficiency vs Horsepower", 
         xaxis = list(title = "Miles per Gallon"), 
         yaxis = list(title = "Horsepower"))

Output:

gh
How To Start Programming With R


3. lattice: lattice is a package for creating trellis plots, which are multi-panel displays of data. It provides a high-level interface for creating conditioned plots, such as scatter plots, histograms, and boxplots, with a single function call.

R
install.packages("lattice")
library(lattice)
xyplot(hp ~ mpg | cyl, data = mtcars, 
       main = "Fuel Efficiency vs Horsepower by Cylinder Count", 
       xlab = "Miles per Gallon", ylab = "Horsepower")

Output:

gh
How To Start Programming With R

9. Data manipulation

Data manipulation involves the process of transforming and modifying data to extract useful information or prepare it for analysis. This can include tasks such as filtering rows, selecting columns, creating new variables, aggregating data, and joining datasets. The dplyr package in R is a powerful tool for data manipulation tasks.

Filtering Rows: Selecting rows based on certain conditions.

R
# Load the dplyr package
library(dplyr)

# Filter cars with mpg greater than 30
filtered_cars <- mtcars %>%
                  filter(mpg > 30)

head(filtered_cars)

Output:

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

10. Exploring Shiny

Shiny is an R package that allows us to build interactive web applications directly from R. It bridges the gap between data analysis in R and web development, enabling us to create interactive dashboards, data visualization tools, and more without requiring knowledge of HTML, CSS, or JavaScript.

Creating a Basic Shiny App with the Iris Dataset

Let's create a basic Shiny app using the iris dataset. Our app will have interactive elements such as dropdown menus to select the axes for plotting, and another dropdown menu to choose different types of plots. Additionally, we'll provide an option to download the plotted image.

R
# Install and load required packages
if (!require("shiny")) install.packages("shiny")
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")

library(shiny)
library(ggplot2)
library(dplyr)

# Define UI
ui <- fluidPage(
  titlePanel("Interactive Iris Data Visualization"),
  
  sidebarLayout(
    sidebarPanel(
      selectInput(inputId = "x_axis",
                  label = "Select X-axis:",
                  choices = c("Sepal Length", "Sepal Width", "Petal Length", 
                              "Petal Width"),
                  selected = "Sepal Length"),
      
      selectInput(inputId = "y_axis",
                  label = "Select Y-axis:",
                  choices = c("Sepal Length", "Sepal Width", "Petal Length", 
                              "Petal Width"),
                  selected = "Sepal Width"),
      
      selectInput(inputId = "plot_type",
                  label = "Select Plot Type:",
                  choices = c("Scatter Plot", "Line Plot", "Bar Plot"),
                  selected = "Scatter Plot"),
      
      downloadButton(outputId = "download_plot", label = "Download Plot")
    ),
    
    mainPanel(
      plotOutput(outputId = "iris_plot")
    )
  )
)

# Define server logic
server <- function(input, output) {
  output$iris_plot <- renderPlot({
    x_var <- switch(input$x_axis,
                    "Sepal Length" = "Sepal.Length",
                    "Sepal Width" = "Sepal.Width",
                    "Petal Length" = "Petal.Length",
                    "Petal Width" = "Petal.Width")
    
    y_var <- switch(input$y_axis,
                    "Sepal Length" = "Sepal.Length",
                    "Sepal Width" = "Sepal.Width",
                    "Petal Length" = "Petal.Length",
                    "Petal Width" = "Petal.Width")
    
    plot_data <- iris
    
    if (input$plot_type == "Scatter Plot") {
      ggplot(plot_data, aes_string(x = x_var, y = y_var)) +
        geom_point() +
        labs(x = input$x_axis, y = input$y_axis, title = "Scatter Plot of Iris Dataset")
    } else if (input$plot_type == "Line Plot") {
      ggplot(plot_data, aes_string(x = x_var, y = y_var, group = "Species", 
                                   color = "Species")) +
        geom_line() +
        labs(x = input$x_axis, y = input$y_axis, title = "Line Plot of Iris Dataset")
    } else if (input$plot_type == "Bar Plot") {
      ggplot(plot_data, aes_string(x = "Species", y = y_var, fill = "Species")) +
        geom_bar(stat = "identity") +
        labs(x = "Species", y = input$y_axis, title = "Bar Plot of Iris Dataset")
    }
  })
  
  output$download_plot <- downloadHandler(
    filename = function() {
      paste("iris_plot_", Sys.Date(), ".png", sep = "")
    },
    
    content = function(file) {
      ggsave(file, plot = output$iris_plot(), device = "png")
    }
  )
}

# Run the application
shinyApp(ui = ui, server = server)

Output:

ezgif-7-c5a62e285b
Basic Shiny App

Conclusion

This guide has covered the basics of programming with R, a user-friendly language designed for data analysis and visualization. We started with installation and explored variables, control structures, functions, and pre-built datasets. We then delved into data analysis and visualization, both with base R functions and specialized packages like ggplot2. Finally, we introduced Shiny for building interactive web applications.


Next Article
Article Tags :

Similar Reads