Working with Sparse Matrices in R Programming
Last Updated :
08 Jun, 2023
Sparse matrices are sparsely populated collections of elements, where there is a very less number of non-null elements. Storage of sparsely populated data in a fully dense matrix leads to increased complexities of time and space. Therefore, the data structures are optimized to store this data much more efficiently and decrease the access time of elements.
Creating a Sparse Matrix
R has an in-built package "matrix" which provides classes for the creation and working with Sparse Matrices in R.
library(Matrix)
The following code snippet illustrates the usage of the matrix library:
R
# installing the matrix library
library('Matrix')
# declaring matrix of 1000 rows and 1000 cols
mat1 <- Matrix(0, nrow = 1000,
ncol = 1000,
sparse = TRUE)
# setting the value at 1st row
# and 1st col to be 1
mat1[1][1]<-5
print ("Size of sparse mat1")
print (object.size(mat1))
Output:
[1] "Size of sparse mat1"
5440 bytes
The space occupied by the sparse matrix decrease largely, because it saves space only for the non-zero values.
Constructing Sparse Matrices From Dense
The dense matrix can be simply created by the in-built matrix() command in R. The dense matrix is then fed as input into the as() function which is embedded implicitly in R. The function has the following signature:
Syntax: as(dense_matrix, type = )
Parameters:
dense_matrix : A numeric or logical array.
type : Default evaluates to dgCMatrix, in case we mention sparseMatrix. This converts the matrix to compressed sparse column( CSC ) format. The other type available is the dgRMatrix, which converts the dense matrix in sparse row format.
The following code snippet indicates the conversion of the dense matrix to Sparse Matrices in R:Â
R
library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 6, 7),
prob = c(0.8, 0.1, 0.1),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
print("Dense Matrix")
print(dense_mat)
# Convert to sparse
sparse_mat <- as(dense_mat,
"sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
Output:
[1] "Dense Matrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 7 6 0 0 0 0
[2,] 0 0 0 0 0 6
[3,] 0 7 0 0 6 0
[4,] 0 6 0 0 0 0
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 7 6 . . . .
[2,] . . . . . 6
[3,] . 7 . . 6 .
[4,] . 6 . . . .
Operations on Sparse Matrices
Various arithmetic and binding operations can be performed on Sparse Matrices in R:
Addition and subtraction by Scalar Value
The scalar values are added or subtracted to all the elements of the Sparse Matrices in R. The resultant matrix is a dense matrix since the scalar value is operated upon by all elements. The following code indicates the usage of + or - operators:
R
# Loading Library
library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 10),
prob = c(0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
# Convert to sparse
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
print("Addition")
# adding a scalar value 5
# to the sparse matrix
print(sparse_mat + 5)
print("Subtraction")
# subtracting a scalar value 1
# to the sparse matrix
print(sparse_mat - 1)
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Addition"
4 x 6 Matrix of class "dgeMatrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 15 15 5 5 5 5
[2,] 5 5 5 5 5 15
[3,] 5 15 5 5 15 5
[4,] 5 15 5 5 5 5
[1] "Subtraction"
4 x 6 Matrix of class "dgeMatrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 9 9 -1 -1 -1 -1
[2,] -1 -1 -1 -1 -1 9
[3,] -1 9 -1 -1 9 -1
[4,] -1 9 -1 -1 -1 -1
Multiplication or Division by Scalar
These operations are performed on all the non-zero elements of the matrix. The resultant matrix is a sparse matrix:Â
R
# library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 10),
prob = c(0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
# Convert to sparse
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
print("Multiplication")
# multiplying a scalar value 10
# to the sparse matrix
print(sparse_mat * 10)
print("Division")
# dividing a scalar value 10
# to the sparse matrix
print(sparse_mat / 10)
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 100 100 . . . .
[2,] . . . . . 100
[3,] . 100 . . 100 .
[4,] . 100 . . . .
[1] "Division"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 1 1 . . . .
[2,] . . . . . 1
[3,] . 1 . . 1 .
[4,] . 1 . . . .
Matrix Multiplication
Matrices can be multiplied with each other, irrespective of sparse or dense. However, the columns of the first matrix should be equal to the rows of the second.
R
library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 10),
prob = c(0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
# Convert to sparse
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
# computing transpose of matrix
transpose_mat = t(sparse_mat)
# computing multiplication of matrix
# and its transpose
mul_mat = sparse_mat %*% transpose_mat
print("Multiplication of Matrices")
print(mul_mat)
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication of Matrices"
4 x 4 sparse Matrix of class "dgCMatrix"
[1,] 200 . 100 100
[2,] . 100 . .
[3,] 100 . 200 100
[4,] 100 . 100 100
Multiplication by a Vector
Matrices can be multiplied by uni-dimensional vectors, to transform data. The rows are multiplied by the corresponding elements of the vector, that is the first row is multiplied by the first indexed element of the vector, until the length of the vector.
R
library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 10),
prob = c(0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
# Convert to sparse
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
# declaring a vector
vec <- c(3, 2)
print("Multiplication by vector")
print(sparse_mat * vec)
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication by vector"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 30 30 . . . .
[2,] . . . . . 20
[3,] . 30 . . 30 .
[4,] . 20 . . . .
Combination of Matrices
Matrices can be combined with vectors or other matrices using column bind cbind( ) or row bind rbind( ) operations. The resultant matrices rows are the summation of the rows of the input matrices in rbind() function and the columns are the summation of the columns of the input matrices in cbind().
R
library(Matrix)
# construct a matrix with values
# 0 with probability 0.80
# 6 with probability 0.10
# 7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
x = c(0, 10),
prob = c(0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
# Convert to sparse
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
# combining matrix through rows
row_bind <- rbind(sparse_mat,
sparse_mat)
# printing matrix after row bind
print ("Row Bind")
print (row_bind)
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Row Bind"
8 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[5,] 10 10 . . . .
[6,] . . . . . 10
[7,] . 10 . . 10 .
[8,] . 10 . . . .
Properties of Sparse Matrices
- NA Values
NA values are not considered equivalent to sparsity and therefore are treated as non-zero values. However, they don't participate in any sparse matrix operations.
R
library(Matrix)
# declaring original matrix
mat <- matrix(data = c(5.5, 0, NA,
0, 0, NA), nrow = 3)
print("Original Matrix")
print(mat)
sparse_mat <- as(mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
Output:
[1] "Original Matrix"
[,1] [,2]
[1,] 5.5 0
[2,] 0.0 0
[3,] NA NA
[1] "Sparse Matrix"
3 x 2 sparse Matrix of class "dgCMatrix"
[1,] 5.5 .
[2,] . .
[3,] NA NA
- Sparse matrix data can be written into an ordinary file in the MatrixMarketformat(.mtx). WriteMM function is available to transfer the data of a sparse matrix into a file.
writeMM(obj-matrix,file="fname.mtx")
Similar Reads
Data Wrangling in R Programming - Working with Tibbles
R is a robust language used by Analysts, Data Scientists, and Business users to perform various tasks such as statistical analysis, visualizations, and developing statistical software in multiple fields.In R Programming Language Data Wrangling is a process of reimaging the raw data to a more structu
6 min read
Lasso Regression in R Programming
Lasso regression is a classification algorithm that uses shrinkage in simple and sparse models(i.e models with fewer parameters). In Shrinkage, data values are shrunk towards a central point like the mean. Lasso regression is a regularized regression algorithm that performs L1 regularization which a
11 min read
Array vs Matrix in R Programming
The data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. The two most important data structures in R ar
3 min read
Ridge Regression in R Programming
Ridge regression is a classification algorithm that works in part as it doesnât require unbiased estimators. Ridge regression minimizes the residual sum of squares of predictors in a given model. Ridge regression includes a shrinks the estimate of the coefficients towards zero. Ridge Regression in R
5 min read
Transporting Sparse Matrix from Python to R
The Sparse matrices are matrices that are predominantly composed of the zero values. They are essential in data science and scientific computing where memory and performance optimizations are crucial. Instead of storing every element sparse matrices only store the non-zero elements drastically reduc
5 min read
Regularization in R Programming
Regularization is a form of regression technique that shrinks or regularizes or constraints the coefficient estimates towards 0 (or zero). In this technique, a penalty is added to the various parameters of the model in order to reduce the freedom of the given model. The concept of Regularization can
7 min read
tidyr Package in R Programming
Packages in the R language are a collection of R functions, compiled code, and sample data. They are stored under a directory called âlibraryâ in the R environment. By default, R installs a set of packages during installation. Â One of the most important packages in R is the tidyr package. The sole p
13 min read
How To Start Programming With R
R Programming Language is designed specifically for data analysis, visualization, and statistical modeling. Here, we'll walk through the basics of programming with R, from installation to writing our first lines of code, best practices, and much more. Table of Content 1. Installation2. Variables and
12 min read
R Tutorial | Learn R Programming Language
R is an interpreted programming language widely used for statistical computing, data analysis and visualization. R language is open-source with large community support. R provides structured approach to data manipulation, along with decent libraries and packages like Dplyr, Ggplot2, shiny, Janitor a
6 min read
Merge two matrices by row names in R
In this article, we will examine various methods to merge two matrices by row names in the R programming language. What is a matrix?A matrix is defined as it is a two-dimensional data set which is the collection of rows and columns. A matrix can have the ability to contain or accept data of the same
4 min read