subsampling

A major challenge in big data statistical analysis is the demand for computing resources. For example, when fitting a logistic regression model to binary response variable with $N \times d$ dimensional covariates, the computational complexity of estimating the coefficients using the IRLS algorithm is $O(\zeta N d^2)$, where $\zeta$ is the number of iteration. When $N$ is large, the cost can be prohibitive, especially if high performance computing resources are unavailable. Subsampling has become a widely used technique to balance the trade-off between computational efficiency and statistical efficiency.

The R package subsampling provides optimal subsampling methods for various statistical models such as generalized linear models (GLMs), softmax (multinomial) regression, rare event logistic regression, quantile regression model and GLMs with rare features. Specialized subsampling techniques are provided to address specific challenges across different models and datasets. With specified model assumptions and subsampling techniques, it draws subsample from the full data, fits model on the subsample and perform statistical inferences.

Installation

You can install the package by

# Install from CRAN
install.packages("subsampling")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("dqksnow/subsampling")

Getting Started

The Online document provides a guidance for quick start.

Example

This is an example of subsampling method on logistic regression:

library(subsampling)
set.seed(1)
N <- 1e4
beta0 <- rep(-0.5, 7)
d <- length(beta0) - 1
corr <- 0.5
sigmax  <- matrix(corr, d, d) + diag(1-corr, d)
X <- MASS::mvrnorm(N, rep(0, d), sigmax)
colnames(X) <- paste("V", 1:ncol(X), sep = "")
P <- 1 - 1 / (1 + exp(beta0[1] + X %*% beta0[-1]))
Y <- rbinom(N, 1, P)
data <- as.data.frame(cbind(Y, X))
formula <- Y ~ .
n.plt <- 200
n.ssp <- 600
ssp.results <- ssp.glm(formula = formula,
                       data = data,
                       n.plt = n.plt,
                       n.ssp = n.ssp,
                       family = "quasibinomial",
                       criterion = "optL",
                       sampling.method = "poisson",
                       likelihood = "weighted"
                       )
summary(ssp.results)
#> Model Summary
#> 
#> Call:
#> 
#> ssp.glm(formula = formula, data = data, n.plt = n.plt, n.ssp = n.ssp, 
#>     family = "quasibinomial", criterion = "optL", sampling.method = "poisson", 
#>     likelihood = "weighted")
#> 
#> Subsample Size:
#>                                
#> 1       Total Sample Size 10000
#> 2 Expected Subsample Size   600
#> 3   Actual Subsample Size   635
#> 4   Unique Subsample Size   635
#> 5  Expected Subample Rate    6%
#> 6    Actual Subample Rate 6.35%
#> 7    Unique Subample Rate 6.35%
#> 
#> Coefficients:
#> 
#>           Estimate Std. Error z value Pr(>|z|)
#> Intercept  -0.4149     0.0924 -4.4920  <0.0001
#> V1         -0.5874     0.1084 -5.4191  <0.0001
#> V2         -0.4723     0.1283 -3.6812   0.0002
#> V3         -0.5492     0.1163 -4.7205  <0.0001
#> V4         -0.4044     0.1173 -3.4471   0.0006
#> V5         -0.3725     0.1234 -3.0177   0.0025
#> V6         -0.6703     0.1138 -5.8929  <0.0001

Acknowledgments

The development of this package was supported by the National Eye Institute of the National Institutes of Health under Award Number R21EY035710.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github		.github
R		R
docs		docs
inst/tinytest		inst/tinytest
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
Makefile		Makefile
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

subsampling

Installation

Getting Started

Example

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

dqksnow/subsampling

Folders and files

Latest commit

History

Repository files navigation

subsampling

Installation

Getting Started

Example

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages