0% found this document useful (0 votes)
32 views2 pages

Collapse Cheat Sheet

collapse is a C/C++ package for R that provides efficient statistical functions and data manipulation capabilities. It allows fast grouped, weighted, and time series computations on matrices and data frames. collapse handles data transformation uniformly while preserving attributes and ensuring compatibility with packages like dplyr, data.table, and panel data classes. It provides full user control for statistical programming with optimization possibilities.

Uploaded by

maksnecki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Collapse Cheat Sheet

collapse is a C/C++ package for R that provides efficient statistical functions and data manipulation capabilities. It allows fast grouped, weighted, and time series computations on matrices and data frames. collapse handles data transformation uniformly while preserving attributes and ensuring compatibility with packages like dplyr, data.table, and panel data classes. It provides full user control for statistical programming with optimization possibilities.

Uploaded by

maksnecki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Advanced and Fast Data Transformation with collapse : : CHEAT SHEET

Introduction Fast Statistical Functions Grouping and Ordering Fast Data Manipulation
collapse is a C/C++ based package supporting advanced Fast functions to perform column–wise grouped and Optimized functions for grouping, ordering, unique Minimal overhead implementations
(grouped, weighted, time series, panel data and recursive) weighted computations on matrix-like objects values, splitting & recombining, and dealing with factors
fselect[<-]() - select/replace columns
statistical operations in R, with very efficient low-level
vectorizations across both groups and columns. fmean, fmedian, fmode, fsum, fprod, fsd, fvar GRP() - create a grouping object (class ’GRP’): pass to g arg. fsubset() - subset data (rows and columns)
fmin, fmax, fnth, ffirst, flast, fnobs, fndistinct g <- GRP(iris, ~ Species) # or GRP(iris£Species) or GRP(iris["Species"])
It also offers a flexible, class-agnostic, approach to data fndistinct(iris[1:4], g) # Computation without grouping overhead ss() - fast alternative to [, particularly for data frames
transformation in R: handling matrix and data frame based Syntax ## Sepal.Length Sepal.Width Petal.Length Petal.Width [row|col]order[v]() - reorder (sort) rows and columns
objects in a uniform, attribute preserving, way, and ensuring ## setosa 15 16 9 6
## versicolor 21 14 19 9 fmutate(), fsummarise() - dplyr -like, incl. across() feature
seamless compatibility with dplyr / (grouped) tibble, data.table, FUN(x, g = NULL, [w = NULL], TRA = NULL, ## virginica 21 13 20 12
xts, sf and plm classes for panel data (’pseries’, ’pdata.frame’). [na.rm = TRUE], use.g.names = TRUE, [f|set]transform[v][<-]() - transform cols (by reference)
fgroup by() - attach ’GRP’ object to data: a class-agnostic
collapse provides full control to the user for statistical [drop = TRUE], [nthreads = 1L])
grouped frame supporting fast computations fcompute[v]() - compute new cols dropping existing ones
programming - with several ways to reach the same outcome mtcars |> fgroup_by(cyl, vs, am) |> ss(1:2)
and rich optimization possibilities. Its default is na.rm = TRUE, x vector, matrix, or (grouped) data frame / list [f|set]rename() - rename (any object with ’names’ attribute)
## mpg cyl disp hp drat wt qsec vs am gear carb
and implemented at very low cost at the algorithm level. g [optional] (list of) vectors / factors or GRP() object ## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 [set]relabel() - assign/change variable labels (’label’ attr.)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Calling help("collapse-documentation") brings up a w [optional] vector of (frequency) weights ## get vars[<-]() - select/replace columns (standard eval.)
## Grouped by: cyl, vs, am [7 | 5 (3.8) 1-12]
detailed documentation, which is also available online. See TRA [optional] operation to transform data with computed [num|cat|char|fact|logi|date] vars[<-]() - select/
# Group Stats: [N. groups | mean (sd) min-max of group sizes]
also the fastverse package/project for a recommended set of statistics (see FUN argument to TRA() and Examples) # Fast Functions also have a grouped_df method: here wt-weighted medians replace columns by data type or retrieve names/indices
complimentary packages and easy package management. mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(3)
drop drop matrix / data frame dimensions. default TRUE add vars[<-]() - add or column-bind columns
## cyl vs am sum.wt mpg disp hp drat qsec gear carb
## 1 4 0 1 2.140 26.0 120.3 91 4.43 16.70 5 2
Examples Examples
Row/Column Arithmetic (by Reference) fmean(AirPassengers) # Vector
## 2
## 3
4 1 0 8.805 22.8 140.8 95
4 1 1 14.198 30.4 79.0 66
3.70 20.01
4.08 18.61
4
4
2
1
mtcars |> fsubset(mpg > fnth(mpg, 0.95), disp:wt, cylinders = cyl)
Column-wise sweeping out of vectors/matrices/DFs/lists ## [1] 280.2986 GRPN(), fgroup vars(), fungroup() - get group count, ## disp hp drat wt cylinders
fmean(AirPassengers, w = cycle(AirPassengers)) # Weighted mean grouping columns/variables, and ungroup data ## Fiat 128 78.7 66 4.08 2.200 4
%cr%, %c+%, %c-%, %c*%, %c/% e.g. Z = X %c/% rowSums(X) ## [1] 284.3397
## Toyota Corolla 71.1 65 4.22 1.835 4

Row-wise sweeping vectors from vectors/matrices/DFs/lists fmean(EuStockMarkets) # Matrix


qF(), qG() - quick as.factor, and vector grouping object mtcars |> colorder(cyl, vs, am, pos = 'after') |> head(2)

## DAX SMI CAC FTSE


of class ’qG’: a factor-light without levels attribute ## mpg cyl vs am disp hp drat wt qsec gear carb
%rr%, %r+%, %r-%, %r*%, %r/% e.g. Z = X %r/% colSums(X) ## 2530.657 3376.224 2227.828 3565.643
## Mazda RX4 21 6 0 1 160 110 3.9 2.620 16.46 4 4
group() - (multivariate) group id (’qG’) in appearance order ## Mazda RX4 Wag 21 6 0 1 160 110 3.9 2.875 17.02 4 4
Standard (column-wise) math by reference (returns invisibly) fmean(EuStockMarkets, drop = FALSE) # Don't drop dimensions
i <- base::invisible # These are equivalent, the second option is faster:
## DAX SMI CAC FTSE groupid() - run-length-type group id (’qG’) mtcars |> fgroup_by(cyl, vs, am) |> fmutate(sum_mpg = fsum(mpg)) |> i()
%+=%, %-=%, %*=%, %/=% e.g. X %-=% rowSums(X) ## [1,] 2530.657 3376.224 2227.828 3565.643 mtcars |> fmutate(sum_mpg = fsum(mpg, list(cyl, vs, am), TRA = 1)) |> i()
fmean(airquality) # Data Frame (can also use drop = FALSE)
seqid() - group-id from integer-sequences (’qG’) # These are also equivalent (weighted means), again the second is faster
Same thing, also supports row-wise operations by reference mtcars |> fgroup_by(cyl) |> fmutate(across(disp:drat, fmean, wt)) |> i()
## Ozone Solar.R Wind Temp Month Day radixorder[v]() - (multivariate) radix-based ordering mtcars |> ftransformv(disp:drat, fmean, cyl, wt, 1, apply = FALSE) |> i()
setop(X, "/", rowSums(X)) ## 42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
# ftransform()/fcompute() support list input and ignore attached groupings
setop(X, "/", colSums(X), rowwise = TRUE) fmean(iris[1:4], g = iris$Species) # Grouped finteraction() - fast factor interactions (or return ’qG’) mtcars %>% fgroup_by(cyl) %>% ftransform(fselect(., hp:qsec) %>%
## Sepal.Length Sepal.Width Petal.Length Petal.Width fmedian(TRA = 1) %>% fungroup() %>% fsum(TRA = "/")) |> i()
fdroplevels() - fast removal of unused factor levels # Again a faster equivalent: note the use of 'set' to avoid a deep copy
## setosa 5.006 3.428 1.462 0.246
## versicolor 5.936 2.770 4.260 1.326 mtcars %>% ftransform(fselect(., hp:qsec) %>% fmedian(cyl, TRA = 1) %>%
Transform Data by (Grouped) Replacing or f[n]unique() - fast unique values / rows (by columns) fsum(TRA = "/", set = TRUE)) %>% i()
## virginica 6.588 2.974 5.552 2.026
# Aggregation: weighted standard deviations
Sweeping out Statistics (by Reference) X = iris[1:4]; g = iris$Species; w <- abs(rnorm(nrow(X))) gsplit() - fast splitting vector based on ’GRP’ objects mtcars |> fgroup_by(vs) |> fsummarise(across(disp:drat, fsd, w = wt))
fmean(X, g, w) # Grouped and weighted (random weights)
A generalisation of rowwise operations, that also ## Sepal.Length Sepal.Width Petal.Length Petal.Width greorder() - efficiently reorder y = unlist(gsplit(x, g)) ## vs disp hp drat
supports sweeping by groups e.g. aggregate statistics ## 1 0 101.80094 54.79388 0.4249447
## setosa 5.011663 3.467638 1.504067 0.2525002 such that identical(greorder(y, g), x) ## 2 1 56.30073 23.17952 0.4915196
## versicolor 5.930365 2.773558 4.238593 1.3136082
## virginica 6.588903 2.978017 5.552375 2.0221178 # Grouped linear models: .apply = FALSE applies functions to DF subset
TRA(x, STATS, FUN = "-", g = NULL, set = FALSE) collapse optimizes grouping using both factors / ’qG’ objects qTBL(mtcars) |> fgroup_by(vs) |> fsummarise(across(disp:drat,
## Transfomrations: here centering data on the weighted group median
setTRA(x, STATS, FUN = "-", g = NULL) TRA(X, fmedian(X, g, w), "-", g) |> head(3) and ’GRP’ objects. ’GRP’ objects contain most information function(x) list(models = list(lm(disp ~., x))), .apply = FALSE))

## Sepal.Length Sepal.Width Petal.Length Petal.Width


and are thus most efficient for complex computations. ## # A tibble: 2 x 2
x vector, matrix, or (grouped) data frame / list ## vs models
## 1 0.1 0.0 -0.1 0 X <- iris[1:4]; v <- as.character(iris$Species) ## <dbl> <list>
## 2 -0.1 -0.5 -0.1 0 f <- qF(v, na.exclude = FALSE) # Adds 'na.included' class: no NA checks ## 1 0 <lm>
STATS statistics matching (columns of) x (i.e. aggregated ## 3 -0.3 -0.3 -0.2 0 gv <- group(v) # 'qG' object: first appearance order, with 'na.included' ## 2 1 <lm>
vector, matrix or data frame / list) fmedian(X, g, w, TRA = "-") |> head(3) # Same thing: more compact microbenchmark(fmode(X, v), fmode(X, f), fmode(X, gv), fmode(X, g))
# Adding some columns. Use ftransform<- to also replace existing ones
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Unit: microseconds add_vars(iris) <- num_vars(iris) |> fsum(TRA = '%') |> add_stub("perc_")
FUN integer/string indicating transformation to perform: ## 1 0.1 0.0 -0.1 0 ## expr min lq mean median uq max neval
## 2 -0.1 -0.5 -0.1 0 ## fmode(X, v) 11.890 12.9150 15.17697 13.3455 13.7350 162.073 100
Int. String Description
## 3 -0.3 -0.3 -0.2 0 ## fmode(X, f) 9.225 9.8195 11.33035 10.0860 10.4550 92.947 100
0 "replace NA" replace missing values in x ## fmode(X, gv) 8.569 9.3480 10.73667 9.6555 10.1065 73.021 100
1 "replace fill" replace data and missing values in x fmedian(X, g, w, "-", set = TRUE) # Modify in-place (same as setTRA()) ## fmode(X, g) 6.683 7.2980 7.71620 7.5440 7.7490 13.489 100 Multi-Type Aggregation
2 "replace" replace data but preserve missing values in x head(iris, 3) # Changed iris too, as X = iris[1:4] did a shallow copy
3 "-" subtract: x - STATS(g) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Convenient interface to complex multi-type aggregations
4 "-+" x - STATS(g) + fmean(STATS, w = GRPN) ## 1 0.1 0.0 -0.1 0 setosa
5 "/" divide: x / STATS(g)
collap(data, by, FUN = fmean, catFUN = fmode,
6 "%" compute percentages: x * 100/STATS(g)
## 2
## 3
-0.1
-0.3
-0.5
-0.3
-0.1
-0.2
0 setosa
0 setosa Quick Conversions cols = NULL, w = NULL, wFUN = fsum,
7 "+" add: x + STATS(g)
Fast and exact conversion of common data objects custom = NULL, keep.col.order = TRUE, ...)
8 "*" multiply: x * STATS(g)
9 "%%" modulus: x %% STATS(g) # Population weighted mean (PCGDP, LIFEEX) & mode (country), and sum(POP)
qM(), qDF(), qDT(), qTBL() - convert vectors, arrays,
10 "-%%" subtract modulus: x - x %% STATS(g) Basic Computing with R Functions data.frames or lists to matrix, data.frame, data.table or tibble
collap(wlddev, country + PCGDP + LIFEEX ~ income, w = ~ POP)
## country income PCGDP LIFEEX POP
g [optional] (list of) vectors / factors or GRP() object Apply R functions to rows or columns (by groups) ## 1 United States High income 31284.7366 75.69257 58840837058
m[r|c]tl() - matrix rows/cols to list, data.frame or data.table ## 2 Ethiopia Low income 557.1427 53.50608 20949161394
set TRUE transforms x by reference. setTRA is dapply(x, FUN, ..., MARGIN = 2) - column/row apply ## 3 India Lower middle income 1238.8280 60.58651 113837684528
qF(), as numeric factor(), as character factor() -
## 4 China Upper middle income 4145.6844 68.26984 119606023798
equivalent to invisible(TRA(..., set = TRUE)) BY(x, g, FUN, ...) - split-apply-combine computing convert to/from factors or all factors in a list / data.frame
Page 1 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08
Advanced Transformations Time Series and Panel Series G(wldi) |> head(2) # default: compute growth of num_vars(), keep ids Recode and Replace Values
Common transformations (in econometrics) Fast and flexible indexed series and data frames: a ## iso3c year G1.decade G1.PCGDP G1.LIFEEX G1.GINI G1.ODA G1.POP recode num(), recode char() - recode numeric / character
## 1 AFG 1960 NA NA NA NA NA NA
modern upgrade of plm’s ’pseries’ and ’pdata.frame’ ## 2 AFG 1961 0 NA 1.590335 NA 98.74969 1.916611
values (+ regex recoding) in matrix-like objects
Scaling, Centering and Averaging
##
fscale(x, g = NULL, w = NULL, na.rm = TRUE, ## Indexed by: iso3c [1] | year [2 (61)]
replace [NA|Inf|outliers]() - replace special values
mean = 0, sd = 1, ...) Turn DF into an ’indexed frame’ using id and/or time vars
data ix = findex by(data, id1, ..., time) settransform(wldi, PCGDP_growth = fgrowth(PCGDP)) pad() - add (missing) observations / rows i.e. expand objects
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, lm(G(PCGDP) ~ L(G(LIFEEX), 0:2), wldi) |> summary() |> coef() |> round(3)
mean = 0, theta = 1, ...) data ix$indexed series - columns are ’indexed series’ ## Estimate Std. Error t value Pr(>|t|)
fbetween(x, g = NULL, w = NULL, na.rm = TRUE,
index df = findex(data ix) - retrieve ’index df’: DF of ids
##
##
(Intercept)
L(G(LIFEEX), 0:2)--
1.718
0.062
0.081 21.256
0.175 0.353
0.000
0.724
(Memory) Efficient Programming
fill = FALSE, ...) ## L(G(LIFEEX), 0:2)L1 0.368 0.220 1.672 0.095 Functions for (memory) efficient R programming
index df = with(data ix, findex(indexed series)) - can ## L(G(LIFEEX), 0:2)L2 0.254 0.173 1.468 0.142
Higher-Dimensional Centering/Avg. and Linear Prediction any|all[v|NA], which[v|NA], %[=|!]=%, copyv, setv, alloc
fetch ’index df’ from ’indexed series’ in any caller environment
fhdwithin(x, fl, w = NULL, na.rm = TRUE, psacf(), pspacf(), psccf() - panel series ACF/PACF/CCF missing cases, na [insert|rm|omit], vlengths, vtypes,
fill = FALSE, lm.method = "qr", ...) data = unindex(data ix) - unindex (also ’indexed series’) psmat() - panel data to array conversion/reshaping vgcd, frange, fnlevels, fn[row|col], fdim, seq [row|col]
fhdbetween() - same arguments as fhdwithin() reindex(data, index = index df) - reindex / new pointers fsubset(wlddev, year %==% 2010) # 2x faster fsubset(wlddev, year == 2010)
attach(mtcars) # Efficient sub-assignment by reference, various options...
Statistical Operators (function shorthands with extra features) ’indexed series’ can be 1-or-2D atomic objects. Vectors / time Summary Statistics setv(am, 0, vs); setv(am, 1:10, vs); setv(am, 1:10, vs[10:20])
STD(), W(), B(), HDW(), HDB() series / matrices can also be indexed directly using:
qsu() - fast (grouped, weighted, panel-decomposed)
reindex(vec/mat, index = vec/index df)
Examples summary statistics for cross-sectional and panel data
# Grouped scaling
is irregular() - irregularity in any index[ed] obj. or time vec # Panel data statistics: overall, on group-means and group-centered data Small (Helper) Functions
qsu(iris, pid = Sepal.Length ~ Species, higher = TRUE)
iris |> fgroup_by(Species) |> fscale() |> head(2) Functions for (meta-)programming and attributes
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width Example: Indexing Panel Data ## N/T Mean SD Min Max Skew Kurt
## 1 setosa 0.2666745 0.1899414 -0.3570112 -0.4364923
## Overall 150 5.8433 0.8281 4.3 7.9 0.3118 2.4264 .c, massign, %=%, vlabels[<-], setLabels, vclasses,
wldi <- wlddev |> findex_by(iso3c, year) # Balanced: 216 countries ## Between 3 5.8433 0.7951 5.006 6.588 -0.2112 1.5
## 2 setosa -0.3007180 -1.1290958 -0.3570112 -0.4364923 fsubset(wldi, 1:2, iso3c, year, PCGDP:POP) namlab, [add|rm] stub, %!in%, ckmatch, all identical,
## Within 50 5.8433 0.5113 4.1553 7.1553 0.1187 3.2633
STD(iris, ~ Species, stub = FALSE) |> invisible() # Same thing + faster ## iso3c year PCGDP LIFEEX GINI ODA POP all obj equal, all funs, set[Dim|Row|Col]names,
# Grouped and weighted scaling. Operators support formulas and keep ids ## 1 AFG 1960 NA 32.446 NA 116769997 8996973 qtab() - faster table() function, incl. weights & custom funs unattrib, setAttrib, copyAttrib, copyMostAttrib
STD(mtcars, mpg + carb ~ cyl, w = ~ wt) |> head(2) ## 2 AFG 1961 NA 32.962 NA 232080002 9169410
## cyl wt STD.mpg STD.carb ## descr() - detailed statistical description of data.frame .c(var1, var2, var3) # Non-standard concatenation

## Mazda RX4 6 2.620 0.9691687 0.386125 ## Indexed by: iso3c [1] | year [2 (61)] ## [1] "var1" "var2" "var3"
## Mazda RX4 Wag 6 2.875 0.9691687 0.386125
varying() - check variation within groups (panel-ids) .c(values, vectors) %=% eigen(cov(mtcars)) # Multiple Assignment
# Index stats: [N. ids] | [N. periods (tot.N. periods: (max-min)/GCD)]
# Much shorter than fsubset(mpg > fmean(mpg, cyl, TRA = "replace")) LIFEEXi = wldi$LIFEEX # Indexed series pwcor(), pwcov(), pwnobs() - pairwise correlations, # Variable labels: vlabels[<-], [set]relabel() etc. namlab() shows summary
str(LIFEEXi, strict.width = "cut") namlab(wlddev[c(2, 9)], N = TRUE, Ndist = TRUE, class = TRUE)
mtcars |> fsubset(mpg > B(mpg, cyl)) |> head(2) covariance and obs. (with P-value and pretty printing)
## mpg cyl disp hp drat wt qsec vs am gear carb ## 'indexed_series' num [1:13176] 32.4 33 33.5 34 34.5 ... ## Variable Class N Ndist Label
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 ## - attr(*, "index_df")=Classes 'index_df', 'pindex' and 'data.frame'.. ## 1 iso3c factor 13176 216 Country Code
## ..$ iso3c: Factor w/ 216 levels "ABW","AFG","AGO",..: 2 2 2 2 2 2 .. ## 2 PCGDP numeric 9470 9470 GDP per capita (constant 2010 US$)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1
# Regression with cyl fixed effects - a la Mundlak (1978)
4 4
## ..$ year : Ord.factor w/ 61 levels "1960"<"1961"<..: 1 2 3 4 5 6 7.. List Processing
lm(mpg ~ carb + B(carb, cyl), data = mtcars) |> coef() LIFEEXi[1:7] # Subsetting indexed series Functions to process (nested) lists (of data objects)
##
##
(Intercept)
34.829652
carb B(carb, cyl)
-0.465511 -4.775032
## [1] 32.446 32.962 33.471 33.971 34.463 34.948 35.430
## ldepth() - level of nesting of list API Extensions
# Fast grouped (vs) bivariate regression slopes: mpg ~ carb
## Indexed by: iso3c [1] | year [7 (61)]
is unlistable() - is list composed of atomic objects Shorthands for frequently used functions
mtcars |> fgroup_by(vs) |> fmutate(dm_carb = W(carb)) |> c(is_irregular(LIFEEXi), is_irregular(LIFEEXi[-5])) # Is irregular?
fsummarise(beta = fsum(mpg, dm_carb) %/=% fsum(dm_carb^2)) has elem() - search if list contains certain elements fselect -> slt, fsubset -> sbt, fmutate -> mtt,
## [1] FALSE TRUE
## vs beta
[f/set]transform[v] -> [set]tfm[v], fsummarise ->
## 1 0 -0.5557241 Note: ’indexed series’ and frames are supported via existing get elem() - pull out elements from list / subset list smr, across -> acr, fgroup by -> gby, finteraction
## 2 1 -2.0706468 ’pseries’/’pdata.frame’ methods for time series/panel functions. atomic elem[<-](), list elem[<-]() - get list with atomic / -> itn, findex by -> iby, findex -> ix, frename ->
# Residuals from regressing on 'Petal' vars and 'Species' FE sub-list elements, examining only first level of list rnm, get vars -> gv, num vars -> nv, add vars -> av
fhdwithin(iris[1:2], iris[3:5]) |> head(2) Fast functions to perform time-based computations on
## Sepal.Length Sepal.Width reg elem(), irreg elem() - get full list tree leading to atomic
(irregular) time series and (unbalanced) panel data Namespace masking
## 1 0.14989286 0.1102684 (’regular’) or non-atomic (’irregular’) elements
## 2 -0.05010714 -0.3897316 Can set option(collpse mask = c(...)) with a vector of
# Detrending with country-level cubic polynomials Lags/Leads, Differences, Growth Rates and Cumulative Sums rsplit() - efficient (recursive) splitting
functions starting with f-, to export versions without f-, masking
HDW(wlddev, PCGDP + LIFEEX + POP ~ iso3c * poly(year, 3)) |> head(2) flag(x, n = 1, g = NULL, t = NULL, fill = NA, ...) t list() - efficient list transpose (transpose lists of lists) base R or dplyr. A few keywords exist to mask multiple
## HDW.PCGDP HDW.LIFEEX HDW.POP fdiff(x, n = 1, diff = 1, g = NULL, t = NULL,
## 43 -258.4069 0.2360285 -317459.1 rapply2d() - recursive apply to lists of data objects functions, see help("collapse-options"). This allows clean
fill = NA, log = FALSE, rho = 1, ...)
## 44 -119.5600 0.1136432 -33900.2 & fast code, but poses additional namespace challenges:
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill unlist2d() - recursive row-binding to data.frame
# Note: HD centering/prediction and polynomials requires package 'fixest' # Masking all f- functions and specials n = GRPN and table = qtab
= NA, logdiff = FALSE, scale = 100, power = 1, ...) options(collapse_mask = "all")
fcumsum(x, g = NULL, o = NULL, na.rm = TRUE, Example: Nested Linear Models library(collapse)
fill = FALSE, check.o = TRUE, ...) (dl <- mtcars |> rsplit(mpg + hp + carb ~ vs + am)) |> str(max.level = 2) # The folowing is 100% collapse code, apart from the base pipe
Linear Models ## List of 2
wlddev |>
Statistical Operators: L(), F(), D(), Dlog(), G() ## $ 0:List of 2
## ..$ 0:'data.frame': 12 obs. of 3 variables: subset(year >= 1990) |>
Fast (barebones) linear model fitting with 6 different solvers group_by(year) |>
## ..$ 1:'data.frame': 6 obs. of 3 variables:
flm(y, X, w = NULL, add.icpt = FALSE, method = "lm") Example: Computing Growth Rates ## $ 1:List of 2 summarise(n = n(), across(PCGDP:GINI, mean, w = POP))
## ..$ 0:'data.frame': 7 obs. of 3 variables:
Fast R2 -based F-test of exclusion restrictions for lm’s (with FE) # Ad-hoc use: note that G() supports formulas which fgrowth() doesn't
## ..$ 1:'data.frame': 7 obs. of 3 variables: with(mtcars, table(cyl, vs, am))
fgrowth(AirPassengers) |> head()
fFtest(y, exc, X = NULL, w = NULL, full.df = TRUE) nest_lm <- dl |> rapply2d(lm, formula = mpg ~ .)
sum(mtcars)
## [1] NA 5.357143 11.864407 -2.272727 -6.201550 11.570248 diff(EuStockMarkets)
(nest_coef <- nest_lm |> rapply2d(summary, classes = "lm") |> droplevels(wlddev)
Both functions also have formula interfaces: G(wlddev, c(1, 10), by = PCGDP ~ iso3c, t = ~ year) |> ss(11:12) get_elem("coefficients")) |> str(give.attr = FALSE, strict = "cut") mean(nv(iris), g = iris$Species)
flm(cbind(mpg, disp) ~ hp + carb, weights = wt, mtcars) ## iso3c year G1.PCGDP L10G1.PCGDP ## List of 2 scale(nv(GGDC10S), g = GGDC10S$Variable)
## 1 AFG 1970 NA NA ## $ 0:List of 2 unique(GGDC10S, cols = c("Variable", "Country"))
## mpg disp
## 2 AFG 1971 NA NA ## ..$ 0: num [1:3, 1:4] 15.8791 0.0683 -4.5715 3.655 0.0345 ... range(wlddev$date)
## (Intercept) 28.48401839 42.155002
## hp -0.06834996 2.101036 wlddev |> fgroup_by(iso3c) |> fselect(iso3c, year, PCGDP, LIFEEX) |> ## ..$ 1: num [1:3, 1:4] 26.9556 -0.0319 -0.308 2.293 0.0149 ...
## carb 0.33207257 -38.183910 fmutate(PCGDP_growth = fgrowth(PCGDP, t = year)) |> head(2) ## $ 1:List of 2 wlddev |>
## iso3c year PCGDP LIFEEX PCGDP_growth ## ..$ 0: num [1:3, 1:4] 30.896903 -0.099403 -0.000332 3.346033 0.035.. index_by(iso3c, year) |>
# Test the exclusion of cyl-dummies and hp.
## 1 AFG 1960 NA 32.446 NA ## ..$ 1: num [1:3, 1:4] 37.0012 -0.1155 0.4762 7.3316 0.0894 ... mutate(PCGDP_lag = lag(PCGDP),
fFtest(mpg ~ qF(cyl) + hp | carb + qF(am), weights = wt, mtcars)
## 2 AFG 1961 NA 32.962 NA nest_coef |> unlist2d(c("vs", "am"), row.names = "variable") |> head(2) PCGDP_diff = PCGDP - PCGDP_lag,
## R-Sq. DF1 DF2 F-Stat. P-Value PCGDP_growth = growth(PCGDP)) |> unindex()
## Full Model 0.812 5 26 22.479 0.000 settransform(wlddev, PCGDP_growth = G(PCGDP, g = iso3c, t = year)) ## vs am variable Estimate Std. Error t value Pr(>|t|)
## Restricted Model 0.674 2 29 30.041 0.000 # Note: can omit t -> requires consecutive observations and groups ## 1 0 0 (Intercept) 15.87914500 3.65495315 4.344555 0.001865018 The best way to set this option is inside an .Rprofile file
## Exclusion Rest. 0.138 3 26 6.351 0.002 # Usage with indexed series / frames: ## 2 0 0 hp 0.06832467 0.03449076 1.980956 0.078938069
placed in the user or project directory. Use it carefully.
Page 2 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08

You might also like