Collapse Cheat Sheet

collapse is a C/C++ package for R that provides efficient statistical functions and data manipulation capabilities. It allows fast grouped, weighted, and time series computations on matrices and data frames. collapse handles data transformation uniformly while preserving attributes and ensuring compatibility with packages like dplyr, data.table, and panel data classes. It provides full user control for statistical programming with optimization possibilities.

Uploaded by

maksnecki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views2 pages

Collapse Cheat Sheet

Uploaded by

maksnecki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Advanced and Fast Data Transformation with collapse : : CHEAT SHEET

Introduction Fast Statistical Functions Grouping and Ordering Fast Data Manipulation
collapse is a C/C++ based package supporting advanced Fast functions to perform column–wise grouped and Optimized functions for grouping, ordering, unique Minimal overhead implementations
(grouped, weighted, time series, panel data and recursive) weighted computations on matrix-like objects values, splitting & recombining, and dealing with factors
fselect[<-]() - select/replace columns
statistical operations in R, with very efficient low-level
vectorizations across both groups and columns. fmean, fmedian, fmode, fsum, fprod, fsd, fvar GRP() - create a grouping object (class ’GRP’): pass to g arg. fsubset() - subset data (rows and columns)
fmin, fmax, fnth, ffirst, flast, fnobs, fndistinct g <- GRP(iris, ~ Species) # or GRP(iris£Species) or GRP(iris["Species"])
It also offers a flexible, class-agnostic, approach to data fndistinct(iris[1:4], g) # Computation without grouping overhead ss() - fast alternative to [, particularly for data frames
transformation in R: handling matrix and data frame based Syntax ## Sepal.Length Sepal.Width Petal.Length Petal.Width [row|col]order[v]() - reorder (sort) rows and columns
objects in a uniform, attribute preserving, way, and ensuring ## setosa 15 16 9 6
## versicolor 21 14 19 9 fmutate(), fsummarise() - dplyr -like, incl. across() feature
seamless compatibility with dplyr / (grouped) tibble, data.table, FUN(x, g = NULL, [w = NULL], TRA = NULL, ## virginica 21 13 20 12
xts, sf and plm classes for panel data (’pseries’, ’pdata.frame’). [na.rm = TRUE], use.g.names = TRUE, [f|set]transform[v][<-]() - transform cols (by reference)
fgroup by() - attach ’GRP’ object to data: a class-agnostic
collapse provides full control to the user for statistical [drop = TRUE], [nthreads = 1L])
grouped frame supporting fast computations fcompute[v]() - compute new cols dropping existing ones
programming - with several ways to reach the same outcome mtcars |> fgroup_by(cyl, vs, am) |> ss(1:2)
and rich optimization possibilities. Its default is na.rm = TRUE, x vector, matrix, or (grouped) data frame / list [f|set]rename() - rename (any object with ’names’ attribute)
## mpg cyl disp hp drat wt qsec vs am gear carb
and implemented at very low cost at the algorithm level. g [optional] (list of) vectors / factors or GRP() object ## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 [set]relabel() - assign/change variable labels (’label’ attr.)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Calling help("collapse-documentation") brings up a w [optional] vector of (frequency) weights ## get vars[<-]() - select/replace columns (standard eval.)
## Grouped by: cyl, vs, am [7 | 5 (3.8) 1-12]
detailed documentation, which is also available online. See TRA [optional] operation to transform data with computed [num|cat|char|fact|logi|date] vars[<-]() - select/
# Group Stats: [N. groups | mean (sd) min-max of group sizes]
also the fastverse package/project for a recommended set of statistics (see FUN argument to TRA() and Examples) # Fast Functions also have a grouped_df method: here wt-weighted medians replace columns by data type or retrieve names/indices
complimentary packages and easy package management. mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(3)
drop drop matrix / data frame dimensions. default TRUE add vars[<-]() - add or column-bind columns
## cyl vs am sum.wt mpg disp hp drat qsec gear carb
## 1 4 0 1 2.140 26.0 120.3 91 4.43 16.70 5 2
Examples Examples
Row/Column Arithmetic (by Reference) fmean(AirPassengers) # Vector
## 2
## 3
4 1 0 8.805 22.8 140.8 95
4 1 1 14.198 30.4 79.0 66
3.70 20.01
4.08 18.61
4
4
2
1
mtcars |> fsubset(mpg > fnth(mpg, 0.95), disp:wt, cylinders = cyl)
Column-wise sweeping out of vectors/matrices/DFs/lists ## [1] 280.2986 GRPN(), fgroup vars(), fungroup() - get group count, ## disp hp drat wt cylinders
fmean(AirPassengers, w = cycle(AirPassengers)) # Weighted mean grouping columns/variables, and ungroup data ## Fiat 128 78.7 66 4.08 2.200 4
%cr%, %c+%, %c-%, %c*%, %c/% e.g. Z = X %c/% rowSums(X) ## [1] 284.3397
## Toyota Corolla 71.1 65 4.22 1.835 4

Row-wise sweeping vectors from vectors/matrices/DFs/lists fmean(EuStockMarkets) # Matrix

qF(), qG() - quick as.factor, and vector grouping object mtcars |> colorder(cyl, vs, am, pos = 'after') |> head(2)

## DAX SMI CAC FTSE

of class ’qG’: a factor-light without levels attribute ## mpg cyl vs am disp hp drat wt qsec gear carb
%rr%, %r+%, %r-%, %r*%, %r/% e.g. Z = X %r/% colSums(X) ## 2530.657 3376.224 2227.828 3565.643
## Mazda RX4 21 6 0 1 160 110 3.9 2.620 16.46 4 4
group() - (multivariate) group id (’qG’) in appearance order ## Mazda RX4 Wag 21 6 0 1 160 110 3.9 2.875 17.02 4 4
Standard (column-wise) math by reference (returns invisibly) fmean(EuStockMarkets, drop = FALSE) # Don't drop dimensions
i <- base::invisible # These are equivalent, the second option is faster:
## DAX SMI CAC FTSE groupid() - run-length-type group id (’qG’) mtcars |> fgroup_by(cyl, vs, am) |> fmutate(sum_mpg = fsum(mpg)) |> i()
%+=%, %-=%, %*=%, %/=% e.g. X %-=% rowSums(X) ## [1,] 2530.657 3376.224 2227.828 3565.643 mtcars |> fmutate(sum_mpg = fsum(mpg, list(cyl, vs, am), TRA = 1)) |> i()
fmean(airquality) # Data Frame (can also use drop = FALSE)
seqid() - group-id from integer-sequences (’qG’) # These are also equivalent (weighted means), again the second is faster
Same thing, also supports row-wise operations by reference mtcars |> fgroup_by(cyl) |> fmutate(across(disp:drat, fmean, wt)) |> i()
## Ozone Solar.R Wind Temp Month Day radixorder[v]() - (multivariate) radix-based ordering mtcars |> ftransformv(disp:drat, fmean, cyl, wt, 1, apply = FALSE) |> i()
setop(X, "/", rowSums(X)) ## 42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
# ftransform()/fcompute() support list input and ignore attached groupings
setop(X, "/", colSums(X), rowwise = TRUE) fmean(iris[1:4], g = iris$Species) # Grouped finteraction() - fast factor interactions (or return ’qG’) mtcars %>% fgroup_by(cyl) %>% ftransform(fselect(., hp:qsec) %>%
## Sepal.Length Sepal.Width Petal.Length Petal.Width fmedian(TRA = 1) %>% fungroup() %>% fsum(TRA = "/")) |> i()
fdroplevels() - fast removal of unused factor levels # Again a faster equivalent: note the use of 'set' to avoid a deep copy
## setosa 5.006 3.428 1.462 0.246
## versicolor 5.936 2.770 4.260 1.326 mtcars %>% ftransform(fselect(., hp:qsec) %>% fmedian(cyl, TRA = 1) %>%
Transform Data by (Grouped) Replacing or f[n]unique() - fast unique values / rows (by columns) fsum(TRA = "/", set = TRUE)) %>% i()
## virginica 6.588 2.974 5.552 2.026
# Aggregation: weighted standard deviations
Sweeping out Statistics (by Reference) X = iris[1:4]; g = iris$Species; w <- abs(rnorm(nrow(X))) gsplit() - fast splitting vector based on ’GRP’ objects mtcars |> fgroup_by(vs) |> fsummarise(across(disp:drat, fsd, w = wt))
fmean(X, g, w) # Grouped and weighted (random weights)
A generalisation of rowwise operations, that also ## Sepal.Length Sepal.Width Petal.Length Petal.Width greorder() - efficiently reorder y = unlist(gsplit(x, g)) ## vs disp hp drat
supports sweeping by groups e.g. aggregate statistics ## 1 0 101.80094 54.79388 0.4249447
## setosa 5.011663 3.467638 1.504067 0.2525002 such that identical(greorder(y, g), x) ## 2 1 56.30073 23.17952 0.4915196
## versicolor 5.930365 2.773558 4.238593 1.3136082
## virginica 6.588903 2.978017 5.552375 2.0221178 # Grouped linear models: .apply = FALSE applies functions to DF subset
TRA(x, STATS, FUN = "-", g = NULL, set = FALSE) collapse optimizes grouping using both factors / ’qG’ objects qTBL(mtcars) |> fgroup_by(vs) |> fsummarise(across(disp:drat,
## Transfomrations: here centering data on the weighted group median
setTRA(x, STATS, FUN = "-", g = NULL) TRA(X, fmedian(X, g, w), "-", g) |> head(3) and ’GRP’ objects. ’GRP’ objects contain most information function(x) list(models = list(lm(disp ~., x))), .apply = FALSE))

## Sepal.Length Sepal.Width Petal.Length Petal.Width

and are thus most efficient for complex computations. ## # A tibble: 2 x 2
x vector, matrix, or (grouped) data frame / list ## vs models
## 1 0.1 0.0 -0.1 0 X <- iris[1:4]; v <- as.character(iris$Species) ## <dbl> <list>
## 2 -0.1 -0.5 -0.1 0 f <- qF(v, na.exclude = FALSE) # Adds 'na.included' class: no NA checks ## 1 0 <lm>
STATS statistics matching (columns of) x (i.e. aggregated ## 3 -0.3 -0.3 -0.2 0 gv <- group(v) # 'qG' object: first appearance order, with 'na.included' ## 2 1 <lm>
vector, matrix or data frame / list) fmedian(X, g, w, TRA = "-") |> head(3) # Same thing: more compact microbenchmark(fmode(X, v), fmode(X, f), fmode(X, gv), fmode(X, g))
# Adding some columns. Use ftransform<- to also replace existing ones
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Unit: microseconds add_vars(iris) <- num_vars(iris) |> fsum(TRA = '%') |> add_stub("perc_")
FUN integer/string indicating transformation to perform: ## 1 0.1 0.0 -0.1 0 ## expr min lq mean median uq max neval
## 2 -0.1 -0.5 -0.1 0 ## fmode(X, v) 11.890 12.9150 15.17697 13.3455 13.7350 162.073 100
Int. String Description
## 3 -0.3 -0.3 -0.2 0 ## fmode(X, f) 9.225 9.8195 11.33035 10.0860 10.4550 92.947 100
0 "replace NA" replace missing values in x ## fmode(X, gv) 8.569 9.3480 10.73667 9.6555 10.1065 73.021 100
1 "replace fill" replace data and missing values in x fmedian(X, g, w, "-", set = TRUE) # Modify in-place (same as setTRA()) ## fmode(X, g) 6.683 7.2980 7.71620 7.5440 7.7490 13.489 100 Multi-Type Aggregation
2 "replace" replace data but preserve missing values in x head(iris, 3) # Changed iris too, as X = iris[1:4] did a shallow copy
3 "-" subtract: x - STATS(g) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Convenient interface to complex multi-type aggregations
4 "-+" x - STATS(g) + fmean(STATS, w = GRPN) ## 1 0.1 0.0 -0.1 0 setosa
5 "/" divide: x / STATS(g)
collap(data, by, FUN = fmean, catFUN = fmode,
6 "%" compute percentages: x * 100/STATS(g)
## 2
## 3
-0.1
-0.3
-0.5
-0.3
-0.1
-0.2
0 setosa
0 setosa Quick Conversions cols = NULL, w = NULL, wFUN = fsum,
7 "+" add: x + STATS(g)
Fast and exact conversion of common data objects custom = NULL, keep.col.order = TRUE, ...)
8 "*" multiply: x * STATS(g)
9 "%%" modulus: x %% STATS(g) # Population weighted mean (PCGDP, LIFEEX) & mode (country), and sum(POP)
qM(), qDF(), qDT(), qTBL() - convert vectors, arrays,
10 "-%%" subtract modulus: x - x %% STATS(g) Basic Computing with R Functions data.frames or lists to matrix, data.frame, data.table or tibble
collap(wlddev, country + PCGDP + LIFEEX ~ income, w = ~ POP)
## country income PCGDP LIFEEX POP
g [optional] (list of) vectors / factors or GRP() object Apply R functions to rows or columns (by groups) ## 1 United States High income 31284.7366 75.69257 58840837058
m[r|c]tl() - matrix rows/cols to list, data.frame or data.table ## 2 Ethiopia Low income 557.1427 53.50608 20949161394
set TRUE transforms x by reference. setTRA is dapply(x, FUN, ..., MARGIN = 2) - column/row apply ## 3 India Lower middle income 1238.8280 60.58651 113837684528
qF(), as numeric factor(), as character factor() -
## 4 China Upper middle income 4145.6844 68.26984 119606023798
equivalent to invisible(TRA(..., set = TRUE)) BY(x, g, FUN, ...) - split-apply-combine computing convert to/from factors or all factors in a list / data.frame
Page 1 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08
Advanced Transformations Time Series and Panel Series G(wldi) |> head(2) # default: compute growth of num_vars(), keep ids Recode and Replace Values
Common transformations (in econometrics) Fast and flexible indexed series and data frames: a ## iso3c year G1.decade G1.PCGDP G1.LIFEEX G1.GINI G1.ODA G1.POP recode num(), recode char() - recode numeric / character
## 1 AFG 1960 NA NA NA NA NA NA
modern upgrade of plm’s ’pseries’ and ’pdata.frame’ ## 2 AFG 1961 0 NA 1.590335 NA 98.74969 1.916611
values (+ regex recoding) in matrix-like objects
Scaling, Centering and Averaging
##
fscale(x, g = NULL, w = NULL, na.rm = TRUE, ## Indexed by: iso3c [1] | year [2 (61)]
replace [NA|Inf|outliers]() - replace special values
mean = 0, sd = 1, ...) Turn DF into an ’indexed frame’ using id and/or time vars
data ix = findex by(data, id1, ..., time) settransform(wldi, PCGDP_growth = fgrowth(PCGDP)) pad() - add (missing) observations / rows i.e. expand objects
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, lm(G(PCGDP) ~ L(G(LIFEEX), 0:2), wldi) |> summary() |> coef() |> round(3)
mean = 0, theta = 1, ...) data ix$indexed series - columns are ’indexed series’ ## Estimate Std. Error t value Pr(>|t|)
fbetween(x, g = NULL, w = NULL, na.rm = TRUE,
index df = findex(data ix) - retrieve ’index df’: DF of ids
##
##
(Intercept)
L(G(LIFEEX), 0:2)--
1.718
0.062
0.081 21.256
0.175 0.353
0.000
0.724
(Memory) Efficient Programming
fill = FALSE, ...) ## L(G(LIFEEX), 0:2)L1 0.368 0.220 1.672 0.095 Functions for (memory) efficient R programming
index df = with(data ix, findex(indexed series)) - can ## L(G(LIFEEX), 0:2)L2 0.254 0.173 1.468 0.142
Higher-Dimensional Centering/Avg. and Linear Prediction any|all[v|NA], which[v|NA], %[=|!]=%, copyv, setv, alloc
fetch ’index df’ from ’indexed series’ in any caller environment
fhdwithin(x, fl, w = NULL, na.rm = TRUE, psacf(), pspacf(), psccf() - panel series ACF/PACF/CCF missing cases, na [insert|rm|omit], vlengths, vtypes,
fill = FALSE, lm.method = "qr", ...) data = unindex(data ix) - unindex (also ’indexed series’) psmat() - panel data to array conversion/reshaping vgcd, frange, fnlevels, fn[row|col], fdim, seq [row|col]
fhdbetween() - same arguments as fhdwithin() reindex(data, index = index df) - reindex / new pointers fsubset(wlddev, year %==% 2010) # 2x faster fsubset(wlddev, year == 2010)
attach(mtcars) # Efficient sub-assignment by reference, various options...
Statistical Operators (function shorthands with extra features) ’indexed series’ can be 1-or-2D atomic objects. Vectors / time Summary Statistics setv(am, 0, vs); setv(am, 1:10, vs); setv(am, 1:10, vs[10:20])
STD(), W(), B(), HDW(), HDB() series / matrices can also be indexed directly using:
qsu() - fast (grouped, weighted, panel-decomposed)
reindex(vec/mat, index = vec/index df)
Examples summary statistics for cross-sectional and panel data
# Grouped scaling
is irregular() - irregularity in any index[ed] obj. or time vec # Panel data statistics: overall, on group-means and group-centered data Small (Helper) Functions
qsu(iris, pid = Sepal.Length ~ Species, higher = TRUE)
iris |> fgroup_by(Species) |> fscale() |> head(2) Functions for (meta-)programming and attributes
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width Example: Indexing Panel Data ## N/T Mean SD Min Max Skew Kurt
## 1 setosa 0.2666745 0.1899414 -0.3570112 -0.4364923
## Overall 150 5.8433 0.8281 4.3 7.9 0.3118 2.4264 .c, massign, %=%, vlabels[<-], setLabels, vclasses,
wldi <- wlddev |> findex_by(iso3c, year) # Balanced: 216 countries ## Between 3 5.8433 0.7951 5.006 6.588 -0.2112 1.5
## 2 setosa -0.3007180 -1.1290958 -0.3570112 -0.4364923 fsubset(wldi, 1:2, iso3c, year, PCGDP:POP) namlab, [add|rm] stub, %!in%, ckmatch, all identical,
## Within 50 5.8433 0.5113 4.1553 7.1553 0.1187 3.2633
STD(iris, ~ Species, stub = FALSE) |> invisible() # Same thing + faster ## iso3c year PCGDP LIFEEX GINI ODA POP all obj equal, all funs, set[Dim|Row|Col]names,
# Grouped and weighted scaling. Operators support formulas and keep ids ## 1 AFG 1960 NA 32.446 NA 116769997 8996973 qtab() - faster table() function, incl. weights & custom funs unattrib, setAttrib, copyAttrib, copyMostAttrib
STD(mtcars, mpg + carb ~ cyl, w = ~ wt) |> head(2) ## 2 AFG 1961 NA 32.962 NA 232080002 9169410
## cyl wt STD.mpg STD.carb ## descr() - detailed statistical description of data.frame .c(var1, var2, var3) # Non-standard concatenation

## Mazda RX4 6 2.620 0.9691687 0.386125 ## Indexed by: iso3c [1] | year [2 (61)] ## [1] "var1" "var2" "var3"
## Mazda RX4 Wag 6 2.875 0.9691687 0.386125
varying() - check variation within groups (panel-ids) .c(values, vectors) %=% eigen(cov(mtcars)) # Multiple Assignment
# Index stats: [N. ids] | [N. periods (tot.N. periods: (max-min)/GCD)]
# Much shorter than fsubset(mpg > fmean(mpg, cyl, TRA = "replace")) LIFEEXi = wldi$LIFEEX # Indexed series pwcor(), pwcov(), pwnobs() - pairwise correlations, # Variable labels: vlabels[<-], [set]relabel() etc. namlab() shows summary
str(LIFEEXi, strict.width = "cut") namlab(wlddev[c(2, 9)], N = TRUE, Ndist = TRUE, class = TRUE)
mtcars |> fsubset(mpg > B(mpg, cyl)) |> head(2) covariance and obs. (with P-value and pretty printing)
## mpg cyl disp hp drat wt qsec vs am gear carb ## 'indexed_series' num [1:13176] 32.4 33 33.5 34 34.5 ... ## Variable Class N Ndist Label
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 ## - attr(*, "index_df")=Classes 'index_df', 'pindex' and 'data.frame'.. ## 1 iso3c factor 13176 216 Country Code
## ..$ iso3c: Factor w/ 216 levels "ABW","AFG","AGO",..: 2 2 2 2 2 2 .. ## 2 PCGDP numeric 9470 9470 GDP per capita (constant 2010 US$)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1
# Regression with cyl fixed effects - a la Mundlak (1978)
4 4
## ..$ year : Ord.factor w/ 61 levels "1960"<"1961"<..: 1 2 3 4 5 6 7.. List Processing
lm(mpg ~ carb + B(carb, cyl), data = mtcars) |> coef() LIFEEXi[1:7] # Subsetting indexed series Functions to process (nested) lists (of data objects)
##
##
(Intercept)
34.829652
carb B(carb, cyl)
-0.465511 -4.775032
## [1] 32.446 32.962 33.471 33.971 34.463 34.948 35.430
## ldepth() - level of nesting of list API Extensions
# Fast grouped (vs) bivariate regression slopes: mpg ~ carb
## Indexed by: iso3c [1] | year [7 (61)]
is unlistable() - is list composed of atomic objects Shorthands for frequently used functions
mtcars |> fgroup_by(vs) |> fmutate(dm_carb = W(carb)) |> c(is_irregular(LIFEEXi), is_irregular(LIFEEXi[-5])) # Is irregular?
fsummarise(beta = fsum(mpg, dm_carb) %/=% fsum(dm_carb^2)) has elem() - search if list contains certain elements fselect -> slt, fsubset -> sbt, fmutate -> mtt,
## [1] FALSE TRUE
## vs beta
[f/set]transform[v] -> [set]tfm[v], fsummarise ->
## 1 0 -0.5557241 Note: ’indexed series’ and frames are supported via existing get elem() - pull out elements from list / subset list smr, across -> acr, fgroup by -> gby, finteraction
## 2 1 -2.0706468 ’pseries’/’pdata.frame’ methods for time series/panel functions. atomic elem[<-](), list elem[<-]() - get list with atomic / -> itn, findex by -> iby, findex -> ix, frename ->
# Residuals from regressing on 'Petal' vars and 'Species' FE sub-list elements, examining only first level of list rnm, get vars -> gv, num vars -> nv, add vars -> av
fhdwithin(iris[1:2], iris[3:5]) |> head(2) Fast functions to perform time-based computations on
## Sepal.Length Sepal.Width reg elem(), irreg elem() - get full list tree leading to atomic
(irregular) time series and (unbalanced) panel data Namespace masking
## 1 0.14989286 0.1102684 (’regular’) or non-atomic (’irregular’) elements
## 2 -0.05010714 -0.3897316 Can set option(collpse mask = c(...)) with a vector of
# Detrending with country-level cubic polynomials Lags/Leads, Differences, Growth Rates and Cumulative Sums rsplit() - efficient (recursive) splitting
functions starting with f-, to export versions without f-, masking
HDW(wlddev, PCGDP + LIFEEX + POP ~ iso3c * poly(year, 3)) |> head(2) flag(x, n = 1, g = NULL, t = NULL, fill = NA, ...) t list() - efficient list transpose (transpose lists of lists) base R or dplyr. A few keywords exist to mask multiple
## HDW.PCGDP HDW.LIFEEX HDW.POP fdiff(x, n = 1, diff = 1, g = NULL, t = NULL,
## 43 -258.4069 0.2360285 -317459.1 rapply2d() - recursive apply to lists of data objects functions, see help("collapse-options"). This allows clean
fill = NA, log = FALSE, rho = 1, ...)
## 44 -119.5600 0.1136432 -33900.2 & fast code, but poses additional namespace challenges:
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill unlist2d() - recursive row-binding to data.frame
# Note: HD centering/prediction and polynomials requires package 'fixest' # Masking all f- functions and specials n = GRPN and table = qtab
= NA, logdiff = FALSE, scale = 100, power = 1, ...) options(collapse_mask = "all")
fcumsum(x, g = NULL, o = NULL, na.rm = TRUE, Example: Nested Linear Models library(collapse)
fill = FALSE, check.o = TRUE, ...) (dl <- mtcars |> rsplit(mpg + hp + carb ~ vs + am)) |> str(max.level = 2) # The folowing is 100% collapse code, apart from the base pipe
Linear Models ## List of 2
wlddev |>
Statistical Operators: L(), F(), D(), Dlog(), G() ## $ 0:List of 2
## ..$ 0:'data.frame': 12 obs. of 3 variables: subset(year >= 1990) |>
Fast (barebones) linear model fitting with 6 different solvers group_by(year) |>
## ..$ 1:'data.frame': 6 obs. of 3 variables:
flm(y, X, w = NULL, add.icpt = FALSE, method = "lm") Example: Computing Growth Rates ## $ 1:List of 2 summarise(n = n(), across(PCGDP:GINI, mean, w = POP))
## ..$ 0:'data.frame': 7 obs. of 3 variables:
Fast R2 -based F-test of exclusion restrictions for lm’s (with FE) # Ad-hoc use: note that G() supports formulas which fgrowth() doesn't
## ..$ 1:'data.frame': 7 obs. of 3 variables: with(mtcars, table(cyl, vs, am))
fgrowth(AirPassengers) |> head()
fFtest(y, exc, X = NULL, w = NULL, full.df = TRUE) nest_lm <- dl |> rapply2d(lm, formula = mpg ~ .)
sum(mtcars)
## [1] NA 5.357143 11.864407 -2.272727 -6.201550 11.570248 diff(EuStockMarkets)
(nest_coef <- nest_lm |> rapply2d(summary, classes = "lm") |> droplevels(wlddev)
Both functions also have formula interfaces: G(wlddev, c(1, 10), by = PCGDP ~ iso3c, t = ~ year) |> ss(11:12) get_elem("coefficients")) |> str(give.attr = FALSE, strict = "cut") mean(nv(iris), g = iris$Species)
flm(cbind(mpg, disp) ~ hp + carb, weights = wt, mtcars) ## iso3c year G1.PCGDP L10G1.PCGDP ## List of 2 scale(nv(GGDC10S), g = GGDC10S$Variable)
## 1 AFG 1970 NA NA ## $ 0:List of 2 unique(GGDC10S, cols = c("Variable", "Country"))
## mpg disp
## 2 AFG 1971 NA NA ## ..$ 0: num [1:3, 1:4] 15.8791 0.0683 -4.5715 3.655 0.0345 ... range(wlddev$date)
## (Intercept) 28.48401839 42.155002
## hp -0.06834996 2.101036 wlddev |> fgroup_by(iso3c) |> fselect(iso3c, year, PCGDP, LIFEEX) |> ## ..$ 1: num [1:3, 1:4] 26.9556 -0.0319 -0.308 2.293 0.0149 ...
## carb 0.33207257 -38.183910 fmutate(PCGDP_growth = fgrowth(PCGDP, t = year)) |> head(2) ## $ 1:List of 2 wlddev |>
## iso3c year PCGDP LIFEEX PCGDP_growth ## ..$ 0: num [1:3, 1:4] 30.896903 -0.099403 -0.000332 3.346033 0.035.. index_by(iso3c, year) |>
# Test the exclusion of cyl-dummies and hp.
## 1 AFG 1960 NA 32.446 NA ## ..$ 1: num [1:3, 1:4] 37.0012 -0.1155 0.4762 7.3316 0.0894 ... mutate(PCGDP_lag = lag(PCGDP),
fFtest(mpg ~ qF(cyl) + hp | carb + qF(am), weights = wt, mtcars)
## 2 AFG 1961 NA 32.962 NA nest_coef |> unlist2d(c("vs", "am"), row.names = "variable") |> head(2) PCGDP_diff = PCGDP - PCGDP_lag,
## R-Sq. DF1 DF2 F-Stat. P-Value PCGDP_growth = growth(PCGDP)) |> unindex()
## Full Model 0.812 5 26 22.479 0.000 settransform(wlddev, PCGDP_growth = G(PCGDP, g = iso3c, t = year)) ## vs am variable Estimate Std. Error t value Pr(>|t|)
## Restricted Model 0.674 2 29 30.041 0.000 # Note: can omit t -> requires consecutive observations and groups ## 1 0 0 (Intercept) 15.87914500 3.65495315 4.344555 0.001865018 The best way to set this option is inside an .Rprofile file
## Exclusion Rest. 0.138 3 26 6.351 0.002 # Usage with indexed series / frames: ## 2 0 0 hp 0.06832467 0.03449076 1.980956 0.078938069
placed in the user or project directory. Use it carefully.
Page 2 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08

Ex1602 Excel 2016 Advanced PDF
No ratings yet
Ex1602 Excel 2016 Advanced PDF
35 pages
Windows Command Prompt A-N
From Everand
Windows Command Prompt A-N
Prometheus MMS
5/5 (2)
Multi-Color Monitor Quick Reference Sheet: Symbol Table
No ratings yet
Multi-Color Monitor Quick Reference Sheet: Symbol Table
1 page
Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
No ratings yet
Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
9 pages
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
No ratings yet
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
14 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Apply Functions With Purrr::: Cheat Sheet
No ratings yet
Apply Functions With Purrr::: Cheat Sheet
2 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
No ratings yet
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
9 pages
R Reference Card
No ratings yet
R Reference Card
1 page
R Functions
No ratings yet
R Functions
8 pages
R Programming
No ratings yet
R Programming
34 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
R Programming Tutorial for Beginners (1)
No ratings yet
R Programming Tutorial for Beginners (1)
7 pages
R_Tutorial
No ratings yet
R_Tutorial
32 pages
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
No ratings yet
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
8 pages
MATLAB For Data Processing and Visualization Quick Reference
No ratings yet
MATLAB For Data Processing and Visualization Quick Reference
11 pages
R
No ratings yet
R
20 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
r file code
No ratings yet
r file code
16 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
M2_DAR_
No ratings yet
M2_DAR_
46 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
R
No ratings yet
R
38 pages
R Exam
No ratings yet
R Exam
18 pages
Introduction To R
No ratings yet
Introduction To R
74 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation Cheatsheet R
No ratings yet
Data Transformation Cheatsheet R
2 pages
Purrr
No ratings yet
Purrr
2 pages
My First Script.r
No ratings yet
My First Script.r
32 pages
R-pres
No ratings yet
R-pres
53 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Practical 1- Basics of R
No ratings yet
Practical 1- Basics of R
8 pages
purrr
No ratings yet
purrr
2 pages
CIND123 Swirl Lesson 15
No ratings yet
CIND123 Swirl Lesson 15
46 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
Base-R
No ratings yet
Base-R
9 pages
R Imp Funtions
No ratings yet
R Imp Funtions
10 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
R-Basics.knit (1)
No ratings yet
R-Basics.knit (1)
13 pages
Rbasics
No ratings yet
Rbasics
96 pages
Data Table
No ratings yet
Data Table
2 pages
Basic R Programming
No ratings yet
Basic R Programming
37 pages
Data Transformation Cheatsheet
No ratings yet
Data Transformation Cheatsheet
2 pages
R Studio
No ratings yet
R Studio
13 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
advance R prog.-1
No ratings yet
advance R prog.-1
24 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Unit 2 Matrices
No ratings yet
Unit 2 Matrices
65 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
24 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Exhibitors Mechanical Week and Exhibition 2022 Final October
No ratings yet
Exhibitors Mechanical Week and Exhibition 2022 Final October
6 pages
De Test Reports 2023
No ratings yet
De Test Reports 2023
117 pages
2000 Bdcmanual
No ratings yet
2000 Bdcmanual
136 pages
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
No ratings yet
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
2 pages
Satellite Journalism - The Big Picture - University of Oxford
No ratings yet
Satellite Journalism - The Big Picture - University of Oxford
109 pages
Module 3
No ratings yet
Module 3
22 pages
Penawaran Perbaikan Chiller Daikin 1-2
No ratings yet
Penawaran Perbaikan Chiller Daikin 1-2
3 pages
Neko 53896dd80d46b474
No ratings yet
Neko 53896dd80d46b474
1 page
210113b I40 Festo Demo Box Overhead Card
No ratings yet
210113b I40 Festo Demo Box Overhead Card
1 page
Acoustic Communication System For Underwater Wireless Sensor Networks
No ratings yet
Acoustic Communication System For Underwater Wireless Sensor Networks
11 pages
Project Management First Exam Review
100% (1)
Project Management First Exam Review
19 pages
Concrete Technologypart 1
No ratings yet
Concrete Technologypart 1
8 pages
Sustainability 13 01174 v2
No ratings yet
Sustainability 13 01174 v2
25 pages
ABATS - Profile
No ratings yet
ABATS - Profile
10 pages
M955G 30B
No ratings yet
M955G 30B
56 pages
Stabox Sam PDF
No ratings yet
Stabox Sam PDF
28 pages
CC StoCast Brick EN Web S973
No ratings yet
CC StoCast Brick EN Web S973
4 pages
Export_29_12_2024 (1)
No ratings yet
Export_29_12_2024 (1)
3 pages
Đề thi học kì 1 môn Tiếng Anh lớp 8 năm 2020-2021 có đáp án - Trường THCS Nguyễn Văn Trỗi (download tai tailieutuoi.com)
No ratings yet
Đề thi học kì 1 môn Tiếng Anh lớp 8 năm 2020-2021 có đáp án - Trường THCS Nguyễn Văn Trỗi (download tai tailieutuoi.com)
8 pages
Buy ebook Insight into wavelets from theory to practice 3rd ed Edition K P Soman cheap price
100% (3)
Buy ebook Insight into wavelets from theory to practice 3rd ed Edition K P Soman cheap price
75 pages
CSD - III & IV Sem MR-21 Syllabus
No ratings yet
CSD - III & IV Sem MR-21 Syllabus
48 pages
Apache ActiveMQ Artemis Documentation PDF
No ratings yet
Apache ActiveMQ Artemis Documentation PDF
322 pages
Machine Learning-Based Approaches For Breast Cancer Detection in Microwave Imaging
No ratings yet
Machine Learning-Based Approaches For Breast Cancer Detection in Microwave Imaging
2 pages
BCA Minor Project Report Format
No ratings yet
BCA Minor Project Report Format
9 pages
CS566 Course Outline Lums PDF
0% (1)
CS566 Course Outline Lums PDF
3 pages
MIPL-J-2433 UNVEILING IN-APP ADS AND UNCOVERING COVERT ATTACKS VIA MOBILE APP-WEB INTERFACE
No ratings yet
MIPL-J-2433 UNVEILING IN-APP ADS AND UNCOVERING COVERT ATTACKS VIA MOBILE APP-WEB INTERFACE
9 pages
Constant-Time Synchronous Binary Counter With Minimal Clock Period
No ratings yet
Constant-Time Synchronous Binary Counter With Minimal Clock Period
5 pages
CN lab report 1
No ratings yet
CN lab report 1
5 pages

Collapse Cheat Sheet

Uploaded by

Collapse Cheat Sheet

Uploaded by

Advanced and Fast Data Transformation with collapse : : CHEAT SHEET

Row-wise sweeping vectors from vectors/matrices/DFs/lists fmean(EuStockMarkets) # Matrix

## DAX SMI CAC FTSE

## Sepal.Length Sepal.Width Petal.Length Petal.Width

You might also like