0% found this document useful (0 votes)

65 views

Enhanced Data

This document provides information on the enhanced data.table package in R. It offers fast and memory efficient file reading/writing, aggregations, updates, and joins. Data tables inherit from data frames and are compatible with functions accepting data frames. The document describes the syntax for subsetting and grouping data tables using the i, j, and by arguments. It provides usage examples and tips for learning features of the data.table package.

Uploaded by

Denis Gontarev

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Enhanced Data

Uploaded by

Denis Gontarev

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Enhanced data.

frame

Description
data.table inherits from data.frame. It offers fast and memory efficient: file reader and
writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and
flexible syntax, for faster development.
It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix. Since
a data.table is adata.frame, it is compatible with R functions and packages that
accept only data.frames.

Type vignette(package="data.table") to get started. The Introduction to

data.table vignette introducesdata.table's x[i, j, by] syntax and is a good place to start.
If you have read the vignettes and the help page below, please read the data.table support guide.
Please check the homepage for up to the minute live NEWS.
Tip: one of the quickest ways to learn the features is to type example(data.table) and study
the output at the prompt.

Usage
data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL,
stringsAsFactors=FALSE)

## S3 method for class 'data.table'

x[i, j, by, keyby, with = TRUE,
nomatch = getOption("datatable.nomatch"), # default:
NA_integer_
mult = "all",
roll = FALSE,
rollends = if (roll=="nearest") c(TRUE,TRUE)
else if (roll>=0) c(FALSE,TRUE)
else c(TRUE,FALSE),
which = FALSE,
.SDcols,
verbose = getOption("datatable.verbose"), # default:
FALSE
allow.cartesian = getOption("datatable.allow.cartesian"), # default:
FALSE
drop = NULL, on = NULL]

Arguments
... Just as ... in data.frame. Usual recycling rules are applied to vectors
of different lengths to create a list of equal length vectors.
keep.rownames If ... is a matrix or data.frame, TRUE will retain the rownames of
that object in a column named rn.

check.names Just as check.names in data.frame.

key Character vector of one or more column names which is passed
to setkey. It may be a single comma separated string such
as key="x,y,z", or a vector of names such
askey=c("x","y","z").

stringsAsFactors Logical (default is FALSE). Convert all character columns

to factors?

x A data.table.

i Integer, logical or character vector, single column numeric matrix,

expression of column names, list, data.frame or data.table.

integer and logical vectors work the same way they do

in [.data.frame except logical NAs are treated as FALSE.

expression is evaluated within the frame of the data.table (i.e. it

sees column names as if they are variables) and can evaluate to any of the
other types.

character, list and data.frame input to i is converted into

a data.tableinternally using as.data.table.

If i is a data.table, the columns in i to be matched against x can be

specified using one of these ways:

on argument (see below). It allows for both equi- and the newly
implementednon-equi joins.

If not, x must be keyed. Key can be set using setkey. If i is also keyed,
then first key column of i is matched against first key column of x, second
against second, etc..

If i is not keyed, then first column of i is matched against first key column
of x, second column of i against second key column of x, etc...

This is summarised in code as min(length(key(x)), if

(haskey(i)) length(key(i)) else ncol(i)).

Using on= is recommended (even during keyed joins) as it helps

understand the code better and also allows for non-equi joins.

When the binary operator == alone is used, an equi join is performed. In

SQL terms,x[i] then performs a right join by default. i prefixed
with ! signals a not-join or not-select.
Support for non-equi join was recently implemented, which allows for
other binary operators >=, >, <= and <.
See Keys and fast binary search based subset and Secondary indices and
auto indexing.
Advanced: When i is a single variable name, it is not considered an
expression of column names and is instead evaluated in calling scope.
j When with=TRUE (default), j is evaluated within the frame of the
data.table; i.e., it sees column names as if they are variables. This allows
to not just select columns in j, but also compute on them e.g., x[,
a] and x[, sum(a)] returns x$a andsum(x$a) as a vector
respectively. x[, .(a, b)] and x[, .(sa=sum(a),
sb=sum(b))] returns a two column data.table each, the first
simply selecting columnsa, b and the second computing their sums.

The expression '.()' is a shorthand alias to list(); they both mean the
same. (An exception is made for the use of .() within a call to bquote,
where .() is left unchanged.) As long as j returns a list, each element
of the list becomes a column in the resulting data.table. This is the
default enhanced mode.

When with=FALSE, j can be a vector of column names or positions to

select (as indata.frame), or a logical vector with
length ncol(x) defining columns to select. Note: if a logical vector with
length k < ncol(x) is passed, it will be filled to
lengthncol(x) with FALSE, which is different from data.frame,
where the vector is recycled.

Advanced: j also allows the use of special read-

only symbols: .SD, .N, .I, .GRP,.BY.

Advanced: When i is a data.table, the columns of i can be referred

to in j by using the prefix i., e.g., X[Y, .(val, i.val)].
Here val refers to X's column andi.val Y's.

Advanced: Columns of x can now be referred to using the prefix x. and is

particularly useful during joining to refer to x's join columns as they are
otherwise masked by i's. For example, X[Y, .(x.a-i.a, b),
on="a"].
See Introduction to data.table vignette and examples.
by Column names are seen as if they are variables (as
in j when with=TRUE). Thedata.table is then grouped by
the by and j is evaluated within each group. The order of the rows within
each group is preserved, as is the order of the groups. byaccepts:
A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x]

a list() of expressions of column names: e.g., DT[,

.(sa=sum(a)), by=.(x=x>0, y)]
a single character string containing comma separated column names
(where spaces are significant since column names may contain spaces even
at the start or end): e.g., DT[, sum(a), by="x,y,z"]

a character vector of column names: e.g., DT[, sum(a), by=c("x",

"y")]

or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]

Advanced: When i is
a list (or data.frame or data.table), DT[i, j,
by=.EACHI] evaluates j for the groups in 'DT' that each row in i joins
to. That is, you can join (in i) and aggregate (in j) simultaneously. We
call this grouping by each i. See this StackOverflow answer for a more
detailed explanation until we roll out vignettes.
Advanced: In the X[Y, j] form of grouping, the j expression sees
variables in X first, then Y. We call this join inherited scope. If the variable
is not in X or Y then the calling frame is searched, its calling frame, and so
on in the usual way up to and including the global environment.
keyby Same as by, but with an additional setkey() run on the by columns of
the result, for convenience. It is common practice to use 'keyby=' routinely
when you wish the result to be sorted.
with By default with=TRUE and j is evaluated within the frame of x; column
names can be used as variables.

When with=FALSE j is a character vector of column names, a

numeric/logical vector of column positions to select or of the
form startcol:endcol, and the value returned is always
a data.table. with=FALSE is often useful in data.table to
select columns dynamically. Note that x[, cols, with=FALSE] is
equivalent to x[, .SD, .SDcols=cols].

nomatch Same as nomatch in match. When a row in i has no match

to x, nomatch=NA(default) means NA is returned. 0 means no rows will
be returned for that row of i.
Useoptions(datatable.nomatch=0) to change the default value
(used whennomatch is not supplied).
mult When i is a list (or data.frame or data.table)
and multiple rows in x match to the row in i, mult controls which are
returned: "all" (default), "first" or "last".

roll When i is a data.table and its row matches to all but the last x join
column, and its value in the last i join column falls in a gap (including
after the last observation in x for that group), then:

+Inf (or TRUE) rolls the prevailing value in x forward. It is also known
as last observation carried forward (LOCF).

-Inf rolls backwards instead; i.e., next observation carried backward

(NOCB).
finite positive or negative number limits how far values are carried forward
or backward.
"nearest" rolls the nearest value instead.
Rolling joins apply to the last join column, generally a date but can be any
variable. It is particularly fast using a modified binary search.
A common idiom is to select a contemporaneous regular time series (dts)
across a set of identifiers
(ids): DT[CJ(ids,dts),roll=TRUE] where DT has a 2-column
key (id,date) and CJ stands for cross join.

rollends A logical vector length 2 (a single logical is recycled) indicating whether

values falling before the first value or after the last value for a group should
be rolled as well.
If rollends[2]=TRUE, it will roll the last value forward. TRUE by
default for LOCF and FALSE for NOCB rolls.

If rollends[1]=TRUE, it will roll the first value backward. TRUE by

default for NOCB and FALSE for LOCF rolls.

When roll is a finite number, that limit is also applied when rolling the
ends.
which TRUE returns the row numbers of x that i matches to. If NA, returns the
row numbers of i that have no match in x. By default FALSE and the rows
in x that match are returned.

.SDcols Specifies the columns of x to be included in the special symbol .SD which
stands forSubset of data.table. May be character column names
or numeric positions. This is useful for speed when applying a function
through a subset of (possible very many) columns; e.g., DT[,
lapply(.SD, sum), by="x,y", .SDcols=301:350].

For convenient interactive use, the form startcol:endcol is also

allowed (as in by), e.g., DT[, lapply(.SD, sum), by=x:y,
.SDcols=a:f]

verbose TRUE turns on status and information messages to the console. Turn this
on by default using options(datatable.verbose=TRUE). The
quantity and types of verbosity may be expanded in future.
allow.cartesian FALSE prevents joins that would result in more
than nrow(x)+nrow(i) rows. This is usually caused by duplicate
values in i's join columns, each of which join to the same group in 'x' over
and over again: a misspecified join. Usually this was not intended and the
join needs to be changed. The word 'cartesian' is used loosely in this
context. The traditional cartesian join is (deliberately) difficult to achieve
in data.table: where every row in i joins to every row
in x (a nrow(x)*nrow(i) row result). 'cartesian' is just meant in a
'large multiplicative' sense.
drop Never used by data.table. Do not use. It needs to be here
because data.tableinherits from data.frame. See datatable-faq.

on Indicate which columns in x should be joined with which columns

in i along with the type of binary operator to join with (see non-equi joins
below on this). When specified, this overrides the keys set on x and i.
There are multiple ways of specifying the onargument:

As an unnamed character vector, e.g., X[Y, on=c("a", "b")], used

when columns a and b are common to both X and Y.
Foreign key joins: As a named character vector when the join columns
have different names in X and Y. For example, X[Y, on=c(x1="y1",
x2="y2")]joins X and Y by matching columns x1 and x2 in X with
columns y1 and y2 inY, respectively.
From v1.9.8, you can also express foreign key joins using the binary
operator==, e.g. X[Y, on=c("x1==y1", "x2==y2")].

NB: shorthand like X[Y, on=c("a", V2="b")] is also possible if,

e.g., column "a" is common between the two tables.
For convenience during interactive scenarios, it is also possible to
use .()syntax as X[Y, on=.(a, b)].
From v1.9.8, (non-equi) joins using binary operators >=, >, <=, < are
also possible, e.g., X[Y, on=c("x>=a", "y<=b")], or for
interactive use asX[Y, on=.(x>=a, y<=b)].
See examples as well as Secondary indices and auto indexing.

Details
data.table builds on base R functionality to reduce 2 types of time: programming time (easier
to write, read, debug and maintain), and compute time (fast and memory efficient). The general
form of data.table syntax is:
DT[ i, j, by ] # + extra arguments
| | |
| | -------> grouped by what?
| -------> what to do?
---> on which rows?

The way to read this out loud is: "Take DT, subset rows by i, then compute j grouped by by.
Here are some basic usage examples expanding on this definition. See the vignette (and examples)
for working examples.
X[, a] # return col 'a' from X as vector. If not
found, search in parent frame.
X[, .(a)] # same as above, but return as a data.table.
X[, sum(a)] # return sum(a) as a vector (with same
scoping rules as above)
X[, .(sum(a)), by=c] # get sum(a) grouped by 'c'.
X[, sum(a), by=c] # same as above, .() can be omitted in by on
single expression for convenience
X[, sum(a), by=c:f] # get sum(a) grouped by all columns in
between 'c' and 'f' (both inclusive)

X[, sum(a), keyby=b] # get sum(a) grouped by 'b', and sort that
result by the grouping column 'b'
X[, sum(a), by=b][order(b)] # same order as above, but by chaining
compound expressions
X[c>1, sum(a), by=c] # get rows where c>1 is TRUE, and on those
rows, get sum(a) grouped by 'c'
X[Y, .(a, b), on="c"] # get rows where Y$c == X$c, and select
columns 'X$a' and 'X$b' for those rows
X[Y, .(a, i.a), on="c"] # get rows where Y$c == X$c, and then select
'X$a' and 'Y$a' (=i.a)
X[Y, sum(a*i.a), on="c" by=.EACHI] # for *each* 'Y$c', get sum(a*i.a) on
matching rows in 'X$c'

X[, plot(a, b), by=c] # j accepts any expression, generates plot

for each group and returns no data
# see ?assign to add/update/delete columns by reference using the same
consistent interface
A data.table is a list of vectors, just like a data.frame. However :
1. it never has or uses rownames. Rownames based indexing can be done by setting a key of
one or more columns or done ad-hoc using the on argument (now preferred).
2. it has enhanced functionality in [.data.table for fast joins of keyed tables, fast
aggregation, fast last observation carried forward (LOCF) and fast add/modify/delete of
columns by reference with no copy at all.

See the see also section for the several other methods that are available for operating on
data.tables efficiently.

Note
If keep.rownames or check.names are supplied they must be written in full because R does
not allow partial argument names after '...'. For example, data.table(DF,
keep=TRUE) will create a column called "keep"containing TRUE and this is correct
behaviour; data.table(DF, keep.rownames=TRUE) was intended.

POSIXlt is not supported as a column type because it uses 40 bytes to store a single datetime.
They are implicitly converted to POSIXct type with warning. You may also be interested
in IDateTime instead; it has methods to convert to and from POSIXlt.

Examples
## Not run:
example(data.table) # to run these examples at the prompt
## End(Not run)

DF = data.frame(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
DF
DT
identical(dim(DT), dim(DF)) # TRUE
identical(DF$a, DT$a) # TRUE
is.list(DF) # TRUE
is.list(DT) # TRUE

is.data.frame(DT) # TRUE

tables()

# basic row subset operations

DT[2] # 2nd row
DT[3:2] # 3rd and 2nd row
DT[order(x)] # no need for order(DT$x)
DT[order(x), ] # same as above. The ',' is optional
DT[y>2] # all rows where DT$y > 2
DT[y>2 & v>5] # compound logical expressions
DT[!2:4] # all rows other than 2:4
DT[-(2:4)] # same

# select|compute columns data.table way

DT[, v] # v column (as vector)
DT[, list(v)] # v column (as data.table)
DT[, .(v)] # same as above, .() is a shorthand alias to
list()
DT[, sum(v)] # sum of column v, returned as vector
DT[, .(sum(v))] # same, but return data.table (column
autonamed V1)
DT[, .(sv=sum(v))] # same, but column named "sv"
DT[, .(v, v*2)] # return two column data.table, v and v*2
# subset rows and select|compute data.table way
DT[2:3, sum(v)] # sum(v) over rows 2 and 3, return vector
DT[2:3, .(sum(v))] # same, but return data.table with column V1
DT[2:3, .(sv=sum(v))] # same, but return data.table with column sv
DT[2:5, cat(v, "\n")] # just for j's side effect

# select columns the data.frame way

DT[, 2, with=FALSE] # 2nd column, returns a data.table always
colNum = 2
DT[, colNum, with=FALSE] # same, equivalent to DT[, .SD,
.SDcols=colNum]
DT[["v"]] # same as DT[, v] but much faster

# grouping operations - j and by

DT[, sum(v), by=x] # ad hoc by, order of groups preserved in
result
DT[, sum(v), keyby=x] # same, but order the result on by cols
DT[, sum(v), by=x][order(x)] # same but by chaining expressions together

# fast ad hoc row subsets (subsets as joins)

DT["a", on="x"] # same as x == "a" but uses binary search
(fast)
DT["a", on=.(x)] # same, for convenience, no need to quote
every column
DT[.("a"), on="x"] # same
DT[x=="a"] # same, single "==" internally optimised to
use binary search (fast)
DT[x!="b" | y!=3] # not yet optimized, currently vector scan
subset
DT[.("b", 3), on=c("x", "y")] # join on columns x,y of DT; uses binary
search (fast)
DT[.("b", 3), on=.(x, y)] # same, but using on=.()
DT[.("b", 1:2), on=c("x", "y")] # no match returns NA
DT[.("b", 1:2), on=.(x, y), nomatch=0] # no match row is not returned
DT[.("b", 1:2), on=c("x", "y"), roll=Inf] # locf, nomatch row gets rolled
by previous row
DT[.("b", 1:2), on=.(x, y), roll=-Inf] # nocb, nomatch row gets rolled
by next row
DT["b", sum(v*y), on="x"] # on rows where DT$x=="b",
calculate sum(v*y)

# all together now

DT[x!="a", sum(v), by=x] # get sum(v) by "x" for each i !=
"a"
DT[!"a", sum(v), by=.EACHI, on="x"] # same, but using subsets-as-
joins
DT[c("b","c"), sum(v), by=.EACHI, on="x"] # same
DT[c("b","c"), sum(v), by=.EACHI, on=.(x)] # same, using on=.()

# joins as subsets
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
X

DT[X, on="x"] # right join

X[DT, on="x"] # left join
DT[X, on="x", nomatch=0] # inner join
DT[!X, on="x"] # not join
DT[X, on=c(y="v")] # join using column "y" of DT with
column "v" of X
DT[X, on="y==v"] # same as above (v1.9.8+)

DT[X, on=.(y<=foo)] # NEW non-equi join (v1.9.8+)

DT[X, on="y<=foo"] # same as above
DT[X, on=c("y<=foo")] # same as above
DT[X, on=.(y>=foo)] # NEW non-equi join (v1.9.8+)
DT[X, on=.(x, y<=foo)] # NEW non-equi join (v1.9.8+)
DT[X, .(x,y,x.y,v), on=.(x, y>=foo)] # Select x's join columns as well

DT[X, on="x", mult="first"] # first row of each group

DT[X, on="x", mult="last"] # last row of each group
DT[X, sum(v), by=.EACHI, on="x"] # join and eval j for each row in i
DT[X, sum(v)*foo, by=.EACHI, on="x"] # join inherited scope
DT[X, sum(v)*i.v, by=.EACHI, on="x"] # 'i,v' refers to X's v column
DT[X, on=.(x, v>=v), sum(y)*foo, by=.EACHI] # NEW non-equi join with
by=.EACHI (v1.9.8+)

# setting keys
kDT = copy(DT) # (deep) copy DT to kDT to work with
it.
setkey(kDT,x) # set a 1-column key. No quotes, for
convenience.
setkeyv(kDT,"x") # same (v in setkeyv stands for vector)
v="x"
setkeyv(kDT,v) # same
# key(kDT)<-"x" # copies whole table, please use set*
functions instead
haskey(kDT) # TRUE
key(kDT) # "x"

# fast keyed subsets

kDT["a"] # subset-as-join on *key* column 'x'
kDT["a", on="x"] # same, being explicit using 'on='
(preferred)

# all together
kDT[!"a", sum(v), by=.EACHI] # get sum(v) for each i != "a"

# multi-column key
setkey(kDT,x,y) # 2-column key
setkeyv(kDT,c("x","y")) # same

# fast keyed subsets on multi-column key

kDT["a"] # join to 1st column of key
kDT["a", on="x"] # on= is optional, but is preferred
kDT[.("a")] # same, .() is an alias for list()
kDT[list("a")] # same
kDT[.("a", 3)] # join to 2 columns
kDT[.("a", 3:6)] # join 4 rows (2 missing)
kDT[.("a", 3:6), nomatch=0] # remove missing
kDT[.("a", 3:6), roll=TRUE] # locf rolling join
kDT[.("a", 3:6), roll=Inf] # same as above
kDT[.("a", 3:6), roll=-Inf] # nocb rolling join
kDT[!.("a")] # not join
kDT[!"a"] # same

# more on special symbols, see also ?"special-symbols"

DT[.N] # last row
DT[, .N] # total number of rows in DT
DT[, .N, by=x] # number of rows in each group
DT[, .SD, .SDcols=x:y] # select columns 'x' and 'y'
DT[, .SD[1]] # first row of all columns
DT[, .SD[1], by=x] # first row of 'y' and 'v' for each
group in 'x'
DT[, c(.N, lapply(.SD, sum)), by=x] # get rows *and* sum columns 'v' and
'y' by group
DT[, .I[1], by=x] # row number in DT corresponding to
each group
DT[, grp := .GRP, by=x] # add a group counter column
X[, DT[.BY, y, on="x"], by=x] # join within each group

# add/update/delete by reference (see ?assign)

print(DT[, z:=42L]) # add new column by reference
print(DT[, z:=NULL]) # remove column by reference
print(DT["a", v:=42L, on="x"]) # subassign to existing v column by
reference
print(DT["b", v2:=84L, on="x"]) # subassign to new column by reference
(NA padded)

DT[, m:=mean(v), by=x][] # add new column by reference by group

# NB: postfix [] is shortcut to print()
# advanced usage
DT = data.table(x=rep(c("b","a","c"),each=3), v=c(1,1,1,2,2,1,1,2,2),
y=c(1,3,6), a=1:9, b=9:1)

DT[, sum(v), by=.(y%%2)] # expressions in by

DT[, sum(v), by=.(bool = y%%2)] # same, using a named list to change by
column name
DT[, .SD[2], by=x] # get 2nd row of each group
DT[, tail(.SD,2), by=x] # last 2 rows of each group
DT[, lapply(.SD, sum), by=x] # sum of all (other) columns for each
group
DT[, .SD[which.min(v)], by=x] # nested query by group

DT[, list(MySum=sum(v),
MyMin=min(v),
MyMax=max(v)),
by=.(x, y%%2)] # by 2 expressions

DT[, .(a = .(a), b = .(b)), by=x] # list columns

DT[, .(seq = min(a):max(b)), by=x] # j is not limited to just aggregations
DT[, sum(v), by=x][V1<20] # compound query
DT[, sum(v), by=x][order(-V1)] # ordering results
DT[, c(.N, lapply(.SD,sum)), by=x] # get number of observations and sum
per group
DT[, {tmp <- mean(y);
.(a = a-tmp, b = b-tmp)
}, by=x] # anonymous lambda in 'j', j accepts
any valid
# expression. TO REMEMBER: every
element of
# the list becomes a column in result.
pdf("new.pdf")
DT[, plot(a,b), by=x] # can also plot in 'j'
dev.off()

# using rleid, get max(y) and min of all cols in .SDcols for each consecutive
run of 'v'
DT[, c(.(y=max(y)), lapply(.SD, min)), by=rleid(v), .SDcols=v:b]

# Support guide and links:

# https://github.com/Rdatatable/data.table/wiki/Support

## Not run:
if (interactive()) {
vignette("datatable-intro")
vignette("datatable-reference-semantics")
vignette("datatable-keys-fast-subset")
vignette("datatable-secondary-indices-and-auto-indexing")
vignette("datatable-reshape")
vignette("datatable-faq")

test.data.table() # over 6,000 low level tests

# keep up to date with latest stable version on CRAN
update.packages()

# get the latest devel version (compiled binary for Windows available -- no
tools needed)
# https://github.com/Rdatatable/data.table/wiki/Installation
}

## End(Not run)

Fenwal 732™ Conventional Fire Alarm-Suppression Control Unit
67% (9)
Fenwal 732™ Conventional Fire Alarm-Suppression Control Unit
88 pages
BUP Question Bank.
100% (2)
BUP Question Bank.
36 pages
11 BSS OSS Requirements
100% (1)
11 BSS OSS Requirements
63 pages
Zusi 3 Manual
No ratings yet
Zusi 3 Manual
101 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
38 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
R - Data Input
No ratings yet
R - Data Input
6 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
Appendix A: Review of Data Types in Pascal
No ratings yet
Appendix A: Review of Data Types in Pascal
14 pages
Data Structures: Steven Skiena
No ratings yet
Data Structures: Steven Skiena
25 pages
Converters and Options - Xlwings Dev Documentation PDF
No ratings yet
Converters and Options - Xlwings Dev Documentation PDF
6 pages
The LUA 5.1 Language Short Reference
No ratings yet
The LUA 5.1 Language Short Reference
4 pages
R1 Plots
No ratings yet
R1 Plots
20 pages
Python Pass Keywords Notes
No ratings yet
Python Pass Keywords Notes
8 pages
Data Table PDF
No ratings yet
Data Table PDF
102 pages
تحليل وعرض البيانات
No ratings yet
تحليل وعرض البيانات
7 pages
02 Graphs and Chart in R-2012
No ratings yet
02 Graphs and Chart in R-2012
24 pages
First Course On R
No ratings yet
First Course On R
26 pages
Sas Functions Pocketref
No ratings yet
Sas Functions Pocketref
171 pages
Sas Functions Pocketref
No ratings yet
Sas Functions Pocketref
171 pages
Vectors:: Status Poor, Improved, Excellent
No ratings yet
Vectors:: Status Poor, Improved, Excellent
4 pages
Compiler Construction Chapter 6
No ratings yet
Compiler Construction Chapter 6
111 pages
Chapter 3: Arrays: I. The Array Structure
No ratings yet
Chapter 3: Arrays: I. The Array Structure
5 pages
Matlab Intro
No ratings yet
Matlab Intro
25 pages
Fy Sem2 Notes Unit1
No ratings yet
Fy Sem2 Notes Unit1
21 pages
Python Data Structures
No ratings yet
Python Data Structures
10 pages
R Master Sheet - All codes, inbuilt functions and packages needed for the course
No ratings yet
R Master Sheet - All codes, inbuilt functions and packages needed for the course
2 pages
STATA Programming II
100% (1)
STATA Programming II
2 pages
Chapter 9 Arrays
No ratings yet
Chapter 9 Arrays
32 pages
Data Types
No ratings yet
Data Types
27 pages
Programming in Matlab
No ratings yet
Programming in Matlab
52 pages
Matlab - Tutor2 - Variables and Arrays
No ratings yet
Matlab - Tutor2 - Variables and Arrays
16 pages
Unit 5 r Programming
No ratings yet
Unit 5 r Programming
43 pages
Haskell Cheat Sheet
No ratings yet
Haskell Cheat Sheet
13 pages
R Notes
No ratings yet
R Notes
27 pages
Varlist Exp: Alphabetical List of Common Stata Commands
No ratings yet
Varlist Exp: Alphabetical List of Common Stata Commands
3 pages
MATLAB Reference Card
No ratings yet
MATLAB Reference Card
2 pages
Stacked Bar Graph Stata
100% (1)
Stacked Bar Graph Stata
30 pages
R Codes For Data Analysis
No ratings yet
R Codes For Data Analysis
2 pages
Data Structures Notes
100% (1)
Data Structures Notes
17 pages
CS 106X Sample Final Exam #2: 1. Array List Implementation (Write)
No ratings yet
CS 106X Sample Final Exam #2: 1. Array List Implementation (Write)
9 pages
Final Matlab (DSP Fyl) TOP
No ratings yet
Final Matlab (DSP Fyl) TOP
33 pages
Exercise 1 (4 + 5 + 4 + 5 + 6 24 Points)
No ratings yet
Exercise 1 (4 + 5 + 4 + 5 + 6 24 Points)
10 pages
Unit 5 Advanced Graphics in r
No ratings yet
Unit 5 Advanced Graphics in r
43 pages
Scala Lightning Tour
No ratings yet
Scala Lightning Tour
8 pages
GNU Octave - Matrix Manipulation
No ratings yet
GNU Octave - Matrix Manipulation
7 pages
08102024
No ratings yet
08102024
54 pages
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
No ratings yet
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
1 page
Importing The Files
No ratings yet
Importing The Files
14 pages
Eviews Help
No ratings yet
Eviews Help
7 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Fsharp Cheatsheet
No ratings yet
Fsharp Cheatsheet
17 pages
CH 3
No ratings yet
CH 3
33 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Sets and Functions
From Everand
Exercises of Sets and Functions
Simone Malacrida
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
SPE-182704-MS - A Novel IPR Calculation Technique To Reduce Oscillations in Time-Lagged Network-Reservoir Coupled Modeling Using Analytical Scaling and Fast Marching Method PDF
No ratings yet
SPE-182704-MS - A Novel IPR Calculation Technique To Reduce Oscillations in Time-Lagged Network-Reservoir Coupled Modeling Using Analytical Scaling and Fast Marching Method PDF
12 pages
SPE-87913-PA - A Generalized Wellbore and Surface Facility Model, Fully Coupled To A Reservoir Simulator PDF
No ratings yet
SPE-87913-PA - A Generalized Wellbore and Surface Facility Model, Fully Coupled To A Reservoir Simulator PDF
11 pages
At 03430 WP Beyond Digitalization
No ratings yet
At 03430 WP Beyond Digitalization
22 pages
IPTC-17255-MS RasGas Experience With Production Optimisation System, A Success Story PDF
No ratings yet
IPTC-17255-MS RasGas Experience With Production Optimisation System, A Success Story PDF
11 pages
SPE-177659-MS Field Development and Optimization Plan For Compartmentalized Oil Rim Reservoir
No ratings yet
SPE-177659-MS Field Development and Optimization Plan For Compartmentalized Oil Rim Reservoir
23 pages
Reservoir To Surface Link: Reference Manual
No ratings yet
Reservoir To Surface Link: Reference Manual
372 pages
Spe 96587 PDF
No ratings yet
Spe 96587 PDF
6 pages
Dynamic Process Simulation When Do We Really Need It
No ratings yet
Dynamic Process Simulation When Do We Really Need It
4 pages
Dual ESP Completion System
No ratings yet
Dual ESP Completion System
2 pages
SPE-187601-MS Smart Entry Into Multilateral Wells With Coiled Tubing Fiber Optic Telemetry
No ratings yet
SPE-187601-MS Smart Entry Into Multilateral Wells With Coiled Tubing Fiber Optic Telemetry
11 pages
Hysys Tips and Tricks User Variables To Calculate Erosional Velocity in Dynamic Simulation Models
100% (1)
Hysys Tips and Tricks User Variables To Calculate Erosional Velocity in Dynamic Simulation Models
4 pages
Second Edition, John C. Slater, Mcgraw-Hill, New York: Quantum Theory of Matter
0% (1)
Second Edition, John C. Slater, Mcgraw-Hill, New York: Quantum Theory of Matter
2 pages
Determination of Gas Wel I Del Iverabi I Ity of Vertical Iy Fractured WD Is
100% (1)
Determination of Gas Wel I Del Iverabi I Ity of Vertical Iy Fractured WD Is
16 pages
Reservoir Geometries and Facies Associations of Fluvial Tight-Gas Sands, Williams Fork Formation, Rifle Gap, Colorado
No ratings yet
Reservoir Geometries and Facies Associations of Fluvial Tight-Gas Sands, Williams Fork Formation, Rifle Gap, Colorado
8 pages
SPE 135669 Best Practices For Candidate Selection, Design and Evaluation of Hydraulic Fracture Treatments
No ratings yet
SPE 135669 Best Practices For Candidate Selection, Design and Evaluation of Hydraulic Fracture Treatments
13 pages
Constraining Stochastic Reservoir Models To Dynamic Data: An Integrated Approach
No ratings yet
Constraining Stochastic Reservoir Models To Dynamic Data: An Integrated Approach
14 pages
Spe 140937 Pa
No ratings yet
Spe 140937 Pa
11 pages
Spe 94252 MS PDF
No ratings yet
Spe 94252 MS PDF
9 pages
0 Egan-Schwartz1979 - Article - ChunkingInRecallOfSymbolicDraw (Reading Imp)
No ratings yet
0 Egan-Schwartz1979 - Article - ChunkingInRecallOfSymbolicDraw (Reading Imp)
10 pages
Eee Previous Year Question Paper
No ratings yet
Eee Previous Year Question Paper
8 pages
Best Resume Template Download Free
100% (2)
Best Resume Template Download Free
4 pages
ECE 1161/2161 Embedded Computer System Design 2: Wei Gao
No ratings yet
ECE 1161/2161 Embedded Computer System Design 2: Wei Gao
24 pages
Buy ebook Commodore Perry in the Land of the Shogun Rhoda Blumberg cheap price
No ratings yet
Buy ebook Commodore Perry in the Land of the Shogun Rhoda Blumberg cheap price
38 pages
FKT
No ratings yet
FKT
32 pages
EE 21-Lecture 2 - Diode Construction &amp Characteristics
100% (1)
EE 21-Lecture 2 - Diode Construction &amp Characteristics
6 pages
Attachment - 1537173462120100001 - Attach - 1 - 1537173462120100001 - UniRay Catalog PDF
100% (1)
Attachment - 1537173462120100001 - Attach - 1 - 1537173462120100001 - UniRay Catalog PDF
21 pages
Literature Review For Hotel Reservation System
100% (1)
Literature Review For Hotel Reservation System
5 pages
National Semiconductor: Preliminary
No ratings yet
National Semiconductor: Preliminary
20 pages
Reflection Notes: - Bootstrap
No ratings yet
Reflection Notes: - Bootstrap
5 pages
O Minimo Que Voce Precisa Saber Olavo de Carvalho
0% (2)
O Minimo Que Voce Precisa Saber Olavo de Carvalho
282 pages
Cryptography and Network Security Principles and Practice 7th Edition Stallings Test Bank download
No ratings yet
Cryptography and Network Security Principles and Practice 7th Edition Stallings Test Bank download
11 pages
DP Bluetooth 14044 Drivers
No ratings yet
DP Bluetooth 14044 Drivers
583 pages
AEC-Q007-001_Rev-(中)
No ratings yet
AEC-Q007-001_Rev-(中)
38 pages
Pub Mac Os X Hacks
No ratings yet
Pub Mac Os X Hacks
557 pages
Attendence Management System
100% (2)
Attendence Management System
78 pages
Techniques For Power System Protection and Control: Dr. Mohamed Amer Hassan Abobmahdi
No ratings yet
Techniques For Power System Protection and Control: Dr. Mohamed Amer Hassan Abobmahdi
6 pages
Unit 1 - A&r - Koe-091
No ratings yet
Unit 1 - A&r - Koe-091
20 pages
Wireless Communications 1st Edition Bin Tian - Instantly access the complete ebook with just one click
No ratings yet
Wireless Communications 1st Edition Bin Tian - Instantly access the complete ebook with just one click
56 pages
Unit 7 Assessment - Attempt Review - Saylor Academy
No ratings yet
Unit 7 Assessment - Attempt Review - Saylor Academy
18 pages
Dynamic Data Analysis PDF
No ratings yet
Dynamic Data Analysis PDF
557 pages
Spare Parts For CD / DVD & Phono: Section 05
No ratings yet
Spare Parts For CD / DVD & Phono: Section 05
30 pages
VPN
No ratings yet
VPN
4 pages
TLE-ICT-10-Q4-INC-Week1-4 - (Key Concepts) - 012724
100% (1)
TLE-ICT-10-Q4-INC-Week1-4 - (Key Concepts) - 012724
8 pages

Enhanced Data

Uploaded by

Enhanced Data

Uploaded by

Enhanced data.

Type vignette(package="data.table") to get started. The Introduction to

## S3 method for class 'data.table'

check.names Just as check.names in data.frame.

stringsAsFactors Logical (default is FALSE). Convert all character columns

i Integer, logical or character vector, single column numeric matrix,

integer and logical vectors work the same way they do

expression is evaluated within the frame of the data.table (i.e. it

character, list and data.frame input to i is converted into

If i is a data.table, the columns in i to be matched against x can be

This is summarised in code as min(length(key(x)), if

Using on= is recommended (even during keyed joins) as it helps

When the binary operator == alone is used, an equi join is performed. In

When with=FALSE, j can be a vector of column names or positions to

Advanced: j also allows the use of special read-

Advanced: When i is a data.table, the columns of i can be referred

Advanced: Columns of x can now be referred to using the prefix x. and is

a list() of expressions of column names: e.g., DT[,

a character vector of column names: e.g., DT[, sum(a), by=c("x",

or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]

When with=FALSE j is a character vector of column names, a

nomatch Same as nomatch in match. When a row in i has no match

-Inf rolls backwards instead; i.e., next observation carried backward

rollends A logical vector length 2 (a single logical is recycled) indicating whether

If rollends[1]=TRUE, it will roll the first value backward. TRUE by

For convenient interactive use, the form startcol:endcol is also

on Indicate which columns in x should be joined with which columns

As an unnamed character vector, e.g., X[Y, on=c("a", "b")], used

NB: shorthand like X[Y, on=c("a", V2="b")] is also possible if,

X[, plot(a, b), by=c] # j accepts any expression, generates plot

DF = data.frame(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

# basic row subset operations

# select|compute columns data.table way

# select columns the data.frame way

# grouping operations - j and by

# fast ad hoc row subsets (subsets as joins)

# all together now

DT[X, on="x"] # right join

DT[X, on=.(y<=foo)] # NEW non-equi join (v1.9.8+)

DT[X, on="x", mult="first"] # first row of each group

# fast *keyed* subsets

# fast *keyed* subsets on multi-column key

# more on special symbols, see also ?"special-symbols"

# add/update/delete by reference (see ?assign)

DT[, m:=mean(v), by=x][] # add new column by reference by group

DT[, sum(v), by=.(y%%2)] # expressions in by

DT[, .(a = .(a), b = .(b)), by=x] # list columns

# Support guide and links:

test.data.table() # over 6,000 low level tests

You might also like

# fast keyed subsets

# fast keyed subsets on multi-column key