Title: | Tools for Data Manipulation |
---|---|
Description: | An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to 'dplyr' and 'data.table' for common transformations on data frames to work around non standard evaluation by default. |
Authors: | Sebastian Warnholz [aut, cre] |
Maintainer: | Sebastian Warnholz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.1 |
Built: | 2025-02-01 02:56:34 UTC |
Source: | https://github.com/wahani/dat |
Convert a formula into a function. See map and extract for examples.
## S3 method for class 'formula' as.function(x, ...)
## S3 method for class 'formula' as.function(x, ...)
x |
(formula) see examples |
... |
not used |
An object inheriting from class function.
as.function(~ .)(1) as.function(x ~ x)(1) as.function(f(x, y) ~ c(x, y))(1, 2) as.function(numeric : x ~ x)(1) # check for class as.function(numeric(1) : x ~ x)(1) # check for class + length
as.function(~ .)(1) as.function(x ~ x)(1) as.function(f(x, y) ~ c(x, y))(1, 2) as.function(numeric : x ~ x)(1) # check for class as.function(numeric(1) : x ~ x)(1) # check for class + length
This is a wrapper around rbindlist to preserve the input class.
bindRows(x, id = NULL, useNames = TRUE, fill = TRUE)
bindRows(x, id = NULL, useNames = TRUE, fill = TRUE)
x |
(list) a list of data frames |
id , useNames , fill
|
passed to rbindlist |
If the first element of x
inherits from data.frame
the type
that first element.
x
else.
This is a 'data.table' like implementation of a data.frame. Either dplyr or
data.table is used as backend. The only purpose is to have R CMD check
friendly syntax.
DataFrame(...) as.DataFrame(x, ...) ## Default S3 method: as.DataFrame(x, ...) ## S3 method for class 'data.frame' as.DataFrame(x, ...) ## S3 method for class 'DataFrame' x[i, j, ..., by, sby, drop]
DataFrame(...) as.DataFrame(x, ...) ## Default S3 method: as.DataFrame(x, ...) ## S3 method for class 'data.frame' as.DataFrame(x, ...) ## S3 method for class 'DataFrame' x[i, j, ..., by, sby, drop]
... |
arbitrary number of args
|
x |
(DataFrame | data.frame) |
i |
(logical | numeric | integer | OneSidedFormula | TwoSidedFormula | FormulaList) see the examples. |
j |
(logical | character | TwoSidedFormula | FormulaList | function) character beginning with '^' are interpreted as regular expression |
by , sby
|
(character) variables to group by. by will be used to do transformations within groups. sby will collapse each group to one row. |
drop |
(ignored) never drops the class. |
OneSidedFormula
is always used for subsetting rows.
TwoSidedFormula
is used instead of name-value expressions in
summarise
and mutate
.
data("airquality") dat <- as.DataFrame(airquality) dat[~ Month > 4, ][meanWind ~ mean(Wind), sby = "Month"]["meanWind"] dat[FL(.n ~ mean(.n), .n = c("Wind", "Temp")), sby = "Month"]
data("airquality") dat <- as.DataFrame(airquality) dat[~ Month > 4, ][meanWind ~ mean(Wind), sby = "Month"]["meanWind"] dat[FL(.n ~ mean(.n), .n = c("Wind", "Temp")), sby = "Month"]
Extract elements from an object as S4 generic function. See the examples.
extract(x, ind, ...) ## S4 method for signature 'list,'function'' extract(x, ind, ...) ## S4 method for signature 'atomic,'function'' extract(x, ind, ...) ## S4 method for signature 'ANY,formula' extract(x, ind, ...) ## S4 method for signature 'atomicORlist,numericORintegerORlogical' extract(x, ind, ...) ## S4 method for signature 'ANY,character' extract(x, ind, ...) ## S4 method for signature 'data.frame,character' extract(x, ind, ...) extract2(x, ind, ...) ## S4 method for signature 'atomicORlist,numericORinteger' extract2(x, ind, ...) ## S4 method for signature 'ANY,formula' extract2(x, ind, ...) ## S4 method for signature 'atomicORlist,'function'' extract2(x, ind, ...) ## S4 method for signature 'ANY,character' extract2(x, ind, ...)
extract(x, ind, ...) ## S4 method for signature 'list,'function'' extract(x, ind, ...) ## S4 method for signature 'atomic,'function'' extract(x, ind, ...) ## S4 method for signature 'ANY,formula' extract(x, ind, ...) ## S4 method for signature 'atomicORlist,numericORintegerORlogical' extract(x, ind, ...) ## S4 method for signature 'ANY,character' extract(x, ind, ...) ## S4 method for signature 'data.frame,character' extract(x, ind, ...) extract2(x, ind, ...) ## S4 method for signature 'atomicORlist,numericORinteger' extract2(x, ind, ...) ## S4 method for signature 'ANY,formula' extract2(x, ind, ...) ## S4 method for signature 'atomicORlist,'function'' extract2(x, ind, ...) ## S4 method for signature 'ANY,character' extract2(x, ind, ...)
x |
(atomic | list) a vector. |
ind |
(function | formula | character | numeric | integer | logical) a formula is coerced into a function. For lists the function is applied to each element (and has to return a logical of length 1). For atomics a vectorized function is expected. If you supply an atomic it is used for subsetting. A character of length 1 beginning with "^" is interpreted as regular expression. |
... |
arguments passed to ind. |
extract(1:15, ~ 15 %% . == 0) extract(list(xy = 1, zy = 2), "^z") extract(list(x = 1, z = 2), 1) extract(list(x = 1, y = ""), is.character) # Example: even numbers: is.even <- function(x) (x %% 2) == 0 sum((1:10)[is.even(1:10)]) extract(1:10, ~ . %% 2 == 0) %>% sum extract(1:10, is.even) %>% sum # Example: factors of 15 extract(1:15, ~ 15 %% . == 0) # Example: relative prime numbers gcd <- function(a, b) { .gcd <- function(a, b) if (b == 0) a else Recall(b, a %% b) flatmap(a ~ b, .gcd) } extract(1:10, x ~ gcd(x, 10) == 1) # Example: real prime numbers isPrime <- function(n) { .isPrime <- function(n) { iter <- function(i) { if (i * i > n) TRUE else if (n %% i == 0 || n %% (i + 2) == 0) FALSE else Recall(i + 6) } if (n <= 1) FALSE else if (n <= 3) TRUE else if (n %% 2 == 0 || n %% 3 == 0) FALSE else iter(5) } flatmap(n, x ~ .isPrime(x)) } extract(1:10, isPrime)
extract(1:15, ~ 15 %% . == 0) extract(list(xy = 1, zy = 2), "^z") extract(list(x = 1, z = 2), 1) extract(list(x = 1, y = ""), is.character) # Example: even numbers: is.even <- function(x) (x %% 2) == 0 sum((1:10)[is.even(1:10)]) extract(1:10, ~ . %% 2 == 0) %>% sum extract(1:10, is.even) %>% sum # Example: factors of 15 extract(1:15, ~ 15 %% . == 0) # Example: relative prime numbers gcd <- function(a, b) { .gcd <- function(a, b) if (b == 0) a else Recall(b, a %% b) flatmap(a ~ b, .gcd) } extract(1:10, x ~ gcd(x, 10) == 1) # Example: real prime numbers isPrime <- function(n) { .isPrime <- function(n) { iter <- function(i) { if (i * i > n) TRUE else if (n %% i == 0 || n %% (i + 2) == 0) FALSE else Recall(i + 6) } if (n <= 1) FALSE else if (n <= 3) TRUE else if (n %% 2 == 0 || n %% 3 == 0) FALSE else iter(5) } flatmap(n, x ~ .isPrime(x)) } extract(1:10, isPrime)
Function to dynamically generate formulas - (F)ormula (L)ist - to be used in mutar.
FL(..., .n = NULL, pattern = "\\.n") makeFormulas(..., .n, pattern = "\\.n") ## S3 method for class 'FormulaList' update(object, data, ...)
FL(..., .n = NULL, pattern = "\\.n") makeFormulas(..., .n, pattern = "\\.n") ## S3 method for class 'FormulaList' update(object, data, ...)
... |
(formulas) |
.n |
names to be used in formulas. Can be any object which can be used by extract to select columns. NULL is interpreted to use the formulas without change. |
pattern |
(character) pattern to be replaced in formulas |
object |
(FormulaList) |
data |
(data.frame) |
FL(.n ~ mean(.n), .n = "variable") as(makeFormulas(.n ~ mean(.n), .n = "variable"), "FormulaList")
FL(.n ~ mean(.n), .n = "variable") as(makeFormulas(.n ~ mean(.n), .n = "variable"), "FormulaList")
An implementation of map and flatmap. They support the use of formulas as syntactic sugar for anonymous functions.
map(x, f, ...) ## S4 method for signature 'ANY,formula' map(x, f, ...) ## S4 method for signature 'atomic,'function'' map(x, f, ...) ## S4 method for signature 'list,'function'' map(x, f, p = function(x) TRUE, ...) ## S4 method for signature 'list,numericORcharacteORlogical' map(x, f, ...) ## S4 method for signature 'MList,'function'' map(x, f, ..., simplify = FALSE) ## S4 method for signature 'formula,'function'' map(x, f, ...) flatmap(x, f, ..., flatten = unlist) ## S4 method for signature 'ANY,formula' flatmap(x, f, ..., flatten = unlist) sac(x, f, by, ..., combine = bindRows) ## S4 method for signature 'data.frame,'function'' sac(x, f, by, ..., combine = bindRows) ## S4 method for signature 'ANY,formula' sac(x, f, by, ..., combine = bindRows) vmap(x, f, ..., .mc = min(length(x), detectCores()), .bar = "bar")
map(x, f, ...) ## S4 method for signature 'ANY,formula' map(x, f, ...) ## S4 method for signature 'atomic,'function'' map(x, f, ...) ## S4 method for signature 'list,'function'' map(x, f, p = function(x) TRUE, ...) ## S4 method for signature 'list,numericORcharacteORlogical' map(x, f, ...) ## S4 method for signature 'MList,'function'' map(x, f, ..., simplify = FALSE) ## S4 method for signature 'formula,'function'' map(x, f, ...) flatmap(x, f, ..., flatten = unlist) ## S4 method for signature 'ANY,formula' flatmap(x, f, ..., flatten = unlist) sac(x, f, by, ..., combine = bindRows) ## S4 method for signature 'data.frame,'function'' sac(x, f, by, ..., combine = bindRows) ## S4 method for signature 'ANY,formula' sac(x, f, by, ..., combine = bindRows) vmap(x, f, ..., .mc = min(length(x), detectCores()), .bar = "bar")
x |
(vector | data.frame | formula) if x inherits from data.frame, a data.frame is returned. Use as.list if this is not what you want. When x is a formula it is interpreted to trigger a multivariate map. |
f |
(function | formula | character | logical | numeric) something which can be interpreted as a function. formula objects are coerced to a function. atomics are used for subsetting in each element of x. See the examples. |
... |
further arguments passed to the apply function. |
p |
(function | formula) a predicate function indicating which columns in a data.frame to use in map. This is a filter for the map operation, the full data.frame is returned. |
simplify |
see SIMPLIFY in mapply |
flatten |
(function | formula) a function used to flatten the results. |
by |
(e.g. character) argument is passed to extract to select columns. |
combine |
(function | formula) a function which knows how to combine the list of results. bindRows is the default. |
.mc |
(integer) the number of cores. Passed down to mclapply or mcmapply. |
.bar |
(character) see verboseApply. |
map
will dispatch to lapply. When x
is a
formula this is interpreted as a multivariate map; this is implemented
using mapply
. When x
is a data.frame map
will iterate
over columns, however the return value is a data.frame
. p
can
be used to map over a subset of x
.
flatmap
will dispatch to map
. The result is then wrapped by
flatten
which is unlist by default.
sac
is a naive implementation of split-apply-combine and implemented
using flatmap
.
vmap
is a 'verbose' version of map
and provides a progress bar
and a link to parallel map (mclapply).
map
, flatmap
, and sac
can be extended; they are S4
generic functions. You don't and should not implement a new method for
formulas. This method will coerce a formula into a function and pass it down
to your map(newtype, function) method.
# Sugar for anonymous functions map(data.frame(y = 1:10, z = 2), x ~ x + 1) map(data.frame(y = 1:10, z = 2), x ~ x + 1, is.numeric) map(data.frame(y = 1:10, z = 2), x ~ x + 1, x ~ all(x == 2)) sac(data.frame(y = 1:10, z = 1:2), df ~ data.frame(my = mean(df$y)), "z") # Trigger a multivariate map with a formula map(1:2 ~ 3:4, f(x, y) ~ x + y) map(1:2 ~ 3:4, f(x, y) ~ x + y, simplify = TRUE) map(1:2 ~ 3:4, f(x, y, z) ~ x + y + z, z = 1) # Extracting values from lists map(list(1:2, 3:4), 2) map(list(1:3, 2:5), 2:3) map(list(1:3, 2:5), c(TRUE, FALSE, TRUE)) # Some type checking along the way map(as.numeric(1:2), numeric : x ~ x) map(1:2, integer(1) : x ~ x) map(1:2, numeric(1) : x ~ x + 0.5)
# Sugar for anonymous functions map(data.frame(y = 1:10, z = 2), x ~ x + 1) map(data.frame(y = 1:10, z = 2), x ~ x + 1, is.numeric) map(data.frame(y = 1:10, z = 2), x ~ x + 1, x ~ all(x == 2)) sac(data.frame(y = 1:10, z = 1:2), df ~ data.frame(my = mean(df$y)), "z") # Trigger a multivariate map with a formula map(1:2 ~ 3:4, f(x, y) ~ x + y) map(1:2 ~ 3:4, f(x, y) ~ x + y, simplify = TRUE) map(1:2 ~ 3:4, f(x, y, z) ~ x + y + z, z = 1) # Extracting values from lists map(list(1:2, 3:4), 2) map(list(1:3, 2:5), 2:3) map(list(1:3, 2:5), c(TRUE, FALSE, TRUE)) # Some type checking along the way map(as.numeric(1:2), numeric : x ~ x) map(1:2, integer(1) : x ~ x) map(1:2, numeric(1) : x ~ x + 0.5)
mutar
is literally the same function as [.DataFrame
and can be
used as interface to dplyr or data.table. Other functions here listed are a
convenience to mimic dplyr's syntax in a R CMD check
friendly way.
These functions can also be used with S4 data.frame(s) / data_frame(s) /
data.table(s). They will always try to preserve the input class.
mutar(x, i, j, ..., by, sby, drop) filtar(x, i) sumar(x, ..., by) withReference(expr)
mutar(x, i, j, ..., by, sby, drop) filtar(x, i) sumar(x, ..., by) withReference(expr)
x |
(DataFrame | data.frame) |
i |
(logical | numeric | integer | OneSidedFormula | TwoSidedFormula | FormulaList) see the examples. |
j |
(logical | character | TwoSidedFormula | FormulaList | function) character beginning with '^' are interpreted as regular expression |
... |
arbitrary number of args
|
by |
(character) variables to group by. by will be used to do transformations within groups. sby will collapse each group to one row. |
sby |
(character) variables to group by. by will be used to do transformations within groups. sby will collapse each group to one row. |
drop |
(ignored) never drops the class. |
expr |
(expression) any R expression that should be evaluated using data tables reference semantics on data transformations. |
The real workhorse of this interface is mutar
. All other functions
exist to ease the transition from dplyr.
OneSidedFormula
is always used for subsetting rows.
TwoSidedFormula
is used instead of name-value expressions. Instead of
writing x = 1
you simply write x ~ 1
.
FormulaList
can be used to repeat the same operation on different
columns. See more details in FL.
data("airquality") airquality %>% filtar(~Month > 4) %>% mutar(meanWind ~ mean(Wind), by = "Month") %>% sumar(meanWind ~ mean(Wind), by = "Month") %>% extract("meanWind") airquality %>% sumar( .n ~ mean(.n) | c("Wind", "Temp"), by = "Month" ) # Enable data.tables reference semantics with: withReference({ x <- data.table::data.table(x = 1) mutar(x, y ~ 2) }) ## Not run: # Use dplyr as back-end: options(dat.use.dplyr = TRUE) x <- data.frame(x = 1) mutar(x, y ~ dplyr::n()) ## End(Not run)
data("airquality") airquality %>% filtar(~Month > 4) %>% mutar(meanWind ~ mean(Wind), by = "Month") %>% sumar(meanWind ~ mean(Wind), by = "Month") %>% extract("meanWind") airquality %>% sumar( .n ~ mean(.n) | c("Wind", "Temp"), by = "Month" ) # Enable data.tables reference semantics with: withReference({ x <- data.table::data.table(x = 1) mutar(x, y ~ 2) }) ## Not run: # Use dplyr as back-end: options(dat.use.dplyr = TRUE) x <- data.frame(x = 1) mutar(x, y ~ dplyr::n()) ## End(Not run)
This function replaces elements in a vector. It is a link to replace as a generic function.
replace(x, ind, values, ...) ## S4 method for signature 'ANY,'function'' replace(x, ind, values, ...) ## S4 method for signature 'ANY,formula' replace(x, ind, values, ...) ## S4 method for signature 'ANY,character' replace(x, ind, values, ...)
replace(x, ind, values, ...) ## S4 method for signature 'ANY,'function'' replace(x, ind, values, ...) ## S4 method for signature 'ANY,formula' replace(x, ind, values, ...) ## S4 method for signature 'ANY,character' replace(x, ind, values, ...)
x |
(atomic | list) a vector. |
ind |
used as index for elements to be replaced. See details. |
values |
the values used for replacement. |
... |
arguments passed to |
The idea is to provide a more flexible interface for the
specification of the index. It can be a character, numeric, integer or
logical which is then simply used in base::replace
. It can be a
regular expression in which case x
should be named – a character of
length 1 and a leading "^" is interpreted as regex. When ind
is a
function (or formula) and x
is a list then it should be a predicate
function – see the examples. When x is an atomic the function is applied
on x and the result is used for subsetting.
replace(c(1, 2, NA), is.na, 0) replace(c(1, 2, NA), rep(TRUE, 3), 0) replace(c(1, 2, NA), 3, 0) replace(list(x = 1, y = 2), "x", 0) replace(list(x = 1, y = 2), "^x$", 0) replace(list(x = 1, y = "a"), is.character, NULL)
replace(c(1, 2, NA), is.na, 0) replace(c(1, 2, NA), rep(TRUE, 3), 0) replace(c(1, 2, NA), 3, 0) replace(list(x = 1, y = 2), "x", 0) replace(list(x = 1, y = 2), "^x$", 0) replace(list(x = 1, y = "a"), is.character, NULL)
This apply function has a progress bar and enables computations in parallel. By default it is not verbose. As an interactive version with proper 'verbose' output by default please use vmap.
verboseApply(x, f, ..., .mc = 1, .mapper = mclapply, .bar = "none")
verboseApply(x, f, ..., .mc = 1, .mapper = mclapply, .bar = "none")
x |
(vector) |
f |
(function) |
... |
arguments passed to |
.mc |
(integer) the number of processes to start |
.mapper |
(function) the actual apply function used. Should have an
argument |
.bar |
(character) one in 'none', '.' or 'bar' |
## Not run: verboseApply( 1:4, function(...) Sys.sleep(1), .bar = "bar", .mc = 2 ) ## End(Not run)
## Not run: verboseApply( 1:4, function(...) Sys.sleep(1), .bar = "bar", .mc = 2 ) ## End(Not run)