Provides utility functions for, and drawing on, the 'data.table' package. The package also collates useful miscellaneous functions extending base R not available elsewhere. The name is a portmanteau of 'utils' and the author.
My name is Hugh and this is a package of functions I often put in
The package attempts to provide lightweight, fast, and stable functions for common operations.
By lightweight, I mean in terms of dependencies: we import
package:fastmatch which do require compilation, but
in C. Since so many operations handle data frames,
worthwhile -- and besides its compile time is not too onerous.
Otherwise, all dependencies do not require compilation. (I also try to minimize
the cardinality of package imports, but it's mostly the compile time I'm
By fast, I mean essentially as fast as possible without using compilation.
By stable, I mean that unit tests should not change unless the major version also changes. To make this completely transparent, tests include the version of their introduction and are guaranteed to not be modified (not even in the sense of adding extra, independent tests) while the major version is 1. Tests that do not include the version in their filename may be modified from version to version (though this will be avoided).
%<->%To swap values between objects
average_bearing, the bearing bisecting two vectors
dir2, (Windows only) a much faster version of
Mode, statistical mode
replace_pattern_into find-and-replace on a pattern in all files in a directory
samp, a 'safe' version of
drop_empty_colsshould now be faster, especially when there are few empty columns.
rows.out < 1to produce a sample.
weight2rowsis now faster for default arguments, by using the
rep(x, w)trick used in
mutate_ntilenow works for a variable with
quantilebut for weighted data
dplyr::ntilebut for weighted data
mutate_ntileconvenience function for adding new column with
trim_common_affixes, and associated helpers
rows.outargument to specify the number of rows in the result.
RQ(p, yes, no)short for
if (!requireNamespace("p", quietly = TRUE)) yes else no.
isAttachedfor conveniently determining whether a namespace is attached
ahullfor locating rectangles in a plot, as for automatically locating a text box.
Switchvectorized version of
switchto avoid nested
drop_grep is an alias for
if_elsereports a clearer error message when
length(condition) == 1.
Change of stable test:
which_linesto allow multiple lines per file, not just the first (the default).
auc: area under the curve given predicted and actual values.
select_grep: select columns matching a pattern.
dev_copy2a4: convenience function for copying to an A4 PDF.
Other minor changes:
if_elseshould be slightly faster when the condition contains
NAs. before: 3.6 ms now 1.5 ms (for 100,000 entries -- see vignette)
drop_constant_colsfirst checks whether the first and second entries are identical before working out the number of unique values.
%notchin%for a 'safer' alternative to
implies, logical implies.
print_transpose_data_tablefor glimpsing data tables by rows
drop_empty_colsretains non-empty columns when duplicate names are used.
coalesceerrors if there is
...wrongly contains factors.
if_elseto reflect dplyr's formals so it can be a drop-in replacement.
NEWS.mdfile to track changes to the package.
%enotin%avoid misspellings in filters
drop_colrdrop columns matching pattern
ngrepnegate regular expression
set_colsuborderchange the order of some columns without affecting the order of others
weight2rowsconvert a weighted
data.tableto an unweighted one by repeating rows by the weight
if_else: lightweight versions of
set_cols_first|lastnow respects the order of the supplied columns
mutate_othernow accepts a
massargument as another way to generate an 'Other' column.