Provides utility functions for, and drawing on, the 'data.table' package. The package also collates useful miscellaneous functions extending base R not available elsewhere. The name is a portmanteau of 'utils' and the author.
Miscellaneous R functions and aliases
My name is Hugh and this is a package of functions I often put in R/utils.s
.
Hence, hutils
.
The package attempts to provide lightweight, fast, and stable functions for common operations.
By lightweight, I mean in terms of dependencies: we import
package:data.table
and package:fastmatch
which do require compilation, but
in C. Since so many operations handle data frames, data.table
seemed
worthwhile -- and besides its compile time is not too onerous.
Otherwise, all dependencies do not require compilation. (I also try to minimize
the cardinality of package imports, but it's mostly the compile time I'm
focused on.)
By fast, I mean essentially as fast as possible without using compilation.
By stable, I mean that unit tests should not change unless the major version also changes. To make this completely transparent, tests include the version of their introduction and are guaranteed to not be modified (not even in the sense of adding extra, independent tests) while the major version is 1. Tests that do not include the version in their filename may be modified from version to version (though this will be avoided).
New functions:
%<->%
To swap values between objectsaverage_bearing
, the bearing bisecting two vectorsdir2
, (Windows only) a much faster version of dir()
Mode
, statistical modereplace_pattern_in
to find-and-replace on a pattern in all files in a directorysamp
, a 'safe' version of sample
.Enhancements
drop_empty_cols
should now be faster, especially when there are few empty columns.weight2rows
supports rows.out < 1
to produce a sample.weight2rows
is now faster for default arguments, by using the rep(x, w)
trick
used in tidyr::uncount
.mutate_ntile
now works for a variable with DT
find_pattern_in
now accepts file_contents_ignore_case
.find_pattern_in
mayBug fixes:
weight2rows
:
New functions:
weighted_quantile
, like quantile
but for weighted dataweighted_ntile
, like dplyr::ntile
but for weighted datamutate_ntile
convenience function for adding new column with ntile
strim_common_affixes
, and associated helpers longest_prefix
and longest_suffix
.Enhancements:
weight2rows
gains a rows.out
argument to specify the number of rows in the result.New functions:
RQ(p, yes, no)
short for if (!requireNamespace("p", quietly = TRUE)) yes else no
.isAttached
for conveniently determining whether a namespace is attachedahull
for locating rectangles in a plot, as for automatically locating a text box.Switch
vectorized version of switch
to avoid nested if_else
's.drop_grep
is an alias for drop_colr
.
Minor changes:
if_else
reports a clearer error message when length(condition) == 1
.Change of stable test:
Bug fix:
find_pattern_in
respects include.comments
Enhancement:
find_pattern_in
accepts argument which_lines
to allow multiple lines per file, not just the first (the default).New functions:
auc
: area under the curve given predicted and actual values.select_grep
: select columns matching a pattern.dev_copy2a4
: convenience function for copying to an A4 PDF.Other minor changes:
if_else
should be slightly faster when the condition contains NAs
.
before: 3.6 ms now 1.5 ms (for 100,000 entries -- see vignette)drop_constant_cols
first checks whether the first and second entries are identical before working out the number of unique values.%notchin%
for a 'safer' alternative to %notin%
.implies
, logical implies.drop_constant_cols
print_transpose_data_table
for glimpsing data tables by rowspow
for exponentiation.drop_empty_cols
retains non-empty columns when duplicate names are used.coalesce
errors if there is ...
wrongly contains factors.if_else
to reflect dplyr's formals so it can be a drop-in replacement.missing
value in if_else
when length-one condition
.NEWS.md
file to track changes to the package.%ein%
%enotin%
avoid misspellings in filtersAND
, NEITHER
, NOR
, OR
, nor
, neither
logical aliasesdrop_colr
drop columns matching patternngrep
negate regular expressionselect_which
similar to dplyr::select_if
set_colsuborder
change the order of some columns without affecting the order of othersweight2rows
convert a weighted data.table
to an unweighted one by repeating rows by the weightcoalesce
and if_else
: lightweight versions of dplyr::
equivalentsset_cols_first|last
now respects the order of the supplied columnsmutate_other
now accepts a mass
argument as another way to generate an 'Other' column.