Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
One of the most difficult parts of any graphics package is scaling, converting from data values to perceptual properties. The inverse of scaling, making guides (legends and axes) that can be used to read the graph, is often even harder! The scales packages provides the internal scaling infrastructure to ggplot2 and its functions allow users to customize the transformations, breaks, guides and palettes used in visualizations.
The idea of the scales package is to implement scales in a way that is graphics system agnostic, so that everyone can benefit by pooling knowledge and resources about this tricky topic.
# Scales is installed when you install ggplot2 or the tidyverse.# But you can install just scales from CRAN:install.packages("scales")# Or the development version from Github:# install.packages("devtools")devtools::install_github("r-lib/scales")
Outside of ggplot2 where it powers all the aesthetic scales, axes formatting, and data transformations internally, the scales package also provides useful helper functions for formatting numeric data for all types of presentation.
library(scales)set.seed(1234)# percent() function takes a numeric and does your division and labelling for youpercent(c(0.1, 1 / 3, 0.56))#> [1] "10.0%" "33.3%" "56.0%"# comma() adds commas into large numbers for easier readabilitycomma(10e6)#> [1] "10,000,000"# dollar() adds currency symbolsdollar(c(100, 125, 3000))#> [1] "$100" "$125" "$3,000"# unit_format() adds unique units# the scale argument can do simple conversion on the flyunit_format(unit = "ha", scale = 1e-4)(c(10e6, 10e4, 8e3))#> [1] "1 000 ha" "10 ha" "1 ha"
All of these formatters are based on the underlying number()
formatter
which has additional arguments that allow further customisation. This
can be especially useful for meeting diverse international standards.
# for instance, European number formatting is easily set:number(c(12.3, 4, 12345.789, 0.0002), big.mark = ".", decimal.mark = ",")#> [1] "12" "4" "12.346" "0"# these functions round by default, but you can set the accuracynumber(c(12.3, 4, 12345.789, 0.0002),big.mark = ".",decimal.mark = ",",accuracy = .01)#> [1] "12,30" "4,00" "12.345,79" "0,00"# percent formatting in the French stylefrench_percent <- percent_format(decimal.mark = ",", suffix = " %")french_percent(runif(10))#> [1] "11,4 %" "62,2 %" "60,9 %" "62,3 %" "86,1 %" "64,0 %" "0,9 %"#> [8] "23,3 %" "66,6 %" "51,4 %"# currency formatting Euros (and simple conversion!)usd_to_euro <- dollar_format(prefix = "", suffix = "\u20ac", scale = .85)usd_to_euro(100)#> [1] "85€"
These are used to power the scales in ggplot2, but you can use them in any plotting system. The following example shows how you might apply them to a base plot.
# pull a list of colours from any paletteviridis_pal()(4)#> [1] "#440154FF" "#31688EFF" "#35B779FF" "#FDE725FF"# use in combination with baseR `palette()` to set new defaultspalette(brewer_pal(palette = "Set2")(4))plot(Sepal.Length ~ Sepal.Width, data = iris, col = Species, pch = 20)
scales provides a handful of functions for rescaling data to fit new ranges.
# squish() will squish your values into a specified rangesquish(c(-1, 0.5, 1, 2, NA), range = c(0, 1))#> [1] 0.0 0.5 1.0 1.0 NA# Useful for setting the `oob` argument for a colour scale with reduced limitslibrary(ggplot2)ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Petal.Length)) +geom_point() +scale_color_continuous(limit = c(2, 4), oob = scales::squish)
# the rescale functions can rescale continuous vectors to new min, mid, or max valuesx <- runif(5, 0, 1)rescale(x, to = c(0, 50))#> [1] 32.063194 20.465217 0.000000 50.000000 0.747796rescale_mid(x, mid = .25)#> [1] 0.8293505 0.7190081 0.5243035 1.0000000 0.5314180rescale_max(x, to = c(0, 50))#> [1] 37.55502 29.50807 15.30882 50.00000 15.82766
scales also gives users the ability to define and apply their own custom transformation functions for repeated use.
# use trans_new to build a new transformationlogp3_trans <- trans_new(name = "logp",trans = function(x) log(x + 3),inverse = function(x) exp(x) - 3,breaks = log_breaks())library(dplyr)dsamp <- sample_n(diamonds, 100)ggplot(dsamp, aes(x = carat, y = price, colour = color)) +geom_point() + scale_y_continuous(trans = logp3_trans)
# You can always call the functions from the trans object separatelylogp3_trans$breaks(dsamp$price)#> [1] 300 1000 3000 10000 30000# scales has some breaks helper functions toolog_breaks(base = exp(1))(dsamp$price)#> [1] 403.4288 1096.6332 2980.9580 8103.0839 22026.4658pretty_breaks()(dsamp$price)#> [1] 0 5000 10000 15000 20000
comma_format()
, percent_format()
and unit_format()
gain new arguments:
accuracy
, scale
, prefix
, suffix
, decimal.mark
, big.mark
(@larmarange, #146).
dollar_format()
gains new arguments: accuracy
, scale
, decimal.mark
,
trim
(@larmarange, #148).
New number_bytes_format()
and number_bytes()
format numeric vectors into byte
measurements (@hrbrmstr, @dpseidel).
New number_format()
provides a generic formatter for numbers (@larmarange, #142).
New pvalue_format()
formats p-values (@larmarange, #145).
ordinal_format()
gains new arguments: prefix
, suffix
, big.mark
, rules
;
rules for French and Spanish are also provided (@larmarange, #149).
scientific_format()
gains new arguments: scale
, prefix
, suffix
, decimal.mark
,
trim
(@larmarange, #147).
New time_format()
formats POSIXt
and hms
objects (@dpseidel, #88).
boxcox_trans()
is now invertible for x >= 0
and requires positive values.
A new argument offset
allows specification of both type-1 and type-2 Box-Cox
transformations (@dpseidel, #103).
log_breaks()
returns integer multiples of integer powers of base when finer
breaks are needed (@ThierryO, #117).
New function modulus_trans()
implements the modulus transformation for positive
and negative values (@dpseidel).
New pseudo_log_trans()
for transforming numerics into a signed logarithmic scale
with a smooth transition to a linear scale around 0 (@lepennec, #106).
scales functions now work as expected when it is used inside a for loop. In previous package versions if a scales function was used with variable custom parameters inside a for loop, some of the parameters were not evaluated until the end of the loop, due to how R lazy evaluation works (@zeehio, #81).
colour_ramp()
now uses alpha = TRUE
by default (@clauswilke, #108).
date_breaks()
now supports subsecond intervals (@dpseidel, #85).
Removes dichromat
and plyr
dependencies. dichromat
is now suggested
(@dpseidel, #118).
expand_range()
arguments mul
and add
now affect scales with a range of 0
(@dpseidel,
ggplot2-2281).
extended_breaks()
now allows user specification of the labeling::extended()
argument only.loose
to permit more flexible breaks specification
(@dpseidel, #99).
New rescale()
and rescale_mid()
methods support dist
objects (@zeehio, #105).
rescale_mid()
now properly handles NAs (@foo-bar-baz-qux, #104).
New function regular_minor_breaks()
calculates minor breaks as a property
of the transformation (@karawoo).
Adds viridis_pal()
for creating palettes with color maps from the
viridisLite package (@karawoo).
Switched from reference classes to R6 (#96).
rescale()
and rescale_mid()
are now S3 generics, and work with numeric
,
Date
, POSIXct
, POSIXlt
and bit64::integer64
objects (@zeehio, #74).
extended_breaks()
no longer fails on pathological inputs.
New hms_trans()
for transforming hms time vectors.
train_discrete()
gets a new na.rm
argument which controls whether
NA
s are preserved or dropped.
Switched from NEWS
to NEWS.md
.
manual_pal()
produces a warning if n is greater than the number of values
in the palette (@jrnold, #68).
precision(0)
now returns 1, which means percent(0)
now returns 0% (#50).
scale_continuous()
uses a more correct check for numeric values.
NaN is correctly recognised as a missing value by the gradient palettes (ggplot2-1482).
rescale()
preserves missing values in input when the range of x
is
(effectively) 0 (ggplot2-985).
Continuous colour palettes now use colour_ramp()
instead of colorRamp()
.
This only supports interpolation in Lab colour space, but is hundreds of
times faster.
date_format()
gains an option to specify time zone (#51).
dollar_format()
is now more flexible and can add either prefixes or suffixes
for different currencies (#53). It gains a negative_parens
argument
to show negative values as ($100)
and now passes missing values through
unchanged (@dougmitarotonda, #40).
New ordinal_format()
generates ordinal numbers (1st, 2nd, etc)
(@aaronwolen, #55).
New unit_format()
makes it easier to add units to labels, optionally
scaling (@ThierryO, #46).
New wrap_format()
function to wrap character vectors to a desired width.
(@jimhester, #37).
New color scaling functions col_numeric()
, col_bin()
, col_quantile()
,
and col_factor()
. These functions provide concise ways to map continuous or
categorical values to color spectra.
New colour_ramp()
function for performing color interpolation in the CIELAB
color space (like grDevices::colorRamp(space = 'Lab')
, but much faster).
boxcox_trans()
returns correct value when p is close to zero (#31).
dollar()
and percent()
both correctly return a zero length string
for zero length input (@BrianDiggs, #35).
brewer_pal()
gains a direction
argument to easily invert the order
of colours (@jiho, #36).
show_col()
has additional options to showcase colors better (@jiho, #52).
Relaxed tolerance in zero_range()
to .Machine$double.eps * 1000
(#33).
Eliminate stringr dependency.
Fix outstanding errors in R CMD check.
floor_time()
calls to_time()
, but that function was moved into a function
so it was no longer available in the scales namespace. Now floor_time()
has its own copy of that function (Thanks to Stefan Novak).
Color palettes generated by brewer_pal()
no longer give warnings when fewer
than 3 colors are requested (@wch).
abs_area()
and rescale_max()
functions have been added, for scaling the area
of points to be proportional to their value. These are used by
scale_size_area()
in ggplot2.
zero_range()
has improved behaviour thanks to Brian Diggs.
brewer_pal()
complains if you give it an incorrect palette type. (Fixes #15,
thanks to Jean-Olivier Irisson).
shape_pal()
warns if asked for more than 6 values. (Fixes #16, thanks to
Jean-Olivier Irisson).
time_trans()
gains an optional argument tz
to specify the time zone to use
for the times. If not specified, it will be guess from the first input with
a non-null time zone.
date_trans()
and time_trans()
now check that their inputs are of the correct
type. This prevents ggplot2 scales from silently giving incorrect outputs
when given incorrect inputs.
Change the default breaks algorithm for cbreaks()
and trans_new()
.
Previously it was pretty_breaks()
, and now it's extended_breaks()
,
which uses the extended()
algorithm from the labeling package.
fixed namespace problem with fullseq()
.
suppressWarnings
from train_continuous()
so zero-row or all infinite data
frames don't potentially cause problems.
check for zero-length colour in gradient_n_pal()
.
added extended_breaks()
which implements an extension to Wilkinson's
labelling approach, as implemented in the labeling
package. This should
generally produce nicer breaks than pretty_breaks()
.
alpha()
can now preserve existing alpha values if alpha()
is missing.
log_breaks()
always gives breaks evenly spaced on the log scale, never
evenly spaced on the data scale. This will result in really bad breaks for
some ranges (e.g 0.5-0.6), but you probably shouldn't be using log scales in
that situation anyway.
censor()
and squish()
gain only.finite
argument and default to operating
only on finite values. This is needed for ggplot2, and reflects the use of
Inf and -Inf as special values.
bounds
functions now force
evaluation of range to avoid bug with S3
method dispatch inside primitive functions (e.g. [
).
Simplified algorithm for discrete_range()
that is robust to
stringsAsFactors
global option. Now, the order of a factor will only be
preserved if the full factor is the first object seen, and all subsequent
inputs are subsets of the levels of the original factor.
scientific()
ensures output is always in scientific format and off the
specified number of significant digits. comma()
ensures output is never in
scientific format (Fixes #7).
Another tweak to zero_range()
to better detect when a range has zero length
(Fixes #6).