Scale Functions for Visualization

Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.


BuildStatus CoverageStatus CRAN_Status_Badge

Overview

One of the most difficult parts of any graphics package is scaling, converting from data values to perceptual properties. The inverse of scaling, making guides (legends and axes) that can be used to read the graph, is often even harder! The scales packages provides the internal scaling infrastructure to ggplot2 and its functions allow users to customize the transformations, breaks, guides and palettes used in visualizations.

The idea of the scales package is to implement scales in a way that is graphics system agnostic, so that everyone can benefit by pooling knowledge and resources about this tricky topic.

Installation

# Scales is installed when you install ggplot2 or the tidyverse.
# But you can install just scales from CRAN:
install.packages("scales")
 
# Or the development version from Github:
# install.packages("devtools")
devtools::install_github("r-lib/scales")

Usage

Formatters

Outside of ggplot2 where it powers all the aesthetic scales, axes formatting, and data transformations internally, the scales package also provides useful helper functions for formatting numeric data for all types of presentation.

library(scales)
set.seed(1234)
 
# percent() function takes a numeric and does your division and labelling for you
percent(c(0.1, 1 / 3, 0.56))
#> [1] "10.0%" "33.3%" "56.0%"
 
# comma() adds commas into large numbers for easier readability
comma(10e6)
#> [1] "10,000,000"
 
# dollar() adds currency symbols
dollar(c(100, 125, 3000))
#> [1] "$100"   "$125"   "$3,000"
 
# unit_format() adds unique units
# the scale argument can do simple conversion on the fly
unit_format(unit = "ha", scale = 1e-4)(c(10e6, 10e4, 8e3))
#> [1] "1 000 ha" "10 ha"    "1 ha"

All of these formatters are based on the underlying number() formatter which has additional arguments that allow further customisation. This can be especially useful for meeting diverse international standards.

# for instance, European number formatting is easily set:
number(c(12.3, 4, 12345.789, 0.0002), big.mark = ".", decimal.mark = ",")
#> [1] "12"     "4"      "12.346" "0"
 
# these functions round by default, but you can set the accuracy
number(c(12.3, 4, 12345.789, 0.0002),
  big.mark = ".",
  decimal.mark = ",",
  accuracy = .01
)
#> [1] "12,30"     "4,00"      "12.345,79" "0,00"
 
# percent formatting in the French style
french_percent <- percent_format(decimal.mark = ",", suffix = " %")
french_percent(runif(10))
#>  [1] "11,4 %" "62,2 %" "60,9 %" "62,3 %" "86,1 %" "64,0 %" "0,9 %" 
#>  [8] "23,3 %" "66,6 %" "51,4 %"
 
# currency formatting Euros (and simple conversion!)
usd_to_euro <- dollar_format(prefix = "", suffix = "\u20ac", scale = .85)
usd_to_euro(100)
#> [1] "85€"

Colour palettes

These are used to power the scales in ggplot2, but you can use them in any plotting system. The following example shows how you might apply them to a base plot.

# pull a list of colours from any palette
viridis_pal()(4)
#> [1] "#440154FF" "#31688EFF" "#35B779FF" "#FDE725FF"
 
# use in combination with baseR `palette()` to set new defaults
palette(brewer_pal(palette = "Set2")(4))
plot(Sepal.Length ~ Sepal.Width, data = iris, col = Species, pch = 20)

Bounds, breaks, & transformations

scales provides a handful of functions for rescaling data to fit new ranges.

# squish() will squish your values into a specified range
squish(c(-1, 0.5, 1, 2, NA), range = c(0, 1))
#> [1] 0.0 0.5 1.0 1.0  NA
 
# Useful for setting the `oob` argument for a colour scale with reduced limits
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Petal.Length)) +
  geom_point() +
  scale_color_continuous(limit = c(2, 4), oob = scales::squish)

# the rescale functions can rescale continuous vectors to new min, mid, or max values
x <- runif(5, 0, 1)
rescale(x, to = c(0, 50))
#> [1] 32.063194 20.465217  0.000000 50.000000  0.747796
rescale_mid(x, mid = .25)
#> [1] 0.8293505 0.7190081 0.5243035 1.0000000 0.5314180
rescale_max(x, to = c(0, 50))
#> [1] 37.55502 29.50807 15.30882 50.00000 15.82766

scales also gives users the ability to define and apply their own custom transformation functions for repeated use.

# use trans_new to build a new transformation
logp3_trans <- trans_new(
  name = "logp",
  trans = function(x) log(x + 3),
  inverse = function(x) exp(x) - 3,
  breaks = log_breaks()
)
 
library(dplyr)
dsamp <- sample_n(diamonds, 100)
ggplot(dsamp, aes(x = carat, y = price, colour = color)) +
  geom_point() + scale_y_continuous(trans = logp3_trans)

# You can always call the functions from the trans object separately
logp3_trans$breaks(dsamp$price)
#> [1]   300  1000  3000 10000 30000
 
# scales has some breaks helper functions too
log_breaks(base = exp(1))(dsamp$price)
#> [1]   403.4288  1096.6332  2980.9580  8103.0839 22026.4658
 
pretty_breaks()(dsamp$price)
#> [1]     0  5000 10000 15000 20000

News

scales 1.0.0

New Features

Formatters

  • comma_format(), percent_format() and unit_format() gain new arguments: accuracy, scale, prefix, suffix, decimal.mark, big.mark (@larmarange, #146).

  • dollar_format() gains new arguments: accuracy, scale, decimal.mark, trim (@larmarange, #148).

  • New number_bytes_format() and number_bytes() format numeric vectors into byte measurements (@hrbrmstr, @dpseidel).

  • New number_format() provides a generic formatter for numbers (@larmarange, #142).

  • New pvalue_format() formats p-values (@larmarange, #145).

  • ordinal_format() gains new arguments: prefix, suffix, big.mark, rules; rules for French and Spanish are also provided (@larmarange, #149).

  • scientific_format() gains new arguments: scale, prefix, suffix, decimal.mark, trim (@larmarange, #147).

  • New time_format() formats POSIXt and hms objects (@dpseidel, #88).

Transformations & breaks

  • boxcox_trans() is now invertible for x >= 0 and requires positive values. A new argument offset allows specification of both type-1 and type-2 Box-Cox transformations (@dpseidel, #103).

  • log_breaks() returns integer multiples of integer powers of base when finer breaks are needed (@ThierryO, #117).

  • New function modulus_trans() implements the modulus transformation for positive and negative values (@dpseidel).

  • New pseudo_log_trans() for transforming numerics into a signed logarithmic scale with a smooth transition to a linear scale around 0 (@lepennec, #106).

Minor bug fixes and improvements

  • scales functions now work as expected when it is used inside a for loop. In previous package versions if a scales function was used with variable custom parameters inside a for loop, some of the parameters were not evaluated until the end of the loop, due to how R lazy evaluation works (@zeehio, #81).

  • colour_ramp() now uses alpha = TRUE by default (@clauswilke, #108).

  • date_breaks() now supports subsecond intervals (@dpseidel, #85).

  • Removes dichromat and plyr dependencies. dichromat is now suggested (@dpseidel, #118).

  • expand_range() arguments mul and add now affect scales with a range of 0
    (@dpseidel, ggplot2-2281).

  • extended_breaks() now allows user specification of the labeling::extended() argument only.loose to permit more flexible breaks specification (@dpseidel, #99).

  • New rescale() and rescale_mid() methods support dist objects (@zeehio, #105).

  • rescale_mid() now properly handles NAs (@foo-bar-baz-qux, #104).

scales 0.5.0

  • New function regular_minor_breaks() calculates minor breaks as a property of the transformation (@karawoo).

  • Adds viridis_pal() for creating palettes with color maps from the viridisLite package (@karawoo).

  • Switched from reference classes to R6 (#96).

  • rescale() and rescale_mid() are now S3 generics, and work with numeric, Date, POSIXct, POSIXlt and bit64::integer64 objects (@zeehio, #74).

scales 0.4.1

  • extended_breaks() no longer fails on pathological inputs.

  • New hms_trans() for transforming hms time vectors.

  • train_discrete() gets a new na.rm argument which controls whether NAs are preserved or dropped.

scales 0.4.0

  • Switched from NEWS to NEWS.md.

  • manual_pal() produces a warning if n is greater than the number of values in the palette (@jrnold, #68).

  • precision(0) now returns 1, which means percent(0) now returns 0% (#50).

  • scale_continuous() uses a more correct check for numeric values.

  • NaN is correctly recognised as a missing value by the gradient palettes (ggplot2-1482).

scales 0.3.0

  • rescale() preserves missing values in input when the range of x is (effectively) 0 (ggplot2-985).

  • Continuous colour palettes now use colour_ramp() instead of colorRamp(). This only supports interpolation in Lab colour space, but is hundreds of times faster.

scales 0.2.5

Improved formatting functions

  • date_format() gains an option to specify time zone (#51).

  • dollar_format() is now more flexible and can add either prefixes or suffixes for different currencies (#53). It gains a negative_parens argument to show negative values as ($100) and now passes missing values through unchanged (@dougmitarotonda, #40).

  • New ordinal_format() generates ordinal numbers (1st, 2nd, etc) (@aaronwolen, #55).

  • New unit_format() makes it easier to add units to labels, optionally scaling (@ThierryO, #46).

  • New wrap_format() function to wrap character vectors to a desired width. (@jimhester, #37).

New colour scaling functions

  • New color scaling functions col_numeric(), col_bin(), col_quantile(), and col_factor(). These functions provide concise ways to map continuous or categorical values to color spectra.

  • New colour_ramp() function for performing color interpolation in the CIELAB color space (like grDevices::colorRamp(space = 'Lab'), but much faster).

Other bug fixes and minor improvements

  • boxcox_trans() returns correct value when p is close to zero (#31).

  • dollar() and percent() both correctly return a zero length string for zero length input (@BrianDiggs, #35).

  • brewer_pal() gains a direction argument to easily invert the order of colours (@jiho, #36).

  • show_col() has additional options to showcase colors better (@jiho, #52).

  • Relaxed tolerance in zero_range() to .Machine$double.eps * 1000 (#33).

scales 0.2.4

  • Eliminate stringr dependency.

  • Fix outstanding errors in R CMD check.

scales 0.2.3

  • floor_time() calls to_time(), but that function was moved into a function so it was no longer available in the scales namespace. Now floor_time() has its own copy of that function (Thanks to Stefan Novak).

  • Color palettes generated by brewer_pal() no longer give warnings when fewer than 3 colors are requested (@wch).

  • abs_area() and rescale_max() functions have been added, for scaling the area of points to be proportional to their value. These are used by scale_size_area() in ggplot2.

scales 0.2.2

  • zero_range() has improved behaviour thanks to Brian Diggs.

  • brewer_pal() complains if you give it an incorrect palette type. (Fixes #15, thanks to Jean-Olivier Irisson).

  • shape_pal() warns if asked for more than 6 values. (Fixes #16, thanks to Jean-Olivier Irisson).

  • time_trans() gains an optional argument tz to specify the time zone to use for the times. If not specified, it will be guess from the first input with a non-null time zone.

  • date_trans() and time_trans() now check that their inputs are of the correct type. This prevents ggplot2 scales from silently giving incorrect outputs when given incorrect inputs.

  • Change the default breaks algorithm for cbreaks() and trans_new(). Previously it was pretty_breaks(), and now it's extended_breaks(), which uses the extended() algorithm from the labeling package.

  • fixed namespace problem with fullseq().

scales 0.2.1

  • suppressWarnings from train_continuous() so zero-row or all infinite data frames don't potentially cause problems.

  • check for zero-length colour in gradient_n_pal().

  • added extended_breaks() which implements an extension to Wilkinson's labelling approach, as implemented in the labeling package. This should generally produce nicer breaks than pretty_breaks().

  • alpha() can now preserve existing alpha values if alpha() is missing.

  • log_breaks() always gives breaks evenly spaced on the log scale, never evenly spaced on the data scale. This will result in really bad breaks for some ranges (e.g 0.5-0.6), but you probably shouldn't be using log scales in that situation anyway.

scales 0.2.0

  • censor() and squish() gain only.finite argument and default to operating only on finite values. This is needed for ggplot2, and reflects the use of Inf and -Inf as special values.

  • bounds functions now force evaluation of range to avoid bug with S3 method dispatch inside primitive functions (e.g. [).

  • Simplified algorithm for discrete_range() that is robust to stringsAsFactors global option. Now, the order of a factor will only be preserved if the full factor is the first object seen, and all subsequent inputs are subsets of the levels of the original factor.

  • scientific() ensures output is always in scientific format and off the specified number of significant digits. comma() ensures output is never in scientific format (Fixes #7).

  • Another tweak to zero_range() to better detect when a range has zero length (Fixes #6).

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("scales")

1.0.0 by Hadley Wickham, 9 months ago


https://scales.r-lib.org, https://github.com/r-lib/scales


Report a bug at https://github.com/r-lib/scales/issues


Browse source code at https://github.com/cran/scales


Authors: Hadley Wickham [aut, cre] , RStudio [cph]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports labeling, munsell, R6, RColorBrewer, Rcpp, viridisLite

Suggests dichromat, bit64, covr, hms, testthat

Linking to Rcpp


Imported by AFM, AeRobiology, BALCONY, BAMBI, BAwiR, BIGL, BatchGetSymbols, CGPfunctions, DEploid, DataExplorer, Deducer, DescribeDisplay, DiagrammeR, EHRtemporalVariability, EMMIXgene, EasyHTMLReport, EventStudy, ExPanDaR, FAOSTAT, GERGM, GGEBiplots, GSIF, IPtoCountry, IncDTW, LexisNexisTools, MKmisc, MarketMatching, MaxMC, MetaboList, Miso, NetworkExtinction, NeuralNetTools, QRAGadget, RADanalysis, RAM, RDS, RIdeogram, RNewsflow, RRphylo, RSDA, RSSL, Rcan, RcmdrPlugin.KMggplot2, SEERaBomb, SHELF, STMotif, SVMMaj, SWMPrExtension, SeqFeatR, Seurat, SixSigma, SourceSet, SubgrPlots, TDA, TSstudio, UpSetR, abjutils, afmToolkit, alakazam, anipaths, aqp, basicTrendline, bayesammi, baystability, bdscale, bea.R, billboarder, blorr, bossMaps, brainGraph, bridgesampling, cellWise, cholera, clifro, colordistance, complmrob, condformat, cosinor2, cowplot, cr17, cregg, crplyr, ctmm, d3heatmap, dendroTools, descriptr, di, dlstats, drc, dtwSat, ebirdst, echarts4r, esquisse, evaluator, ez, fSRM, factorMerger, fastqcr, fic, finalfit, findviews, fingerPro, fingertipscharts, forestmangr, funModeling, ganalytics, gazepath, genBart, geneNetBP, geomerge, getTBinR, ggChernoff, ggExtra, ggQQunif, ggallin, ggalt, gganimate, ggasym, ggedit, ggeffects, ggetho, ggforce, ggfortify, ggimage, ggiraphExtra, ggmap, ggnormalviolin, ggplot2, ggplotAssist, ggpmisc, ggpubr, ggquickeda, ggraph, ggrepel, ggridges, ggsci, ggspatial, ggspectra, ggstatsplot, ggtern, ggthemes, ggupset, ggwordcloud, gluvarpro, googleway, graphlayouts, gwdegree, halfcircle, heatmaply, hierarchicalSets, hrbrthemes, iDINGO, idefix, iheatmapr, inlmisc, iprior, jcolors, jskm, kableExtra, kayadata, khroma, ldatuning, ldhmm, leaflet, learningCurve, lemon, linear.tools, longitudinalcascade, mafs, mapview, marcher, metR, metacoder, metaplot, modelplotr, mousetrap, mplot, myTAI, ncappc, neutralitytestr, nima, obliqueRSF, optiRum, otvPlots, paleobioDB, peakPantheR, pedquant, pheatmap, pinbasic, pixiedust, plotKML, plotluck, plotly, powerlmm, ppcSpatial, primerTree, processmapR, prophet, psycho, qdap, qicharts, qicharts2, quadmesh, quadrupen, quantable, rAvis, rENA, rPackedBar, radiant.basics, radiant.data, radiant.multivariate, randomcoloR, raptr, rayshader, rbokeh, rcartocolor, rcicr, regrrr, remote, rfPermute, rnoaa, roahd, robustSingleCell, ruv, saotd, sbpiper, seaaroundus, segclust2d, sensobol, sergeant, sharpshootR, shazam, shinyWidgets, sigmajs, simmer.plot, simrel, sjPlot, sleepwalk, smartR, spikeSlabGAM, splashr, ssdtools, stability, starma, statebins, stminsights, strvalidator, superheat, surveydata, survivalAnalysis, survminer, survxai, tcR, teachingApps, telefit, themetagenomics, tidytransit, trread, ufs, ukbtools, useful, userfriendlyscience, vetools, vmd, voteogram, wilson, xray, ypr, zoocat, ztable.

Depended on by ACSNMineR, DeducerSpatial, EpiCurve, MortalityTables, STPGA, TriMatch, bios2mds, dials, dslice, gofMC, precintcon, weightr.

Suggested by Census2016, DirectEffects, GGally, HistData, JWileymisc, LBSPR, NitrogenUptake2016, NlsyLinks, PairViz, R6, RBesT, RImagePalette, RcmdrPlugin.MA, Rdtq, RxODE, SACOBRA, VarSelLCM, Wats, ahpsurvey, ballr, bayesplot, bench, bfw, cancensus, cansim, chron, colormap, colorspace, colourvalues, countyfloods, dimRed, dint, disclapmix, dodgr, eechidna, emmeans, fivethirtyeight, ggdendro, ggformula, ggparliament, ggvoronoi, grattan, gsDesign, guardianapi, hesim, hhh4contacts, httk, huxtable, igraph, imager, incidence, inferr, ipumsr, jtools, jubilee, kdtools, likeLTD, loon, lspline, nzelect, oddsratio, olsrr, performance, pwr, rODE, rattle, raw, rbgm, rdpla, recurse, refuge, rtide, rtimes, shinyaframe, shipunov, sjstats, solarius, sparseHessianFD, sparseMVN, spm12r, streamDepletr, surveillance, sweep, tidyquant, tidytext, tikzDevice, timetk, tis, tsbox, ukpolice, usmap, vcfR, vinereg, viridis, voronoiTreemap, zoo.


See at CRAN