Provides a 'tbl_ts' class (the 'tsibble') to store and manage temporal data in a data-centric format, which is built on top of the 'tibble'. The 'tsibble' aims at easily manipulating and analysing temporal data, including counting and filling in time gaps, aggregate over calendar periods, performing rolling window calculations, and etc.
The tsibble package provides a data class of
tbl_ts to represent
tidy time series data. A tsibble consists of a time index, key and
other measured variables in a data-centric format, which is built on top
of the tibble.
You could install the stable version on CRAN:
You could install the development version from Github using
weather data included in the package
nycflights13 is used as an
example to illustrate. The “index” variable is the
containing the date-times, and the “key” is the
origin as weather
stations created via
id(). The key together with the index uniquely
identifies each observation, which gives a valid tsibble. Other
columns can be considered as measured variables.
library(tsibble)weather <- nycflights13::weather %>%select(origin, time_hour, temp, humid, precip)weather_tsbl <- as_tsibble(weather, key = id(origin), index = time_hour)weather_tsbl#> # A tsibble: 26,115 x 5 [1h] <America/New_York>#> # Key: origin #> origin time_hour temp humid precip#> <chr> <dttm> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0#> # … with 2.611e+04 more rows
The key is comprised of one or more variables. See
Tsibble internally computes the interval for given time indices based
on the time representation, ranging from year to nanosecond, from
numerics to ordered factors. The
POSIXct corresponds to sub-daily
Date to daily,
yearweek to weekly,
yearmonth to monthly,
yearquarter to quarterly, and
fill_gaps()to turn implicit missing values into explicit missing values
Often there are implicit missing cases in time series. If the
observations are made at regular time interval, we could turn these
implicit missingness to be explicit simply using
gaps in precipitation (
precip) with 0 in the meanwhile. It is quite
common to replaces
NAs with its previous observation for each origin
in time series analysis, which is easily done using
full_weather <- weather_tsbl %>%fill_gaps(precip = 0) %>%group_by(origin) %>%fill(temp, humid, .direction = "down")full_weather#> # A tsibble: 26,190 x 5 [1h] <America/New_York>#> # Key: origin #> # Groups: origin #> origin time_hour temp humid precip#> <chr> <dttm> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0#> # … with 2.618e+04 more rows
fill_gaps() also handles filling in time gaps by values or functions,
and respects time zones for date-times. Wanna a quick overview of
implicit missing values? Check out
summarise()to aggregate over calendar periods
index_by() is the counterpart of
group_by() in temporal context, but
it groups the index only. In conjunction with
summarise() and its scoped variants aggregate interested variables
over calendar periods.
index_by() goes hand in hand with the index
yearquarter(), as well as other friends from lubridate. For example,
it would be of interest in computing average temperature and total
precipitation per month, by applying
yearmonth() to the hourly time
full_weather %>%group_by(origin) %>%index_by(year_month = yearmonth(time_hour)) %>% # monthly aggregatessummarise(avg_temp = mean(temp, na.rm = TRUE),ttl_precip = sum(precip, na.rm = TRUE))#> # A tsibble: 36 x 4 [1M]#> # Key: origin #> origin year_month avg_temp ttl_precip#> <chr> <mth> <dbl> <dbl>#> 1 EWR 2013 Jan 35.6 3.53#> 2 EWR 2013 Feb 34.2 3.83#> 3 EWR 2013 Mar 40.1 3#> 4 EWR 2013 Apr 53.0 1.47#> 5 EWR 2013 May 63.3 5.44#> # … with 31 more rows
While collapsing rows (like
index_by() will take care of updating the key and index respectively.
summarise() combo can help with regularising a
tsibble of irregular time space too.
Time series often involves moving window calculations. Several functions in tsibble allow for different variations of moving windows using purrr-like syntax:
pslide(): sliding window with overlapping observations.
ptile(): tiling window without overlapping observations.
pstretch(): fixing an initial window and expanding to include more observations.
For example, a moving average of window size 3 is carried out on hourly temperatures for each group (origin).
full_weather %>%group_by(origin) %>%mutate(temp_ma = slide_dbl(temp, ~ mean(., na.rm = TRUE), .size = 3))#> # A tsibble: 26,190 x 6 [1h] <America/New_York>#> # Key: origin #> # Groups: origin #> origin time_hour temp humid precip temp_ma#> <chr> <dttm> <dbl> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0 NA#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0 NA#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0 39.0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0 39.3#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0 39.3#> # … with 2.618e+04 more rows
Looking for rolling in parallel? Their multiprocessing equivalents are
future_. More examples can be found at
Tsibble also serves as a natural input for forecasting and many other downstream analytical tasks. Stay tuned for tidyverts.org.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
append_row()for easily appending new observations to a tsibble. (#59)
/, consistent with
fill_na()for multiple replacements when using with
group_by(), introduced in v0.5.1.
as_tsibble.grouped_df()respected its existing groups and removed argument
unnest.lst_tsrespects the ordering of "key" values. (#56)
nest.tbl_ts()respect the appearance ordering of input variables. (#57)
key_indices()return consistent formats as its generic.
keyno longer accepted character.
index_by()gives more informative error when LHS is named as index.
tile()gained a new argument
.bind = FALSE.
-) for yearweek, yearmonth, and yearquarter.
new_interval()creates an "interval" object with the specified values.
fill_na()for replacing values when
stretch()use the same coercion rules as
.bind = TRUE.
This release introduced the breaking changes to the "interval" class to make tsibble better support finer time resolution (e.g. millisecond, microsecond, and nanosecond). The "interval" format changes from upper case to short hand. To support new time index class, only
pull_interval() need to be defined now.
group_by_key()to easily group the key variables.
slide()gained a new argument
.align = "right"to align at "right", "center", or "left". If window size is even for center alignment, either "center-right" or "center-left" is needed.
-) for yearweek, yearmonth, and yearquarter.
stretch()gained a new argument
.bind = FALSE.
0in the "interval" class to make the representation simpler.
intervalclass has new slots of "millisecond", "microsecond", "nanosecond".
time_unit()is a function instead of S3 generic, and made index extension a bit easier.
group_by.lst_ts()for dropping the grouping information.
.fto one input.
as_tsibble.grouped_df()for groups. (#44)
.fill = NULLfor
purrrstyle exactly (#35):
stretch()return lists only instead of numerics before.
pslide()to map over multiple inputs simultaneously.
slide()gained a new argument
.partialto support partial sliding.
pstretcher()support multiple inputs now, and split them in parallel.
holiday_aus()for Australian national and state-based public holiday.
diff()for year-week, year-month, and year-quarter.
yearquarter()supported for character.
pstretch()to slide over multiple inputs simultaneously (#33).
units_since()for index classes.
is_53weeks()for determine if the year has 53 ISO weeks.
key_sum()for extending tsibble.
tspattribute from the
count_gapswhen a tsibble of unknown interval.
as_tsibble.grouped_ts()now return self (#34).
id()is used in the tsibble context (e.g.
build_tsibble()) regardless of the conflicts with dplyr or plyr, to avoid frustrating message (#36).
select.tbl_ts()now preserved index.
as.ts.tbl_ts()for ignoring the
valueargument when the key is empty.
[.tbl_ts()when subsetting columns by characters (#30).
fill_na.tbl_ts()dropping custom index class (#32).
format.yearweek()due to the boundary issue (#27).
nycflights13 >= 1.0.0.
The tsibble package has a hexagon logo now! Thanks Mitch (@mitchelloharawild).
difference()computes lagged differences of a numeric vector. It returns a vector of the same length as the input with
NApadded. It works with
index2will be part of grouping variables.
This release (hopefully) marks the stability of a tsibble data object (
tbl_ts contains the following components:
key: single or multiple columns uniquely identify observational units over time. A key consisting of nested and crossed variables reflects the structure underlying the data. The programme itself takes care of the updates in the "key" when manipulating the data. The "key" differs from the grouping variables with respect to variables manipulated by users.
index: a variable represents time. This together the "key" uniquely identifies each observation in the data table.
index2: why do we need the second index? It means re-indexing to a variable, not the second index. It is identical to the
indexmost time, but start deviating when using
index_by()works similarly to
group_by(), but groups the index only. The dplyr verbs, like
mutate(), operates on each time group of the data defined by
index_by(). You may wonder why introducing a new function rather than using
group_by()that users are most familiar with. It's because time is indispensable to a tsibble,
index_by()provides a trace to understanding how the index changes. For this purpose,
group_by()is just too general. For example,
summarise()aggregates data to less granular time period, leading to the update in index, which is nicely and intuitively handled now.
intervalclass to save a list of time intervals. It computes the greatest common factor from the time difference of the
indexcolumn, which should give a sensible interval for the almost all the cases, compared to minimal time distance. It also depends on the time representation. For example, if the data is monthly, the index is suggested to use a
yearmonth()format instead of
Dateonly gives the number of days not the number of months.
regular: since a tsibble factors in the implicit missing cases, whether the data is regular or not cannot be determined. This relies on the user's specification.
ordered: time-wise and rolling window functions assume data of temporal ordering. A tsibble will be sorted by its time index. If a key is explicitly declared, the key will be sorted first and followed by arranging time in ascending order. If it's not in time order, it broadcasts a warning.
tsummarise()and its scoped variants. It can be replaced by the combo
tsummarise()provides an unintuitive interface where the first argument keeps the same size of the index, but the remaining arguments reduces rows to a single one. Analogously, it does
summarise(). The proposed
index_by()solves the issue of index update.
find_duplicates()to better reflect its functionality.
group_vars()return a vector of characters instead of a list.
distinct.tbl_ts()now returns a tibble instead of an error.
tidyr::fill(), as they respect the input structure.
index_sum(), and replaced by
index_valid()to extend index type support.
index_by()groups time index, as the counterpart of
group_by()in temporal context.
gaps()counts time gaps (implicit missing observations in time).
yearweek()creates and coerces to a year-week object. (#17)
fill_na.tbl_ts()gained a new argument of
.full = FALSE.
.full = FALSE(the default) inserts
NAfor each key within its time period,
TRUEfor the entire time span. This affects the results of
fill_na.tbl_ts()as it only took
TRUEinto account previously. (#15)
.dropin column-wise dplyr verbs.
group_by.tbl_ts()behaves exactly the same as
group_by.tbl_dfnow. Grouping variables are temporary for data manipulation. Nested or crossed variables are not the type that
transmute.tbl_ts()for a univariate time series due to unregistered tidyselect helpers. (#9).
rename.tbl_ts()for not preserving grouped variables (#12).
rename.tbl_ts()for renaming grouped variables.
tbl_tsgains a new attribute
index2, which is a candidate of new index (symbol) used by
attr(grouped_ts, "vars")stores characters instead of names, same as
This release introduces major changes into the underlying
tbl_tsclass to reduce the object size, and computed on the fly when printing.
tbl_tsobject is a symbol now instead of a quosure.
tbl_tsobject is an unnamed list of symbols.
key_update()to change/update the keys for a given tsibble.
unkey()as an S3 method for a tsibble of key size < 2.
key_indices()as an S3 method to extract key indices.
split_by()to split a tsibble into a list of data by unquoted variables.
build_tsibble()allows users to gain more control over a tsibble construction.
as_tsibble.msts()for multiple seasonality time series defined in the forecast package.
as_tsibble.ts()for daily time series (when frequency = 7).
group_by.tbl_ts()does not accept named expressions.
slice()). This avoids unnecessary re-computation for many function calls.
stretch(), are no longer defined as S3 methods. Several new variants have been introduced for the purpose of type stability, like
slide_dfr()(a row-binding data frame),
slide_dfc()(a column-binding data frame).
indexvariable must sit in the first name-value pair in
tsummarise()instead of any position in the call.
transmute.tbl_ts()keeps the newly created variables along with index and keys, instead of throwing an error before.
format.key()for nesting crossed with another nesting.
This release marks the complete support of dplyr key verbs.
NAbackward or forward in tsibble.
dplyr::distinct()and return an error.
inform_duplicates()informs which row has duplicated elements of key and index variables.
tsummarise.tbl_ts(), when calling functions with no parameters like
tsummarise.tbl_ts(), one grouping level should be dropped for the consistency with
dplyr::summarise()for a grouped
tbl_tsare supported in
as_tsibble(). An empty tsibble is not allowed.
group_by.tbl_ts(.data, ..., add = TRUE)works as expected now.
NEWS.mdfile to track changes to the package.