Provides a 'tbl_ts' class (the 'tsibble') to store and manage temporal data in a data-centric format, which is built on top of the 'tibble'. The 'tsibble' aims at easily manipulating and analysing temporal data, including counting and filling in time gaps, aggregate over calendar periods, performing rolling window calculations, and etc.
The tsibble package provides a data class of
tbl_ts to represent
tidy time series data. A tsibble consists of a time index, key and
other measured variables in a data-centric format, which is built on top
of the tibble.
You could install the stable version on CRAN:
You could install the development version from Github using
weather data included in the package
nycflights13 is used as an
example to illustrate. The “index” variable is the
containing the date-times, and the “key” is the
origin as weather
stations created via
id(). The key together with the index uniquely
identifies each observation, which gives a valid tsibble. Other
columns can be considered as measured variables.
library(tsibble)weather <- nycflights13::weather %>%select(origin, time_hour, temp, humid, precip)weather_tsbl <- as_tsibble(weather, key = id(origin), index = time_hour)weather_tsbl#> # A tsibble: 26,115 x 5 [1h] <America/New_York>#> # Key: origin #> origin time_hour temp humid precip#> <chr> <dttm> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0#> # … with 2.611e+04 more rows
The key is comprised of one or more variables. See
Tsibble internally computes the interval for given time indices based
on the time representation, ranging from year to nanosecond, from
numerics to ordered factors. The
POSIXct corresponds to sub-daily
Date to daily,
yearweek to weekly,
yearmonth to monthly,
yearquarter to quarterly, and
fill_gaps()to turn implicit missing values into explicit missing values
Often there are implicit missing cases in time series. If the
observations are made at regular time interval, we could turn these
implicit missingness to be explicit simply using
gaps in precipitation (
precip) with 0 in the meanwhile. It is quite
common to replaces
NAs with its previous observation for each origin
in time series analysis, which is easily done using
full_weather <- weather_tsbl %>%fill_gaps(precip = 0) %>%group_by(origin) %>%fill(temp, humid, .direction = "down")full_weather#> # A tsibble: 26,190 x 5 [1h] <America/New_York>#> # Key: origin #> # Groups: origin #> origin time_hour temp humid precip#> <chr> <dttm> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0#> # … with 2.618e+04 more rows
fill_gaps() also handles filling in time gaps by values or functions,
and respects time zones for date-times. Wanna a quick overview of
implicit missing values? Check out
summarise()to aggregate over calendar periods
index_by() is the counterpart of
group_by() in temporal context, but
it groups the index only. In conjunction with
summarise() and its scoped variants aggregate interested variables
over calendar periods.
index_by() goes hand in hand with the index
yearquarter(), as well as other friends from lubridate. For example,
it would be of interest in computing average temperature and total
precipitation per month, by applying
yearmonth() to the hourly time
full_weather %>%group_by(origin) %>%index_by(year_month = yearmonth(time_hour)) %>% # monthly aggregatessummarise(avg_temp = mean(temp, na.rm = TRUE),ttl_precip = sum(precip, na.rm = TRUE))#> # A tsibble: 36 x 4 [1M]#> # Key: origin #> origin year_month avg_temp ttl_precip#> <chr> <mth> <dbl> <dbl>#> 1 EWR 2013 Jan 35.6 3.53#> 2 EWR 2013 Feb 34.2 3.83#> 3 EWR 2013 Mar 40.1 3#> 4 EWR 2013 Apr 53.0 1.47#> 5 EWR 2013 May 63.3 5.44#> # … with 31 more rows
While collapsing rows (like
index_by() will take care of updating the key and index respectively.
summarise() combo can help with regularising a
tsibble of irregular time space too.
Time series often involves moving window calculations. Several functions in tsibble allow for different variations of moving windows using purrr-like syntax:
pslide(): sliding window with overlapping observations.
ptile(): tiling window without overlapping observations.
pstretch(): fixing an initial window and expanding to include more observations.
For example, a moving average of window size 3 is carried out on hourly temperatures for each group (origin).
full_weather %>%group_by(origin) %>%mutate(temp_ma = slide_dbl(temp, ~ mean(., na.rm = TRUE), .size = 3))#> # A tsibble: 26,190 x 6 [1h] <America/New_York>#> # Key: origin #> # Groups: origin #> origin time_hour temp humid precip temp_ma#> <chr> <dttm> <dbl> <dbl> <dbl> <dbl>#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0 NA#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0 NA#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0 39.0#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0 39.3#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0 39.3#> # … with 2.618e+04 more rows
Looking for rolling in parallel? Their multiprocessing equivalents are
future_. More examples can be found at
Tsibble also serves as a natural input for forecasting and many other downstream analytical tasks. Stay tuned for tidyverts.org.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
The tsibble's data structure and API reach to the lifecycle of stability.
v0.8.0grouped data frames, tsibble allows for empty key values and disregards the lazily stored key. All operations now recalculate the keying structure.
grouped_ts) is a subclassing of
.sizeis retired in
stretch()in favour of
stretch()gained a new
.fill = NAargument, which returns the same length as the input. To restore the previous behaviour, please use
.fill = NULL. (#88)
stretch_tsibble()provide fast and shorthand subsetting tsibble by rolling rows.
slide()gained a new
.stepargument for calculating at every specified step instead of every single step.
update_tsibble()to update key and index a bit easier.
rbind()for dropping custom index class. (#78)
count_gaps()for dropping custom index class.
count_gaps()now only summarises keys with gaps instead of all the keys.
Inf. (@jeffzi, #84)
fill_gaps()returns a grouped tsibble too.
fill_na()in favour of
.dropin column-wise dplyr verbs.
key_by()(no idea why it's there).
scan_gaps()joins the family of implicit missing values handlers.
future_. It requires the furrr package to be installed. (#66)
.datais a complete tsibble,
fill_gaps()gives a warning instead of an error when name-value pairs are supplied.
filter_index()works for a grouped tsibble.
This release simplifies the "key" structure. The nesting and crossing definition has been removed from the "key" specification. One or more variables forming the "key", are required to identify observational units over time, but no longer assume the relationship between these variables. The nesting and crossing structure will be dealt with visualisation and forecasting reconciliation in downstream packages.
count_gaps.tbl_ts()returns a tibble containing gaps for each key value rather than an overall gap, which is consistent with the rest of tsibble methods. And all output column names that are not supplied by users gain a prefixed ".".
intervalinput instead of time vectors to avoid overheads, also marked as internal function.
pslider()as new functions
.partialis removed from
pslider()to feature a simpler interface.
build_tsibble(). In order to construct a grouped tsibble,
xrequires a grouped df.
has_gaps()to quickly check if there are implicit time gaps for each key in a tsibble.
new_data()to produce the future of a tsibble.
filter_index()to filter time window for a tsibble.
time_in()to check if time falls in the ranges in compact expression, with no need for time zone specification.
new_tsibble()creates a subclass of a tsibble.
fill_gaps(), for more expressive function name and consistency to
POSIXct, time zone will be displayed in the header via
holiday_aus()that requires package "timeDate".
fill_na.tbl_ts()scoping issue (#67).
slice.tbl_ts()correctly handles logical
fill_na()will only replace implicit time gaps by values and functions, and leave originally explicit
tidyr::fill()gained support for class "grouped_ts", and it is re-exported again. (#73)
fill_na(), in favour of
find_duplicates(), in favour of
case_na(), and will be defunct in next release.
split_by(), which is under development as S3 generic in dplyr.
.dropargument in column-wise verbs, and suggested to use
select()doesn't select index, it will inform users and automatically select it.
append_row()for easily appending new observations to a tsibble. (#59)
/, consistent with
fill_na()for multiple replacements when using with
group_by(), introduced in v0.5.1.
as_tsibble.grouped_df()respected its existing groups and removed argument
unnest.lst_tsrespects the ordering of "key" values. (#56)
nest.tbl_ts()respect the appearance ordering of input variables. (#57)
key_indices()return consistent formats as its generic.
keyno longer accepted character.
index_by()gives more informative error when LHS is named as index.
tile()gained a new argument
.bind = FALSE.
-) for yearweek, yearmonth, and yearquarter.
new_interval()creates an "interval" object with the specified values.
fill_na()for replacing values when
stretch()use the same coercion rules as
.bind = TRUE.
This release introduced the breaking changes to the "interval" class to make tsibble better support finer time resolution (e.g. millisecond, microsecond, and nanosecond). The "interval" format changes from upper case to short hand. To support new time index class, only
pull_interval() need to be defined now.
group_by_key()to easily group the key variables.
slide()gained a new argument
.align = "right"to align at "right", "center", or "left". If window size is even for center alignment, either "center-right" or "center-left" is needed.
-) for yearweek, yearmonth, and yearquarter.
stretch()gained a new argument
.bind = FALSE.
0in the "interval" class to make the representation simpler.
intervalclass has new slots of "millisecond", "microsecond", "nanosecond".
time_unit()is a function instead of S3 generic, and made index extension a bit easier.
group_by.lst_ts()for dropping the grouping information.
.fto one input.
as_tsibble.grouped_df()for groups. (#44)
.fill = NULLfor
purrrstyle exactly (#35):
stretch()return lists only instead of numerics before.
pslide()to map over multiple inputs simultaneously.
slide()gained a new argument
.partialto support partial sliding.
pstretcher()support multiple inputs now, and split them in parallel.
holiday_aus()for Australian national and state-based public holiday.
diff()for year-week, year-month, and year-quarter.
yearquarter()supported for character.
pstretch()to slide over multiple inputs simultaneously (#33).
units_since()for index classes.
is_53weeks()for determine if the year has 53 ISO weeks.
key_sum()for extending tsibble.
tspattribute from the
count_gapswhen a tsibble of unknown interval.
as_tsibble.grouped_ts()now return self (#34).
id()is used in the tsibble context (e.g.
build_tsibble()) regardless of the conflicts with dplyr or plyr, to avoid frustrating message (#36).
select.tbl_ts()now preserved index.
as.ts.tbl_ts()for ignoring the
valueargument when the key is empty.
[.tbl_ts()when subsetting columns by characters (#30).
fill_na.tbl_ts()dropping custom index class (#32).
format.yearweek()due to the boundary issue (#27).
nycflights13 >= 1.0.0.
The tsibble package has a hexagon logo now! Thanks Mitch (@mitchelloharawild).
difference()computes lagged differences of a numeric vector. It returns a vector of the same length as the input with
NApadded. It works with
index2will be part of grouping variables.
This release (hopefully) marks the stability of a tsibble data object (
tbl_ts contains the following components:
key: single or multiple columns uniquely identify observational units over time. A key consisting of nested and crossed variables reflects the structure underlying the data. The programme itself takes care of the updates in the "key" when manipulating the data. The "key" differs from the grouping variables with respect to variables manipulated by users.
index: a variable represents time. This together the "key" uniquely identifies each observation in the data table.
index2: why do we need the second index? It means re-indexing to a variable, not the second index. It is identical to the
indexmost time, but start deviating when using
index_by()works similarly to
group_by(), but groups the index only. The dplyr verbs, like
mutate(), operates on each time group of the data defined by
index_by(). You may wonder why introducing a new function rather than using
group_by()that users are most familiar with. It's because time is indispensable to a tsibble,
index_by()provides a trace to understanding how the index changes. For this purpose,
group_by()is just too general. For example,
summarise()aggregates data to less granular time period, leading to the update in index, which is nicely and intuitively handled now.
intervalclass to save a list of time intervals. It computes the greatest common factor from the time difference of the
indexcolumn, which should give a sensible interval for the almost all the cases, compared to minimal time distance. It also depends on the time representation. For example, if the data is monthly, the index is suggested to use a
yearmonth()format instead of
Dateonly gives the number of days not the number of months.
regular: since a tsibble factors in the implicit missing cases, whether the data is regular or not cannot be determined. This relies on the user's specification.
ordered: time-wise and rolling window functions assume data of temporal ordering. A tsibble will be sorted by its time index. If a key is explicitly declared, the key will be sorted first and followed by arranging time in ascending order. If it's not in time order, it broadcasts a warning.
tsummarise()and its scoped variants. It can be replaced by the combo
tsummarise()provides an unintuitive interface where the first argument keeps the same size of the index, but the remaining arguments reduces rows to a single one. Analogously, it does
summarise(). The proposed
index_by()solves the issue of index update.
find_duplicates()to better reflect its functionality.
group_vars()return a vector of characters instead of a list.
distinct.tbl_ts()now returns a tibble instead of an error.
tidyr::fill(), as they respect the input structure.
index_sum(), and replaced by
index_valid()to extend index type support.
index_by()groups time index, as the counterpart of
group_by()in temporal context.
gaps()counts time gaps (implicit missing observations in time).
yearweek()creates and coerces to a year-week object. (#17)
fill_na.tbl_ts()gained a new argument of
.full = FALSE.
.full = FALSE(the default) inserts
NAfor each key within its time period,
TRUEfor the entire time span. This affects the results of
fill_na.tbl_ts()as it only took
TRUEinto account previously. (#15)
.dropin column-wise dplyr verbs.
group_by.tbl_ts()behaves exactly the same as
group_by.tbl_dfnow. Grouping variables are temporary for data manipulation. Nested or crossed variables are not the type that
transmute.tbl_ts()for a univariate time series due to unregistered tidyselect helpers. (#9).
rename.tbl_ts()for not preserving grouped variables (#12).
rename.tbl_ts()for renaming grouped variables.
tbl_tsgains a new attribute
index2, which is a candidate of new index (symbol) used by
attr(grouped_ts, "vars")stores characters instead of names, same as
This release introduces major changes into the underlying
tbl_tsclass to reduce the object size, and computed on the fly when printing.
tbl_tsobject is a symbol now instead of a quosure.
tbl_tsobject is an unnamed list of symbols.
key_update()to change/update the keys for a given tsibble.
unkey()as an S3 method for a tsibble of key size < 2.
key_indices()as an S3 method to extract key indices.
split_by()to split a tsibble into a list of data by unquoted variables.
build_tsibble()allows users to gain more control over a tsibble construction.
as_tsibble.msts()for multiple seasonality time series defined in the forecast package.
as_tsibble.ts()for daily time series (when frequency = 7).
group_by.tbl_ts()does not accept named expressions.
slice()). This avoids unnecessary re-computation for many function calls.
stretch(), are no longer defined as S3 methods. Several new variants have been introduced for the purpose of type stability, like
slide_dfr()(a row-binding data frame),
slide_dfc()(a column-binding data frame).
indexvariable must sit in the first name-value pair in
tsummarise()instead of any position in the call.
transmute.tbl_ts()keeps the newly created variables along with index and keys, instead of throwing an error before.
format.key()for nesting crossed with another nesting.
This release marks the complete support of dplyr key verbs.
NAbackward or forward in tsibble.
dplyr::distinct()and return an error.
inform_duplicates()informs which row has duplicated elements of key and index variables.
tsummarise.tbl_ts(), when calling functions with no parameters like
tsummarise.tbl_ts(), one grouping level should be dropped for the consistency with
dplyr::summarise()for a grouped
tbl_tsare supported in
as_tsibble(). An empty tsibble is not allowed.
group_by.tbl_ts(.data, ..., add = TRUE)works as expected now.
NEWS.mdfile to track changes to the package.