Simple Data Frames

Provides a 'tbl_df' class (the 'tibble') that provides stricter checking and better formatting than the traditional data frame.


tibble implements a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. It extracts these basic ideas out of dplyr, which is now more clearly focused on data manipulation. tibble provides a lighter-weight package for the basic care and feeding of tbl_df's, aka "tibble diffs" or just "tibbles". Tibbles are data.frames with nicer behavior around printing, subsetting, and factor handling.

You can create a tibble from an existing object with as_tibble():

library(tibble)
as_tibble(iris)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
#> 1           5.1         3.5          1.4         0.2  setosa
#> 2           4.9         3.0          1.4         0.2  setosa
#> 3           4.7         3.2          1.3         0.2  setosa
#> 4           4.6         3.1          1.5         0.2  setosa
#> 5           5.0         3.6          1.4         0.2  setosa
#> 6           5.4         3.9          1.7         0.4  setosa
#> 7           4.6         3.4          1.4         0.3  setosa
#> 8           5.0         3.4          1.5         0.2  setosa
#> 9           4.4         2.9          1.4         0.2  setosa
#> 10          4.9         3.1          1.5         0.1  setosa
#> # ... with 140 more rows

This will work for reasonable inputs that are already data.frame, list, matrix, or table.

You can also create a new tibble from vectors that represent the columns with tibble():

tibble(x = 1:5, y = 1, z = x ^ 2 + y)
#> # A tibble: 5 × 3
#>       x     y     z
#>   <int> <dbl> <dbl>
#> 1     1     1     2
#> 2     2     1     5
#> 3     3     1    10
#> 4     4     1    17
#> 5     5     1    26

tibble() does much less than data.frame(): it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row.names(). You can read more about these features in the vignette, vignette("tibble").

You can define a tibble row-by-row with tribble():

tribble(
  ~x, ~y,  ~z,
  "a", 2,  3.6,
  "b", 1,  8.5
)
#> # A tibble: 2 × 3
#>       x     y     z
#>   <chr> <dbl> <dbl>
#> 1     a     2   3.6
#> 2     b     1   8.5

You can see why this variant of the data.frame is called a "tibble diff" from its class:

class(as_tibble(iris))
#> [1] "tbl_df"     "tbl"        "data.frame"

There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.

Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. In addition to its name, each column reports its type, a nice feature borrowed from str():

library(nycflights13)
flights
#> # A tibble: 336,776 × 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#> 1   2013     1     1      517            515         2      830
#> 2   2013     1     1      533            529         4      850
#> 3   2013     1     1      542            540         2      923
#> 4   2013     1     1      544            545        -1     1004
#> 5   2013     1     1      554            600        -6      812
#> 6   2013     1     1      554            558        -4      740
#> 7   2013     1     1      555            600        -5      913
#> 8   2013     1     1      557            600        -3      709
#> 9   2013     1     1      557            600        -3      838
#> 10  2013     1     1      558            600        -2      753
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

Tibbles are strict about subsetting. If you try to access a variable that does not exist via $, you'll get a warning:

flights$yea
#> Warning: Unknown column 'yea'
#> NULL

Tibbles also clearly delineate [ and [[: [ always returns another tibble, [[ always returns a vector. No more drop = FALSE!

class(iris[ , 1])
#> [1] "numeric"
class(iris[ , 1, drop = FALSE])
#> [1] "data.frame"
class(as_tibble(iris)[ , 1])
#> [1] "tbl_df"     "tbl"        "data.frame"

tibble is on CRAN, install using:

install.packages("tibble")

You can try out the dev version with:

# install.packages("devtools")
devtools::install_github("hadley/tibble")

News

tibble 1.2 (2016-08-26)

  • The tibble.width option is used for glimpse() only if it is finite (#153, @kwstat).
  • New as_tibble.poly() to support conversion of a poly object to a tibble (#110).
  • add_row() now correctly handles existing columns of type list that are not updated (#148).
  • all.equal() doesn't throw an error anymore if one of the columns is named na.last, decreasing or method (#107, @BillDunlap).
  • New add_column(), analogously to add_row() (#99).
  • print.tbl_df() gains n_extra method and will have the same interface as trunc_mat() from now on.
  • add_row() and add_column() gain .before and .after arguments which indicate the row (by number) or column (by number or name) before or after which the new data are inserted. Updated or added columns cannot be named .before or .after (#99).
  • Rename frame_data() to tribble(), stands for "transposed tibble". The former is still available as alias (#132, #143).
  • add_row() now can add multiple rows, with recycling (#142, @jennybc).
  • Use multiply character × instead of x when printing dimensions (#126). Output tests had to be disabled for this on Windows.
  • Back-tick non-semantic column names on output (#131).
  • Use dttm instead of time for POSIXt values (#133), which is now used for columns of the difftime class.
  • Better output for 0-row results when total number of rows is unknown (e.g., for SQL data sources).
  • New object summary vignette that shows which methods to define for custom vector classes to be used as tibble columns (#151).
  • Added more examples for print.tbl_df(), now using data from nycflights13 instead of Lahman (#121), with guidance to install nycflights13 package if necessary (#152).
  • Minor changes in vignette (#115, @helix123).

tibble 1.1 (2016-07-01)

Follow-up release.

  • tibble() is no longer an alias for frame_data() (#82).
  • Remove tbl_df() (#57).
  • $ returns NULL if column not found, without partial matching. A warning is given (#109).
  • [[ returns NULL if column not found (#109).
  • Reworked output: More concise summary (begins with hash # and contains more text (#95)), removed empty line, showing number of hidden rows and columns (#51). The trailing metadata also begins with hash # (#101). Presence of row names is indicated by a star in printed output (#72).
  • Format NA values in character columns as <NA>, like print.data.frame() does (#69).
  • The number of printed extra cols is now an option (#68, @lionel-).
  • Computation of column width properly handles wide (e.g., Chinese) characters, tests still fail on Windows (#100).
  • glimpse() shows nesting structure for lists and uses angle brackets for type (#98).
  • Tibbles with POSIXlt columns can be printed now, the text <POSIXlt> is shown as placeholder to encourage usage of POSIXct (#86).
  • type_sum() shows only topmost class for S3 objects.
  • Strict checking of integer and logical column indexes. For integers, passing a non-integer index or an out-of-bounds index raises an error. For logicals, only vectors of length 1 or ncol are supported. Passing a matrix or an array now raises an error in any case (#83).
  • Warn if setting non-NULL row names (#75).
  • Consistently surround variable names with single quotes in error messages.
  • Use "Unknown column 'x'" as error message if column not found, like base R (#94).
  • stop() and warning() are now always called with call. = FALSE.
  • The .Dim attribute is silently stripped from columns that are 1d matrices (#84).
  • Converting a tibble without row names to a regular data frame does not add explicit row names.
  • as_tibble.data.frame() preserves attributes, and uses as_tibble.list() to calling overriden methods which may lead to endless recursion.
  • New has_name() (#102).
  • Prefer tibble() and as_tibble() over data_frame() and as_data_frame() in code and documentation (#82).
  • New is.tibble() and is_tibble() (#79).
  • New enframe() that converts vectors to two-column tibbles (#31, #74).
  • obj_sum() and type_sum() show "tibble" instead of "tbl_df" for tibbles (#82).
  • as_tibble.data.frame() gains validate argument (as in as_tibble.list()), if TRUE the input is validated.
  • Implement as_tibble.default() (#71, hadley/dplyr#1752).
  • has_rownames() supports arguments that are not data frames.
  • Two-dimensional indexing with [[ works (#58, #63).
  • Subsetting with empty index (e.g., x[]) also removes row names.

Documentation

  • Document behavior of as_tibble.tbl_df() for subclasses (#60).
  • Document and test that subsetting removes row names.
  • Don't rely on knitr internals for testing (#78).
  • Fix compatibility with knitr 1.13 (#76).
  • Enhance knit_print() tests.
  • Provide default implementation for tbl_sum.tbl_sql() and tbl_sum.tbl_grouped_df() to allow dplyr release before a tibble release.
  • Explicit tests for format_v() (#98).
  • Test output for NULL value of tbl_sum().
  • Test subsetting in all variants (#62).
  • Add missing test from dplyr.
  • Use new expect_output_file() from testthat.

Version 1.0 (2016-03-21)

  • Initial CRAN release

  • Extracted from dplyr 0.4.3

  • Exported functions:

    • tbl_df()
    • as_data_frame()
    • data_frame(), data_frame_()
    • frame_data(), tibble()
    • glimpse()
    • trunc_mat(), knit_print.trunc_mat()
    • type_sum()
    • New lst() and lst_() create lists in the same way that data_frame() and data_frame_() create data frames (hadley/dplyr#1290). lst(NULL) doesn't raise an error (#17, @jennybc), but always uses deparsed expression as name (even for NULL).
    • New add_row() makes it easy to add a new row to data frame (hadley/dplyr#1021).
    • New rownames_to_column() and column_to_rownames() (#11, @zhilongjia).
    • New has_rownames() and remove_rownames() (#44).
    • New repair_names() fixes missing and duplicate names (#10, #15, @r2evans).
    • New is_vector_s3().
  • Features

    • New as_data_frame.table() with argument n to control name of count column (#22, #23).
    • Use tibble prefix for options (#13, #36).
    • glimpse() now (invisibly) returns its argument (hadley/dplyr#1570). It is now a generic, the default method dispatches to str() (hadley/dplyr#1325). The default width is obtained from the tibble.width option (#35, #56).
    • as_data_frame() is now an S3 generic with methods for lists (the old as_data_frame()), data frames (trivial), matrices (with efficient C++ implementation) (hadley/dplyr#876), and NULL (returns a 0-row 0-column data frame) (#17, @jennybc).
    • Non-scalar input to frame_data() and tibble() (including lists) creates list-valued columns (#7). These functions return 0-row but n-col data frame if no data.
  • Bug fixes

    • frame_data() properly constructs rectangular tables (hadley/dplyr#1377, @kevinushey).
  • Minor modifications

    • Uses setOldClass(c("tbl_df", "tbl", "data.frame")) to help with S4 (hadley/dplyr#969).
    • tbl_df() automatically generates column names (hadley/dplyr#1606).
    • tbl_dfs gain $ and [[ methods that are ~5x faster than the defaults, never do partial matching (hadley/dplyr#1504), and throw an error if the variable does not exist. [[.tbl_df() falls back to regular subsetting when used with anything other than a single string (#29). base::getElement() now works with tibbles (#9).
    • all_equal() allows to compare data frames ignoring row and column order, and optionally ignoring minor differences in type (e.g. int vs. double) (hadley/dplyr#821). Used by all.equal() for tibbles. (This package contains a pure R implementation of all_equal(), the dplyr code has identical behavior but is written in C++ and thus faster.)
    • The internals of data_frame() and as_data_frame() have been aligned, so as_data_frame() will now automatically recycle length-1 vectors. Both functions give more informative error messages if you are attempting to create an invalid data frame. You can no longer create a data frame with duplicated names (hadley/dplyr#820). Both functions now check that you don't have any POSIXlt columns, and tell you to use POSIXct if you do (hadley/dplyr#813). data_frame(NULL) raises error "must be a 1d atomic vector or list".
    • trunc_mat() and print.tbl_df() are considerably faster if you have very wide data frames. They will now also only list the first 100 additional variables not already on screen - control this with the new n_extra parameter to print() (hadley/dplyr#1161). The type of list columns is printed correctly (hadley/dplyr#1379). The width argument is used also for 0-row or 0-column data frames (#18).
    • When used in list-columns, S4 objects only print the class name rather than the full class hierarchy (#33).
    • Add test that [.tbl_df() does not change class (#41, @jennybc). Improve [.tbl_df() error message.
  • Documentation

    • Update README, with edits (#52, @bhive01) and enhancements (#54, @jennybc).
    • vignette("tibble") describes the difference between tbl_dfs and regular data frames (hadley/dplyr#1468).
  • Code quality

    • Test using new-style Travis-CI and AppVeyor. Full test coverage (#24, #53). Regression tests load known output from file (#49).
    • Renamed obj_type() to obj_sum(), improvements, better integration with type_sum().
    • Internal cleanup.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tibble")

1.3.3 by Kirill Müller, 3 months ago


http://tibble.tidyverse.org/, https://github.com/tidyverse/tibble


Report a bug at https://github.com/tidyverse/tibble/issues


Browse source code at https://github.com/cran/tibble


Authors: Kirill Müller [aut, cre], Hadley Wickham [aut], Romain Francois [ctb], RStudio [cph]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports methods, rlang, Rcpp, utils

Suggests covr, dplyr, knitr, microbenchmark, nycflights13, testthat, rmarkdown, withr

Linking to Rcpp


Imported by DiagrammeR, HURDAT, KraljicMatrix, REDCapR, SanFranBeachWater, Tmisc, abjutils, afmToolkit, alfred, anomalyDetection, antaresViz, atlantistools, babynames, banR, bikedata, biomartr, blkbox, blob, blockTools, bold, breathtestcore, breathteststan, bsam, ccafs, cdata, cellranger, cepR, charlatan, ciTools, condformat, congressbr, corrr, countyweather, cpr, crplyr, dat, datadogr, dataonderivatives, datastepr, dbplyr, docxtractr, dplyr, easyformatr, ecoseries, edeaR, enigma, esc, eurostat, evaluator, fastqcr, fbar, fcuk, feather, filesstrings, flextable, fmbasics, foghorn, forcats, ftDK, gastempt, gdns, getCRUCLdata, getlandsat, ggalt, ggeffects, ggenealogy, ggformula, ggfortify, ggguitar, ggimage, ggplot2, ggpmisc, giphyr, gitlabr, hansard, haploR, haven, hddtools, heemod, highcharter, huxtable, hypoparsr, iadf, inferr, influxdbr, isdparser, jpmesh, jpndistrict, lifelogr, mnis, modelr, modeval, monkeylearn, mregions, mrgsolve, msgtools, myTAI, nandb, naniar, natserv, nneo, nycflights13, oai, observer, officer, olsrr, openadds, pangaear, parlitools, photobiology, photobiologyInOut, phylopath, pkggraph, plotly, poio, polypoly, postlightmercury, prcr, prisonbrief, purrr, radiant.data, randNames, rbcb, rbgm, rbhl, rcv, rdefra, rdiversity, rdpla, readr, readtext, readxl, recipes, refimpact, rematch2, rerddap, reutils, rgbif, rgho, riem, rif, rio, ritis, rmapzen, rnoaa, rodham, rorcid, rsample, rtide, rtimes, rtimicropem, sfdct, sjPlot, sjlabelled, sjmisc, sjstats, solrium, spbabel, spdplyr, spocc, srvyr, survminer, sweep, tabularaster, taxa, taxize, tetraclasse, tidyRSS, tidygraph, tidyquant, tidyr, tidyverse, timetk, unpivotr, valr, waccR, wand, wikitaxa, worrms, zFactor, zeligverse.

Depended on by fileplyr, manifestoR, pdfsearch, pinnacle.data, simglm.

Suggested by GSODR, batchtools, checkmate, dataCompareR, datacheckr, datapasta, dotwhisker, drake, geojson, knitr, noaastormevents, odbc, rmarkdown, sf, snakecase, survtmle, units.


See at CRAN