Read 'IPUMS' Extract Files

An easy way to import census, survey and geographic data provided by 'IPUMS' into R plus tools to help use the associated metadata to make analysis easier. 'IPUMS' data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from our website < https://ipums.org>.


ProjectStatus:Active CRAN_Status_Badge Travis-CI BuildStatus AppVeyor BuildStatus CoverageStatus

The ipumsr package helps import IPUMS extracts from the IPUMS website into R. IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services are available free of charge.

The ipumsr package can be installed by running the following command:

install.packages("ipumsr")

Or, you can install the development version using the following commands:

if (!require(devtools)) install.packages("devtools")
 
devtools::install_github("mnpopcenter/ipumsr")

Learning More

The vignettes are a great place to learn more about ipumsr and IPUMS data:

  • For a general introduction see the ipums vignette.

  • For a more detailed look at some of the features, see these vignettes:

    • value-labels
      • Provides guidance for using the value labels provided by IPUMS
    • ipums-geography
      • Provides guidance for using R as GIS tool with IPUMS data
    • ipums-bigdata
      • How to handle large IPUMS data extracts and examples of using the chunked versions of microdata reading functions.
  • Or to see examples of how to work through data from particular projects, see these vignettes:

    • ipums-cps
      • An example of using CPS data with the ipumsr package
    • ipums-nhgis
      • An example of using NHGIS data with the ipumsr package
    • ipums-terra
      • An example of using IPUMS Terra Data with the ipumsr package
    • And more project-specific examples are available on the Data Training Exercises section of the IPUMS website.

You can access them with the vignette() command (eg vignette("value-labels")).

If you are installing from github and want the vignettes, you’ll need to run the following commands first:

devtools::install_github("mnpopcenter/ipumsr/ipumsexamples")
devtools::install_github("mnpopcenter/ipumsr", build_vignettes = TRUE)

Development

We greatly appreciate bug reports, suggestions or pull requests. They can be submitted via github, or by email to [email protected]

Before contributing, please be sure to read the Contributing Guidelines and the Code of Conduct.

News

ipumsr 0.3.0

  • Lots of improvements for users who wish to use "big data" sized IPUMS extracts. See the vignette using command vignette("ipums-bigdata", package = "ipusmr") for the full details.

    • There are now chunked versions of the microdata reading functions which let you perform functions on subsets of the data as you read it in (read_ipums_micro_chunked() & ipumsr::read_ipums_micro_list_chunked())

    • There is a new function ipums_collect() which combined dplyr::collect() with set_ipums_attributes() to add value and variable labels to data collected from a database.

    • When reading gzipped files, ipumsr no longer has to store the full text in memory.

  • Added pillar printing for labelled classes in tibbles. This means that the label will print the labels alongside the values when printed in a tibble (in a subtle grey color when the terminal supports it). To turn this feature off, use command `options("ipumsr.show_pillar_labels" = FALSE).

  • The approach to reading hierarchical data files is much faster.

  • Arguments to read_ipums_sp() are now in the same order as read_ipums_sf()

  • read_ipums_sf() and read_ipums_sp() gain 2 new arguments vars which allows you to pick a subset of variables, and add_layer_var which lets you add a variable indicating which layer it came from.

  • You can now use your inside voice for variable names with the new argument lower_vars for read_ipums_ddi() and read_ipums_micro() family of functions so that the variable names are lower case.

  • ipumsr is compatible with versions of haven newer than 2.0 (while maintaining compatibility with earlier versions). (#31)

ipumsr 0.2.0

  • IPUMS Terra is now officially supported! Read raster, area or microdata extracts using functions read_terra_raster(), read_terra_raster_list(), read_terra_area(), read_terra_area_sf(), and read_terra_micro()

  • Add support for keyvar in DDI, which will (eventually) help link data across record types in hierarchical extracts. To be effective, this requires more support on the ipums.org website, which is hopefully coming soon (#25 - thanks @mpadge!)

  • Improved main vignette instructions for Safari users (#27)

  • Fix for selecting columns from csv extracts (#26 - thanks forum user JCambon_OIS!)

  • Fixes for the ipums_list_*() family of functions.

ipumsr 0.1.1

  • Fixed a bug in ipums_shape_*_join functions when using integer ID columns. (#16)

  • Allow for unzipped folders because Safari on macOS unzips folders by default (#17)

  • lbl_relabel behavior is improved so that labels aren't assigned sequentially (#21)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("ipumsr")

0.4.1 by Derek Burk, 9 days ago


https://www.ipums.org, https://github.com/mnpopcenter/ipumsr


Report a bug at https://github.com/mnpopcenter/ipumsr/issues


Browse source code at https://github.com/cran/ipumsr


Authors: Greg Freedman Ellis [aut] , Derek Burk [aut, cre] , Joe Grover [ctb] , Minnesota Population Center [cph]


Documentation:   PDF Manual  


Task views: Official Statistics & Survey Methodology


Mozilla Public License 2.0 license


Imports cli, crayon, dplyr, haven, hipread, pillar, purrr, R6, raster, readr, rlang, tibble, tidyselect, xml2, Rcpp, zeallot

Suggests DT, ggplot2, htmltools, knitr, rgdal, rmarkdown, rstudioapi, scales, sf, sp, shiny, testthat, covr, biglm, DBI, RSQLite, dbplyr

Linking to Rcpp


Suggested by Ecfun.


See at CRAN