Import 'Stata' Data Files

Function to read and write the 'Stata' file format.


Package to read and write all Stata file formats (version 15 and older) into a R data.frame. The dta file format versions 102 to 118 are supported.

The function read.dta from the foreign package imports only dta files from Stata versions <= 12. Due to the different structure and features of dta 117 files, we wrote a new file reader in Rcpp.

Additionally the package supports many features of the Stata dta format like label sets in different languages (?set.lang) or business calendars (?as.caldays).

Installation

The package is now hosted on CRAN.

install.packages("readstata13")

Usage

library(readstata13)
dat <- read.dta13("path to file.dta")
save.dta13(dat, file="newfile.dta")

Development Version

To install the current release from github you need the plattform specific build tools. On Windows a current installation of Rtools is necessary, while OS X users need to install Xcode.

# install.packages("devtools")
devtools::install_github("sjewo/readstata13", ref="0.9.2")

Older Versions of devtools require a username option:

install_github("readstata13", username="sjewo", ref="0.9.2")

To install the current development version from github:

devtools::install_github("sjewo/readstata13", ref="testing")

Current Status

Build Status CRAN Downloads

Changelog and Features

Version Changes
0.9.2 Fix Build on MacOS X
0.9.1 Allow reading only pre-selected variables
0.9.1 Experimental support for format 119
0.9.1 Improvements to partial reading. Idea by Kevin Jin
0.9.1 Export of binary data from dta-files
0.9.1 new function get.label.tables() to show all Stata label sets
0.9.1 Fix check for duplicate labels and in set.lang()
0.9.0 Generate unique factor labels to prevent errors in factor definition
0.9.0 check interrupt for long read. Patch by Giovanni Righi
0.9.0 Updates to notes, roxygen and register
0.9.0 Fixed size of character length. Bug reported by Yiming (Paul) Li
0.9.0 Fix saving characters containing missings. Bug reported by Eivind H. Olsen
0.9.0 Adjustments to convert.underscore. Patch by luke-m-olson
0.9.0 Allow partial reading of selected rows
0.8.5 Fix errors on big-endians systems
0.8.4 Fix valgrind errors. converting from dta.write to writestr
0.8.4 Fix for empty data label
0.8.4 Make replace.strl default
0.8.3 Restrict length of varnames to 32 chars for compatibility with Stata 14
0.8.3 Add many function tests
0.8.3 Avoid converting of double to floats while writing compressed files
0.8.2 Save NA values in character vector as empty string
0.8.2 Convert.underscore=T will convert all non-literal characters to underscores
0.8.2 Fix saving of Dates
0.8.2 Save with convert.factors by default
0.8.2 Test for NaN and inf values while writing missing values and replace with NA
0.8.2 Remove message about saving factors
0.8.1 Convert non-integer variables to factors (nonint.factors=T)
0.8.1 Handle large datasets
0.8.1 Working with strL variables is now a lot faster
<0.8.1 Reading data files from disk or url and create a data.frame
<0.8.1 Saving dta files to disk - most features of the dta file format are supported
<0.8.1 Assign variable names
<0.8.1 Read the new strL strings and save them as attribute
<0.8.1 Convert stata label to factors and save them as attribute
<0.8.1 Read some meta data (timestamp, dataset label, formats,...)
<0.8.1 Convert strings to system encoding
<0.8.1 Handle different NA values
<0.8.1 Handle multiple label languages
<0.8.1 Convert dates
<0.8.1 Reading business calendar files

Test

Since our attributes differ from foreign::read.dta all.equal and identical report false. If you check the values, everything is identical.

library("foreign")
r12 <- read.dta("http://www.stata-press.com/data/r12/auto.dta")
r13 <- read.dta13("http://www.stata-press.com/data/r13/auto.dta")
 
Map(identical,r12,r13)
 
att <- names(attributes(r12))
for (i in seq(att))
    cat(att[i],":", all.equal(attr(r12,att[i]),attr(r13,att[i])),"\n")
 
r12 <- read.dta("http://www.stata-press.com/data/r12/auto.dta",convert.factors=F)
r13 <- read.dta13("http://www.stata-press.com/data/r13/auto.dta",convert.factors=F)
 
Map(identical,r12,r13)

Authors

Marvin Garbuszus (JanMarvin) and Sebastian Jeworutzki (sjewo)

Licence

GPL2

News

[0.9.2]

  • fix build on OSX

[0.9.1]

  • allow reading only pre-selected variables
  • experimental support for format 119
  • improve partial reading
  • export of binary data from dta-files
  • new function get.label.tables() to show all Stata label sets
  • fix check for duplicate labels
  • fixes in set.lang

[0.9.0]

  • generate unique factor labels to prevent errors in factor definition
  • check interrupt for long read
  • fix storage size of character vectors in save.dta13
  • fix saving characters containing missings
  • implement partial reading of dta-files
  • fix an integer bug with saving data.frames of length requiring uint64_t

0.8.5

  • fix errors on big-endian systems

0.8.4

  • fix valgrind errors. converting from dta.write to writestr
  • fix for empty data label
  • make replace.strl default

0.8.3

  • restrict length of varnames to 32 chars for compatibility with Stata 14
  • Stop compression of doubles as floats. Now test if compression of doubles as interger types is possible.
  • add many function tests

0.8.2

  • save NA values in character vector as empty string
  • convert.underscore=T will convert all non-literal characters to underscores
  • fix saving of Dates
  • save with convert.factors by default
  • test for NaN and inf values while writing missing values and replace with NA
  • remove message about saving factors

0.8.1

  • convert non-integer variables to factors (nonint.factors=T)
  • working with strL variables is now a lot faster (thank to Magnus Thor Torfason)
  • fix handling of large datasets
  • some code cleanups

0.8

  • implement reading all version prior 13.
  • clean up code.
  • fix a crash when varlables do not match ncols.
  • update leap seconds R code with foreign.

0.7.1

  • fix saving of files > 2GB

0.7

  • read and write Stata 14 files (ver 118)
  • fix save for variables without non-missing values
  • read strings from different file encodings
  • code cleanups

0.6.1

  • fix heap overflow

0.6

  • various fixes
  • reading stbcal-files

0.5

0.4

  • convert.dates from foreign::read.dta()
  • handle different NA values
  • convert strings to system encoding
  • some checks on label assignment

0.3

  • reading file from url. Example: read.dta13("http://www.stata-press.com/data/r13/auto.dta")
  • convert.underscore from foreign::read.dta(): converts _ to .
  • missing.type parts from foreign::read.dta(). If TRUE return "missing"
  • replace.strl option to replace the reference to a STRL string in the data.frame with the actual value

0.2

  • read stata characteristics and save them in extension.table attribute
  • more robust handling of factor labels
  • set file encoding for all strings and convert them to system encoding
  • fixed compiler warnings

0.1

  • reading data files and create a data.frame
  • assign variable names
  • read the new strL strings and save them as attribute
  • convert stata label to factors and save them as attribute
  • read some meta data (timestamp, dataset label, formats,...)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("readstata13")

0.9.2 by Sebastian Jeworutzki, a year ago


https://github.com/sjewo/readstata13


Report a bug at https://github.com/sjewo/readstata13/issues


Browse source code at https://github.com/cran/readstata13


Authors: Jan Marvin Garbuszus [aut] , Sebastian Jeworutzki [aut, cre] , R Core Team [cph] , Magnus Thor Torfason [ctb] , Luke M. Olson [ctb] , Giovanni Righi [ctb] , Kevin Jin [ctb]


Documentation:   PDF Manual  


GPL-2 | file LICENSE license


Imports Rcpp

Suggests testthat

Linking to Rcpp


Imported by RcmdrMisc, RcmdrPlugin.EZR.

Suggested by SUMMER.


See at CRAN