Fast Access to Large ASCII Files

Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.


LaF version 0.6.2

  • Minor update. Fixes error with '~' in the path name.

LaF version 0.6.1

  • Minor update. Fixes bug in the C++ code. This bug only affected colfreq and colnmissing one some platforms when there are missing values in the numeric columns.

LaF version 0.6

  • Internal changes: switched documentation to roxygen; changes to namespace and description files to pass tests.

LaF version 0.5

  • Added ability to use data models for opening files. These contain a description of the column names and column types etc. The data models can be saved to and read from YAML-files. For CSV files the data models can also be automatically detected. Blaise data models are also supported. (see laf_open, write_dm, read_dm, read_dm_blaise, detect_dm_csv)
  • The ability to set the levels of columns. When reading these columns are automatically converted to factors.
  • Some routines were added to read specific or random lines from a text file (see get_lines, sample_lines) and to determine the number of lines in a text file (see determine_nlines).
  • Fixed bug with nrows, which could return the wrong number of lines when the size of the file exceeded 2 Gb.

LaF version 0.4

  • Added methods for the calculation of simple column statistics such as mean, sum, number of missings, frequency table, min, max.
  • Fixed bug in skip option in laf_open_csv on windows machines with files with DOS-line breaks
  • laf_open_csv and laf_open_fwf now also accept r'isch column types: numeric, and factor instead of double and categorical.
  • Fixed bug with checking if file can be opened on some windows network shares

LaF version 0.3

  • Fixed bug with categorical columns. The levels of the categorical columns not always match those in the file.
  • Added option to laf_open_fwf and laf_open_csv to trim white space from string and categorical columns.
  • Added option to laf_open_csv to skip the first lines in the file.
  • Added ability to read floating point numbers in scientific format.
  • Added tests.

Reference manual

0.7.1 by Jan van der Laan, 12 days ago

Authors: Jan van der Laan

Task views: High-Performance and Parallel Computing with R

GPL-3 license

Imports Rcpp

Depends on methods, utils

Suggests testthat, yaml

Linking to Rcpp

Imported by EdSurvey, SEERaBomb, chunked.

Suggested by ffbase.

