Fast Access to Large ASCII Files

Methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.


LaF version 0.8

  • Detects and ignores byte-order-mark
  • Added option to ignore (set NA) fields that could not be converted; previously this generated an error.

LaF version 0.7

  • Fixed memory leak
  • Added option to show progress bar in process_blocks

LaF version 0.6.3

  • Minor update. Fixes error in detect_dm_csv with sample = TRUE. As reported by spinwards.

LaF version 0.6.2

  • Minor update. Fixes error with '~' in the path name.

LaF version 0.6.1

  • Minor update. Fixes bug in the C++ code. This bug only affected colfreq and colnmissing one some platforms when there are missing values in the numeric columns.

LaF version 0.6

  • Internal changes: switched documentation to roxygen; changes to namespace and description files to pass tests.

LaF version 0.5

  • Added ability to use data models for opening files. These contain a description of the column names and column types etc. The data models can be saved to and read from YAML-files. For CSV files the data models can also be automatically detected. Blaise data models are also supported. (see laf_open, write_dm, read_dm, read_dm_blaise, detect_dm_csv)
  • The ability to set the levels of columns. When reading these columns are automatically converted to factors.
  • Some routines were added to read specific or random lines from a text file (see get_lines, sample_lines) and to determine the number of lines in a text file (see determine_nlines).
  • Fixed bug with nrows, which could return the wrong number of lines when the size of the file exceeded 2 Gb.

LaF version 0.4

  • Added methods for the calculation of simple column statistics such as mean, sum, number of missings, frequency table, min, max.
  • Fixed bug in skip option in laf_open_csv on windows machines with files with DOS-line breaks
  • laf_open_csv and laf_open_fwf now also accept r'isch column types: numeric, and factor instead of double and categorical.
  • Fixed bug with checking if file can be opened on some windows network shares

LaF version 0.3

  • Fixed bug with categorical columns. The levels of the categorical columns not always match those in the file.
  • Added option to laf_open_fwf and laf_open_csv to trim white space from string and categorical columns.
  • Added option to laf_open_csv to skip the first lines in the file.
  • Added ability to read floating point numbers in scientific format.
  • Added tests.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.8.4 by Jan van der Laan, 2 years ago

Browse source code at

Authors: Jan van der Laan

Documentation:   PDF Manual  

Task views: High-Performance and Parallel Computing with R

GPL-3 license

Imports Rcpp

Depends on methods, utils

Suggests testthat, yaml

Linking to Rcpp

Imported by EdSurvey, SEERaBomb, chunked.

Depended on by easyPSID.

Suggested by blaise, disk.frame, ffbase.

See at CRAN