Accessing and Processing a 'Mega2' Genetic Database

Uses as input genetic data that have been reformatted and stored in a 'SQLite' database; this database is initially created by the standalone 'mega2' C++ program (available freely from <>). Loads and manipulates data frames containing genotype, phenotype, and family information from the input 'SQLite' database, and decompresses needed subsets of the genotype data, on the fly, in a memory efficient manner. We have also created several more functions that illustrate how to use the data frames as well as perform useful tasks: these permit one to run the 'pedgene' package to carry out gene-based association tests on family data using selected marker subsets, to run the 'SKAT' package to carry out gene-based association tests using selected marker subsets, to run the 'famSKATRC' package to carry out gene-based association tests on families (optionally) and with rare or common variants using selected marker subsets, to output the 'Mega2R' data as a VCF file and related files (for phenotype and family data), and to convert the data frames into CoreArray Genomic Data Structure (GDS) format.

Mega2R is an R package that makes it easy to load SQLite databases created by Mega2 directly into R as data frames. It also provides support for carrying out gene-based association tests, automatically looping over genes, using a variety of other R packages. For more information about Mega2R, see this web page:

The latest development snapshot of the Mega2R R package can be obtained from this Bitbucket repository. Please note that this development snapshot is not as thoroughly tested as our stable release version, but does contain the newest features and changes.

To obtain the stable release version, please go to


Daniel E. Weeks, Ph.D.
Professor of Human Genetics and Biostatistics
Department of Human Genetics
University of Pittsburgh
Public Health 3119
130 DeSoto Street
Pittsburgh, PA 15261

Work: 1 412 624 5388
Email: [email protected]
Web site:
Twitter: @StatGenDan


Version 1.0.5 (2019-02-27)

-- Bug fix: The 'mkfam' function now correctly extracts the case/control trait when the database contains more than one phenotype. It worked correctly previously in our simple example data set where there was only one phenotype in the Mega2 database.

-- Mega2pedgene adjusted to allow specification of the trait name.

Version 1.0.4 (2018-06-18)

-- Improvements to use compressed data created by Mega2 version 5.0.0 or higher.

Version 1.0.3 (2018-05-22)

-- Removed strict dependency on GenABEL because GenABEL has been archived.

Version 1.0.2 (2018-04-03)

  • Bug fix: The init_pedgene function now sets up the trait and pedigree structure correctly.

Version 1.0.0 (2017-08-22)

  • Initial CRAN release

Authors: Robert V. Baron [aut] , Daniel E. Weeks [aut, cre] , University of Pittsburgh [cph]

GPL-2 license

