Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software

Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.

Author: Sylvain Loiseau
License: BSD_3_clause


devtools::install_github("sylvainloiseau/interlineaR",  build_vignettes=TRUE)


Import an interlinearised corpus in the EMELD XML format (as exported from SIL FieldWorks for instance):

path <- system.file("exampleData", "tuwariInterlinear.xml", package="interlineaR")
corpus <- read.emeld(path, vernacular.languages="tww")

Import an interlinearised corpus in Toolbox (SIL) format:

path <- system.file("exampleData", "tuwariToolbox.txt", package="interlineaR")
corpus <- read.toolbox(path)

Import a dictionary in the LIFT XML format (as exported from SIL FieldWorks for instance):

dicpath <- system.file("exampleData", "tuwariDictionary.lift", package="interlineaR")
dictionary <- read.lift(dicpath, language.code="tww")


See the vignette interlineaR for an overview of the data model and the functions of this package.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0 by Sylvain Loiseau, 9 months ago

Browse source code at

Authors: Sylvain Loiseau [aut, cre]

Documentation:   PDF Manual  

BSD_3_clause + file LICENSE license

Depends on xml2, reshape2

Suggests kableExtra, knitr, rmarkdown, testthat

See at CRAN