Reading, Extracting, and Converting an Mbox File into a Tibble

Importing and converting an mbox file into a tibble object.


mboxr

CRANstatus License: GPLv3 Travis buildstatus AppVeyor buildstatus Codecov testcoverage

The goal of mboxr is to allow R users to conveniently import an mbox file into R tibble for hands-on analyses in R environment.

Installation

Python Dependencies

mboxr requires Anaconda Python environment on your system Path.

If you have not installed Conda environment on your system, please download and install Anaconda (Python 3.6 or later is recommended).

For this package, I have implemented mailbox.mbox, email.header.decode_header, email.utils and pandas.DataFrame Python modules into R using reticulate.

R Package Installation

You can install the latest development version as follows:

if(!require(devtools)) {
install.packages("devtools")
}
 
devtools::install_github("jooyoungseo/mboxr")

Stable Version

You can install the released version of mboxr from CRAN with:

install.packages('mboxr')

Usage

Please use read_mbox() function after loading mboxr library like below:

library(mboxr)
# Importing your mbox file into an R:
test <- system.file("extdata", "test1.mbox", package = "mboxr")
data <- read_mbox(test)
data
# > # A tibble: 2 x 6 > date from to cc subject content > <dttm> <chr> <chr>
# <chr> <chr> <chr> > 1 2011-07-08 12:08:34 Author <~ Recipient~ <NA> Sample
# ~ 'This is the bod~ > 2 2011-07-08 12:08:34 Author <~ Recipient~ <NA>
# Sample ~ 'This is the sec~
 
# Or, you can save your mbox file as an RDS file while assigning a tibble
# variable at the same time like below:
data <- read_mbox(mbox = test, file = "output.rds")
data
# > # A tibble: 2 x 6 > date from to cc subject content > <dttm> <chr> <chr>
# <chr> <chr> <chr> > 1 2011-07-08 12:08:34 Author <~ Recipient~ <NA> Sample
# ~ 'This is the bod~ > 2 2011-07-08 12:08:34 Author <~ Recipient~ <NA>
# Sample ~ 'This is the sec~
 
# You can merge all mbox files in your current directory or in any specified
# path into one tibble and save as an RDS file for the integrated one:
test_path <- system.file("extdata", package = "mboxr")
all_data <- merge_mbox_all(path = test_path, file = "all_merged_mbox.rds")
## Find your 'output.rds' file saved in your working directory while freely
## using the imported tibble in your R session!
 
all_data
# > # A tibble: 4 x 6 > date from to cc subject content > <dttm> <chr> <chr>
# <chr> <chr> <chr> > 1 2011-07-08 12:08:34 Author <~ Recipient~ <NA> Sample
# m~ 'This is the bo~ > 2 2011-07-08 12:08:34 Author <~ Recipient~ <NA>
# Sample m~ 'This is the se~ > 3 2011-07-09 12:09:35 Author <~ Recipient~
# <NA> Another ~ 'R is the best!~ > 4 2011-07-10 10:03:32 Author <~
# Recipient~ <NA> The last~ 'This is the la~

News

mboxr 0.1.5

  • A functionality of saving an mbox file object as an Rda file in both read_mbox() and merge_mbox_all() has been replaced with saving it as an RDS.
  • Some parameter names has changed to better reflect their usage in both read_mbox() and merge_mbox_all() functions (e.g., from file to mbox; and from out to file.
  • New variables (i.e., columns) have been added to a returned mbox tibble data_frame object: message_ID; in_reply_to; and references. (Those are commonly used entries in mailing systems like Mailman).
  • Underlying Python script now returns an mbox file name when an error occurs. Sometimes, a new line starting with non-escaped 'From ' in middle of message body causes an error; this does not have to do with this mboxr package; users have to manually modify their original mbox files by replacing a new line starting with 'From ' with '>From ' found in the middle of message body.

mboxr 0.1.4

  • Underlying Python code has been modified to address some critical issues that has caused errors when parsing "From," and "To" variables; this change will resolve any unexpected stops in read_mbox() and merge_mbox_all() functions.
  • read_mbox() and merge_mbox_all() no longer saves an output as a CSV file since CSV format is not sufficient for handling a large amount of data and can even cause some issues; Rda file format is now supported instead.
  • "cc" variable has newly been added to a returned tibble object for an mbox file.
  • "date" variable is now automatically converted into lubridate::as_datetime() object with the default timezone as UTC.

mboxr 0.1.3

  • "UTF-8" has been applied to test mbox and Python files.
  • Author field has been corrected in the Vignette.

mboxr 0.1.2

  • Example mbox files are added for testing purpose.
  • New argument path is added for merge_mbox_all() function.
  • A new column to is added for tibble output for a given mbox file.

mboxr 0.1.1

  • A new function merge_mbox_all() is added for merging all available mbox files in the current active directory into one mbox file as a tibble object.

mboxr 0.1.0

  • Added a NEWS.md file to track changes to the package.
  • Initial release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("mboxr")

0.1.5 by JooYoung Seo, 2 months ago


https://github.com/jooyoungseo/mboxr


Report a bug at https://github.com/jooyoungseo/mboxr/issues


Browse source code at https://github.com/cran/mboxr


Authors: JooYoung Seo [aut, cre] , Soyoung Choi [aut]


Documentation:   PDF Manual  


GPL-3 license


Imports reticulate, tibble, magrittr, purrr, dplyr, lubridate

Suggests knitr, rmarkdown, testthat, covr

System requirements: Anaconda (https://www.anaconda.com/download/)


Imported by ezpickr.


See at CRAN