Manipulation of Microsoft Word and PowerPoint Documents

Access and manipulate 'Microsoft Word' and 'Microsoft PowerPoint' documents from R. The package focuses on tabular and graphical reporting from R; it also provides two functions that let users get document content into data objects. A set of functions lets add and remove images, tables and paragraphs of text in new or existing documents. When working with 'PowerPoint' presentations, slides can be added or removed; shapes inside slides can also be added or removed. When working with 'Word' documents, a cursor can be used to help insert or delete content at a specific location in the document. The package does not require any installation of Microsoft products to be able to write Microsoft files.


The officer package lets R users manipulate Word (.docx) and PowerPoint (*.pptx) documents. In short, one can add images, tables and text into documents from R. An initial document can be provided; contents, styles and properties of the original document will then be available.

This package is close to ReporteRs as it produces Word and PowerPoint files but it is faster, do not require rJava (but xml2) and has less functions that will make it easier to maintain.

Word documents

The read_docx() function will read an initial Word document (an empty one by default) and let you modify its content later.

The package provides functions to add R outputs into a Word document:

  • images: produce your plot in png or emf files and add them into the document, as a whole paragraph or inside a paragraph.
  • tables: add data.frames as tables, format is defined by the associated Word table style.
  • text: add text as paragraphs or inside an existing paragraph, format is defined by the associated Word paragraph and text styles.
  • field codes: add Word field codes inside paragraphs. Field codes is an old feature of MS Word to create calculated elements such as tables of contents, automatic numbering and hyperlinks.

In a Word document, one can use cursor functions to reach the beginning or end of a document, or a particular paragraph containing a given text. This cursor concept has been implemented to make the post processing of files easier.

File generation is performed with the print function.

The function docx_summary() reads and imports content of a Word document into a tibble object. The function handles paragraphs, tables and section breaks.

PowerPoint documents

The function read_pptx() will read an initial PowerPoint document (an empty one by default) and let you modify its content later.

The package provides functions to add R outputs into existing or new PowerPoint slides:

  • images: produce your plot in png or emf files and add them in a slide.
  • tables: add data.frames as tables, format is defined by the associated PowerPoint table style.
  • text: add text as paragraphs or inside an existing paragraph, format is defined in the corresponding layout of the slide.

In a PowerPoint document, one can set a slide as selected and reach a particular shape (and remove it or add text).

ile generation is performed with the print() function.

import PowerPoint document in a data.frame

The pptx_summary() function reads and imports content of a PowerPoint document into a tibble object. The function handles paragraphs, tables and images.

Extensions

Tables and package flextable

The package flextable brings a full API to produce nice tables and use them with officer. Tables can be written in PowerPoint documents and Word documents. An option is available to render flextables in rmarkdown (HTML and Word outputs).

Vector graphics with package rvg

The package rvg brings an API to produce nice vector graphics that can be embedded in PowerPoint documents or Excel workbooks with officer.

Native office charts with package mschart

The package mschart combined with officer can produce native office charts in PowerPoint and Word documents.

Installation

You can get the development version from GitHub:

devtools::install_github("davidgohel/officer")

Or the latest version on CRAN:

install.packages("officer")

Ressources

Online documentation

The help pages are located at https://davidgohel.github.io/officer/.

Getting help

This project is developped and maintained on my own time. In order to help me to maintain the package, do not send me private emails if you only have questions about how to use the package. Instead, visit Stackoverflow, officer has its own tag Stackoverflow link! I usually read them and answer when possible.

Contributing to the package

Code of Conduct

Anyone getting involved in this package agrees to our Code of Conduct.

Bug reports

When you file a bug report, please spend some time making it easy for me to follow and reproduce. The more time you spend on making the bug report coherent, the more time I can dedicate to investigate the bug as opposed to the bug report.

Contributing to the package development

A great way to start is to contribute an example or improve the documentation.

If you want to submit a Pull Request to integrate functions of yours, please provide:

  • the new function(s) with code and roxygen tags (with examples)
  • a new section in the appropriate vignette that describes how to use the new function
  • add corresponding tests in directory tests/testthat.

By using rhub (run rhub::check_for_cran()), you will see if everything is ok. When submitted, the PR will be evaluated automatically on travis and appveyor and you will be able to see if something broke.

News

officer 0.2.1

Issues

  • fix issue #97 with function pptx_summary()

officer 0.2.0

Enhancement

  • new function body_replace_all_text() to replace any text in a Word document
  • new functions for xlsx files (experimental).
  • new functions ph_with_gg() and ph_with_gg_at() to make easier production of ggplot objects in PowerPoint
  • new functions ph_with_ul() to make easier production of unordered lists of text in PowerPoint

Issues

  • an error is raised when adding an image with blank(s) in its basename (i.e. /home/user/bla bla.png).

officer 0.1.8

Issues

  • decrease execution time necessary to add elements into big slide deck
  • fix encoding issue in function "*_add_table"
  • fix an issue with complex slide layouts (there is still an issue left but don't know how to manage it for now)

Changes

  • Functions slide_summary and layout_properties now return inches.

officer 0.1.7

Enhancement

  • new function body_replace_at to replace text inside bookmark
  • argument header for body_add_table and ph_with_table.
  • layout_properties now returns placeholder id when available.

Issues

  • an error is now occurring when an incorrect index is used with ph_with_* functions.

officer 0.1.6

Enhancement

  • function ph_empty_at can now make new shapes inherit properties from template

Changes

  • drop gdtools dependency

officer 0.1.5

Enhancement

  • new function body_default_section
  • fp_border supports width in double precision

Issues

  • characters <, > and & are now html encoded
  • on_slide index is now the correct slide number id.

Changes

  • drop dplyr deprecated verbs from code
  • rename break_column to break_column_before.

officer 0.1.4

Issues

  • body_end_section is supposed to only work with cursor on a paragraph, an error is raised now if ending a section on something else than a paragraph.

Enhancement

  • read_pptx run faster than in previous version thanks to some code refactoring

officer 0.1.3

new feature

  • new function media_extract to extract a media file from a document object. This function can be used to access images stored in a PowerPoint file.

Issues

  • drop magick dependence

officer 0.1.2

new features

  • new functions docx_summary and pptx_summary to import content of an Office document into a tidy data.frame.
  • new function docx_dim() is returning current page dimensions.
  • new functions set_doc_properties and doc_properties to let you modify/access metadata of Word and PowerPoint documents.
  • cursor can now reach paragraphs with a bookmark (functions body_bookmark and cursor_bookmark).
  • Content can be inserted at any arbitrary location in PowerPoint (functions ph_empty_at, ph_with_img_at and ph_with_table_at).

Issues

  • cast all columns of data.frame as character when using ph_with_table and body_add_table
  • fix pptx when more than 9 slides

officer 0.1.1

Enhancement

  • argument style of functions body_add* and slip_in* now will use docx default style if not specified
  • new function body_add_gg to add ggplots to Word documents
  • new function test_zip for diagnostic purpose

API changes

  • classes docx and pptx have been renamed rdocx and pptx to avoid conflict with package ReporteRs.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("officer")

0.3.0 by David Gohel, 17 days ago


https://davidgohel.github.io/officer


Report a bug at https://github.com/davidgohel/officer/issues


Browse source code at https://github.com/cran/officer


Authors: David Gohel [aut, cre], Frank Hangler [ctb] (function body_replace_all_text), Liz Sander [ctb] (several documentation fixes), Jon Calder [ctb] (update vignettes), John Harrold [ctb] (fuction annotate_base)


Documentation:   PDF Manual  


GPL-3 license


Imports Rcpp, R6, R.utils, grDevices, base64enc, zip, digest, uuid, utils, stats, magrittr, htmltools, xml2

Suggests testthat, devEMF, knitr, tibble, ggplot2, rmarkdown

Linking to Rcpp


Imported by SWMPrExtension, WordR, flextable, mschart, rvg.

Suggested by huxtable.


See at CRAN