Extract Text from Microsoft Word Documents

Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.



  • Fix for sys 2.0 (do not quote shQuote args anymore)


  • Fix gcc8 warning (requested by CRAN)


  • Windows: shQuote() path to file to make it work for paths with spaces
  • Capture error messages sent to stderr() by antiword
  • Simplify build structure a bit
  • Fix UBSAN error

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.3 by Jeroen Ooms, 2 years ago

https://github.com/ropensci/antiword#readme (devel) http://www.winfield.demon.nl (upstream)

Report a bug at http://github.com/ropensci/antiword/issues

Browse source code at https://github.com/cran/antiword

Authors: Jeroen Ooms [aut, cre] , Adri van Os [cph] (Author 'antiword' utility)

Documentation:   PDF Manual  

GPL-2 license

Imports sys

Imported by eurlex, readtext, textreadr.

Suggested by ezpickr, tm.

See at CRAN