Found 284 packages in 0.01 seconds
Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
Bibtex Parser
Utility to parse a bibtex file.
Dynamic Function-Oriented 'Make'-Like Declarative Pipelines
Pipeline tools coordinate the pieces of computationally
demanding analysis projects.
The 'targets' package is a 'Make'-like pipeline tool for statistics and
data science in R. The package skips costly runtime for tasks
that are already up to date,
orchestrates the necessary computation with implicit parallel computing,
and abstracts files as R objects. If all the current output matches
the current upstream code and data, then the whole pipeline is up
to date, and the results are more trustworthy than otherwise.
The methodology in this package
borrows from GNU 'Make' (2015, ISBN:978-9881443519)
and 'drake' (2018,
Import 'OpenStreetMap' Data as Simple Features or Spatial Objects
Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (< https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.
Collecting Twitter Data
An implementation of calls designed to collect and organize Twitter data via Twitter's REST and stream Application Program Interfaces (API), which can be found at the following URL: < https://developer.twitter.com/en/docs>.
Managing Larger Data on a GitHub Repository
Because larger (> 50 MB) data files cannot easily be committed to git, a different approach is required to manage data associated with an analysis in a GitHub repository. This package provides a simple work-around by allowing larger (up to 2 GB) data files to piggyback on a repository as assets attached to individual GitHub releases. These files are not handled by git in any way, but instead are uploaded, downloaded, or edited directly by calls through the GitHub API. These data files can be versioned manually by creating different releases. This approach works equally well with public or private repositories. Data can be uploaded and downloaded programmatically from scripts. No authentication is required to download data from public repositories.
Straightforward 'BibTeX' and 'BibLaTeX' Bibliography Management
Provides tools for importing and working with bibliographic references. It greatly enhances the 'bibentry' class by providing a class 'BibEntry' which stores 'BibTeX' and 'BibLaTeX' references, supports 'UTF-8' encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. 'BibTeX' and 'BibLaTeX' '.bib' files can be read into 'R' and converted to 'BibEntry' objects. Interfaces to 'NCBI Entrez', 'CrossRef', and 'Zotero' are provided for importing references and references can be created from locally stored 'PDF' files using 'Poppler'. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with 'RMarkdown' or 'RHTML'.
Generate Citation File Format ('cff') Metadata for R Packages
The Citation File Format version 1.2.0
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker
Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.
Dendrograms for Evolutionary Analysis
Contains functions for developing phylogenetic trees as
deeply-nested lists ("dendrogram" objects).
Enables bi-directional conversion between dendrogram and
"phylo" objects
(see Paradis et al (2004)