Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.


Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

The stringr package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, stringr:

  • Uses consistent functions and argument names.

  • Simplifies string operations by eliminating options that you don't need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

  • Is built on top of stringi which uses the ICU library to provide fast, correct implementations of common string manipulations

To get the current released version from CRAN:

install.packages("stringr")

To get the current development version from github:

# install.packages("devtools")
devtools::install_github("hadley/stringr")

stringr provides the pipe, %>%, from magrittr to make it easy to string together sequences of string operations:

letters %>%
  str_pad(5, "right") %>%
  str_c(letters)

News

stringr 1.1.0

  • Add sample datasets: fruit, words and sentences.

  • fixed(), regex(), and coll() now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement for perl() is regex() not regexp() (#61). boundary() has improved defaults when splitting on non-word boundaries (#58, @lmullen).

  • str_detect() now can detect boundaries (by checking for a str_count() > 0) (#120). str_subset() works similarly.

  • str_extract() and str_extract_all() now work with boundary(). This is particularly useful if you want to extract logical constructs like words or sentences. str_extract_all() respects the simplify argument when used with fixed() matches.

  • str_subset() now respects custom options for fixed() patterns (#79, @gagolews).

  • str_replace() and str_replace_all() now behave correctly when a replacement string contains $s, \\\\1, etc. (#83, #99).

  • str_split() gains a simplify argument to match str_extract_all() etc.

  • str_view() and str_view_all() create HTML widgets that display regular expression matches (#96).

  • word() returns NA for indexes greater than number of words (#112).

stringr 1.0.0

  • stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.

  • stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.

  • str_c() now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using str_c("x", NA) now yields NA. If you want "xNA", use str_replace_na() on the inputs.

  • str_replace_all() gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:

    input <- c("abc", "def")
    str_replace_all(input, c("[ad]" = "!", "[cf]" = "?"))
  • str_match() now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with str_extract() and other match failures.

  • New str_subset() keeps values that match a pattern. It's a convenient wrapper for x[str_detect(x)] (#21, @jiho).

  • New str_order() and str_sort() allow you to sort and order strings in a specified locale.

  • New str_conv() to convert strings from specified encoding to UTF-8.

  • New modifier boundary() allows you to count, locate and split by character, word, line and sentence boundaries.

  • The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.

  • ignore.case(x) has been deprecated in favour of fixed|regex|coll(x, ignore.case = TRUE), perl(x) has been deprecated in favour of regex(x).

  • str_join() is deprecated, please use str_c() instead.

stringr 0.6.2

  • fixed path in str_wrap example so works for more R installations.

  • remove dependency on plyr

stringr 0.6.1

  • Zero input to str_split_fixed returns 0 row matrix with n columns

  • Export str_join

stringr 0.6

  • new modifier perl that switches to Perl regular expressions

  • str_match now uses new base function regmatches to extract matches - this should hopefully be faster than my previous pure R algorithm

stringr 0.5

  • new str_wrap function which gives strwrap output in a more convenient format

  • new word function extract words from a string given user defined separator (thanks to suggestion by David Cooper)

  • str_locate now returns consistent type when matching empty string (thanks to Stavros Macrakis)

  • new str_count counts number of matches in a string.

  • str_pad and str_trim receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up

  • str_length returns NA for invalid multibyte strings

  • fix small bug in internal recyclable function

stringr 0.4

  • all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters
  • fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors.
  • new ignore.case() modifier tells stringr functions to ignore case of pattern.
  • str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions.
  • new str_sub<- function (analogous to substring<-) for substring replacement
  • str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end.
  • str_pad side argument can be left, right, or both (instead of center)
  • str_trim gains side argument to better match str_pad
  • stringr now has a namespace and imports plyr (rather than requiring it)

stringr 0.3

  • fixed() now also escapes |
  • str_join() renamed to str_c()
  • all functions more carefully check input and return informative error messages if not as expected.
  • add invert_match() function to convert a matrix of location of matches to locations of non-matches
  • add fixed() function to allow matching of fixed strings.

stringr 0.2

  • str_length now returns correct results when used with factors
  • str_sub now correctly replaces Inf in end argument with length of string
  • new function str_split_fixed returns fixed number of splits in a character matrix
  • str_split no longer uses strsplit to preserve trailing breaks

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("stringr")

1.2.0 by Hadley Wickham, 8 months ago


http://stringr.tidyverse.org, https://github.com/tidyverse/stringr


Report a bug at https://github.com/tidyverse/stringr/issues


Browse source code at https://github.com/cran/stringr


Authors: Hadley Wickham [aut, cre, cph], RStudio [cph]


Documentation:   PDF Manual  


GPL-2 | file LICENSE license


Imports stringi, magrittr

Suggests testthat, knitr, htmltools, htmlwidgets, rmarkdown, covr


Imported by AFM, ALA4R, APSIM, AmostraBrasil, ApacheLogProcessor, BETS, BTLLasso, BatchGetSymbols, BatchJobs, BayesFactor, BioInstaller, CIAAWconsensus, CLME, CRANsearcher, Causata, CoFRA, CollapsABEL, DeLorean, DiagrammeR, EasyMARK, EventStudy, Evomorph, FFTrees, GADMTools, GCalignR, GERGM, GUIgems, GenomicTools, GetHFData, GetITRData, GetLattesData, GetTDData, Greg, HTSSIP, HURDAT, HistogramTools, HydeNet, IATscores, IRISMustangMetrics, IRISSeismic, ISOweek, KoNLP, LAGOSNE, MODIStsp, MSbox, MazamaSpatialUtils, MetaIntegrator, NFP, NMF, NNTbiomarker, OpenRepGrid, P2C2M, PATHChange, PKPDmisc, PWFSLSmoke, Plasmidprofiler, PubMedWordcloud, QCAtools, R2ucare, RDML, RInno, RLogicalOps, RNeXML, RQGIS, RSDA, RSentiment, RSiteCatalyst, RcppOctave, RefManageR, RevEcoR, Rilostat, RndTexExams, Rnightlights, SSRA, SciencesPo, SeerMapper, Seurat, ShinyItemAnalysis, ShinyTester, SnakeCharmR, SocialMediaLab, Stack, TLBC, TSTr, TcGSA, VDAP, WikiSocio, Xplortext, abcrf, abjutils, aemo, afex, algstat, alphavantager, apa, aqp, asciiSetupReader, aslib, assignPOP, atlantistools, auk, badgecreatr, banR, banxicoR, bea.R, beepr, betalink, bib2df, bibliometrix, bibtex, biogeo, biomartr, blastula, blkbox, bold, boostr, boxr, breathtestcore, breathteststan, bridgesampling, broom, brr, bsplus, censusr, choroplethr, civis, ck37r, colormap, commentr, configr, congressbr, crtests, cruts, cymruservices, d3Tree, dMod, dartR, data.tree, datacheck, datadogr, dataone, datasus, detector, difconet, distcomp, docopt, docxtools, dotwhisker, dplR, dplyrAssist, drLumi, drake, dynamichazard, eclust, eemR, eiCompare, elementR, emuR, enaR, epitable, etl, eurostat, evaluate, exampletestr, eyelinker, ez, fastLink, fbRanks, fbar, fergm, fitbitScraper, flextable, fragilityindex, fungible, futureheatwaves, fuzzyjoin, gaiah, games, gastempt, genderizeR, genemodel, geoparser, gfcanalysis, ggformula, ggplotgui, ggraptR, gitlabr, gogamer, googlesheets, gphmm, gsheet, gutenbergr, hddtools, highcharter, hoardeR, htmlTable, hurricaneexposure, hybridModels, imager, io, kableExtra, kehra, keyringr, kgschart, knitr, kntnr, kokudosuuchi, latex2exp, leaflet.extras, lexRankr, lidR, lifelogr, liftr, linear.tools, lmem.gwaser, lmem.qtler, lubridate, m2r, madrat, mailR, managelocalrepo, matlabr, mau, mem, memapp, metacoder, metagear, mglR, mgm, micromapST, modeval, modules, morse, mrMLM, mtconnectR, muir, nandb, narray, nauf, net.security, netgen, networkreporting, ngstk, nhanesA, nmfgpu4R, noaastormevents, nparACT, nscprepr, nucim, oai, optiRum, optiSel, optiSolve, optim.functions, outreg, packagedocs, parsemsf, phenopix, phrasemachine, phybreak, pipefittr, pixiedust, pkgcopier, pkgmaker, plotKML, pmml, pointblank, pollstR, polywog, postGIStools, powerbydesign, pre, predatory, primerTree, prisonbrief, processmapR, profr, profvis, proustr, prozor, pryr, ptstem, pubprint, pxweb, qrcode, qualtRics, quantoptr, rAvis, rClinicalCodes, rNOMADS, rPraat, rSQM, rUnemploymentData, radiant.model, randomcoloR, rapport, ratios, rattle, rcrossref, rcv, readJDX, redcapAPI, reportRx, reshape2, revengc, rmarkdown, rngtools, rnrfa, robotstxt, rodham, rollply, ropercenter, roxygen2, rpcdsearch, rpdo, rprime, rpubchem, rslp, rsnps, rsunlight, rtide, rtimicropem, rusda, ryouready, sasMap, satscanMapper, sbtools, scholar, scientoText, searchConsoleR, selectr, sidrar, simPH, simcausal, simr, sjmisc, smpic, snakecase, sophisthse, spant, spatsurv, spellcheckr, sperrorest, spind, sqliter, stacomiR, standardize, starmie, statar, stationaRy, statquotes, statsDK, stm, stormwindmodel, stplanr, stremr, stringformattr, striprtf, subspace, surveydata, survtmle, sweidnumbr, swirl, swirlify, tangram, taxa, taxize, templates, textmineR, textreuse, tibbletime, tidycensus, tidyquant, tidytext, tidyverse, tigris, timelineR, tmlenet, touch, translateSPSS2R, treeman, tropr, tspmeta, uavRmp, ucbthesis, ukds, uptasticsearch, urlshorteneR, utilsIPEA, vagalumeR, validaRA, valr, vcfR, vembedr, vennplot, vetools, vortexR, vows, vqtl, webTRISr, webchem, wikilake, wikipediatrend, wordbankr, wux, x.ent, x12, x12GUI, xesreadR, yhatr, ztype.

Depended on by AnDE, FRESA.CAD, Fgmutils, LindenmayeR, Maeswrap, NNS, PepPrep, PersomicsArray, PhysActBedRest, RGENERATEPREC, RJafroc, RSMET, VarfromPDB, ViSiElse, acs, dataPreparation, eqs2lavaan, exsic, filesstrings, geotopbricks, installr, lero.lero, lettercase, mpoly, mtk, muRL, neuroim, orgR, pMineR, pafdR, patchSynctex, pxR, quipu, recoder, rsgcc, rsurfer, sdcTable, sim1000G, snpReady, sqlutils, ssh.utils, surveybootstrap, tumblR, vardpoor.

Suggested by BradleyTerryScalable, ClimClass, GSIF, MARSS, ProjectTemplate, SoundexBR, arsenal, blogdown, cytominer, dtree, eeptools, envDocument, fivethirtyeight, fontMPlus, frequencyConnectedness, ggenealogy, ggmap, heemod, icd, jpmesh, leaflet.esri, lemon, miscset, optparse, plotROC, ragtop, rex, rfordummies, rmonad, simPop, sweep, taRifx, text2vec, tikzDevice, timetk, unpivotr, usmap, valaddin, vkR, wingui, wsrf.


See at CRAN