Fuzzy String Matching

Fuzzy string matching implementation of the 'fuzzywuzzy' < https://github.com/seatgeek/fuzzywuzzy> 'python' package. It uses the Levenshtein Distance < https://en.wikipedia.org/wiki/Levenshtein_distance> to calculate the differences between sequences.



The fuzzywuzzyR package is a fuzzy string matching implemenation of the fuzzywuzzy python package. It uses the Levenshtein Distance to calculate the differences between sequences. More details on the functionality of fuzzywuzzyR can be found in the package Vignette.


System Requirements


  • Python (>= 2.4)

  • difflib

  • fuzzywuzzy ( >=0.15.0 )

  • python-Levenshtein ( >=0.12.0, optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)


Before the installation of any python modules one should check the python-configuration using :


reticulate::py_config()
 

All modules should be installed in the default python configuration (the configuration that the R-session displays as default), otherwise errors will occur during package installation.


Debian/Ubuntu/Fedora


Python2

sudo apt-get install python-pip
sudo pip install --upgrade pip
pip install fuzzywuzzy
pip install python-Levenshtein

Python 3

sudo apt-get install python3-pip
sudo pip3 install --upgrade pip
pip3 install fuzzywuzzy
pip3 install python-Levenshtein



Macintosh OSX


sudo easy_install pip
sudo pip install fuzzywuzzy
sudo pip install python-Levenshtein

Windows OS


  • Download of get-pip.py
  • Update of the Environment variables ( Control Panel >> System and Security >> System >> Advanced system settings >> Environment variables >> System variables >> Path >> Edit ) by adding ( for instance in case of python 2.7 ) :
C:\Python27;C:\Python27\Scripts
pip install fuzzywuzzy
pip install python-Levenshtein

Installation of the fuzzywuzzyR package


To install the package from CRAN use,

 
install.packages('fuzzywuzzyR')
 
 

and to download the latest version from Github use the install_github function of the devtools package,

 
devtools::install_github(repo = 'mlampros/fuzzywuzzyR')
 

Use the following link to report bugs/issues,

https://github.com/mlampros/fuzzywuzzyR/issues


News

fuzzywuzzyR 1.0.3

I added an exception in the additional tests, to avoid Solaris OS throw an error if python is not available

fuzzywuzzyR 1.0.2

I added the decoding parameter to the following classes : FuzzExtract, FuzzMatcher and FuzzUtils. The decoding parameter does not apply to the GetCloseMatches and SequenceMatcher classes, because there isn't any force_ascii parameter in the difflib python library. The decoding parameter applies only to python 2 configurations, as in python 3 character strings are decoded to unicode by default. For reference, see the following github issue : https://github.com/mlampros/fuzzywuzzyR/issues/3

fuzzywuzzyR 1.0.1

I added links to the github repository (master repository, issues).

fuzzywuzzyR 1.0.0

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("fuzzywuzzyR")

1.0.3 by Lampros Mouselimis, 2 years ago


https://github.com/mlampros/fuzzywuzzyR


Report a bug at https://github.com/mlampros/fuzzywuzzyR/issues


Browse source code at https://github.com/cran/fuzzywuzzyR


Authors: Lampros Mouselimis <[email protected]>


Documentation:   PDF Manual  


GPL-2 license


Imports reticulate, R6

Suggests testthat, covr, knitr, rmarkdown

System requirements: Python (>= 2.4), difflib, fuzzywuzzy ( >=0.15.0 ), python-Levenshtein ( >=0.12.0 ). Detailed installation instructions for each operating system can be found in the README file.


See at CRAN