Get the Category of Content Hosted by a Domain

Get the category of content hosted by a domain. Use Shallalist < http://shalla.de/>, Virustotal (which provides access to lots of services) < https://www.virustotal.com/>, McAfee < https://www.trustedsource.org/>, Alexa < https://aws.amazon.com/awis/>, DMOZ < http://www.dmoz.org/>, or validated machine learning classifiers based on Shallalist data to learn about the kind of content hosted by a domain.


The package provides a few ways to classify domains based on their content. You can either get the categorizations from shallalist, trusted (McAfee), DMOZ, Alexa API (which uses the DMOZ Data), or virustotal API, or use validated machine learning models based off the shallalist data.

To get the current released version from CRAN:

install.packages("rdomains")

To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("soodoku/domain_classifier/rdomains", build_vignettes = TRUE)

To learn how to use rdomains, read this vignette. Or launch the vignette within R:

vignette("rdomains", package = "rdomains")

Scripts are released under the MIT License.

News

rdomains 0.1.5

  • Shallalist and DMOZ data read in with stringAsFactors as FALSE.
  • Swapped the DMOZ data to domain level category data, included English translations of non-English categories, quote protection of multiple categories.
  • Accounting for changes in RSelenium --- startServer() for instance is deprecated. But currently only allow for passing of log for trusted_cat.
  • Fixed bug in shalla_cat for multiple domain names arguments
  • Fixed small issue with adult_ml1_cat() whose returned data.frame had a column that was a named list. The column is now a vector.
  • If an unknown domain is passed to virustotal, it will return an empty data.frame rather than throw an error.

rdomains 0.1.0

  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rdomains")

0.1.5 by Gaurav Sood, 5 months ago


Browse source code at https://github.com/cran/rdomains


Authors: Gaurav Sood [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports Matrix, urltools, glmnet, stats, methods, RSelenium, XML, curl, virustotal, aws.alexa

Suggests testthat, rmarkdown, knitr


See at CRAN