Get the Category of Content Hosted by a Domain

Get the category of content hosted by a domain. Use Shallalist < http://shalla.de/>, Virustotal (which provides access to lots of services) < https://www.virustotal.com/>, McAfee < https://www.trustedsource.org/>, Alexa < https://aws.amazon.com/awis/>, DMOZ < http://www.dmoz.org/>, or validated machine learning classifiers based on Shallalist data to learn about the kind of content hosted by a domain.


Build Status Build status CRAN_Status_Badge codecov

The package provides a few ways to classify domains based on their content. You can either get the categorizations from shallalist, trusted (McAfee), DMOZ (the service has ended), Alexa API (which uses the DMOZ Data), or virustotal API, or use validated machine learning models based off the shallalist data.

Installation

To get the current release version from CRAN:

install.packages("rdomains")

To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("themains/rdomains", build_vignettes = TRUE)

Usage

To learn how to use rdomains, read this vignette. Or launch the vignette within R:

vignette("rdomains", package = "rdomains")

License

Scripts are released under the MIT License.

News

rdomains 0.1.7

  • Changes due to move to a new repo.
  • Basic brightcloud function added

rdomains 0.1.6

  • Adds not_news classifier that classifies not news based on published work.
  • passes expect_lint_free

rdomains 0.1.5

  • Shallalist and DMOZ data read in with stringAsFactors as FALSE.
  • Swapped the DMOZ data to domain level category data, included English translations of non-English categories, quote protection of multiple categories.
  • Accounting for changes in RSelenium --- startServer() for instance is deprecated. But currently only allow for passing of log for trusted_cat.
  • Fixed bug in shalla_cat for multiple domain names arguments
  • Fixed small issue with adult_ml1_cat() whose returned data.frame had a column that was a named list. The column is now a vector.
  • If an unknown domain is passed to virustotal, it will return an empty data.frame rather than throw an error.

rdomains 0.1.0

  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rdomains")

0.1.7 by Gaurav Sood, 3 months ago


Browse source code at https://github.com/cran/rdomains


Authors: Gaurav Sood [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports Matrix, urltools, glmnet, stats, methods, RSelenium, XML, httr, xml2, curl, virustotal, aws.alexa, rlang

Suggests testthat, rmarkdown, knitr, lintr


See at CRAN