A Simple Data Science Challenge System

A simple data science challenge system using R Markdown and Dropbox < https://www.dropbox.com/>. It requires no network configuration, does not depend on external platforms like e.g. Kaggle < https://www.kaggle.com/> and can be easily installed on a personal computer.


The rchallenge R package provides a simple data science competition system using R Markdown and Dropbox with the following features:

  • No network configuration required.
  • Does not depend on external platforms like e.g. Kaggle.
  • Can be easily installed on a personal computer.
  • Provides a customizable template in english and french.

Further documentation is available in the Reference manual.

Please report bugs, troubles or discussions on the Issues tracker. Any contribution to improve the package is welcome. Install the R package from CRAN repositories

install.packages("rchallenge")

or install the latest development version from GitHub

# install.packages("devtools")
devtools::install_github("adrtod/rchallenge")

A recent version of pandoc (>= 1.12.3) is also required. See the pandoc installation instructions for details on installing pandoc for your platform.

Install a new challenge in Dropbox/mychallenge:

setwd("~/Dropbox/mychallenge")
library(rchallenge)
?new_challenge
new_challenge()

or for a french version:

new_challenge(template = "fr")

You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge containing:

  • challenge.rmd: template R Markdown script for the webpage.
  • data: directory of the data containing data_train and data_test datasets.
  • submissions: directory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox.
  • history: directory where the submissions history is stored.

The default challenge provided is a binary classification problem on the German Credit Card dataset.

You can easily customize the challenge in two ways:

  • During the creation of the challenge: by using the options of the new_challenge function.
  • After the creation of the challenge: by manually replacing the data files in the data subdirectory and the baseline predictions in submissions/baseline and by customizing the template challenge.rmd as needed.

To complete the installation:

  1. Create and share subdirectories in submissions for each team:

    ?new_team
    new_team("team_foo", "team_bar")
  2. Render the HTML page:

    ?publish
    publish()

    Use the output_dir argument to change the output directory. Make sure the output HTML file is rendered, e.g. using GitHub Pages.

  3. Give the URL to your HTML file to the participants.

  4. Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.

From now on, a fully autonomous challenge system is set up requiring no further administration. With each update, the program automatically performs the following tasks using the functions available in our package:

You can setup the following line to your crontab using crontab -e (mind the quotes):

0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")'

This will render a HTML webpage every hour. Use the output_dir argument to change the output directory.

If your challenge is hosted on a Github repository you can automate the push:

0 * * * * cd ~/Dropbox/mychallenge && Rscript -e 'rchallenge::publish()' && git commit -m "update html" index.html && git push

You might have to add the path to Rscript and pandoc at the beginning of your crontab:

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command:

0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'

You can use the Task Scheduler to create a new task with a Start a program action with the settings (mind the quotes):

  • Program/script: Rscript.exe
  • options: -e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')
  • The rendering of HTML content provided by Dropbox will be discontinued from the 3rd October 2016 for Basic users and the 1st September 2017 for Pro and Business users. See https://www.dropbox.com/help/16. Alternatively, GitHub Pages provide an easy HTML publishing solution via a simple GitHub repository.

Please contact me to add yours.

Copyright (C) 2014-2015 Adrien Todeschini.

Contributions from Robin Genuer.

Design inspired by Datascience.net, a french platform for data science challenges.

The rchallenge package is licensed under the GPLv2 (https://www.gnu.org/licenses/gpl-2.0.html).

  • [ ] do not take baseline into account in ranking
  • [ ] examples, tests, vignettes
  • [ ] interactive plots with ggvis
  • [ ] check arguments
  • [ ] interactive webpage using Shiny

News

rchallenge 1.3.0 (23-10-2016)

  • output_dir argument of publish function now defaults to "index.html". Useful for hosting the challenge on a GitHub repo with Github pages.
  • glyphicon is defunct. use icon instead of glyphicon.
  • print_readerr displays a table.
  • get_best returns a single data.frame instead of a list with one data.frame per metric. the ranking can be based on several metrics in a specific order to break ties.
  • update_rank_diff and print_leaderboard take a single data.frame as input

rchallenge 1.2.0 (05-10-2016)

  • output_dir argument of publish function now defaults to the input directory instead of "~/Dropbox/Public" because Dropbox rendering of HTML content is discontinued.

rchallenge 1.1.1 (25-11-2015)

  • fixed bug for "submitted after the deadline"
  • added some comments in template rmd files
  • added overall package documentation

rchallenge 1.1 (16-05-2015)

  • added out_rmdfile argument to new_challenge
  • changed template argument to c("en", "fr")
  • fixed bugs
  • added examples to doc
  • available on CRAN

rchallenge 1.0 (15-04-2015)

  • new name
  • changes in readme
  • new_team can create several teams
  • instructions for windows

rchallenge 0.2 (05-03-2015)

  • exported new_team function
  • suppressed dependency to caret package
  • fixed change of directory in publish
  • improved messages

rchallenge 0.1 (21-01-2015)

  • initial package release
  • easy installation
  • roxygen documentation
  • english and french templates

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rchallenge")

1.3.0 by Adrien Todeschini, 7 months ago


https://adrtod.github.io/rchallenge


Report a bug at https://github.com/adrtod/rchallenge/issues


Browse source code at https://github.com/cran/rchallenge


Authors: Adrien Todeschini [aut, cre], Robin Genuer [ctb]


Documentation:   PDF Manual  


GPL-2 license


Imports rmarkdown, knitr

System requirements: pandoc (>= 1.12.3) - http://johnmacfarlane.net/pandoc


See at CRAN