'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API < https://www.datarobot.com/>.


News

v2.8.0

New features:

  • A new premium feature, Time Series, is now available. New projects can be created as time series projects which automatically derive features from past data and forecast the future. See the time series documentation in the web app for more information.
  • The DataRobot API supports the creation, training, and predicting of multiclass classification projects. DataRobot, by default, handles a dataset with a numeric target column as regression. If your data has a numeric cardinality of up to 10 classes, you can override this behavior to instead create a multiclass classification project from the data. To do so, use the SetTarget function, setting targetType = TargetType$Multiclass. If DataRobot recognizes your data as categorical, and it has up to 10 classes, using multiclass will create a project that classifies which label the data belongs to.
  • With the introduction of Multiclass Classification projects, DataRobot needed a better way to explain the performance of a multiclass model so we created a new Confusion Chart. The API now supports retrieving and interacting with confusion charts.
  • GetFeatureInfo and ListFeatureInfo now return the EDA summary statistics (i.e., mean, median, minum, maximum, and standard deviation) for features where this is available (e.g., numeric, date, time, currency, and length features). These summary statistics will be formatted in the same format as the data it summarizes.
  • The DataRobot API now includes Rating Tables. A rating table is an exportable CSV representation of a model. Users can influence predictions by modifying them and creating a new model with the modified table.
  • You can now set scaleoutModelingMode when setting a project target. It can be used to control whether scaleout models appear in the autopilot and/or available blueprints. Scaleout models are only supported in the Hadoop enviroment with the corresponding user permission set.
  • You can now set accuracyOptimizedBlueprints when setting a project target. Accuracy optimized blueprints are longer running model blueprints that provide increased accuracy over the normal blueprints that run during autopilot.
  • DataRobot now supports retrieving model blueprint charts via GetModelBlueprintChart and model blueprint documentation via GetModelBlueprintDocumentation. These are like regular blueprint charts and blueprint documentation, except for model blueprints, which are a reduced representation of the blueprint run by the model to only include the relevant branches actually executed by the model.
  • The Datarobot API now supports generating and retrieving training predictions, which are predictions made by the model on out-of-fold training data. Users can start a job which will make training predictions and retrieve them. See the training predictions documentation in the web app for more information on how to use training predictions.

Enhancements:

  • CreateDatetimePartitionSpecification now includes the optional disableHoldout flag that can be used to disable the holdout fold when creating a project with datetime partitioning.
  • The advanced options available when setting the target have been extended to include the new parameters offset and exposure to allow specifying offset and exposure columns to apply to predictions generated by models within the project. See the user guide documentation in the web app for more information on offset and exposure columns.
  • The advanced options available when setting the target have been extended to include the new parameter eventsCount to allow specifying the events count column. See the user guide documentation in the webapp for more information on events count.
  • File URIs can now be used as sourcedata when creating a project or uploading a prediction dataset. The file URI must refer to an allowed location on the server, which is configured as described in the user guide documentation.
  • If this package is used in RStudio v1.1 or higher, it is possible to use the RStudio Connections UI to open a DataRobot connection.
  • When retrieving reason codes on a project using an exposure column, predictions that are adjusted for exposure can be retrieved.
  • ConnectToDataRobot now supports an option sslVerify that turns off SSL verification if set to FALSE.

Bugfixes:

  • Fixes a bug that prevented GetReasonCodesMetadataFromJobId from being called with a project directly (instead of a project id).
  • Fixes a bug that prevented RequestNewModel from being called when options(stringsAsFactors = TRUE) is set.
  • Fixes a bug that prevented more than one blueprint document from being returned by GetBlueprintDocuments (now named GetBlueprintDocumentation).

Deprecated and Defunct:

  • The quickrun parameter on SetTarget, the ability to use GetFeatureInfo with feature IDs, the GetRecommendedBlueprints function, and the RequestPredictions function were all originally planned to be deprecated in version 3.0. These features and functions will now be deprecated in v2.10 instead.
  • GetModelObject is replaced by GetModel and deprecated (and will be removed in v2.10).
  • GetAllModels is replaced by ListModels and deprecated (and will be removed in v2.10).
  • GetBlueprintDocuments is replaced by GetBlueprintDocumentation and deprecated (and will be removed in v2.10).

v2.7.1

Documentation Changes:

  • The modelwordcloud package is now available on CRAN, so the documentation has been updated to reflect CRAN installation instructions.

v2.7.0

New features:

  • Word cloud data for text processing models can be retrieved using GetWordCloud function.
  • Scoring code JAR file can be downloaded for models supporting code generation using 'DownloadScoringCode` function.
  • Lift Chart data can be retrieved using GetLiftCharts and GetAllLiftCharts function.
  • Roc Curve data for binary classification projects can be retrieved using GetRocCurve and GetAllRocCurves
  • Status and information about individual jobs can be retieved using GetPredictJob',GetModelJob,GetJobfunctions. Any job can be retrieve viaGetJobwhich is less specific. Only prediction jobs can be retrieved withGetPredictJoband only modeling jobs can be retrieved withGetModelJob`.

Enhancements:

  • GetModelParameters now includes an additional key showing the coefficients for individual stages of multistage models (e.g. Frequency-Severity models).
  • When training a DatetimeModel on a window of data, a timeWindowSamplePct can be specified to take a uniform random sample of the training data instead of using all data within the window.

Bugfixes:

  • Fixed a bug where depending on what version of the R curl library was installed, the client could hang after requesting certain DataRobot jobs.
  • DownloadTransferrableModel now correctly handles HTTP errors.

Dependency Changes:

  • To support new features, jsonlite at version 1.0 or higher and curl at version 1.1 or higher are now required.

Deprecated and Defunct:

  • Semi-automatic autopilot mode is removed. Quick or manual mode can be used instead to get a sparser autopilot.

v2.6.0

New features:

  • Function CreateDerivedFeatureIntAsCategorical has been added. It creates new categorical feature based on parent numerical feature while truncating numerical values to integer. (All of the data in the column should be considered categorical in its string form when cast to an int by truncation. For example the value 3 will be cast as the string 3 and the value 3.14 will also be cast as the string 3. Further, the value -3.6 will become the string -3. Missing values will still be recognized as missing.)
  • Reason Codes, a new feature in DataRobot, is fully supported in the package through several new functions.
  • Functions which allow to access blueprint chart and documentation have been added.
  • Model parameters can now be retrieved using GetModelParameters function.
  • A new partitioning method (datetime partitioning) has been added. The recommended workflow is to preview the partitioning by creating a DatetimePartitioningSpecification using CreateDatetimePartition and CreateBacktestSpecification function and passing it into GenerateDatetimePartition, inspect the results and adjust as needed for the specific project dataset by adjusting the DatetimePartitioningSpecification and re-generating, and then set the target by passing the final DatetimePartitioningSpecification object to the partitioning_method parameter of SetTarget.

Enhancements:

  • The default value of the maxWait parameter used to control how long asynchronous routes are polled has been changed from 1 minute to 10 minutes.

API Changes:

  • projectId has been added to Feature schema
  • The UnpauseQueue function will not longer set the autopilot mode of a project to full autopilot. This means that projects using the (deprecated) SemiAuto autopilot mode will require the autopilot to be advanced via the webapp.

v2.5.0

New features:

  • Functions RequestFrozenModel, GetFrozenModel, GetFrozenModelFromJobId have been added. They allow user to create model with the same tuning parameters as parent model but with different data sample size and get information about frozen models in a project.
  • Functions RequestBlender, GetBlenderModelFromJobId, GetBlenderModel have been added. They allow user to create blender models and get information about blender models in a project.
  • Projects created via the API can now use smart downsampling when setting the target by passing smartDownsampled and majorityDownsamplingRate into the SetTarget function.

Enhancements:

  • Meaningful error messages have been added when the DataRobot endpoint is incorrectly specified in a way that causes redirects (e.g. specifying http for an https endpoint).
  • Previously it was not possible to use user partition columns with cross-validation without specifying a holdout level using the API. This can now be be done by either omitting the cvHoldoutLevel parameter or providing it as NA.

Bugfixes:

API Changes:

Deprecated and Defunct:

  • Support for recommender models has been removed from the DataRobot API. The package has been updated to remove functionality that formerly used this feature.

Documentation Changes:

v2.4.0

New features:

  • The premium feature DataRobot Prime has been added. You can now approximate a model on the leaderboard and download executable code for it. Talk to your account representative if the feature is not available on your account. The new related functions are GetPrimeEligibility, RequestApproximation, ListPrimeModels, GetPrimeModel, GetRulesets, RequestPrimeModel, GetPrimeModelFromJobId, CreatePrimeCode, GetPrimeFileFromJobid, ListPrimeFiles, GetPrimeFile, DownloadPrimeCode
  • A utility function, WaitForJobToComplete, has been added. It will block until the specified job finishes, or raise an error if it does not finish within a specified timeout.
  • Functions SetupProjectFromMySQL, SetupProjectFromOracle, SetupProjectFromPostgreSQL and SetupProjectFromHDFS have been added. They allow user to create DataRobot projects from MySQL, Oracle, PostgreSQL and HDFS data sources.
  • Functions RequestTransferrrableModel, DownloadTransferrableModel, UploadTransferrableModel, GetTransferrrableModel, ListTransferrrableModels, UpdateTransferrrableModel, DeleteTransferrrableModel have been added. They allow user to download models from modeling server and transfer them to special dedicated prediction server (those functions are only useful to users with on-premise environment)

Enhancements:

  • An optional maxWait parameter has been added to GetModelFromJobId and GetFeatureImpactForJobId, to allow users to specify an amount of time to wait for the job to complete other than the default 60 seconds.
  • Projects can now be run in quickrun mode (which skips some autopilot stages and longer-running models) by passing "quick" as the mode parameter, in the same way "auto" and "manual" modes can be specified.
  • The client will now check the API version offered by the server specified in configuration, and a warning if the client version is newer than the server version. The DataRobot server is always backwards compatible with old clients, but new clients may have functionality that is not implemented on older server versions. This issue mainly affects users with on-premise deployments of DataRobot.
  • SetupProject and UploadPredictionDataset accept url as dataSource parameter now

Bugfixes:

  • If a model job errors, GetModelFromJobId will now immediately raise an exception, rather than waiting for the timeout.
  • The maxWait parameter on UploadPredictionDataset will now be correctly applied.

API Changes:

Deprecated and Defunct:

  • The quickrun parameter on SetTarget is deprecated (and will be removed in 3.0). Pass "quick" as the mode parameter instead.

Documentation Changes:

v2.3.0

Enhancements:

  • When project creation using SetupProject times out, the error message now includes a URL to use with the new ProjectFromAsyncUrl function to resume waiting for the project creation.
  • GetFeatureInfo now supports retrieving features by feature name. (For backwards compatibility, feature IDs are still supported until 3.0.)
  • The package no longer a particular version of the methods package. (This dependecy was too strict and required some users to unnecessarily upgrade R.)
  • The projectName argument of SetupProject no longer defaults to the string 'None'. (The new default is not to send a name, which results in the name 'Untitled Project'.)
  • The maxWait argument for SetupProject now controls the timeout for the initial POST request and has a larger default value. The reason for this is that for large project creation file uploads, the server may take a longer-than-normal amount of time to respond, and waiting longer than the default timeout may be necessary.

Deprecated and Defunct:

  • The ability to use GetFeatureInfo with feature IDs is deprecated (and will be removed in 3.0). Use feature names instead.
  • GetRecommendedBlueprints is replaced by ListBlueprints and deprecated (and will be removed in 3.0).
  • RequestPredictions is deprecated and replaced by RequestPredictionsForDataset. RequestPredictionsForDataset will be renamed to RequestPredictions in 3.0.
  • DeletePendingJobs is removed; use DeleteModelJob instead
  • GetFeatures is removed; use ListModelFeatures instead
  • GetPendingJobs is removed; use GetModelJobs instead
  • StartAutopilot is removed; use SetTarget instead
  • parameter url is removed from ConnectToDataRobot
  • parameter jobStatus is removed from GetModelJobs
  • parameters saveFile and csvExtension are removed from RequestPredictions
  • parameters saveFile and csvExtension are removed from SetupProject
  • "semi" mode option (functions SetTarget, StartNewAutoPilot) is deprecated (and will be removed in 3.0).

New features:

  • The API now supports the new Feature Impact feature. Use RequestFeatureImpact to start a job to compute FeatureImpact, and GetFeatureImpactForModel or GetFeatureImpactForJobId to retrieve the completed Feature Impact results.
  • The new functions CreateDerivedFeatureAsCategorical, CreateDerivedFeatureAsText, CreateDerivedFeatureAsNumeric can be used to create derived features as type transforms of existing features.
  • The API now supports uploading (UploadPredictionDataset), listing (ListPredictionDatasets), and deleting (DeletePredictionDataset) datasets for prediction as well as requesting predictions (RequestPredictionsForDataset) against such datasets.

Bugfixes

  • as.data.frame fixed for empty listOfBlueprints, listOfFeaturelists, listOfModels
  • The documentation for SetTarget incorrectly referred to the 'semiauto' (rather than 'semi') autopilot setting. This is fixed.
  • GetPredictions previously used a maxWait of 60, regardless of what maxWait the user specified. This is fixed.

v2.2.33

Bugfixes

  • GetModelJobFromId was broken by v2.2.32 and is now fixed.
  • CreateFeaturelist was broken by v2.2.32 and is now fixed.

v2.2.32

API Changes

  • Package renamed to datarobot.

New features:

  • ListJobs and DeleteJob functions added. ListJobs lists the jobs in the project queue (of any type). DeleteJob can be used to cancel one of these jobs.
  • ListFeatureInfo (for all features) and GetFeatureInfo (for one feature) have been added for retrieving feature details.

Enhancements:

  • In line with new functionality in version 2.2 of the DataRobot API, CreateUserPartition now allows holdoutLevel to be NULL (which results in not sending the holdout level, in line with backend API changes to allow user partitions to be created without a holdout level).
  • Slices using [ from objects of type listOfBlueprints, listOfFeaturelists, and listOfModels will now retain the appropriate type.
  • Several functions (e.g. ConnectToDataRobot, DeleteModel, PauseQueue, etc.) used to return TRUE as their only possible return value. Now they return nothing instead.
  • GetValidMetrics no longer has special-casing for the situation when the project is not yet ready to give you the valid metrics for a potential metric. In this case, an error will now be returned from the server.
  • Error messages from the server now include additional detail.
  • To improve error messages, in several places error messages no longer reference the top-level function the user called.
  • The SetTarget function will now properly block execution until the server indicates the project has finished initializing and is ready to build models

Deprecated and Defunct:

  • GetFeatures has been deprecated and renamed to ListModelFeatures (for more more clarity/consistency in naming and to avoid confusion with the now GetFeatureInfo and ListFeatureInfo)
  • Support for authenticating via username/password has been removed. Use an API Token instead
  • Removed broken UpdateDefaultPartition. To use one of the default partition methods with updated settings, please use CreateRandomPartition or CreateStratifiedPartition.

v2.1.31

Enhancements

  • Use of the WaitForAutopilot function will no longer trigger deprecation warnings

v2.1.30

Bugfixes

  • Due to a dependency on the methods package (which is loaded by default interactively but not running Rscript), RequestPredictions did not work when invoked with Rscript. This is fixed. The methods package is now in 'depends' instead of 'imports' to prevent this problem from ever occurring again.

v2.1.29

Deprecated & Defunct

  • Removed broken UpdateDefaultPartition. Please use the other partition-creating functions.

v2.1.28

Bugfixes

  • Due to a dependency on the methods package (which is loaded by default interactively but not running Rscript), some functions did not work when invoked with Rscript. This is fixed.
  • SetupProject and GetPredictions now check for and displays errors in project creation (previously they would keep waiting and time out if there are errors)
  • Previously errors would sometimes appear missing a space between two words. This is fixed.

v2.1.27

Bugfixes

  • Fixed a problem that caused an error when getting predictions if the installed version of the httr package was 1.0 and older.

v2.1.26

Enhancements:

  • HTTP requests now include User-Agent headers for logging purposes, e.g. "DataRobotRClient/2.0.25 (Darwin 14.5.0 x86_64)".
  • We now provide a more informative error message after receiving HTML from the server when we expected JSON.
  • We avoid httr encoding warning messages by specifying UTF-8.
  • It is now possible to not specify the desired jobStatus in GetPendingJobs (by passing NULL for the jobStatus argument, which is now the default).
  • GetPredictions now checks whether a prediction job has errored or been canceled and will error right away in that case (instead of waiting until the timeout)
  • When specifying the data source as a dataframe (in RequestPredictions or SetupProject), the class may now be a subclass of dataframe (it need not be equal to dataframe).
  • Previously GetModelJobs returned a dataframe when there are jobs but an empty list when there are none. Now it consistently returns a dataframe (with zero rows if there are no jobs) either way.

New features:

  • ConnectToDataRobot can now read from a YAML config file.
  • On package startup, we look for a config file in the default location, so the user does not need to call ConnectToDataRobot explicitly
  • WaitForAutopilot function added. This function periodically checks whether Autopilot is finished and returns only after it is.
  • SetupProject and RequestPredictions now default to using a tempfile instead of placing the file to be uploaded into the current working directory.
  • New function StartNewAutopilot can be used to restart autopilot on a specific featurelist if it was previously running on a different one.
  • New function SetTarget provides the functionality that StartAutopilot used to be responsible for. StartAutopilot is now deprecated, and SetTarget should be used instead. This function can now take a featurelistId argument, specifying which featurelist to use.

Bugfixes:

  • GetPendingJobs (now deprecated in favor of GetModelJobs) was broken and is now fixed.
  • GetValidMetrics was broken and is now fixed.
  • GetProjectList no longer errors when there are no projects. It now returns an object whose structure matches the returned object when there are projects.

Deprecated and Defunct:

  • The arguments controlling where the tempfile goes (in SetupProject and RequestPredictions) are now deprecated
  • DeletePendingJob is deprecated (use DeleteModelJob instead)
  • GetPendingJob is deprecated (use GetModelJob instead)
  • jobStatus argument to GetModelJob/GetPendingJob is deprecated (use status instead)
  • StartAutopilot is deprecated (use SetTarget instead).

API Changes Summary:

  • Support for the experimental date partitioning has been removed in DataRobot API, so it is being removed from the client immediately - the CreateDatePartition function has been removed.

v2.0.25

Enhancements:

  • Codebase cleaned of many lint violations.

New Features:

  • DeletePredictJob, GetPredictJobs, GetPredictions, RequestPredictions all added to control the prediction functionality created in v2.0 featureset of the API.
  • "quickrun" parameter added to StartAutopilot. This boolean enables use of the quickrun autopilot feature of DataRobot.

Bugfixes: None

Deprecated and Defunct: None

API Changes: None

v0.2.24

  • fixes the maxWait parameter that was unsuccessfully introduced in 0.2.23

v0.2.23

  • maxWait parameter added to SetupProject to allow for datasets that take very long to initialize on the DataRobot server

v0.2.22

  • Documentation structure changed to use Roxygen2

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("datarobot")

2.8.0 by Peter Hurford, 4 months ago


Browse source code at https://github.com/cran/datarobot


Authors: Ron Pearson [aut], Zachary Deane-Mayer [aut], David Chudzicki [aut], Dallin Akagi [aut], Sergey Yurgenson [aut], Thakur Raj Anand [aut], Peter Hurford [aut]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


MIT + file LICENSE license


Imports httr, jsonlite, yaml, curl

Depends on methods, stats

Suggests knitr, testthat, lintr, stubthat, data.table, car, MASS, mlbench, beanplot, doBy, insuranceData, rmarkdown, ggplot2, colormap, modelwordcloud


See at CRAN