'AWS S3' Client Package

A simple client package for the Amazon Web Services ('AWS') Simple Storage Service ('S3') 'REST' 'API' < https://aws.amazon.com/s3/>.


aws.s3 is a simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API. While other packages currently connect R to S3, they do so incompletely (mapping only some of the API endpoints to R) and most implementations rely on the AWS command-line tools, which users may not have installed on their system.

To use the package, you will need an AWS account and to enter your credentials into R. Your keypair can be generated on the IAM Management Console under the heading Access Keys. Note that you only have access to your secret key once. After it is generated, you need to save it in a secure location. New keypairs can be generated at any time if yours has been lost, stolen, or forgotten. The aws.iam package profiles tools for working with IAM, including creating roles, users, groups, and credentials programmatically; it is not needed to use IAM credentials.

A detailed description of how credentials can be specified is provided at: https://github.com/cloudyr/aws.signature/. The easiest way is to simply set environmetn variables on the command line prior to starting R or via an Renviron.site or .Renviron file, which are used to set environment variables in R during startup (see ? Startup). Or they can be set within R:

Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
           "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
           "AWS_DEFAULT_REGION" = "us-east-1",
           "AWS_SESSION_TOKEN" = "mytoken")

To use the package with S3-compatible storage provided by other cloud platforms, set the AWS_S3_ENDPOINT environment variable to the appropriate host name. By default, the package uses the AWS endpoint: s3.amazonaws.com

Code Examples

The package can be used to examine publicly accessible S3 buckets and publicly accessible S3 objects without registering an AWS account. If credentials have been generated in the AWS console and made available in R, you can find your available buckets using:

library("aws.s3")
bucketlist()

If your credentials are incorrect, this function will return an error. Otherwise, it will return a list of information about the buckets you have access to.

Buckets

To get a listing of all objects in a public bucket, simply call

get_bucket(bucket = '1000genomes')

Amazon maintains a listing of Public Data Sets on S3.

To get a listing for all objects in a private bucket, pass your AWS key and secret in as parameters. (As described above, all functions in aws.s3 will look for your keys as environment variables by default, greatly simplifying the process of making a s3 request.)

# specify keys in-line
get_bucket(
  bucket = 'my_bucket',
  key = YOUR_AWS_ACCESS_KEY,
  secret = YOUR_AWS_SECRET_ACCESS_KEY
)
 
# specify keys as environment variables
Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
           "AWS_SECRET_ACCESS_KEY" = "mysecretkey")
get_bucket("my_bucket")

S3 can be a bit picky about region specifications. bucketlist() will return buckets from all regions, but all other functions require specifying a region. A default of "us-east-1" is relied upon if none is specified explicitly and the correct region can't be detected automatically. (Note: using an incorrect region is one of the most common - and hardest to figure out - errors when working with S3.)

Objects

There are eight main functions that will be useful for working with objects in S3:

  1. s3read_using() provides a generic interface for reading from S3 objects using a user-defined function
  2. s3write_using() provides a generic interface for writing to S3 objects using a user-defined function
  3. get_object() returns a raw vector representation of an S3 object. This might then be parsed in a number of ways, such as rawToChar(), xml2::read_xml(), jsonlite::fromJSON(), and so forth depending on the file format of the object
  4. save_object() saves an S3 object to a specified local file
  5. put_object() stores a local file into an S3 bucket
  6. s3save() saves one or more in-memory R objects to an .Rdata file in S3 (analogously to save()). s3saveRDS() is an analogue for saveRDS()
  7. s3load() loads one or more objects into memory from an .Rdata file stored in S3 (analogously to load()). s3readRDS() is an analogue for readRDS()
  8. s3source() sources an R script directly from S3

They behave as you would probably expect:

# save an in-memory R object into S3
s3save(mtcars, bucket = "my_bucket", object = "mtcars.Rdata")
 
# `load()` R objects from the file
s3load("mtcars.Rdata", bucket = "my_bucket")
 
# get file as raw vector
get_object("mtcars.Rdata", bucket = "my_bucket")
# alternative 'S3 URI' syntax:
get_object("s3://my_bucket/mtcars.Rdata")
 
# save file locally
save_object("mtcars.Rdata", file = "mtcars.Rdata", bucket = "my_bucket")
 
# put local file into S3
put_object(file = "mtcars.Rdata", object = "mtcars2.Rdata", bucket = "my_bucket")

Installation

CRAN Downloads Build Status codecov.io

This package is not yet on CRAN. To install the latest development version you can install from the cloudyr drat repository:

# latest stable version
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))
 
# on windows you may need:
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"), INSTALL_opts = "--no-multiarch")

Or, to pull a potentially unstable version directly from GitHub:

if (!require("remotes")) {
    install.packages("remotes")
}
remotes::install_github("cloudyr/aws.s3")

cloudyr project logo

News

aws.s3 0.3.12

  • s3write_using() now attaches the correct file extension to the temporary file being written to (just as s3read_using() already did). (#226, h/t @jon-mago)

aws.s3 0.3.11

  • s3sync() gains a direction argument allowing for unidirectional (upload-only or download-only) synchronization. The default remains bi-directional.
  • New functions put_encryption(), get_encryption(), and delete_encryption() implement bucket-level encryption so that encryption does not need to be specified for each put_object() call. (#183, h/t Dan Tenenbaum)
  • Fixed typos in s3sync(). (#211, h/t Nirmal Patel)
  • put_bucket() only includes a LocationConstraint body when the region != "us-east-1". (#171, h/t David Griswold)

aws.s3 0.3.10

  • Fixed a typo in setup_s3_url(). (#223, h/t Peter Foley)
  • Signatures are now calculated correctly when a port is specified. (#221, h/t @rvolykh)

aws.s3 0.3.9

  • Fixed a bug in s3write_using(). (#205, h/t Patrick Miller)
  • Bumped aws.signature dependency to v0.3.7 to take advantage of automatic credential loading. (#184, h/t Dan Tenenbaum)
  • acl argument was ignored by put_bucket(). This is now fixed. (#172)
  • The base_url argument in s3HTTP() now defaults to an environment variable - AWS_S3_ENDPOINT - or the AWS S3 default in order to facilitate using the package with S3-compatible storage. (#189, #191, #194)

aws.s3 0.3.8

  • save_object() now uses httr::write_disk() to avoid having to load a file into memory. (#158, h/t Arturo Saco)

aws.s3 0.3.7

  • Remove usage of endsWith() in two places to reduce (implicit) base R dependency. (#147, h/t Huang Pan)

aws.s3 0.3.6

  • Bump aws.signature dependency to 0.3.4. (#142, #143, #144)

aws.s3 0.3.5

  • Attempt to fix bug introduced in 0.3.4. (#142)

aws.s3 0.3.4

  • Update code and documentation to use aws.signature (>=0.3.2) credentials handling.

aws.s3 0.3.3

  • put_object() and put_bucket() now expose explicitacl` arguments. (#137)
  • get_acl() and put_acl() are now exported. (#137)
  • Added a high-level put_folder() convenience function for creating an empty pseudo-folder.

aws.s3 0.3.2

  • put_bucket() now errors if the request is unsuccessful. (#132, h/t Sean Kross)
  • Fixed a bug in the internal function setup_s3_url() when region = "".

aws.s3 0.3.1

  • DESCRIPTION file fix for CRAN.

aws.s3 0.3.0

  • CRAN (beta) release. (#126)
  • bucketlist() gains both an alias, bucket_list_df(), and an argument add_region to add a region column to the output data frame.

aws.s3 0.2.8

  • Exported the s3sync() function. (#20)
  • save_object() now creates a local directory if needed before trying to save. This is useful for object keys contains /.

aws.s3 0.2.7

  • Some small bug fixes.
  • Updated examples and links to API documentation.

aws.s3 0.2.6

  • Tweak region checking in s3HTTP().

aws.s3 0.2.5

  • Fix reversed argument order in s3readRDS() and s3saveRDS().
  • Fixed the persistent bug related to s3readRDS(). (#59)
  • Updated some documentation.

aws.s3 0.2.4

  • Mocked up multipart upload functionality within put_object() (#80)
  • Use tempfile() instead of rawConnection() for high-level read/write functions. (#128)
  • Allow multiple CommonPrefix values in get_bucket(). (#88)
  • get_object() now returns a pure raw vector (without attributes). (#94)
  • s3sync() relies on get_bucket(max = Inf). (#20)
  • s3HTTP() gains a base_url argument to (potentially) support S3-compatible storage on non-AWS servers. (#109)
  • s3HTTP() gains a dualstack argument provide support for "dual stack" (IPv4 and IPv6) support. (#62)

aws.s3 0.2.3

  • Fixed a bug in get_bucket() when max = Inf. (#127, h/t Liz Macfie)

aws.s3 0.2.2

  • Two new functions - s3read_using() and s3write_using() provide a generic interface to reading and writing objects from S3 using a specified function. This provides a simple and extensible interface for the import and export of objects (such as data frames) in formats other than those provided by base R. (#125, #99)

aws.s3 0.2.1

  • s3HTTP() gains a url_style argument to control use of "path"-style (new default) versus "virtual"-style URL paths. (#23, #118)

aws.s3 0.2.0

  • All functions now produce errors when requests fail rather than returning an object of class "aws_error". (#86)

aws.s3 0.1.39

  • s3save() gains an envir argument. (#115)

aws.s3 0.1.38

  • get_bucket() now automatically handles pagination based upon the specified number of objects to return. (PR #104, h/t Thierry Onkelinx)
  • get_bucket_df() now uses an available (but unexported) as.data.frame.s3_bucket() method. The resulting data frame always returns character rather than factor columns.

aws.s3 0.1.37

  • Further changes to region vertification in s3HTTP(). (#46, #106 h/t John Ramey)

aws.s3 0.1.36

  • bucketlist() now returns (in addition to past behavior of printing) a data frame of buckets.
  • New function get_bucket_df() returns a data frame of bucket contents. get_bucket() continues to return a list. (#102, h/t Dean Attali)

aws.s3 0.1.35

  • s3HTTP() gains a check_region argument (default is TRUE). If TRUE, attempts are made to verify the bucket's region before performing the operation in order to avoid confusing out-of-region errors. (#46)
  • Object keys can now be expressed using "S3URI" syntax, e.g., object = "s3://bucket_name/object_key". In all cases, the bucketname and object key will be extracted from this string (meaning that a bucket does not need to be explicitly specified). (#100; h/t John Ramey)
  • Fixed several places where query arguments were incorrectly being passed to the API as object key names, producing errors.

aws.s3 0.1.34

  • Update and rename policy-related functions.

aws.s3 0.1.33

  • Exported the get_bucket() S3 generic and methods.

aws.s3 0.1.32

  • Fixed a bug related to the handling of object keys that contained spaces. (#84, #85; h/t Bao Nguyen)

aws.s3 0.1.29

  • Fixed a bug related to the handling of object keys that contained atypical characters (e.g., =). (#64)
  • Added a new function s3save_image() to save an entire workspace.
  • Added a temporary fix for GitHub installation using the DESCRIPTION Remotes field.

aws.s3 0.1.25

  • Added function s3source() as a convenience function to source an R script directly from S3. (#54)

aws.s3 0.1.23

  • Added support for S3 "Acceleration" endpoints, enabling faster cross-region file transfers. (#52)
  • s3save(), s3load(), s3saveRDS(), and s3readRDS() no longer write to disk, improving performance. (#51)

aws.s3 0.1.22

  • Added new functions s3saveRDS() and s3readRDS(). (h/t Steven Akins, #50)

aws.s3 0.1.21

  • Operations on non-default buckets (outside "us-east-1") now infer bucket region from bucket object. Some internals were simplified to better handle this. (h/t Tyler Hunt, #47)

aws.s3 0.1.18

  • All functions now use snake case (e.g., get_object()). Previously available functions that did not conform to this format have been deprecated. They continue to work, but issue a warning. (#28)
  • Separated authenticated and unauthenticated testthat tests, conditional on presence of AWS keys.
  • Numerous documentation fixes and consolidations.
  • Dropped XML dependency in favor of xml2. (#40)

aws.s3 0.1.17

  • The structure of an object of class "s3_bucket" has changed. It now is simply a list of objects of class "s3_object" and bucket attributes are stored as attributes to the list.
  • The order of bucket and object names was swapped in most object-related functions and the Bucket name has been added to the object lists returned by getbucket(). This means that bucket can be omitted when object is an object of class "s3_object".

aws.s3 0.1.1

  • Initial release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("aws.s3")

0.3.12 by Thomas J. Leeper, 10 months ago


https://github.com/cloudyr/aws.s3


Report a bug at https://github.com/cloudyr/aws.s3/issues


Browse source code at https://github.com/cran/aws.s3


Authors: Thomas J. Leeper [aut, cre] , Boettiger Carl [ctb] , Andrew Martin [ctb] , Mark Thompson [ctb] , Tyler Hunt [ctb] , Steven Akins [ctb] , Bao Nguyen [ctb] , Thierry Onkelinx [ctb]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


GPL (>= 2) license


Imports utils, tools, httr, xml2, base64enc, digest, aws.signature

Suggests testthat, datasets


Imported by analogsea, aws.cloudtrail, awspack, cloudSimplifieR, mlflow.

Suggested by DatabaseConnector, aws.lambda, aws.transcribe, memoise.


See at CRAN