Parse dates automatically, without the need of specifying a format. Currently it includes the git date parser. It can also recognize and parse all ISO 8601 formats.
This R package has three functions for dealing with dates.
parse_iso_8601
recognizes and parses all valid ISO
8601 date and time formats. It can also be used as an ISO 8601
validator.parse_date
can parse a date when you don't know
which format it is in. First it tries all ISO 8601 formats.
Then it tries git's versatile date parser. Lastly, it tries
as.POSIXct
.format_iso_8601
formats a date (and time) in
specific ISO 8601 format.The git parser does not work for dates before 1970 and after 2100. For these dates the current year is used instead:
parse_date("april 15 1971")
## Error in parse_date("april 15 1971"): could not find function "parse_date"
parse_date("april 15 1969")
## Error in parse_date("april 15 1969"): could not find function "parse_date"
parse_date("april 15 2110")
## Error in parse_date("april 15 2110"): could not find function "parse_date"
parse_iso_8601
recognizes all valid ISO 8601 formats, and
gives an NA
for invalid dates. Here are some examples
library(parsedate)parse_iso_8601("2013-02-08 09")
## [1] "2013-02-08 09:00:00 UTC"
parse_iso_8601("2013-02-08 09:30")
## [1] "2013-02-08 09:30:00 UTC"
parse_iso_8601("2013-02-08T09")
## [1] "2013-02-08 09:00:00 UTC"
parse_iso_8601("2013-02-08T09:30")
## [1] "2013-02-08 09:30:00 UTC"
parse_iso_8601("2013-02-08T09:30:26")
## [1] "2013-02-08 09:30:26 UTC"
parse_iso_8601("2013-02-08T09:30:26.123")
## [1] "2013-02-08 09:30:26 UTC"
parse_iso_8601("2013-02-08T09:30.5")
## [1] "2013-02-08 09:30:30 UTC"
parse_iso_8601("2013-02-08T09,25")
## [1] "2013-02-08 09:15:00 UTC"
parse_iso_8601("2013-02-08T09:30:26Z")
## [1] "2013-02-08 09:30:26 UTC"
parse_iso_8601("2013-W06-5")
## [1] "2013-02-08 UTC"
parse_iso_8601("2013-W01-1")
## [1] "2012-12-31 UTC"
parse_iso_8601("2009-W01-1")
## [1] "2008-12-29 UTC"
parse_iso_8601("2009-W53-7")
## [1] "2010-01-03 UTC"
parse_iso_8601("2013-039")
## [1] "2013-02-08 UTC"
parse_iso_8601("2013-039 09:30:26Z")
## [1] "2013-02-08 09:30:26 UTC"
Sometimes one has to work with a large number of dates, in arbitrary
formats. It is of impossible to reliably guess the format of some
dates, because of ambiguity. But it is often not critical to get the
date exactly right in the ambiguous cases, and this is when the
parse_date
function is useful. It tries a large number of formats,
here is the algorithm is uses:
parse_iso_8601
.as.POSIXct
.
(It is unlikely that this step will parse any dates that the
first two steps couldn't, but it is still a logical fallback,
to make sure that we can parse at least as many dates as
as.POSIXct
.Here are some examples. The first ones are easy.
parse_date("2014-12-12")
## [1] "2014-12-12 UTC"
parse_date("04/15/99")
## [1] "1999-04-15 UTC"
parse_date("15/04/99")
## [1] "1999-04-15 UTC"
The following formats are ambiguous and are parsed as month/day/year.
parse_date("12/11/99")
## [1] "1999-12-11 UTC"
parse_date("11/12/99")
## [1] "1999-11-12 UTC"
parse_date("03/20")
## [1] "2019-03-20 UTC"
parse_date("12")
## [1] "2019-05-12 UTC"
But not for this, because this is ISO 8601.
parse_date("2014")
## [1] "2014-01-01 UTC"
The format_iso_8601
function formats a date (and time) in a fixed format
that is ISO 8601 valid, and can be used to compare dates as character
strings. It converts the date(s) to UTC.
format_iso_8601(parse_iso_8601("2013-02-08"))
## [1] "2013-02-08T00:00:00+00:00"
format_iso_8601(parse_iso_8601("2013-02-08 09:34:00"))
## [1] "2013-02-08T09:34:00+00:00"
format_iso_8601(parse_iso_8601("2013-02-08 09:34:00+01:00"))
## [1] "2013-02-08T08:34:00+00:00"
format_iso_8601(parse_iso_8601("2013-W06-5"))
## [1] "2013-02-08T00:00:00+00:00"
format_iso_8601(parse_iso_8601("2013-039"))
## [1] "2013-02-08T00:00:00+00:00"
parse_date()
and parse_iso_8601()
now dupport a default time zone,
that will be used for dates that do not explicitly specify one.
Reimplement parse_iso_8601()
with vectorized code, for speed (#9).
Fix parse_date()
and parse_iso_8601()
for zero-length input (#20).
parse_date()
parses strings with +
characters correctly now (#23).
Drop lubridate
package dependency
Fix parsing dates consisting of six or eight digits, e.g. 20140922
and 092214
NA
is returned by parse_date
for non-sensical numerical dates, e.g. 000202
Fix parse_date
time zone that was wrong for some dates
Fix parse_date
for dates are not in DST
format_iso_8601
, on platforms that have a buggy %z