Amend, Augment and Aid Analysis of John Snow's Cholera Map

Amends errors, augments data and aids analysis of John Snow's map of the 1854 London cholera outbreak.


package features

  • Fixes three apparent coding errors in Dodson and Tobler’s 1992 digitization of Snow’s map.
  • “Unstacks” the data in two ways to make analysis and visualization easier and more meaningful.
  • Computes and visualizes “pump neighborhoods” based on Voronoi tessellation, Euclidean distance, and walking distance.
  • Ability to overlay graphical elements and features like kernel density, Voronoi diagrams, Snow’s Broad Street neighborhood, and notable landmarks (John Snow’s residence, the Lion Brewery, etc.) via add*() functions.
  • Includes a variety of functions to find and highlight specific cases, roads, pumps and paths.
  • Appends street names to the roads data set.
  • Includes the revised pump data used in the second version of Snow’s map from the Vestry report, which also includes the “correct” location of the Broad Street pump.
  • Adds two different aggregate time series fatalities data sets, taken from the Vestry report.

background

John Snow’s map of the 1854 cholera outbreak in London is one of the best known examples of data visualization and information design.

By plotting the number and location of fatalities on a map, Snow was able to do something that is easily taken for granted today: the ability to create and disseminate a visualization of a spatial distribution. To our modern eye, the pattern is unmistakable. It seems self-evident that the map elegantly supports Snow’s claims that cholera is a waterborne disease and that the pump on Broad Street is the source of the outbreak. And yet, despite its virtues, the map failed to convince both the authorities and Snow’s colleagues in the medical and scientific communities.

Beyond considerations of time and place, there are “scientific” reasons for this failure. The map shows a concentration of cases around the Broad Street pump, but that alone should not convince us that Snow is right. The map doesn’t refute the primary rival explanation, miasma theory: the pattern we see is not unlike what airborne transmission might look like. And while the presence of a pump near or at the epicenter of the distribution of fatalities is strong circumstantial evidence, it is still circumstantial. There are a host of rival explanations that the map doesn’t consider and cannot rule out: location of sewer grates, elevation, weather patterns, etc..

Arguably, this may be one reason why Snow added a graphical annotation to the second, lesser-known version of the map that was published in the official report on the outbreak (Report On The Cholera Outbreak In The Parish Of St. James, Westminster, During The Autumn Of 1854):

pump neighborhoods

The annotation outlines what we might call the Broad Street pump neighborhood: the set of addresses that are, according to Snow, within “close” walking distance to the pump. The notion of a pump neighborhood is important because it provides a prediction about where we should and should not expect to find cases. If water is cholera’s mode of transmission and if water pumps are the primary source of drinking water, then most, if not all, fatalities should be found within the pump neighborhood. The disease should stop at the neighborhood’s borders.

Creating this annotation is not a trivial matter. To identify the neighborhood of the Broad Street pump, you need to identify the neighborhoods of surrounding pumps. Snow writes: “The inner dotted line on the map shews [sic] the various points which have been found by careful measurement to be at an equal distance by the nearest road from the pump in Broad Street and the surrounding pumps …” (Ibid., p. 109.).

I build on Snow’s efforts by writing functions that allow you to compute three flavors of pump neighborhoods. The first is based on Voronoi tessellation. It works by using the Euclidean distances between pumps. It’s easy to compute and has been a popular choice for analysts of Snow’s map. However, it has two drawbacks: 1) roads and buildings play no role in determining neighborhoods (it assumes that people walk directly, “as the crow flies”, to their preferred pump); and 2) it’s not what Snow has in mind. For that, you’ll need to consider the second type of neighborhood.

plot(neighborhoodVoronoi())

The second flavor is based on the Euclidean distances between cases and pumps. This serves as a check on the Voronoi-based method, as a more granular estimate of Euclidean distance based neighborhoods, and provides a more flexible way to visualize Euclidean distance based neighborhoods.

plot(neighborhoodEuclidean())

The third and final flavor is based on the walking distance along the roads on the map. While more accurate, it’s computationally more demanding. To compute these distances, I transform the roads on the map into a network graph and turn the computation of walking distance into a graph theory problem. For each case (observed or simulated), I compute the shortest path, weighted by the length of roads (edges), to the nearest pump. “Rinse and repeat” and the different pump neighborhoods emerge:

plot(neighborhoodWalking())

To explore the data, you can consider a variety of scenarios by computing different sets of neighborhoods. Here’s the result excluding the Broad Street pump.

plot(neighborhoodWalking(-7))

“expected” pump neighborhoods

You can also explore “expected” neighborhoods. Currently, you can do so in three ways. The first colors roads.

plot(neighborhoodWalking(case.set = "expected"))

The second colors the expected area of neighborhoods using points().

plot(neighborhoodWalking(case.set = "expected"), type = "area.points")

The third colors the expected area of neighborhoods using polygon().

plot(neighborhoodWalking(case.set = "expected"), type = "area.polygons")

For exploration, the first two options are faster.

getting started

To install ‘cholera’ from CRAN:

install.packages("cholera")

To install the development version of ‘cholera’ from GitHub:

# Note that you may need to install the 'devtools' package:
# install.packages("devtools")
 
# For 'devtools' (< 2.0.0)
devtools::install_github("lindbrook/cholera", build_vignettes = TRUE)
 
# For 'devtools' (>= 2.0.0)
devtools::install_github("lindbrook/cholera", build_opts = c("--no-resave-data", "--no-manual"))

vignettes

The vignettes, which are available in the package as well as online at the links below, go into detail on a variety of topics.

Duplicate and Missing Cases describes the two coding errors and three misplaced cases I argue are present in Dodson and Tobler’s (1992) digitization of Snow’s map. Documentation and details about the fix are found online in “Note on Duplicate and Missing Cases”.

“Unstacking” Bars discusses the inferential and visual importance of “unstacking” the bars in Snow’s map and the two “unstacked” data sets, which use “fatalities” and “addresses” as the units of observation.

Pump Neighborhoods expands on the notion of a pump neighborhood and describes the two flavors of neighborhoods: two based on Euclidean (i.e., neighborhoodEuclidean() and neighborhoodVoronoi) and one based on walking distance (i.e., neighborhoodWalking()).

Roads covers issues related to roads. This includes discussion of how and why I move pump #5 from Queen Street (I) to Marlborough Mews, the overall structure of the roads data set, “valid” road names, and my back of the envelope translation from the map’s nominal scale to meters (and yards).

deldirVertices(): Tiles, Triangles and Polygons focuses on the deldirVertices(), which extracts the vertices of triangles (Delauny triangulation) and tiles (Dirichelet or Voronoi tessellation) from deldir::deldir() for use with polygon() and related functions.

Kernel Density Plot discusses the the syntax of addKernelDensity(), which allows you to define “populations” and subsets of pumps. This syntax is used in many of the functions in ‘cholera’.

Time Series discusses functions and data related to fatalities time series data and the question of the effect of the removal of the handle from the Broad Street pump.

lab notes

The lab notes, which are only available online, go into greater detail about some of the issues and topics discussed in the vignettes:

note on duplicate and missing cases documents the specifics of how I “fixed” two apparent coding errors and three misplaced case in Dodson and Tobler’s data.

computing street addresses discusses how I use orthogonal projection and hierarchical cluster analysis to “unstack” bars and compute a stack’s “address”.

Euclidean v. Voronoi neighborhoods discusses why there are separate functions for neighborhoodEuclidean() and neighborhoodVoronoi().

points v. polygons discusses the tradeoff between using points() and polygon() to plot “expected” area neighborhood plots and the computation of polygon vertices.

references is an informal list of articles and books about cholera, John Snow and the 1854 outbreak.

note on neighborhoodWalking()

neighborhoodWalking() is computationally intensive. Using R version 3.5.2 on a single core of a 2.3 GHz Intel i7, plotting observed paths to PDF takes about 5 seconds; doing the same for expected paths takes about 30 seconds. Using the function’s parallel implementation on 4 physical (8 logical) cores, the times fall to about 4 and 13 seconds.

Note that parallelization is currently only available on Linux and Mac.

Also, note that although some precautions are taken in R.app on macOS, the developers of the ‘parallel’ package, which neighborhoodWalking() uses, strongly discourage against using parallelization within a GUI or embedded environment. See vignette("parallel") for details.

contributing

Contributions to the ‘cholera’ package are welcome. If interested, please see the suggested guidelines.

News

cholera 0.6.0

Fixes

  • fix title in euclideanPath(type = "case-pump").
  • fix destination label for walkingPath(destination = NULL).

Data Changes

  • add Earl of Aberdeen residence (Argyll House).
  • nominal and orthogonal coordinates for landmarks.

Function Changes

  • addNeighborhood() -> addNeighborhoodWalking()

Function Changes - new arguments

  • addSnow(type = "perimeter", line.width = 2)
  • neighborhoodData(embed = TRUE, embed.landmarks = TRUE)
  • neighborhoodEuclidean(case.set = "expected")
  • plot.voronoi(voronoi.cells = TRUE, delauny.triangles = FALSE)
  • snowMap(...)
  • streetNameLocator(add.subtitle = TRUE, token = id)
  • streetNumberLocator(add.subtitle = TRUE, token = id)

Function Changes - polygon.method argument

  • addNeighborhoodEuclidean(polygon.method = "traveling.salesman")

  • plot.euclidean(polygon.method = "traveling.salesman")

  • addNeighborhoodWalking(polygon.method = "pearl.string")

  • plot.walking(polygon.method = "pearl.string")

Function Change - landmarks as origin and/or destination (treated as cases)

  • euclideanPath()
  • walkingPath()
  • find nearest case or landmark, given pump (i.e., reverse lookup)

Function Changes - case.location argument: "address" or "nominal"

  • addVoronoi(case.location = "nominal")
  • euclideanPath(case.location = "nominal")
  • neighborhoodEuclidean(case.location = "nominal")
  • addNeighborhoodEuclidean(case.location = "nominal")

New Functions

  • addCase()
  • addDelauny()
  • addNeighborhoodCases()
  • deldirVertices()
  • orthogonalProjection()
  • profile2D()
  • profile3D()
  • streetHighlight()

New Exported Functions

  • fixFatalities()
  • landmarkData()

New S3 Function

  • pearsonResiduals()
  • plot.neighborhood_data()

New Vignette

  • "deldirVertices(): Tiles, Triangles and Polygons"

Deprecated Functions

  • euclideanDistance()
  • walkingDistance()

cholera 0.5.1

Fixes

  • backward compatibility (R 3.4.4) related to base::isFALSE() & bug fix.
  • fix for multiple results in walkingDistance() and walkingPath().

Function Changes

  • enable ellipses (...) in plot.time_series() (#1).
  • enable ellipses and negative selection in addPump().
  • consolidate addEuclideanPath(), euclideanDistance(), euclideanPath(), walkingDistance() and walkingPath()

New Functions

  • addBorder()
  • addRoads()
  • mapRange()

cholera 0.5.0

Data Changes

  • regular.cases and sim.ortho.proj: increase number of observations from 5K to 20K.

Function Changes

  • "alpha.level" argument to control path transparency addEuclideanPath() and addWalkingPath()

  • distance and time based "mileposts" addEuclideanPath() and addWalkingPath(). plot.euclidean_path() and plot.walking_path(). addMilePosts().

  • "pump.subset" and "pump.select" arguments addCase(), addKernelDensity(), addMilePosts(), addNeighborhood(), neighborhoodEuclidean(), neighborhoodWalking()

  • "walking.speed" argument added to: addMilePosts(), nearestPump(), addEuclideanPath(), euclideanDistance(), euclideanPath(), addWalkingPath(), walkingDistance(), walkingPath()

  • euclideanDistance() no longer S3. generic S3 functionality moved to euclideanPath().

  • multiCore() moved to multiCore.R.

  • neighborhoodVoronoi() plot.voronoi() adds "euclidean.paths" argument for star graph.

  • neighborhoodWalking() "area.polygons" related functions for plot_walking() moved to pearlString.R.

  • simulateFatalities(): default is now 20K observations. use proximate in addition to orthogonal distances to find "addresses".

  • snowMap() new arguments: "add.cases", "add.pumps", "add.roads".

  • unitMeter() default unit of measurement is now "meter".

  • walkingAuxillaryFunctions.R: location of walking related helper functions.

  • walkingDistance() no longer S3. generic S3 functionality moved to walkingPath().

New Functions

  • addCase()
  • addEuclideanPath()
  • addMilePosts()
  • addNeighborhood()
  • addWalkingPath()()
  • distanceTime()

New S3 Functions

  • euclideanPath()
  • walkingPath()
  • neighborhoodEuclidean()

Vignette Changes

  • Lab Notes available online and on GitHub: "duplicate.missing.cases.notes" "pump.neighborhoods.notes" "unstacking.bars.notes"

cholera 0.4.0

Data Changes

  • ortho.proj.pump and ortho.proj.pump.vestry now include node ID.

  • roads and road.segments amend street names: "Unknown-B" to "Tent Court" (Edmund Cooper's map). "Unknown-D" to "St James's Market" (https://maps.nls.uk). "Unknown-E" to "Market Street (II)" (https://maps.nls.uk).

Function Changes

  • addKernelDensity() uses "pump.subset" and "pump.select" arguments.

  • addLandmarks() add landmarks from Edmund Cooper's map.

  • classifierAudit() can return coordinates of address.

  • nearestPump() now incorporates nearestPath().

  • neighborhoodWalking() segment and sub-segment implementation.

  • pumpData() returns node ID.

  • timeSeries() includes day of the week.

  • walkingDistance() add "simulated" expected cases.

New Functions

  • addNeighborhood()

New S3 Implementations

  • plot.walking type = "area.points" and type = "area-polygons". type = "area-polygons" via pearlString() replaces alphahull::ashape().

  • print.walking() uses expectedCount().

Vignette Changes

  • add "Kernel Density Plot".
  • update "Pump Neighborhoods" with discussion of area plots.

cholera 0.3.0

Data Changes

  • ortho.proj: reclassify case 483: Pulteney Court (I) ("242-1") -> Little Windmill Street ("326-2"). reclassify cases 369, 434, 11, 53, 193: Poland Street ("194-1") -> St James Workhouse ("148-1").

Function Changes

  • addSnow() "area", "street" and "boundary" graphical annotation.

  • caseLocator() highlight home road segment.

  • neighborhoodWalking() "case-set" argument: "observed", "expected" and "snow". updated implementation and improved performance. pre-computed configurations from version 0.2.1 removed.

  • segmentLocator(), streetNameLocator() and streetNumberLocator() highlight segment or street cases. option to plot all cases, anchor cases or no cases.

New S3 Implementations

  • timeSeries()
  • walkingDistance() incorporates and deprecates walkingPath().

New Functions

  • addIndexCase()
  • nearestPath()
  • nearestPump()
  • nodeData()
  • segmentLength()
  • snowNeighborhood()
  • streetLength()
  • unitMeter()

New S3 Functions

  • classifierAudit()
  • euclideanDistance()

cholera 0.2.1

  • Initial CRAN release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("cholera")

0.6.0 by Peter Li, 2 months ago


https://github.com/lindbrook/cholera


Report a bug at https://github.com/lindbrook/cholera/issues


Browse source code at https://github.com/cran/cholera


Authors: Peter Li [aut, cre]


Documentation:   PDF Manual  


GPL (>= 2) license


Imports deldir, HistData, ggplot2, igraph, KernSmooth, pracma, RColorBrewer, scales, sp, threejs, TSP

Suggests knitr, rmarkdown


See at CRAN