SEER and Atomic Bomb Survivor Data Analysis Tools

Creates SEER (Surveillance, Epidemiology and End Results) and A-bomb data binaries from ASCII sources and provides tools for estimating SEER second cancer risks. Methods are described in .


CHANGES IN VERSION 2019.1 (First version since SEER release of April 2019)


o The SEER9 registries now all start in 1975. This caused problems since I had assumed that 
1973 would always be the starting year. If you want to use older SEER data that 
starts in 1973, please use a 2018 version of SEERaBomb. 

o The three new registries (Idaho, Massachusetts and New York) do not seem to 
be present in the custom ASCII SEER data release.

CHANGES IN VERSION 2018.3 (Third version since SEER release of April 2018)


o added mkMrtLocal() which makes mortality binaries from a local copy of the HMD data. This replaces mkMrt().
The R package demograhphy used by mkMrt() is thus no longer needed. Use this new function if mkMrt() gives you 
a password error, but you can get into HMD with your username and password (and thus know they work). Before 
running it download to your local drive (and unzip into your ~/data folder) at least one
of the two full HMD data file trees (organized by country at the top, the other by statistics at the top).  

CHANGES IN VERSION 2018.2 (Second version since SEER release of April 2018)


o simulated survival simSurv().
o make life tables mkLT().

CHANGES IN VERSION 2018.1 (First version as of SEER release of April 2018)


o csd() now gets second cancer factor level order from seerSet$sec and moves years in t by 0.1 for each sec level
to separate error bars in plots.

o mkAbomb now sorts cols into blocks: 1) grouping; 2) PY-weighted means; and 3) sums. DG ends the 1st
and py starts the last. Cols ending in G are SEERaBomb group defaults. New t is time since bombing (yrs)

o incidAbomb takes a grouped A-bomb tibble with DG and py, and forms means, sums, and incidences.

o incidSEER converts canc and popsae into a data frame with py and cases, filtering years on cancers
that had not yet started in SEER. 

o mkSEER now automatically maps tAML 9920's into tMDS 9987's post 2009. All 9987 cases after 2009
are  ones that were mapped over, so users can undo this if they wish, but I think its best to leave it. 
Values of 9999=missing in canc$surv are now set to missing in mkSEER().

o pickFields now sorts picks internally, so the user no longer needs to worry about pick ordering.

CHANGES IN VERSION 2017.3 (Third version working with SEER data of 2017)


o desc field of getFields() is now var_label() in mkSEER() output (depends on R package labelled)
o added sas names to output of getFields and pickField
o shifted dependence from XLConnect to openxlsx to get rid of rJava dependence  
o mapCODs  maps COD integer codes to strings in CODS     
o p2s generates times between primary/first and second cancers, at the individual level     


o Fixed riskVsAge() so it no longer depends on trt being "rad" and "noRad" 

CHANGES IN VERSION 2017.2 (Second version working with SEER data of 2017)


o mkDemographics is a new function that generates median age and OS tables in an Excel file     


o There is now an ASCII version of the SEER data with radiatn and chemo as rightmost columns, so please ignore
my Notes below. I  assume users of SEERaBomb will get this "CUSTOM"  dataset by filling out the  data use
agreement: radiatn and chemo are now default fields in pickFields.

CHANGES IN VERSION 2017.1 (First version working with SEER data of 2017)


o The field radiatn is no longer in the regular SEER release, so it has been removed from the pickFields default
and mkSEER no longer calls mapTrts unless the field is there (if using data released in 2016). 
While one can get the radiation data by signing an extra DUA, it is only available through 
SEER*stat (one would have to export it from there) and SEER*stat only runs on Windows (and I'm on a mac). 
If you need this data acquisition route and are willing to replace getFields, 
pickFields, and mkSEER by a new function mkSEERcrt (crt for chemo radiation therapy data out of SEER*Stat),
I'm happy to add you as an author to the SEERaBomb project on GitHub. Meanwhile, I will keep data released 
in 2016 in ~/data/SEER13 and will place the most recent data in ~/data/SEER. Thus, scripts 
that use radiatn now load data from ~/data/SEER13. 

CHANGES IN VERSION 2016.3 (Third version working with SEER data of 2016)


o riskVsAge is a new function that yields risks vs attained age after a first cancer     

o esd (event since diagnosis) is an msd wrapper. It uses SEER incidences instead of mortalities 
to yield RR time courses for user defined second cancer data.  

CHANGES IN VERSION 2016.2 (Second version working with SEER data of 2016)


o msd now takes a ybrks vector for showing calendar year trends     

Notes o mkExcel2 is now mkExcelCsd, and mkExcel is now mkExcelTsd, to emphasize that their inputs are the outputs of csd and tsd, respectively. mkExcelTsd and tsd are internal.

CHANGES IN VERSION 2016.1 (First version working with the new SEER data release of 2016)


o AML/MDS (leukemia 2016), CLL (leukemia research 2016), and CML (AAPSJ 2016) scripts were added to doc/papers      

Notes o Many fields were deleted or renamed in the SAS file of the 2016 SEER data release. Names used in my scipts that changed were reverted to originals in getFields() to avoid downstream problems. Numprims is no longer
there, so use seqnum instead. A few fields were also added to the new SEER data release.

CHANGES IN VERSION 2015.3 (Third version working with the SEER release of 2015)


o The function msd (mortality since diagnosis) computes mortality RR since diagnosis of cancer.  

o The function csd (cancer since diagnosis), now replaces tsd in computing 2nd cancer RR after a 1st cancer. 
  An added feature of csd is that it handles different intervals of years and ages at diagnosis. This changed
  the internal list of lists structure of the output, which getDF (instead of mkDF) converts to a data.frame (DF) 
  that ggplot2 can use, and which mkExcel2 (instead of mkExcel) uses to make an Excel file. 

Notes o tsd, mkExcel, and mkDF are now internal functions (in terms of help pages), i.e. they can still be used by old scripts o mkSEER no longer makes age19 in canc and it no longer makes popsga. If you still want 19-age groups use mkSEERold.

CHANGES IN VERSION 2015.2 (Second version working with the SEER release of 2015)


o The function tsd now takes a firstS argument, a verctor of first cancers of interest. This doesn't speed things 
  up much but it makes the output of tst easier to read and the output of mkExcel much more focused. 

o The latest National Vital Statistics Report data is now included as a data.frame.  

CHANGES IN VERSION 2015.1 (First version working with the SEER release of 2015)


o simSeerSet simulates a seerSet class object for two cancer types, A and B. 

o mkSEER (mkSEER2->mkSEER and mkSEER>-mkSEERold) uses stdUS 2000 to extend single age popsa to popsae. It also 
  makes a trt column for radiation therapy by default.

Notes o random crashes from the C++ code have been debugged. Variants of fillPYM in straight C and R used to debug it are now in the folder doc/extraFuncs



o added seetSet to make objects of class seerSet for the pipeline seerSet->mk2D->tsd.    

o added mk2D that makes two-dimensional spline fits of incidence versus age and calendar year and plot2D to look at them.    

o added tsd to compute RR over various time since diagnosis intervals
  and mkExcel to look at the results.

o added helper functions: getBinInfo, getPY, post1PYO, getE    

o mapCancs is now called within mkSEER to make a cancer field in those files as well.  

Notes o Month is no longer in the package version number and year now reflects the SEER data release year rather than the SEERaBomb release year.

o published paper scripts now use the cancer field instead of histo2 (which is no longer included in pickfields by default) and  plots now also use quartz() when on a mac. The old courses folder is now called examples; scripts therein were updated
  and some were removed. 



o added US 2000 populations as 19-, 86-, and 100-age group data.frames.    

o created mkSEER2 to create pooled db files directly into e.g. ~\data\SEER\mrgd  


o mkHema87 was removed. Its functionality is now included in mkAbomb


o fixed pickFields() to make country and state of birth strings rather than integers



o changed default seerHome directory to ~\data\SEER to no longer assume admin rights

o updated 2010 to 2011 in mkSEER and in several help pages to match April 15,2014 SEER release

o changed mkSEER SQL default to TRUE 

o added pops single ages (popsa) to binary files and all.db  



o added mkAbomb() to convert RERF file lsshempy.csv into a binary and SQLite database

o added mkHema87() to convert RERF file HEMA87.DAT into a binary with column names

o sq vs ad lung incidence and survival added to doc/courses

o JCO 2013 survival plots added to doc/papers/JCO2013

o activeMS is now doc/papers/radiatEnvironBiophys2013


o old demos are now examples in doc/courses


o vignettes are not possible since SEER and Abomb data must be signed off on. 



o Now handles SEER data release of April 24, 2013

o Package version number now matches latest SEER release date.


o No longer handles SEER data released in 2012.


o The SEER SAS file changed: fields are no longer a continuum.  
  The implication is that getFields is now field name centric, since 
  I can no longer count on all fields being in the SAS file.
o Survival is now already in months in SEER, so mapping it is no longer needed. 
  Indeed, the SAS file no longer includes a field for survival with years and 
  months as two character strings in a field of 4 chars, though this field does
  still remain in the SEER data files. 



o eod13 and eod2 are now set to strings in pickFields to avoid errors when picked 

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2019.2 by Tomas Radivoyevitch, a year ago

Browse source code at

Authors: Tomas Radivoyevitch [aut, cre] , R. Molenaar [ctb]

Documentation:   PDF Manual  

GPL (>= 2) license

Imports Rcpp, reshape2, mgcv, tibble, LaF, DBI, RSQLite, openxlsx, WriteXLS, labelled, scales, forcats, purrr, readr, tidyr, stringr, plyr, survival

Depends on dplyr, ggplot2, rgl, demography

Suggests bbmle

Linking to Rcpp

See at CRAN