A Collection of Small Text Corpora of Interesting Data

A collection of small text corpora of interesting data. It contains all data sets from 'dariusk/corpora'. Some examples: names of animals: birds, dinosaurs, dogs; foods: beer categories, pizza toppings; geography: English towns, rivers, oceans; humans: authors, US presidents, occupations; science: elements, planets; words: adjectives, verbs, proverbs, US president quotes.


Linux Build Status Windows Build status CRAN version CRAN RStudio mirror downloads

R package that contains all data sets from https://github.com/dariusk/corpora

Installation

devtools::install_github("gaborcsardi/rcorpora")

Usage

Calling the corpora() function without arguments lists all data sets in the package, calling it with the name of a data set, returns the data set itself. For example

library(rcorpora)
corpora()
#>   [1] "animals/birds_antarctica"                                       
#>   [2] "animals/birds_north_america"                                    
#>   [3] "animals/cats"                                                   
#>   [4] "animals/collateral_adjectives"                                  
#>   [5] "animals/common"                                                 
#>   [6] "animals/dinosaurs"                                              
#>   [7] "animals/dog_names"                                              
#>   [8] "animals/dogs"                                                   
#>   [9] "animals/donkeys"                                                
#>  [10] "animals/horses"                                                 
#>  [11] "animals/ponies"                                                 
#>  [12] "archetypes/artifact"                                            
#>  [13] "archetypes/character"                                           
#>  [14] "archetypes/event"                                               
#>  [15] "archetypes/setting"                                             
#>  [16] "architecture/passages"                                          
#>  [17] "architecture/rooms"                                             
#>  [18] "art/isms"                                                       
#>  [19] "colors/crayola"                                                 
#>  [20] "colors/dulux"                                                   
#>  [21] "colors/google_material_colors"                                  
#>  [22] "colors/paints"                                                  
#>  [23] "colors/palettes"                                                
#>  [24] "colors/web_colors"                                              
#>  [25] "colors/xkcd"                                                    
#>  [26] "corporations/cars"                                              
#>  [27] "corporations/djia"                                              
#>  [28] "corporations/fortune500"                                        
#>  [29] "corporations/industries"                                        
#>  [30] "corporations/nasdaq"                                            
#>  [31] "corporations/newspapers"                                        
#>  [32] "divination/tarot_interpretations"                               
#>  [33] "divination/zodiac"                                              
#>  [34] "film-tv/game-of-thrones-houses"                                 
#>  [35] "film-tv/iab_categories"                                         
#>  [36] "film-tv/netflix-categories"                                     
#>  [37] "film-tv/popular-movies"                                         
#>  [38] "film-tv/tv_shows"                                               
#>  [39] "foods/apple_cultivars"                                          
#>  [40] "foods/bad_beers"                                                
#>  [41] "foods/beer_categories"                                          
#>  [42] "foods/beer_styles"                                              
#>  [43] "foods/breads_and_pastries"                                      
#>  [44] "foods/combine"                                                  
#>  [45] "foods/condiments"                                               
#>  [46] "foods/curds"                                                    
#>  [47] "foods/fruits"                                                   
#>  [48] "foods/herbs_n_spices"                                           
#>  [49] "foods/hot_peppers"                                              
#>  [50] "foods/iba_cocktails"                                            
#>  [51] "foods/menuItems"                                                
#>  [52] "foods/pizzaToppings"                                            
#>  [53] "foods/sandwiches"                                               
#>  [54] "foods/sausages"                                                 
#>  [55] "foods/scotch_whiskey"                                           
#>  [56] "foods/tea"                                                      
#>  [57] "foods/vegetable_cooking_times"                                  
#>  [58] "foods/vegetables"                                               
#>  [59] "foods/wine_descriptions"                                        
#>  [60] "games/bannedGames/argentina/bannedList"                         
#>  [61] "games/bannedGames/brazil/bannedList"                            
#>  [62] "games/bannedGames/china/bannedList"                             
#>  [63] "games/bannedGames/denmark/bannedList"                           
#>  [64] "games/cluedo"                                                   
#>  [65] "games/dark_souls_iii_messages"                                  
#>  [66] "games/jeopardy_questions"                                       
#>  [67] "games/pokemon"                                                  
#>  [68] "games/scrabble"                                                 
#>  [69] "games/street_fighter_ii"                                        
#>  [70] "games/trivial_pursuit"                                          
#>  [71] "games/wrestling_moves"                                          
#>  [72] "games/zelda"                                                    
#>  [73] "geography/canada_provinces_and_territories"                     
#>  [74] "geography/canadian_municipalities"                              
#>  [75] "geography/countries_with_capitals"                              
#>  [76] "geography/countries"                                            
#>  [77] "geography/english_towns_cities"                                 
#>  [78] "geography/japanese_prefectures"                                 
#>  [79] "geography/london_underground_stations"                          
#>  [80] "geography/nationalities"                                        
#>  [81] "geography/norwegian_cities"                                     
#>  [82] "geography/nyc_neighborhood_zips"                                
#>  [83] "geography/oceans"                                               
#>  [84] "geography/rivers"                                               
#>  [85] "geography/sf_neighborhoods"                                     
#>  [86] "geography/us_airport_codes"                                     
#>  [87] "geography/us_cities"                                            
#>  [88] "geography/us_counties"                                          
#>  [89] "geography/us_metropolitan_areas"                                
#>  [90] "geography/us_state_capitals"                                    
#>  [91] "geography/venues"                                               
#>  [92] "geography/winds"                                                
#>  [93] "governments/mass-surveillance-project-names"                    
#>  [94] "governments/nsa_projects"                                       
#>  [95] "governments/uk_political_parties"                               
#>  [96] "governments/us_federal_agencies"                                
#>  [97] "governments/us_mil_operations"                                  
#>  [98] "humans/2016_us_presidential_candidates"                         
#>  [99] "humans/atus_activities"                                         
#> [100] "humans/authors"                                                 
#> [101] "humans/bodyParts"                                               
#> [102] "humans/britishActors"                                           
#> [103] "humans/celebrities"                                             
#> [104] "humans/descriptions"                                            
#> [105] "humans/englishHonorifics"                                       
#> [106] "humans/famousDuos"                                              
#> [107] "humans/firstNames"                                              
#> [108] "humans/lastNames"                                               
#> [109] "humans/moods"                                                   
#> [110] "humans/norwayFirstNamesBoys"                                    
#> [111] "humans/norwayFirstNamesGirls"                                   
#> [112] "humans/norwayLastNames"                                         
#> [113] "humans/occupations"                                             
#> [114] "humans/prefixes"                                                
#> [115] "humans/richpeople"                                              
#> [116] "humans/scientists"                                              
#> [117] "humans/spanishFirstNames"                                       
#> [118] "humans/spanishLastNames"                                        
#> [119] "humans/spinalTapDrummers"                                       
#> [120] "humans/suffixes"                                                
#> [121] "humans/thirdPersonPronouns"                                     
#> [122] "humans/tolkienCharacterNames"                                   
#> [123] "humans/us_presidents"                                           
#> [124] "humans/wrestlers"                                               
#> [125] "instructions/laundry_care"                                      
#> [126] "materials/abridged-body-fluids"                                 
#> [127] "materials/building-materials"                                   
#> [128] "materials/carbon-allotropes"                                    
#> [129] "materials/decorative-stones"                                    
#> [130] "materials/fabrics"                                              
#> [131] "materials/fibers"                                               
#> [132] "materials/gemstones"                                            
#> [133] "materials/layperson-metals"                                     
#> [134] "materials/metals"                                               
#> [135] "materials/natural-materials"                                    
#> [136] "materials/packaging"                                            
#> [137] "materials/plastic-brands"                                       
#> [138] "materials/sculpture-materials"                                  
#> [139] "materials/technical-fabrics"                                    
#> [140] "mathematics/fibonnaciSequence"                                  
#> [141] "mathematics/primes_binary"                                      
#> [142] "mathematics/primes"                                             
#> [143] "mathematics/trigonometry"                                       
#> [144] "medicine/diagnoses"                                             
#> [145] "medicine/drugNameStems"                                         
#> [146] "medicine/drugs"                                                 
#> [147] "medicine/hospitals"                                             
#> [148] "music/a_list_of_guitar_manufacturers"                           
#> [149] "music/bands_that_have_opened_for_tool"                          
#> [150] "music/female_classical_guitarists"                              
#> [151] "music/genres"                                                   
#> [152] "music/hamilton_musical_obcrecording_actors_characters"          
#> [153] "music/instruments"                                              
#> [154] "music/mtv_day_one"                                              
#> [155] "music/rock_hall_of_fame"                                        
#> [156] "music/xxl_freshman"                                             
#> [157] "mythology/greek_gods"                                           
#> [158] "mythology/greek_monsters"                                       
#> [159] "mythology/greek_myths_master"                                   
#> [160] "mythology/greek_titans"                                         
#> [161] "mythology/hebrew_god"                                           
#> [162] "mythology/lovecraft"                                            
#> [163] "mythology/monsters"                                             
#> [164] "mythology/norse_gods"                                           
#> [165] "objects/clothing"                                               
#> [166] "objects/corpora_winners"                                        
#> [167] "objects/objects"                                                
#> [168] "plants/cannabis"                                                
#> [169] "plants/flowers"                                                 
#> [170] "plants/plants"                                                  
#> [171] "religion/christian_saints"                                      
#> [172] "religion/fictional_religions"                                   
#> [173] "religion/parody_religions"                                      
#> [174] "religion/religions"                                             
#> [175] "science/elements"                                               
#> [176] "science/hail_size"                                              
#> [177] "science/minor_planets"                                          
#> [178] "science/planets"                                                
#> [179] "science/pregnancy"                                              
#> [180] "science/toxic_chemicals"                                        
#> [181] "science/weather_conditions"                                     
#> [182] "societies_and_groups/animal_welfare"                            
#> [183] "societies_and_groups/designated_terrorist_groups/australia"     
#> [184] "societies_and_groups/designated_terrorist_groups/canada"        
#> [185] "societies_and_groups/designated_terrorist_groups/china"         
#> [186] "societies_and_groups/designated_terrorist_groups/egypt"         
#> [187] "societies_and_groups/designated_terrorist_groups/european_union"
#> [188] "societies_and_groups/designated_terrorist_groups/india"         
#> [189] "societies_and_groups/designated_terrorist_groups/iran"          
#> [190] "societies_and_groups/designated_terrorist_groups/israel"        
#> [191] "societies_and_groups/designated_terrorist_groups/kazakhstan"    
#> [192] "societies_and_groups/designated_terrorist_groups/russia"        
#> [193] "societies_and_groups/designated_terrorist_groups/saudi_arabia"  
#> [194] "societies_and_groups/designated_terrorist_groups/tunisia"       
#> [195] "societies_and_groups/designated_terrorist_groups/turkey"        
#> [196] "societies_and_groups/designated_terrorist_groups/uae"           
#> [197] "societies_and_groups/designated_terrorist_groups/ukraine"       
#> [198] "societies_and_groups/designated_terrorist_groups/united_kingdom"
#> [199] "societies_and_groups/designated_terrorist_groups/united_nations"
#> [200] "societies_and_groups/designated_terrorist_groups/united_states" 
#> [201] "societies_and_groups/fraternities/coeducational_fraternities"   
#> [202] "societies_and_groups/fraternities/defunct"                      
#> [203] "societies_and_groups/fraternities/fraternities"                 
#> [204] "societies_and_groups/fraternities/professional"                 
#> [205] "societies_and_groups/fraternities/service"                      
#> [206] "societies_and_groups/fraternities/sororities"                   
#> [207] "societies_and_groups/semi_secret"                               
#> [208] "sports/football/epl_teams"                                      
#> [209] "sports/football/laliga_teams"                                   
#> [210] "sports/football/serieA"                                         
#> [211] "sports/mlb_teams"                                               
#> [212] "sports/nba_mvps"                                                
#> [213] "sports/nba_teams"                                               
#> [214] "sports/nfl_teams"                                               
#> [215] "sports/nhl_teams"                                               
#> [216] "sports/olympics"                                                
#> [217] "technology/appliances"                                          
#> [218] "technology/computer_sciences"                                   
#> [219] "technology/fireworks"                                           
#> [220] "technology/guns_n_rifles"                                       
#> [221] "technology/knots"                                               
#> [222] "technology/lisp"                                                
#> [223] "technology/new_technologies"                                    
#> [224] "technology/photo_sharing_websites"                              
#> [225] "technology/programming_languages"                               
#> [226] "technology/social_networking_websites"                          
#> [227] "technology/video_hosting_websites"                              
#> [228] "transportation/commercial-aircraft"                             
#> [229] "travel/lcc"                                                     
#> [230] "words/adjs"                                                     
#> [231] "words/adverbs"                                                  
#> [232] "words/closed_pairs"                                             
#> [233] "words/common"                                                   
#> [234] "words/compounds"                                                
#> [235] "words/crash_blossoms"                                           
#> [236] "words/eggcorns"                                                 
#> [237] "words/emoji/cute_kaomoji"                                       
#> [238] "words/emoji/emoji"                                              
#> [239] "words/encouraging_words"                                        
#> [240] "words/ergative_verbs"                                           
#> [241] "words/expletives"                                               
#> [242] "words/harvard_sentences"                                        
#> [243] "words/infinitive_verbs"                                         
#> [244] "words/interjections"                                            
#> [245] "words/literature/infinitejest"                                  
#> [246] "words/literature/lovecraft_words"                               
#> [247] "words/literature/mr_men_little_miss"                            
#> [248] "words/literature/shakespeare_phrases"                           
#> [249] "words/literature/shakespeare_sonnets"                           
#> [250] "words/literature/shakespeare_words"                             
#> [251] "words/literature/technology_quotes"                             
#> [252] "words/nouns"                                                    
#> [253] "words/oprah_quotes"                                             
#> [254] "words/personal_nouns"                                           
#> [255] "words/personal_pronouns"                                        
#> [256] "words/possessive_pronouns"                                      
#> [257] "words/prefix_root_suffix"                                       
#> [258] "words/prepositions"                                             
#> [259] "words/proverbs"                                                 
#> [260] "words/resume_action_words"                                      
#> [261] "words/rhymeless_words"                                          
#> [262] "words/spells"                                                   
#> [263] "words/state_verbs"                                              
#> [264] "words/states_of_drunkenness"                                    
#> [265] "words/stopwords/ar"                                             
#> [266] "words/stopwords/bg"                                             
#> [267] "words/stopwords/cs"                                             
#> [268] "words/stopwords/da"                                             
#> [269] "words/stopwords/de"                                             
#> [270] "words/stopwords/en"                                             
#> [271] "words/stopwords/es"                                             
#> [272] "words/stopwords/fi"                                             
#> [273] "words/stopwords/fr"                                             
#> [274] "words/stopwords/gr"                                             
#> [275] "words/stopwords/it"                                             
#> [276] "words/stopwords/jp"                                             
#> [277] "words/stopwords/lv"                                             
#> [278] "words/stopwords/nl"                                             
#> [279] "words/stopwords/no"                                             
#> [280] "words/stopwords/pl"                                             
#> [281] "words/stopwords/pt"                                             
#> [282] "words/stopwords/ru"                                             
#> [283] "words/stopwords/sk"                                             
#> [284] "words/stopwords/sv"                                             
#> [285] "words/stopwords/tr"                                             
#> [286] "words/strange_words"                                            
#> [287] "words/units_of_time"                                            
#> [288] "words/us_president_quotes"                                      
#> [289] "words/verbs_with_conjugations"                                  
#> [290] "words/verbs"                                                    
#> [291] "words/word_clues/clues_five"                                    
#> [292] "words/word_clues/clues_four"                                    
#> [293] "words/word_clues/clues_six"
corpora("foods/pizzaToppings")
#> $description
#> [1] "A list of pizza toppings."
#> 
#> $pizzaToppings
#>  [1] "anchovies"        "artichoke"        "bacon"           
#>  [4] "breakfast bacon"  "Canadian bacon"   "cheese"          
#>  [7] "chicken"          "chili peppers"    "feta"            
#> [10] "garlic"           "green peppers"    "grilled onions"  
#> [13] "ground beef"      "ham"              "hot sauce"       
#> [16] "meatballs"        "mushrooms"        "olives"          
#> [19] "onions"           "pepperoni"        "pineapple"       
#> [22] "sausage"          "spinach"          "sun-dried tomato"
#> [25] "tomatoes"

License

CC0

News

2.0.0

  • Data sets are now cached to minimize loading times (#2, @richfitz)

  • Data files are always read in UTF-8 Encoding now (#3, #5, @isteves)

New data sets:

  • animals/cats, animals/donkeys, animals/horses, animals/ponies List of cat, donkey, horse, and pony breeds sourced from wikipedia.

  • animals/collateral_adjectives list of animals plus collateral adjectives.

  • animals/dog_names list of dog names.

  • colors/dulux Dulux colors.

  • colors/google_material_colors Material Design Style Color Palette.

  • colors/palettes The top 200 most popular palettes on colourlovers.com.

  • colors/xkcd The 954 most common RGB monitor colors, as defined by several hundred thousand participants in the xkcd color name survey.

  • divination/zodiac Zodiac signs and associated information, both Western and Eastern.

  • film-tv/game-of-thrones-houses Game of Thrones Houses.

  • film-tv/iab_categories Categories from Interactive Advertising Bureau.

  • film-tv/netflix-categories Netflix Movie Categories.

  • film-tv/popular-movies A bunch of movies, mostly Best Picture winners or nominees, scraped from the web.

  • foods/bad_beers Beers with the 100 lowest scores on BeerAdvocate, adapted from https://www.beeradvocate.com/lists/bottom/

  • foods/iba_cocktails Cocktails recognized by the International Bartenders Association for use in the World Cocktail Competition.

  • foods/sausages A list of sausages.

  • foods/scotch_whiskey A list of scotch whiskies.

  • games/zelda List of Zelda characters by game.

  • geography/canadian_municipalities Top 100 Canadian municipalities by 2011 population.

  • geography/countries_with_capitals A list of countries and its respective capitals.

  • geography/japanese_prefectures Japanese regions and prefectures.

  • geography/nationalities A list of nationalities.

  • geography/norwegian_cities Top Norwegian Cities by 2017 population.

  • geography/nyc_neighborhood_zips Neighborhoods of New York City and their corresponding ZIP codes.

  • geography/sf_neighborhoods San Francisco neighborhoods and their locations.

  • geography/us_airport_codes IATA and ICAO airport codes for the primary commercial airports in each state.

  • geography/us_counties U.S. Counties by State.

  • geography/us_metropolitan_areas U.S. Metropolitan, Micropolitan and Combined Statistical Areas with 2016 population estimates.

  • geography/us_state_capitals U.S. State Capitals.

  • geography/winds A list of regional and local winds and weather phenomena.

  • governments/mass-surveillance-project-names This is a list of government surveillance projects and related databases throughout the world.

  • humans/2016_us_presidential_candidates All individuals who filed a Statement of Candidacy with the FEC to register as a presidential candidate in the 2016 United States election.

  • humans/atus_activities Activity category codes used by the US Bureau of Labor Statistics in its American Time Use Survey.

  • humans/celebrities Celebrities.

  • humans/descriptions A list of adjectives for describing people, taken from www.enchantedlearning.com/wordlist/adjectivesforpeople.shtml.

  • humans/norwayFirstNamesBoys First names of boys, pulled from Statistics Norway 2015.

  • humans/norwayFirstNamesGirls First names of girls, pulled from Statistics Norway 2015.

  • humans/norwayLastNames Last names of people, pulled from Statistics Norway 2015.

  • humans/thirdPersonPronouns Third person personal pronouns with case.

  • humans/tolkienCharacterNames Character names from Tolkien's Middle Earth, from https://en.wikipedia.org/wiki/List_of_Middle-earth_characters

  • mathematics/primes_binary The first 1000 prime numbers in binary.

  • medicine/hospitals A partial list of the hospitals in the United States.

  • music/a_list_of_guitar_manufacturers A list of guitar manufacturers.

  • music/female_classical_guitarists A list of women classical guitarists.

  • music/hamilton_musical_obcrecording_actors_characters Actors and the named characters played by them in the Original Broadway Cast recording of Hamilton: An American Musical.

  • music/instruments Musical Instruments.

  • music/xxl_freshman Every rapper that's ever made the XXL Annual Freshman Cover.

  • mythology/greek_myths_master Greek Myths Actors.

  • objects/clothing List of clothing types.

  • objects/corpora_winners Winners in the Corpora Brackets.

  • science/weather_conditions A list of phrases describing weather conditions.

  • plants/plants List of plants by common name.

  • sports/football/epl_teams Current (as of November 2016) teams in the EPL (English Premier League) and where they play.

  • sports/football/laliga_teams Teams in the Spanish Primera División, La Liga(2017-18) with their details.

  • sports/football/serieA Teams in the Italian First División, Serie A(2017-18) with their details.

  • sports/mlb_teams Current (as of 2016) Major League Baseball teams and where they play.

  • sports/nba_mvps NBA MVP award winners 1956-2017.

  • sports/nba_teams Current (as of 2016) teams in the NBA and where they play.

  • sports/nhl_teams Current (as of 2016) teams in the NHL and where they play.

  • sports/olympics Olympic Games summary data.

  • transportation/commercial-aircraft List of aircraft manufacturers and some of their aircraft types currently in use.

  • travel/lcc A list of low cost air carriers.

  • words/compounds A partial list of English compound words.

  • words/emoji/emoji All the Unicode emoji.

  • words/ergative_verbs 'Ergative' verbs in English can be used both transitively and intransitively.

  • words/expletives Common expletives and spelling variants used in internet comments.

  • words/harvard_sentences The Harvard sentences are a collection of sample phrases that are used for standardized testing of Voice over IP, cellular, and other telephone systems.

  • words/infinitive_verbs Infinitive verbs.

  • words/literature/infinitejest List of names from the novel Infinite Jest by David Foster Wallace.

  • words/literature/lovecraft_words H.P Lovecraft favorite words.

  • words/literature/technology_quotes Edited passages from public domain works. These quotes are intended as standard propaganda in science-fiction stories.

  • words/personal_pronouns Personal pronouns.

  • words/possessive_pronouns Possessive pronouns.

  • words/prepositions A list of English prepositions.

  • words/state_verbs State verbs.

  • words/strange_words Some strange sounding words.

  • words/units_of_time A list of units of time ordered by magnitude.

  • words/verbs_with_conjugations Verbs with conjugations.

Updated data sets:

  • animals/birds_north_america Birds of North America, Update per ABA Checklist Version 7.9.0 – July 2016.

  • Updates: animals/dogs, divination/tarot_interpretations, film-tv/tv_shows, foods/fruits, foods/sandwiches, foods/tea, foods/vegetables, geography/countries, geography/us_cities, geography/venues, humans/occupations, humans/prefixes, mathematics/primes, music/genres, mythology/lovecraft, objects/objects, religion/christian_saints, science/elements, sports/nfl_teams, technology/computer_sciences, technology/new_technologies, technology/programming_languages, words/adjs, words/stopwords/bg.

Deleted data sets:

  • animals/birds_uk Birds of the United Kingdom, source (RSPB) copyright notice does not clearly allow for file's inclusion in corpora project.

  • words/emoji/positive_emoji and words/emoji/sea_emoji, see words/emoji/emoji instead.

1.2.0

  • categories() lists subcategories as well

New data sets:

  • animals/birds_north_america Birds of North America, grouped by family. Source: http://listing.aba.org/aba-checklist/

  • architecture/passages Ways to enter or exit a place.

  • corporations/industries A list of all industries on LinkedIn, as of May 21, 2013 Source: http://robertwdempsey.com/liindustries

  • divination/tarot_interpretations Tarot card interpretations, from Mark McElroy's A Guide to Tarot Meanings (http://www.madebymark.com/a-guide-to-tarot-card-meanings/)

  • film-tv/tv_shows 1000 entries from the list of TV shows at http://en.wikipedia.org/wiki/List_of_television_programs_by_name

  • foods/apple_cultivars The 1000 most popular apple cultivars in the USDA's Pomological Watercolor collection.

  • foods/combine A list of recipe instructions.

  • foods/tea Types of tea.

  • foods/vegetable_cooking_times Approximate cooking times for various vegetables Source: http://recipes.howstuffworks.com/tools-and-techniques/how-to-cook-vegetables24.htm

  • foods/wine_descriptions A list of words commonly used to describe wine.

  • games/bannedGames/argentina/bannedList A list of video games banned in Argentina.

  • games/bannedGames/brazil/bannedList A list of video games banned in Brazil.

  • games/bannedGames/china/bannedList A list of video games banned in China.

  • games/bannedGames/denmark/bannedList A list of video games banned in Denmark.

  • games/dark_souls_iii_messages Organized components from the Dark Souls III message system.

  • games/wrestling_moves A list of professional wrestling moves.

  • humans/englishHonorifics English honorifics.

  • humans/famousDuos Famous duos.

  • humans/lastNames Last names of people, pulled from the US Census for the 2000s.

  • materials/gemstones A list of the names of materials commonly used as gemstones Source: https://en.wikipedia.org/wiki/List_of_gemstone_species

  • mathematics/fibonnaciSequence The first 1000 numbers in the Fibonnaci Sequence.

  • mathematics/primes The first 1000 prime numbers.

  • mathematics/trigonometry A list of trigonometric functions, formulas, equations, etc..

  • medicine/diagnoses International Statistical Classification of Diseases and Related Health Problems, 10th revision Source: http://www.cdc.gov/nchs/icd/icd10cm.htm

  • medicine/drugNameStems A list of generic pharmaceutical drug name stems. Hypens indicate whether a stem appears at the beginning, middle, or end of the name. Source: http://druginfo.nlm.nih.gov/drugportal/jsp/drugportal/DrugNameGenericStems.jsp

  • medicine/drugs A list of pharmaceutical drug names Source: The United States National Library of Medicine, http://druginfo.nlm.nih.gov/drugportal/

  • music/bands_that_have_opened_for_tool Bands that have opened for Tool. You must be really dedicated to your music if you are willing to play before Tool fans.

  • music/rock_hall_of_fame Artists who have been added to the Rock N' Roll Hall of Fame along with their year of induction Source: https://en.wikipedia.org/wiki/List_of_Rock_and_Roll_Hall_of_Fame_inductees

  • mythology/greek_gods Gods and goddesses from Greek myth.

  • mythology/greek_monsters Monsters from Greek myth.

  • mythology/greek_titans Titans from Greek myth.

  • mythology/hebrew_god Hebrew names of God used in the Old Testament Bible.

  • mythology/monsters A list of monsters and other mythic creatures.

  • mythology/norse_gods Gods and goddesses of norse and germanic myth.

  • plants/cannabis 420 popular strains of cannabis.

  • religion/christian_saints

  • religion/fictional_religions

  • religion/parody_religions

  • religion/religions

  • science/minor_planets List of names of the first 1000 numbered minor planets.

  • societies_and_groups/animal_welfare

  • societies_and_groups/designated_terrorist_groups/australia

  • societies_and_groups/designated_terrorist_groups/canada

  • societies_and_groups/designated_terrorist_groups/china

  • societies_and_groups/designated_terrorist_groups/egypt

  • societies_and_groups/designated_terrorist_groups/european_union

  • societies_and_groups/designated_terrorist_groups/india

  • societies_and_groups/designated_terrorist_groups/iran

  • societies_and_groups/designated_terrorist_groups/israel

  • societies_and_groups/designated_terrorist_groups/kazakhstan

  • societies_and_groups/designated_terrorist_groups/russia

  • societies_and_groups/designated_terrorist_groups/saudi_arabia

  • societies_and_groups/designated_terrorist_groups/tunisia

  • societies_and_groups/designated_terrorist_groups/turkey

  • societies_and_groups/designated_terrorist_groups/ukraine

  • societies_and_groups/designated_terrorist_groups/uae

  • societies_and_groups/designated_terrorist_groups/united_kingdom

  • societies_and_groups/designated_terrorist_groups/united_nations

  • societies_and_groups/designated_terrorist_groups/united_states

  • societies_and_groups/fraternities/coeducational_fraternities

  • societies_and_groups/fraternities/defunct

  • societies_and_groups/fraternities/fraternities

  • societies_and_groups/fraternities/professional

  • societies_and_groups/fraternities/service

  • societies_and_groups/fraternities/sororities

  • societies_and_groups/semi_secret

  • sports/nfl_teams Current (as of 2015) teams in the NFL and where they play.

  • technology/lisp A list of LISP dialects.

  • technology/new_technologies New or emerging technologies.

  • technology/photo_sharing_websites Photo sharing websites.

  • technology/programming_languages

  • technology/social_networking_websites Social networking websites.

  • technology/video_hosting_websites Video hosting websites.

  • words/closed_pairs Closed pairs in English i.e both words rhyme with each other and only with each other. from https://en.wikipedia.org/wiki/List_of_closed_pairs_of_English_rhyming_words

  • words/emoji/cute_kaomoji A general corpus of cute kaomoji.

  • words/emoji/positive_emoji A general corpus of positive emoji.

  • words/emoji/sea_emoji A general corpus of emoji of sea/water creatures.

  • words/encouraging_words A list of encouraging words to tell someone about something they created.

  • words/interjections a list of exclamatory words and expression from http://www.enchantedlearning.com/wordlist/interjections.shtml

  • words/literature/mr_men_little_miss Mr Men and Little Miss characters Source: http://www.mrmen.com

  • words/literature/shakespeare_phrases Phrasess coined by Shakespeare, from http://www.pathguy.com/shakeswo.htm

  • words/literature/shakespeare_sonnets Shakespeare's sonnets.

  • words/literature/shakespeare_words Words coined by Shakespeare, from http://www.pathguy.com/shakeswo.htm

  • words/personal_nouns List of personal nouns in the 1890 Webster's Unabridged Dictionary. Assembled by Cory Taylor from Project Gutenberg's HTML edition of the dictionary: http://www.gutenberg.org/ebooks/673 Source: https://github.com/coryandrewtaylor/Personal-Nouns

  • words/resume_action_words Resume action words. Source: http://careercenter.umich.edu/article/resume-action-words

  • words/rhymeless_words English words for which there is no perfect rhyme, taken from https://en.wikipedia.org/wiki/List_of_English_words_without_rhymes

  • words/spells A list of Harry Potter spells and descriptions.

  • words/stopwords/ar Arabic stop words.

  • words/stopwords/bg Bulgarian stop words.

  • words/stopwords/cs Chech stop words.

  • words/stopwords/da Danish stop words.

  • words/stopwords/de German stop words.

  • words/stopwords/en English stop words.

  • words/stopwords/es Spanish stop words.

  • words/stopwords/fi Finnish stop words.

  • words/stopwords/fr French stop words.

  • words/stopwords/gr Greek stop words.

  • words/stopwords/it Italian stop words.

  • words/stopwords/jp Japanese stop words.

  • words/stopwords/lv Latvian stop words.

  • words/stopwords/nl Dutch stop words.

  • words/stopwords/no Norwegian stop words.

  • words/stopwords/pl Polish stop words.

  • words/stopwords/pt Portugese stop words.

  • words/stopwords/ru Russian stop words.

  • words/stopwords/sk Slovakian stop words.

  • words/stopwords/sv Swedish stop words.

  • words/stopwords/tr Turkish stop words.

1.1.1

  • Get rid of R CMD check notes.

1.1.0

New data sets:

  • architecture/rooms Different kinds of rooms.

  • art/isms A list of modernist art isms.

  • corporations/fortune500 The 2014 Fortune 500 list.

  • foods/breads_and_pastries A list of classic breads and sweet pastries.

  • foods/condiments A list of condiments.

  • foods/curds A list of curds, cheeses, and other fermented dairy products.

  • games/street_fighter_ii Street Fighter II fighting moves.

  • governments/uk_political_parties A list of uk political parties. Source: http://www.electoralcommission.org.uk/ export on 8th May 2015.

  • humans/moods A list of words that naturally complete the phrase 'They were feeling...'.

  • materials/abridged-body-fluids Abridged body fluids.

  • materials/building-materials Building materials.

  • materials/carbon-allotropes Carbon allotropes.

  • materials/decorative-stones Decorative stones.

  • materials/fabrics Fabrics.

  • materials/fibers Fibers.

  • materials/layperson-metals Layperson metals.

  • materials/natural-materials Natural materials.

  • materials/packaging Packaging.

  • materials/plastic-brands Plastic brands.

  • materials/sculpture-materials Sculpture materials.

  • materials/technical-fabrics Technical fabrics.

  • music/genres A list of musical genres taken from wikipedia article titles.

  • music/mtv_day_one Music videos broadcast on MTV's first day Source: https://en.wikipedia.org/wiki/First_music_videos_aired_on_MTV

  • mythology/lovecraft Deities and supernatural creatures from the works of Lovecraft and the Cthulhu mythos.

  • technology/appliances A list of home appliances.

1.0.1

First release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rcorpora")

2.0.0 by Gábor Csárdi, 2 years ago


https://github.com/gaborcsardi/rcorpora


Report a bug at https://github.com/gaborcsardi/rcorpora/issues


Browse source code at https://github.com/cran/rcorpora


Authors: Darius Kazemi , Cole Willsea , Serin Delaunay , Karl Swedberg , Matthew Rothenberg , Greg Kennedy , Nathaniel Mitchell , Javier Arce , Mark Sample , Parker Higgins , Allison Parrish , Matthew Hokanson , Aaron Marriner , Casey Kolderup , Michael Paulukonis , Neil Freeman , nathan lachenmyer , Brett O'Connor , Christian Leon Christensen , David Edgar , Greg Borenstein , Jeffery Bennett , Kris Baillargeon , M. Nowak , Peter Organisciak , Rachel White , Tod Robbins , John Wiseman , Alex Fox , Alice Maz , Becca Ricks , Chris Spurgeon , Colin Mitchell , David Whitten , Mary Dickson Diaz , Michael R. Bernstein , Mike Watson , Patrick Rodriguez , Rebecca Sherman , Rebecca Turner , Ross Barclay , Ross Binden , Ryan Freebern , Will Hankinson , Stefan Bohacek , Justin Alford , Brian Detweiler , Ed Lea , John Ohno , Daniel McNally , Sean May , Tariq Ali , shubham kumar , adam malantonio , Alan Hussey , Amanda Visconti , Andreas Fuchs , Andy Craze , Andy Dayton , Ashur Cabrera , Austin Davis-Richardson , Ben Williams , Brian Chitester , Brian Gawalt , Brian Jones , Casey Olson , Chad Nelson , Cliff Rodgers , Cristian Rivas Gómez , Dan Sumption , Edward Loveall , Elijah Cobb , Garrett Miller , Grant Williamson , Ian McCowan , Jacob Fauber , Jay Mahabal , Jeoff Villanueva , Jesse Spielman , Joe Mahoney , Jordan Killpack , Josh Leong , Kay Belardinelli , K Adam White , Kristian Wichmann , Kyle McDonald , Liam Cooke , Marcos Wright-Kuhns , Mark Wunsch , Matt Beiswenger , Matthew McVickar , Matthew Molnar , Max Bittker , Michael Dewberry , Nathan Black , Noah Kantrowitz , Noah Swartz , Ranjit Bhatnagar , Ray Martinez , Rob Huzzey , Ryan Giglio , Sabareesh Iyer , Sam Raker , Tia Esguerra , Utsav Chadha , Vincent Bruijn , Will Thompson , Zac Moody , aarón montoya-moraga , Alex Miller , Delacannon , Scott Lieber , Pace Ricciardelli , Ruta Kruliauskaite , Scott Grant


Documentation:   PDF Manual  


CC0 license


Imports jsonlite


Imported by crsra.

Suggested by ids.


See at CRAN