R6 Objects for Text and Data

For natural language processing and analysis of qualitative text coding structures which provide a way to bind together text and text data are fundamental. The package provides such a structure and accompanying methods in form of R6 objects. The 'rtext' class allows for text handling and text coding (character or regex based) including data updates on text transformations as well as aggregation on various levels. Furthermore, the usage of R6 enables inheritance and passing by reference which should enable 'rtext' instances to be used as back-end for R based graphical text editors or text coding GUIs.




Status

getting stable

R6 Objects for Text and Data



Version

0.1.19
2016-10-24 (last change to R folder)



Description

For natural language processing and analysis of qualitative text coding structures which provide a way to bind together text and text data are fundamental. The package provides such a structure and accompanying methods in form of R6 objects. The 'rtext' class allows for text handling and text coding (character or regex based) including data updates on text transformations as well as aggregation on various levels. Furthermore, the usage of R6 enables inheritance and passing by reference which should enable 'rtext' instances to be used as back-end for R based graphical text editors or text coding GUIs.



License

MIT + file LICENSE
Peter Meissner [aut, cre], Ulrich Sieberer [cph], University of Konstanz [cph]



Citation

Meißner P (2016). rtext. R package version 0.1.19, <URL: https://github.com/petermeissner/rtext>.

Sieberer U, Meißner P, Keh J and Müller W (2016). "Mapping and Explaining Parliamentary Rule Changes in Europe: A Research Program." Legislative Studies Quarterly, 41(1), pp. 61-88. ISSN 1939-9162, doi: 10.1111/lsq.12106 (URL: http://doi.org/10.1111/lsq.12106), <URL: http://dx.doi.org/10.1111/lsq.12106>.



BibTex for citing

@Manual{Meissner2016, title = {rtext}, author = {Peter Meißner}, year = {2016}, note = {R package version 0.1.19}, url = {https://github.com/petermeissner/rtext}, }

@Article{Sieberer2016, title = {Mapping and Explaining Parliamentary Rule Changes in Europe: A Research Program}, author = {Ulrich Sieberer and Peter Meißner and Julia F. Keh and Wolfgang C. Müller}, journal = {Legislative Studies Quarterly}, volume = {41}, number = {1}, issn = {1939-9162}, url = {http://dx.doi.org/10.1111/lsq.12106}, doi = {10.1111/lsq.12106}, pages = {61--88}, year = {2016}, }



Installation

stable CRAN version

install.packages("rtext")
library(rtext)

(stable) development version

standard_repos <- options("repos")$repos
install.packages( "rtext", repos = c(standard_repos, "https://petermeissner.github.io/drat/"))
library(rtext)



Package Contents

library(rtext)
```
``` r
objects("package:rtext")
##  [1] "%>%"               "modus"             "prometheus_early"  "prometheus_late"   "R6_rtext_extended"
##  [6] "rtext"             "rtext_base"        "rtext_export"      "rtext_loadsave"    "rtext_tokenize"



Contribution

Note, that this package uses a Contributor Code of Conduct. By participating in this project you agree to abide by its terms: http://contributor-covenant.org/version/1/0/0/ (basically this should be a place were people get along with each other respectful and nice because it's simply more fun that way for everybody)

Contributions are very much welcome, e.g. in the form of:



Example Usage



... starting up ...

library(rtext)



... creating a text object ...

# initialize (with text or file)
quote_text <- "Outside of a dog, a book is man's best friend. Inside of a dog it's too dark to read."
quote <- rtext$new(text = quote_text)
## rtext : initializing



... setting and getting data ...

# add some data
quote$char_data_set("first", 1, TRUE)
quote$char_data_set("last", quote$char_length(), TRUE)
 
# get the data
quote$char_data_get()
##    i char first last
## 1  1    O  TRUE   NA
## 2 85    .    NA TRUE



... text transformation and data update ...

# transform text
quote$char_add("[this is an insertion] \n", 47)
 
# get the data again (see, the data moved along with the text)
quote$text_get()
## [1] "Outside of a dog, a book is man's best friend. [this is an insertion] \nInside of a dog it's too dark to read."
quote$char_data_get()
##     i char first last
## 1   1    O  TRUE   NA
## 2 109    .    NA TRUE



... using regular expression for setting data ...

# do some convenience coding (via regular expressions)
quote$char_data_set_regex("dog_friend", "dog", "dog")
quote$char_data_set_regex("dog_friend", "friend", "friend")
quote$char_data_get()
##      i char first last dog_friend
## 1    1    O  TRUE   NA       <NA>
## 2   14    d    NA   NA        dog
## 3   15    o    NA   NA        dog
## 4   16    g    NA   NA        dog
## 5   40    f    NA   NA     friend
## 6   41    r    NA   NA     friend
## 7   42    i    NA   NA     friend
## 8   43    e    NA   NA     friend
## 9   44    n    NA   NA     friend
## 10  45    d    NA   NA     friend
## 11  84    d    NA   NA        dog
## 12  85    o    NA   NA        dog
## 13  86    g    NA   NA        dog
## 14 109    .    NA TRUE       <NA>



... data aggregation via regex ...

quote$tokenize_data_regex(split="(dog)|(friend)", non_token = TRUE, join = "full")
##   token_i from  to                                   token is_token first last dog_friend
## 1       1    1  13                           Outside of a      TRUE  TRUE   NA       <NA>
## 2       2   14  16                                     dog    FALSE    NA   NA        dog
## 3       3   17  39                 , a book is man's best      TRUE    NA   NA       <NA>
## 4       4   40  45                                  friend    FALSE    NA   NA     friend
## 5       5   46  83 . [this is an insertion] \nInside of a      TRUE    NA   NA       <NA>
## 6       6   84  86                                     dog    FALSE    NA   NA        dog
## 7       7   87 109                  it's too dark to read.     TRUE    NA TRUE       <NA>



... data aggregation by words ...

quote$tokenize_data_words(non_token = TRUE, join="full")
##    token_i from  to     token is_token first last dog_friend
## 1        1    1   7   Outside     TRUE  TRUE   NA       <NA>
## 2        2    8   8              FALSE    NA   NA       <NA>
## 3        3    9  10        of     TRUE    NA   NA       <NA>
## 4        4   11  11              FALSE    NA   NA       <NA>
## 5        5   12  12         a     TRUE    NA   NA       <NA>
## 6        6   13  13              FALSE    NA   NA       <NA>
## 7        7   14  16       dog     TRUE    NA   NA        dog
## 8        8   17  18        ,     FALSE    NA   NA       <NA>
## 9        9   19  19         a     TRUE    NA   NA       <NA>
## 10      10   20  20              FALSE    NA   NA       <NA>
## 11      11   21  24      book     TRUE    NA   NA       <NA>
## 12      12   25  25              FALSE    NA   NA       <NA>
## 13      13   26  27        is     TRUE    NA   NA       <NA>
## 14      14   28  28              FALSE    NA   NA       <NA>
## 15      15   29  31       man     TRUE    NA   NA       <NA>
## 16      16   32  32         '    FALSE    NA   NA       <NA>
## 17      17   33  33         s     TRUE    NA   NA       <NA>
## 18      18   34  34              FALSE    NA   NA       <NA>
## 19      19   35  38      best     TRUE    NA   NA       <NA>
## 20      20   39  39              FALSE    NA   NA       <NA>
## 21      21   40  45    friend     TRUE    NA   NA     friend
## 22      22   46  48       . [    FALSE    NA   NA       <NA>
## 23      23   49  52      this     TRUE    NA   NA       <NA>
## 24      24   53  53              FALSE    NA   NA       <NA>
## 25      25   54  55        is     TRUE    NA   NA       <NA>
## 26      26   56  56              FALSE    NA   NA       <NA>
## 27      27   57  58        an     TRUE    NA   NA       <NA>
## 28      28   59  59              FALSE    NA   NA       <NA>
## 29      29   60  68 insertion     TRUE    NA   NA       <NA>
## 30      30   69  71      ] \n    FALSE    NA   NA       <NA>
## 31      31   72  77    Inside     TRUE    NA   NA       <NA>
## 32      32   78  78              FALSE    NA   NA       <NA>
## 33      33   79  80        of     TRUE    NA   NA       <NA>
## 34      34   81  81              FALSE    NA   NA       <NA>
## 35      35   82  82         a     TRUE    NA   NA       <NA>
## 36      36   83  83              FALSE    NA   NA       <NA>
## 37      37   84  86       dog     TRUE    NA   NA        dog
## 38      38   87  87              FALSE    NA   NA       <NA>
## 39      39   88  89        it     TRUE    NA   NA       <NA>
## 40      40   90  90         '    FALSE    NA   NA       <NA>
## 41      41   91  91         s     TRUE    NA   NA       <NA>
## 42      42   92  92              FALSE    NA   NA       <NA>
## 43      43   93  95       too     TRUE    NA   NA       <NA>
## 44      44   96  96              FALSE    NA   NA       <NA>
## 45      45   97 100      dark     TRUE    NA   NA       <NA>
## 46      46  101 101              FALSE    NA   NA       <NA>
## 47      47  102 103        to     TRUE    NA   NA       <NA>
## 48      48  104 104              FALSE    NA   NA       <NA>
## 49      49  105 108      read     TRUE    NA   NA       <NA>
## 50      50  109 109         .    FALSE    NA TRUE       <NA>



... data aggregation by lines ...

quote$tokenize_data_lines()
##   token_i from  to                                                                  token is_token first
## 1       1    1  70 Outside of a dog, a book is man's best friend. [this is an insertion]      TRUE    NA
## 2       2   72 109                                 Inside of a dog it's too dark to read.     TRUE    NA
##   last dog_friend
## 1   NA     friend
## 2   NA        dog



... text plotting with data highlighting ...

plot(quote, "dog_friend")



... adding further data to the plot ...

plot(quote, "dog_friend")
plot(quote, "first", col="steelblue", add=TRUE)
plot(quote, "last", col="steelblue", add=TRUE)

News

NEWS rtext

version 0.1.20 // 2016-11-01 ...

  • BUGFIXES
  • FEATURE

    • adding some examples
  • DEV

version 0.1.19 // 2016-11-01 ...

  • BUGFIXES
  • FEATURE
  • DEV
    • documentation
    • CRAN publication polishing

version 0.1.18 // 2016-10-24 ...

  • BUGFIXES
  • FEATURE

    • tokenize_data_sequences()
  • DEV

  • BUGFIXES

    • R6_extended$ls() breaks on more than on class
  • FEATURE

  • Development
  • BUGFIXES
  • FEATURE
  • Development
    • rtext_extended : moving verbose to self$options
    • introducing self$options as general place to put options
  • BUGFIXES
  • FEATURE
    • adding export functionality to SQLite databases
  • Development
    • rtext_loadsave : restructuring
  • BUGFIXES
    • minor fixes discovered by testing:
  • FEATURE
    • testing
  • Development
    • testing testing testing
    • restructuring plot.rtext()
  • BUGFIXES
  • FEATURE
    • R6_rtext_extended : ls()
    • R6_rtext_extended : hash(), hashed(), hashes
    • R6_rtext_extended : get()
    • adding second order data (data that is less important than first order data) to allow for inheritance of data while not overriding first class data
  • Development
    • splitting rtext class into R6_rtext_extended, rtext_base and rtext and using inheritance
  • BUGFIXES
  • FEATURE
  • Development
    • restructuring internal hashing functions and data to be more streamlined
  • BUGFIXES
    • fixing non-split char bug
    • fixing minor issues: unnecessary exports, un-documented, ...
  • FEATURE
  • Development
  • FEATURE
    • rtext : tokenize_data_regex()
    • rtext : tokenize_data_lines()
    • rtext : tokenize_data_words()
  • Development
  • BUGFIXES
    • fixing which_token supressing results error
  • FEATURE
    • rtext : char_code() -> char_data_set()
    • rtext : char_get_code() -> char_data_get()
    • rtext : introducing tokens and token_data
    • introducing Rcpp to speed up: which_token_worker, which_token
  • Development
  • BUGFIXES
  • FEATURE
    • rtext : char_code()
    • rtext : char_length()
  • Development
  • BUGFIXES

    • rtext : text_get() and char_get() would return text decently encoded as UTF-8 nut fail to tell Windows about that
  • FEATURE

  • Development
  • BUGFIXES
  • FEATURE
    • rtext : save()
    • rtext : load()
    • rtext : text is tokenized into characters and then stored in characters
    • rtext : char_add()
    • rtext : char_delete()
    • rtext : char_replace()
    • rtext : text_hash()
    • rtext : hash_text()
  • Development
  • BUGFIXES

    • rtext : getting tokenization on init right
  • FEATURE

    • dp_text : !!! rename to rtext :-) !!!
  • Development
  • BUGFIXES

    • fixing documentation and minor build check complaints
  • FEATURE

    • dp_text() : add tokenization to initializetion stage
  • Development
  • BUGFIXES
  • FEATURES
    • tools : text_tokenize()
    • tools : text_tokenize_words()
  • Development
  • BUGFIXES
  • FEATURES
    • dp_text : show_text()
    • dp_text : info()
    • dp_text : get_text()
  • Development
  • FEATURES
    • dp_text : an object for text (basic layout)
    • tools : text_read() function for reading text
    • tools : text_snippet() function for getting snippet of text
  • Development
  • START of development

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rtext")

0.1.20 by Peter Meissner, 7 months ago


https://github.com/petermeissner/rtext


Report a bug at https://github.com/petermeissner/rtext/issues


Browse source code at https://github.com/cran/rtext


Authors: Peter Meissner [aut, cre], Ulrich Sieberer [cph], University of Konstanz [cph]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports R6, hellno, magrittr, Rcpp, digest, RSQLite, stats, graphics

Depends on stringb

Suggests testthat, knitr, rmarkdown

Linking to Rcpp


Depended on by diffrprojects.


See at CRAN