This tool is for parsing the 'DrugBank' XML database < https://www.drugbank.ca/>. The parsed data are then returned in a proper 'R' dataframe with the ability to save them in a given database.
The main purpose of the dbparser
package is to parse the DrugBank database which is downloadable in XML format from this link. The parsed data can then be explored and analyzed as desired by the user. The dbparser
package further provides the facility of saving the parsed data into a given database.
You can install the released version of dbparser from CRAN with:
install.packages("dbparser")
This is a basic example which shows you how to solve a common problem:
get_xml_db_rows(system.file("extdata", "drugbank_record.xml", package = "dbparser"))## load drugs datadrugs <- parse_drug()## load drug groups datadrug_groups <- parse_drug_groups()## load drug targets actions datadrug_targets_actions <- parse_drug_targets_actions()
The parsed data may be saved into a given database. Databases supported by dbparser
include MS SQL Server, MySQL and any database supported by DBI
package. Following is an example of saving the parsed data into a MySQL database.
library(dbparser)## open a connection to the desired database engine with an already## created databaseopen_db(xml_db_name = "drugbank.xml", driver = "SQL Server",server = "ServerName\\\\SQL2016", output_database = "drugbank")## save 'drugs' dataframe to DBparse_drug(TRUE)## save 'drug_groups' dataframe to DBparse_drug_groups(TRUE)## save 'drug_targets_actions' dataframe to DBparse_drug_targets_actions(TRUE)## finally close db connectionclose_db()
Following is an example involving a quick look at a few aspects of the parsed data. First we look at the proportions of biotech
and small-molecule
drugs in the data.
## view proportions of the different drug types (biotech vs. small molecule)drugs %>%select(type) %>%ggplot(aes(x = type)) +geom_bar() +guides(fill=FALSE) ## removes legend for the bar colors
Below, we view the different drug_groups
in the data and how prevalent they are.
## view proportions of the different drug types for each drug groupdrugs %>%rename(parent_key = primary_key) %>%full_join(drug_groups, by = 'parent_key') %>%select(type, text) %>%ggplot(aes(x = text, fill = type)) +geom_bar() +theme(legend.position= 'bottom') +labs(x = 'Drug Group',y = 'Quantity',title="Drug Type Distribution per Drug Group",caption="created by ggplot") +coord_flip()
Finally, we look at the drug_targets_actions
to observe their proportions as well.
## get counts of the different target actions in the datatargetActionCounts <-drug_targets_actions %>%group_by(text) %>%summarise(count = n()) %>%arrange(desc(count))## get bar chart of the 10 most occurring target actions in the datap <-ggplot(targetActionCounts[1:10,],aes(x = reorder(text,count), y = count, fill = letters[1:10])) +geom_bar(stat = 'identity') +labs(fill = 'action',x = 'Target Action',y = 'Quantity',title = 'Target Actions Distribution',subtitle = 'Distribution of Target Actions in the Data',caption = 'created by ggplot') +guides(fill = FALSE) + ## removes legend for the bar colorscoord_flip() ## switches the X and Y axes## display plotp
NEWS.md
file to track changes to the package.