Pathway enrichment analysis enables researchers to uncover mechanisms
underlying the phenotype. pathfindR is a tool for pathway enrichment analysis
utilizing active subnetworks. It identifies active subnetworks in a
protein-protein interaction network using user-provided a list of genes.
It performs pathway enrichment analyses on the identified subnetworks.
pathfindR also offers functionalities to cluster enriched pathways and identify
representative pathways and to score the pathways per sample. The method is
described in detail in Ulgen E, Ozisik O, Sezerman OU. 2018. pathfindR: An R
Package for Pathway Enrichment Analysis Utilizing Active Subnetworks. bioRxiv.
plot_scores, added the argument
label_casesto indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument
case_control_titleswhich allows the user to change the default ‘Case’ and ‘Control’ headers. Also added the arguments
highused to change the low and high end colors of the scoring color gradient.
plot_scores, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)
score_quan_thr. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.
sig_gene_thrfrom 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.
choose_clusters, added the argument
p_val_thresholdto be used as p value threshold for filtering the enriched pathways prior to clustering.
choose_clusters, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.
run_pathfindR. For this, the
gene_setsargument should be set to "Custom" and
custom_pathwaysshould be provided.
calculate_pw_scoreswhere if there was one DEG, subseting the experiment matrix failed
calculate_pw_scores. If there is none, the pathway is skipped.
casesare provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in
cases. This way, "up" pathways are grouped together, same for "down" pathways.
calculate_pwd, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.
result_dfhas less than 3 pathways, do not perform clustering.
run_pathfindRchecks whether the output directory (
output_dir) already exists and if it exists, now appends "(1)" to
output_dirand displays a warning message. This was implemented to prevent writing over existing results.
run_pathfindR, recursive creation for the output directory (
output_dir) is now supported.
run_pathfindR, if no pathways are found, the function returns an empty data frame instead of raising an error.
Implemented the (per subject) pathway scoring function
calculate_pw_scores and the function to plot the heatmap of pathway scores per subject
auto parameter to
auto == TRUE (default), the function chooses the optimal number of clusters
k automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.
Fold_Enrichment column to the resulting data frame of
enrichment, and as a corollary to the resulting data frame of
Added the option
bubble to plot a bubble chart displaying the enrichment results in
run_pathfindR using the helper function
enrichment_chart. To plot the bubble chart set
bubble = TRUE in
run_pathfindR or use
Add the paramater
silent_option == TRUE (default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". If
silent_option == FALSE, the output is printed on the screen. Default was set to
TRUE because multiple console outputs are simultaneously printed when runnning in parallel.
list_active_snw_genes parameter to
list_active_snw_genes == TRUE, the function adds the column
non_DEG_Active_Snw_Genes, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.
Added the data
RA_clustered, which is the example output of the clustering workflow.
In the function,
run_pathfindR added the option to specify the argument
output_dir which specifies the directory to be created under the current working directory for storing the result HTML files.
output_dir is "pathfindR_Results" by default.
run_pathfindR now checks whether the output directory (
output_dir) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.
genes_table.html now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.
run_pathfindRto chose between different gene sets. Available gene sets are
BioCartaand Gene Ontology gene sets (
cluster_pathwaysautomatically recognizes the ID type and chooses the gene sets accordingly.
input_processing, genes for which no interactions are found in the PIN are now removed before active subnetwork search
run_pathfindRreturns to the user's working directory.