Enrichment analysis enables researchers to uncover mechanisms
underlying a phenotype. However, conventional methods for enrichment
analysis do not take into account protein-protein interaction information,
resulting in incomplete conclusions. pathfindR is a tool for enrichment
analysis utilizing active subnetworks. The main function identifies active
subnetworks in a protein-protein interaction network using a user-provided
list of genes and associated p values. It then performs enrichment analyses
on the identified subnetworks, identifying enriched terms (i.e. pathways or,
more broadly, gene sets) that possibly underlie the phenotype of interest.
pathfindR also offers functionalities to cluster the enriched terms and
identify representative terms in each cluster, to score the enriched terms
per sample and to visualize analysis results. The enrichment, clustering and
other methods implemented in pathfindR are described in detail in
Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for
Comprehensive Identification of Enriched Pathways in Omics Data Through
Active Subnetworks. Front. Genet.
run_pathfindR
into individual functions: active_snw_search
, enrichment_analyses
, summarize_enrichment_results
, annotate_pathway_DEGs
, visualize_pws
.pathmap
as visualize_hsa_KEGG
, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the input_processing
function, assigns a change value of 100 to all).visualize_pw_interactions
, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.create_kappa_matrix
, hierarchical_pw_clustering
, fuzzy_pw_clustering
and cluster_pathways
.cluster_graph_vis
for visualing graph diagrams of clustering results.score_quan_thr
and sig_gene_thr
for run_pathfindR
were not being utilized.run_pathfindR
, added message at the end of run, reporting the number enriched pathways.run_pathfindR
now creates a variable org_dir
that is the "path/to/original/working/directory". org_dir
is used in multiple funtions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to "..", i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (output_dir
, e.g. "./ALL_RESULTS/RESULT_A").input_processing
, added the argument human_genes
to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML filesGO-All
, all annotations in the GO database (BP+MF+CC)pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks
to reflect the new functionalities.plot_scores
, added the argument label_cases
to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument case_control_titles
which allows the user to change the default ‘Case’ and ‘Control’ headers. Also added the arguments low
and high
used to change the low and high end colors of the scoring color gradient.plot_scores
, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)parseActiveSnwSearch
, replaced score_thr
by score_quan_thr
. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.parseActiveSnwSearch
, increased sig_gene_thr
from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.choose_clusters
, added the argument p_val_threshold
to be used as p value threshold for filtering the enriched pathways prior to clustering.pathview
.choose_clusters
, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.run_pathfindR
. For this, the gene_sets
argument should be set to "Custom" and custom_genes
and custom_pathways
should be provided.calculate_pw_scores
where if there was one DEG, subseting the experiment matrix failedcalculate_pw_scores
. If there is none, the pathway is skipped.calculate_pw_scores
, if cases
are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in cases
. This way, "up" pathways are grouped together, same for "down" pathways.calculate_pwd
, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.choose_clusters
, if result_df
has less than 3 pathways, do not perform clustering.run_pathfindR
checks whether the output directory (output_dir
) already exists and if it exists, now appends "(1)" to output_dir
and displays a warning message. This was implemented to prevent writing over existing results.run_pathfindR
, recursive creation for the output directory (output_dir
) is now supported.run_pathfindR
, if no pathways are found, the function returns an empty data frame instead of raising an error.Implemented the (per subject) pathway scoring function calculate_pw_scores
and the function to plot the heatmap of pathway scores per subject plot_scores
.
Added the auto
parameter to choose_clusters
. When auto == TRUE
(default), the function chooses the optimal number of clusters k
automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.
Added the Fold_Enrichment
column to the resulting data frame of enrichment
, and as a corollary to the resulting data frame of run_pathfindR
.
Added the option bubble
to plot a bubble chart displaying the enrichment results in run_pathfindR
using the helper function enrichment_chart
. To plot the bubble chart set bubble = TRUE
in run_pathfindR
or use enrichment_chart(your_result_df)
.
Add the paramater silent_option
to run_pathfindR
. When silent_option == TRUE
(default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". If silent_option == FALSE
, the output is printed on the screen. Default was set to TRUE
because multiple console outputs are simultaneously printed when runnning in parallel.
Added the list_active_snw_genes
parameter to run_pathfindR
. When list_active_snw_genes == TRUE
, the function adds the column non_DEG_Active_Snw_Genes
, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.
Added the data RA_clustered
, which is the example output of the clustering workflow.
In the function, run_pathfindR
added the option to specify the argument output_dir
which specifies the directory to be created under the current working directory for storing the result HTML files. output_dir
is "pathfindR_Results" by default.
run_pathfindR
now checks whether the output directory (output_dir
) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.
genes_table.html
now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.
gene_sets
option in run_pathfindR
to chose between different gene sets. Available gene sets are KEGG
, Reactome
, BioCarta
and Gene Ontology gene sets (GO-BP
, GO-CC
and GO-MF
).cluster_pathways
automatically recognizes the ID type and chooses the gene sets accordingly.input_processing
.input_processing
, genes for which no interactions are found in the PIN are now removed before active subnetwork searchinput_processing
.run_pathfindR
returns to the user's working directory.