Classification and Feature Selection for microRNA/mRNA Interactions

Comprises a pipeline for predicting microRNA/mRNA interactions, as detailed in Williams, Calinescu, Mohorianu (2020) . Its input consists of [a] a messenger RNA (mRNA) dataset (either in fasta format, focused on 3' UTRs or in gtf format; for the latter, the sequences of the 3’ UTRs are generated using the genomic coordinates), [b] a microRNA dataset (in fasta format, retrieved from miRBase, <>) and [c] an interaction dataset (in csv format, from miRTarBase <>). To characterise and predict microRNA/mRNA interactions, we use [a] statistical analyses based on Chi-squared and Fisher exact tests and [b] Machine Learning classifiers (decision trees, random forests and support vector machines). To enhance the accuracy of the classifiers we also employ feature selection approaches used in on conjunction with the classifiers. The feature selection approaches include a voting scheme for decision trees, a measure based on Gini index for random forests, forward feature selection and Genetic Algorithms on SVMs. The pipeline also includes a novel approach based on embryonic Genetic Algorithms which combines and optimises the forward feature selection and Genetic Algorithms. All analyses, including the classification and feature selection, are applicable on the microRNA seed features (default), on the full microRNA features and/or flanking features on the mRNA. The sets of features can be combined.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1.0 by Eleanor Williams, a year ago

Report a bug at

Browse source code at

Authors: Eleanor Williams [aut, cre] , Irina Mohorianu [aut]

Documentation:   PDF Manual  

GPL-2 license

Imports stringr, randomForest, rpart, rpart.plot, GA, e1071, ggplot2, magrittr, tibble, dplyr, reticulate

Suggests parallel, doParallel

System requirements: Python (>=3.6) sreformat patman

See at CRAN