OpenBioLink: A benchmarking framework for large-scale biomedical link prediction.

May 30, 2020, 2:48 p.m. By: Merlyn Shelley

OpenBioLink

With the tremendous exploration in machine learning algorithms, ultimately we got a novel model to benchmark the progress of the biomedical link prediction algorithms. Yep, it is OpenBioLink!

And this OpenBioLink suite of software has made an ice-breaking feat to facilitate utmost transparency, reproducibility, and highly configurable evaluation framework.

Yep, OpenBioLink is capable of evaluating large scale benchmark datasets of challenging biomedical link predicting algorithms with high quality and reproducibility.

Key use cases referring to the pre-print

1.OpenBioLink framework consists of three frameworks,

  • Graph creation module

  • Train-test-split creation module

  • Training and evaluation creation module

2.Graph creation module is used to create benchmark datasets from multiple public datasets

3.Train-test-split creation module is used for segregating the data set into train and test data in random or as time slices. In the Training and evaluation creation module the data sets are trained with external graph embedding algorithms like PyKEEN and evaluated with [email protected], mean reciprocal rank (MRR), area under the receiver operator characteristic curve (ROC AUC) & area under the precision-recall curve (PR AUC) metrics respectively.

This novel OpenBioLink benchmark dataset has got seven nodes and 30 edge types that can encompass the relationships of a wide range of ontology terms and biomedical entities.

Based on the confidence scores of the data source, this benchmark dataset can be divided into four various filter settings such as high, low, medium and all. To facilitate a wide range of link prediction methods, this OpenBioLink benchmark graph is available in directed (reverse edges) and undirected versions.

TransE and TransR graph embedding methods are performed as a preliminary baseline evaluation. Also, hyperparameter optimisation was made to train and test the best configuration of the model against OpenBioLink benchmark dataset.

The best of OpenBioLink is yet to come. More real-life datasets are expected to put into this model to train and test for the best inferences in problem-solving. OpenBioLink is expected to be best tool in advance biomedical research and developments.

Paper preprint on arXiv.

PDF Link: OpenBioLink: A benchmarking framework for large-scale biomedical link prediction

Github Link: OpenBioLink