Improving compound-protein interaction prediction by building up highly credible negative samples

Hui Liu; Jianjiang Sun; Jihong Guan; Jie Zheng; Shuigeng Zhou

doi:10.1093/bioinformatics/btv256

Improving compound-protein interaction prediction by building up highly credible negative samples

Bioinformatics. 2015 Jun 15;31(12):i221-9. doi: 10.1093/bioinformatics/btv256.

Authors

Hui Liu¹, Jianjiang Sun², Jihong Guan², Jie Zheng², Shuigeng Zhou²

Affiliations

¹ Lab of Information Management, Changzhou University, Jiangsu 213164, China, School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore, Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China Lab of Information Management, Changzhou University, Jiangsu 213164, China, School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore, Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.
² Lab of Information Management, Changzhou University, Jiangsu 213164, China, School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore, Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.

Abstract

Motivation: Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods.

Results: This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases.

Availability: Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Bayes Theorem
Computer Simulation
Databases, Protein
Drug Discovery*
Humans
Pharmaceutical Preparations / chemistry
Protein Interaction Mapping
Proteins / chemistry*
Support Vector Machine

Substances

Pharmaceutical Preparations
Proteins