Host: Sohan Seth
Abstract:
Searching similar compound from a database is the most important process for in-silico drug screening.
Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. There arises serious dilemma, however, when the database holder also wants to output no information except for the search results, and such dilemma prevents efficient use of many important databases. Therefore, it is emerging demand to develop the new technology which overcome this dilemma. In this study, we propose a novel protocol which enables searching databases while keeping both a query holder's privacy and a database holder's privacy. Generally, such a privacy-preserving protocol entails highly time-consuming cryptographic techniques such as general purpose multi-party computation, but our protocol is successfully designed without relying on such techniques and built from only additive-homomorphic cryptosystem. Hence its performance is significantly efficient both in CPU time and communication size, easily scales for large scale databases. In the experiment searching on ChEMBL, which consists of more than 1,200,000 compounds, the proposed method is 50,000 times faster in CPU time and 12,000 times efficient in communication size comparing to general purpose multi-party computation. So far, technology related to privacy issues has been scarcely discussed in the field of bioinformatics, thus, we think our study serves as the earliest important model which examines practical application of privacy-preserving datamining.
Last updated on 2 Sep 2013 by Antti Ukkonen - Page created on 2 Sep 2013 by Antti Ukkonen