Abstract
| - This paper evaluates the effectiveness of various similarity coefficients for 2D similarity searching whenmultiple bioactive target structures are available. Similarity searches using several different activity classeswithin the MDL Drug Data Report and the Dictionary of Natural Products databases are performed usingBCI 2D fingerprints. Using data fusion techniques to combine the resulting nearest neighbor lists we obtaingroup recall results which, in many cases, are a considerable improvement on standard average recall valuesobtained for individual structures. It is shown that the degree of improvement can be related to the structuraldiversity of the activity class that is searched for, the best results being found for the most diverse groups.The group recall of active compounds using subsets of the class is also investigated: for highly self-similaractivity classes, the group recall improvement saturates well before the full activity class size is reached. Arough correlation is found between the relative improvement using the group recall and the square of thenumber of unique compounds available in all of the merged lists. The Tanimoto coefficient is foundunambiguously to be the best coefficient to use for the recovery of active compounds using multiple targets.Furthermore, when using the Tanimoto coefficient, the “MAX” fusion rule is found to be more effectivethan the “SUM” rule for the combination of similarity searches from multiple targets. The use of grouprecall can lead to improved enrichment in database searches and virtual screening.
|