Abstract
| - The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searchingis discussed and applied to newly designed fingerprint representations. The approach is based on the analysisof characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biologicalactivity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weightedaccording to observed bit frequencies are applied to signature bit positions when searching for similarcompounds. In systematic similarity search calculations over 23 diverse activity classes, profile scalingconsistently increased the performance of fingerprints containing property descriptors and/or structural keys.A significant improvement of ∼15% was observed for a new fingerprint consisting of binary encodedmolecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few falsepositives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimumperformance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficientthan in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively highweight on signature bit positions that were always, or almost always, set on was found to be the mosteffective scaling procedure. Analysis of class-specific search performance revealed that profile scaling ofMP-MFP improved the similarity search results for each of the 23 activity classes.
|