Abstract
| - Among the many dimensionality reduction techniques that have appeared in the statistical literature,multidimensional scaling and nonlinear mapping are unique for their conceptual simplicity and ability toreproduce the topology and structure of the data space in a faithful and unbiased manner. However, a majorshortcoming of these methods is their quadratic dependence on the number of objects scaled, which imposessevere limitations on the size of data sets that can be effectively manipulated. Here we describe a novelapproach that combines conventional nonlinear mapping techniques with feed-forward neural networks,and allows the processing of data sets orders of magnitude larger than those accessible with conventionalmethodologies. Rooted on the principle of probability sampling, the method employs a classical algorithmto project a small random sample, and then “learns” the underlying nonlinear transform using a multilayerneural network trained with the back-propagation algorithm. Once trained, the neural network can be usedin a feed-forward manner to project the remaining members of the population as well as new, unseen sampleswith minimal distortion. Using examples from the fields of image processing and combinatorial chemistry,we demonstrate that this method can generate projections that are virtually indistinguishable from thosederived by conventional approaches. The ability to encode the nonlinear transform in the form of a neuralnetwork makes nonlinear mapping applicable to a wide variety of data mining applications involving verylarge data sets that are otherwise computationally intractable.
|