N. H. Bshouty and P. M. Long. Linear classifiers are
nearly optimal when hidden variables have diverse effects. COLT'09.
Abstract
We analyze classification problems in which data is generated
by a two-tiered random process. The class is generated
first, then a layer of conditionally independent hidden variables, and
finally the observed variables. For sources like this, the
Bayes-optimal rule for predicting the class given the values of the
observed variables is a two-layer neural network. We show that, if
the hidden variables have non-negligible effects on many observed
variables, a linear classifier approximates the error rate of the
Bayes optimal classifier up to lower order terms.