P. M. Long and
V. B. Vega. Boosting and microarray data.
Machine
Learning, 52(1):31-44, 2003.
(Available in Postscript and PDF formats. Software also
available.)
Abstract
We have found one reason why AdaBoost tends not to perform well
on gene expression data, and identified simple modifications
that improve its ability to find accurate class prediction rules.
These modifications appear especially
to be needed when there is a strong association between expression profiles
and class designations. Cross-validation analysis of six microarray
datasets with different characteristics suggests that, suitably modified,
boosting provides competitive classification accuracy in general.
Sometimes the goal in a microarray analysis is to find a class prediction rule
that is not only accurate, but that depends on the level of expression
of few genes. Because boosting makes an effort to
find genes that are complementary sources of
evidence of the correct classification of a tissue sample,
it appears especially useful for such gene-efficient
class prediction. This appears particularly to be to be true
when there is a strong association between expression profiles and
class designations, which is often the case for example when comparing
tumor and normal samples.