N. Cesa-Bianchi, P. M. Long and M. K. Warmuth.
Worst-case quadratic loss bounds for
prediction using linear functions and gradient descent.
IEEE Transactions on Neural Networks, 7(3):604-619, 1996.
(Available in Postscript and PDF formats.)
Abstract
In this paper we study the performance of gradient
descent when applied to the problem of on-line linear
prediction in arbitrary inner product spaces. We prove
worst-case bounds on the sum of the squared prediction errors
under various assumptions concerning the amount of a
priori information about the sequence to predict. The algorithms
we use are variants and extensions of on-line gradient
descent. Whereas our algorithms always predict using linear
functions as hypotheses, none of our results requires the
data to be linearly related. In fact, the bounds proved on the
total prediction loss are typically expressed as a function of
the total loss of the best fixed linear predictor with bounded
norm. All the upper bounds are tight to within constants.
Matching lower bounds are provided in some cases. Finally,
we apply our results to the problem of online prediction for
classes of smooth functions.