## Online Learning with Adaptive Local Step Sizes

N. N. Schraudolph. Online Learning with Adaptive Local Step Sizes. In Neural Nets---WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks, pp. 151–156, Springer Verlag, Berlin, Vietri sul Mare, Salerno, Italy, 1999.

 159.7kB 62.4kB 110.8kB

### Abstract

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton's work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

### BibTeX Entry

@inproceedings{Schraudolph99c,
author = {Nicol N. Schraudolph},
title = {\href{http://nic.schraudolph.org/pubs/Schraudolph99c.pdf}{
Online Learning with Adaptive Local Step Sizes}},
pages = {151--156},
editor = {Maria Marinaro and Roberto Tagliaferri},
booktitle = {Neural Nets\,---\,WIRN Vietri-99: Proc.\ 11$^{th}$
Italian Workshop on Neural Networks},
series = {Perspectives in Neural Computing},
address = {Vietri sul Mare, Salerno, Italy},
publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
year =  1999,
b2h_type = {Other},
b2h_topic = {>Stochastic Meta-Descent},
abstract = {
Almeida {\em et al.}\ have recently proposed {\em online}\/
algorithms for local step size adaptation in nonlinear systems
trained by gradient descent.  Here we develop an alternative to their
approach by extending Sutton's work on linear systems to the general,
nonlinear case.  The resulting algorithms are computationally little
more expensive than other acceleration techniques, do not assume
statistical independence between successive training patterns, and
do not require an arbitrary smoothing parameter.  In our benchmark
experiments, they consistently outperform other acceleration methods
as well as stochastic gradient descent with fixed learning rate
and momentum.
}}


Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33