Towards Stochastic Conjugate Gradient Methods

N. N. Schraudolph and T. Graepel. Towards Stochastic Conjugate Gradient Methods. In Proc. 9th Intl. Conf. Neural Information Processing (ICONIP), pp. 853–856, IEEE, 2002.
Related paper

Download

pdf djvu ps.gz
61.9kB   57.3kB   33.8kB  

Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. In our benchmark experiments the resulting highly scalable algorithms converge about an order of magnitude faster than ordinary stochastic gradient descent.

BibTeX Entry

@inproceedings{SchGra02b,
     author = {Nicol N. Schraudolph and Thore Graepel},
      title = {\href{http://nic.schraudolph.org/pubs/SchGra02b.pdf}{
               Towards Stochastic Conjugate Gradient Methods}},
      pages = {853--856},
     editor = {Lipo Wang and Jagath C. Rajapakse and Kunihiko Fukushima
               and Soo-Young Lee and Xin Yao},
  booktitle = {Proc.\ 9$^{th}$ Intl.\ Conf.\ Neural
               Information Processing (ICONIP)},
  publisher = {IEEE},
       year =  2002,
   b2h_note = {<a href="b2hd-SchGra03.html">Related paper</a>},
   b2h_type = {Other},
  b2h_topic = {Gradient Descent},
   abstract = {
    The method of conjugate gradients provides a very effective way to
    optimize large, deterministic systems by gradient descent.  In its
    standard form, however, it is not amenable to stochastic approximation
    of the gradient.  Here we explore a number of ways to adopt ideas from
    conjugate gradient in the stochastic setting, using fast Hessian-vector
    products to obtain curvature information cheaply.  In our benchmark
    experiments the resulting highly scalable algorithms converge about
    an order of magnitude faster than ordinary stochastic gradient descent.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33