Towards Stochastic Conjugate Gradient Methods

N. N. Schraudolph and T. Graepel. Towards Stochastic Conjugate Gradient Methods. In Proc. 9^th Intl. Conf. Neural Information Processing (ICONIP), pp. 853–856, IEEE, 2002.
Related paper

Download


61.9kB	57.3kB	33.8kB

Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. In our benchmark experiments the resulting highly scalable algorithms converge about an order of magnitude faster than ordinary stochastic gradient descent.

BibTeX Entry

@inproceedings{SchGra02b,
     author = {Nicol N. Schraudolph and Thore Graepel},
      title = {\href{http://nic.schraudolph.org/pubs/SchGra02b.pdf}{
               Towards Stochastic Conjugate Gradient Methods}},
      pages = {853--856},
     editor = {Lipo Wang and Jagath C. Rajapakse and Kunihiko Fukushima
               and Soo-Young Lee and Xin Yao},
  booktitle = {Proc.\ 9$^{th}$ Intl.\ Conf.\ Neural
               Information Processing (ICONIP)},
  publisher = {IEEE},
       year =  2002,
   b2h_note = {<a href="b2hd-SchGra03.html">Related paper</a>},
   b2h_type = {Other},
  b2h_topic = {Gradient Descent},
   abstract = {
    The method of conjugate gradients provides a very effective way to
    optimize large, deterministic systems by gradient descent.  In its
    standard form, however, it is not amenable to stochastic approximation
    of the gradient.  Here we explore a number of ways to adopt ideas from
    conjugate gradient in the stochastic setting, using fast Hessian-vector
    products to obtain curvature information cheaply.  In our benchmark
    experiments the resulting highly scalable algorithms converge about
    an order of magnitude faster than ordinary stochastic gradient descent.
}}