## Towards Stochastic Conjugate Gradient Methods

N. N. Schraudolph and T. Graepel. **
Towards Stochastic Conjugate Gradient Methods**. In *Proc. 9 ^{th} Intl.
Conf. Neural Information Processing (ICONIP)*, pp. 853–856,
IEEE, 2002.

Related paper

### Download

61.9kB | 57.3kB | 33.8kB |

### Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. In our benchmark experiments the resulting highly scalable algorithms converge about an order of magnitude faster than ordinary stochastic gradient descent.

### BibTeX Entry

@inproceedings{SchGra02b, author = {Nicol N. Schraudolph and Thore Graepel}, title = {\href{http://nic.schraudolph.org/pubs/SchGra02b.pdf}{ Towards Stochastic Conjugate Gradient Methods}}, pages = {853--856}, editor = {Lipo Wang and Jagath C. Rajapakse and Kunihiko Fukushima and Soo-Young Lee and Xin Yao}, booktitle = {Proc.\ 9$^{th}$ Intl.\ Conf.\ Neural Information Processing (ICONIP)}, publisher = {IEEE}, year = 2002, b2h_note = {<a href="b2hd-SchGra03.html">Related paper</a>}, b2h_type = {Other}, b2h_topic = {Gradient Descent}, abstract = { The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. In our benchmark experiments the resulting highly scalable algorithms converge about an order of magnitude faster than ordinary stochastic gradient descent. }}