Fast Curvature Matrix-Vector Products
N. N. Schraudolph. Fast
Curvature Matrix-Vector Products. In Proc. Intl. Conf. Artificial Neural Networks
(ICANN), pp. 19–26, Springer Verlag,
Berlin, Vienna, Austria, 2001.
Latest version
Download
| 225.4kB | 78.3kB | 204.3kB |
Abstract
The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product (Pearlmutter, 1994). The stability of SMD, a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.
BibTeX Entry
@inproceedings{Schraudolph01,
author = {Nicol N. Schraudolph},
title = {\href{http://nic.schraudolph.org/pubs/Schraudolph01.pdf}{
Fast Curvature Matrix-Vector Products}},
pages = {19--26},
editor = {Georg Dorffner and Horst Bischof and Kurt Hornik},
booktitle = icann,
address = {Vienna, Austria},
volume = 2130,
series = {\href{http://www.springer.de/comp/lncs/}{
Lecture Notes in Computer Science}},
publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
year = 2001,
b2h_type = {Top Conferences},
b2h_topic = {>Stochastic Meta-Descent},
b2h_note = {<a href="b2hd-Schraudolph02.html">Latest version</a>},
abstract = {
The Gauss-Newton approximation of the Hessian guarantees positive
semi-definiteness while retaining more second-order information than
the Fisher information. We extend it from nonlinear least squares to
all differentiable objectives such that positive semi-definiteness
is maintained for the standard loss functions in neural network
regression and classification. We give efficient algorithms for
computing the product of extended Gauss-Newton and Fisher information
matrices with arbitrary vectors, using techniques similar to but even
cheaper than the fast Hessian-vector product (Pearlmutter, 1994).
The stability of SMD, a learning rate adaptation method that uses
curvature matrix-vector products, improves when the extended
Gauss-Newton matrix is substituted for the Hessian.
}}