## Accelerated Gradient Descent by Factor-Centering Decomposition

N. N. Schraudolph. ** Accelerated
Gradient Descent by Factor-Centering Decomposition**. Technical Report
IDSIA-33-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 1998.

### Download

187.7kB | 75.8kB | 101.9kB |

### Abstract

*Gradient factor centering* is a new methodology for decomposing neural
networks into *biased* and *centered* subnets which are then trained
in parallel. The decomposition can be applied to any pattern-dependent factor
in the network's gradient, and is designed such that the subnets are more amenable
to optimization by gradient descent than the original network: biased subnets because
of their simplified architecture, centered subnets due to a modified gradient that
improves conditioning. The architectural and algorithmic modifications mandated
by this approach include both familiar and novel elements, often in prescribed
combinations. The framework suggests for instance that *shortcut connections*---a
well-known architectural feature---should work best in conjunction with *slope
centering*, a new technique described herein. Our benchmark experiments
bear out this prediction, and show that factor-centering decomposition can speed
up learning significantly without adversely affecting the trained network's generalization
ability.

### BibTeX Entry

@techreport{facede, author = {Nicol N. Schraudolph}, title = {\href{http://nic.schraudolph.org/pubs/facede.pdf}{ Accelerated Gradient Descent by Factor-Centering Decomposition}}, number = {IDSIA-33-98}, institution = {Istituto Dalle Molle di Studi sull'Intelligenza Artificiale}, address = {Galleria 2, CH-6928 Manno, Switzerland}, year = 1998, b2h_type = {Other}, b2h_topic = {>Preconditioning}, abstract = { {\em Gradient factor centering}\/ is a new methodology for decomposing neural networks into {\em biased}\/ and {\em centered}\/ subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that {\em shortcut connections}\/---\,a well-known architectural feature\,---\,should work best in conjunction with \href{b2hd-slope}{\em slope centering}, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network's generalization ability. }}