Accelerated Gradient Descent by Factor-Centering Decomposition
N. N. Schraudolph. Accelerated Gradient Descent by Factor-Centering Decomposition. Technical Report IDSIA-33-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 1998.
Download
| 187.7kB | 75.8kB | 101.9kB |
Abstract
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections---a well-known architectural feature---should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network's generalization ability.
BibTeX Entry
@techreport{facede,
author = {Nicol N. Schraudolph},
title = {\href{http://nic.schraudolph.org/pubs/facede.pdf}{
Accelerated Gradient Descent by
Factor-Centering Decomposition}},
number = {IDSIA-33-98},
institution = {Istituto Dalle Molle di Studi sull'Intelligenza Artificiale},
address = {Galleria 2, CH-6928 Manno, Switzerland},
year = 1998,
b2h_type = {Other},
b2h_topic = {>Preconditioning},
abstract = {
{\em Gradient factor centering}\/ is a new methodology for decomposing
neural networks into {\em biased}\/ and {\em centered}\/ subnets which
are then trained in parallel. The decomposition can be applied to any
pattern-dependent factor in the network's gradient, and is designed such
that the subnets are more amenable to optimization by gradient descent
than the original network: biased subnets because of their simplified
architecture, centered subnets due to a modified gradient that improves
conditioning.
The architectural and algorithmic modifications mandated by this
approach include both familiar and novel elements, often in prescribed
combinations. The framework suggests for instance that {\em shortcut
connections}\/---\,a well-known architectural feature\,---\,should
work best in conjunction with \href{b2hd-slope}{\em slope centering},
a new technique described herein. Our benchmark experiments bear
out this prediction, and show that factor-centering decomposition
can speed up learning significantly without adversely affecting the
trained network's generalization ability.
}}