Accelerated Gradient Descent by Factor-Centering Decomposition

N. N. Schraudolph. Accelerated Gradient Descent by Factor-Centering Decomposition. Technical Report IDSIA-33-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 1998.

Download


187.7kB	75.8kB	101.9kB

Abstract

Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections---a well-known architectural feature---should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network's generalization ability.

BibTeX Entry

@techreport{facede,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/facede.pdf}{
               Accelerated Gradient Descent by
               Factor-Centering Decomposition}},
     number = {IDSIA-33-98},
institution = {Istituto Dalle Molle di Studi sull'Intelligenza Artificiale},
    address = {Galleria 2, CH-6928 Manno, Switzerland},
       year =  1998,
   b2h_type = {Other},
  b2h_topic = {>Preconditioning},
   abstract = {
    {\em Gradient factor centering}\/ is a new methodology for decomposing
    neural networks into {\em biased}\/ and {\em centered}\/ subnets which
    are then trained in parallel.  The decomposition can be applied to any
    pattern-dependent factor in the network's gradient, and is designed such
    that the subnets are more amenable to optimization by gradient descent
    than the original network: biased subnets because of their simplified
    architecture, centered subnets due to a modified gradient that improves
    conditioning.
    The architectural and algorithmic modifications mandated by this
    approach include both familiar and novel elements, often in prescribed
    combinations.  The framework suggests for instance that {\em shortcut
    connections}\/---\,a well-known architectural feature\,---\,should
    work best in conjunction with \href{b2hd-slope}{\em slope centering},
    a new technique described herein.  Our benchmark experiments bear
    out this prediction, and show that factor-centering decomposition
    can speed up learning significantly without adversely affecting the
    trained network's generalization ability.
}}