Centering Neural Network Gradient Factors

N. N. Schraudolph. Centering Neural Network Gradient Factors. In G. B. Orr and K. Müller, editors, Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, pp. 207–226, Springer Verlag, Berlin, 1998.

Download

pdf djvu ps.gz
237.6kB   156.6kB   109.3kB  

Abstract

It has long been known that neural networks can learn faster when their input and hidden unit activity is centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph & Sejnowski, 1996). Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network's generalization ability.

BibTeX Entry

@incollection{Schraudolph98,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph98.pdf}{
               Centering Neural Network Gradient Factors}},
      pages = {207--226},
     editor = {Genevieve B. Orr and Klaus-Robert M\"uller},
  booktitle = {Neural Networks: Tricks of the Trade},
     series = {\href{http://www.springer.de/comp/lncs/}{
               Lecture Notes in Computer Science}},
     volume =  1524,
  publisher = {\href{http://www.springer.de/}{Springer Verlag}},
    address = {Berlin},
       year =  1998,
   b2h_type = {Book Chapters},
  b2h_topic = {>Preconditioning},
   abstract = {
    It has long been known that neural networks can learn faster when their
    input and hidden unit activity is centered about zero; recently we have
    extended this approach to also encompass the centering of error signals
    \href{b2hd-nips95}{(Schraudolph \& Sejnowski, 1996)}.  Here we generalize
    this notion to {\em all}\/ factors involved in the network's gradient,
    leading us to propose centering the slope of hidden unit activation
    functions as well.  Slope centering removes the linear component of
    backpropagated error; this improves credit assignment in networks with
    shortcut connections.  Benchmark results show that this can speed up
    learning significantly without adversely affecting the trained network's
    generalization ability.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33