Gradient-Based Manipulation of Nonparametric Entropy Estimates

N. N. Schraudolph. Gradient-Based Manipulation of Nonparametric Entropy Estimates. IEEE Transactions on Neural Networks, 15(4):828–837, 2004.

Download


1.1MB	223.2kB	2.8MB

Abstract

We derive a family of differential learning rules that optimize the Shannon entropy at the output of an adaptive system via kernel density estimation. In contrast to parametric formulations of entropy, this nonparametric approach assumes no particular functional form of the output density. We address problems associated with quantized data and finite sample size, and implement efficient maximum likelihood techniques for optimizing the regularizer. We also develop a normalized entropy estimate that is invariant with respect to affine transformations, facilitating optimization of the shape, rather than the scale, of the output density. Kernel density estimates are smooth and differentiable; this makes the derived entropy estimates amenable to manipulation by gradient descent. The resulting weight updates are surprisingly simple and efficient learning rules that operate on pairs of input samples. They can be tuned for data-limited or memory-limited situations, or modified to give a fully online implementation.

BibTeX Entry

@article{Schraudolph04,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph04.pdf}{
               Gradient-Based Manipulation of
               Nonparametric Entropy Estimates}},
      pages = {828--837},
    journal = {{IEEE} Transactions on Neural Networks},
     volume =  15,
     number =  4,
       year =  2004,
   b2h_type = {Journal Papers},
  b2h_topic = {>Entropy Optimization},
   abstract = {
    We derive a family of differential learning rules that optimize the
    Shannon entropy at the output of an adaptive system via kernel density
    estimation. In contrast to parametric formulations of entropy,
    this nonparametric approach assumes no particular functional
    form of the output density. We address problems associated with
    quantized data and finite sample size, and implement efficient
    maximum likelihood techniques for optimizing the regularizer. We
    also develop a normalized entropy estimate that is invariant with
    respect to affine transformations, facilitating optimization
    of the shape, rather than the scale, of the output density.
    Kernel density estimates are smooth and differentiable; this makes
    the derived entropy estimates amenable to manipulation by gradient
    descent. The resulting weight updates are surprisingly simple and
    efficient learning rules that operate on pairs of input samples.
    They can be tuned for data-limited or memory-limited situations,
    or modified to give a fully online implementation.
}}