## Optimization of Entropy with Neural Networks

N. N. Schraudolph. Optimization of Entropy with Neural Networks. Ph.D. Thesis, University of California, San Diego, 1995.
Introduction only     Related papers:   Chapter 2   Chapter 3   Chapter 3   Chapter 4

 1.1MB 656.6kB 727.7kB

### Abstract

The goal of unsupervised learning algorithms is to discover concise yet informative representations of large data sets; the minimum description length principle and exploratory projection pursuit are two representative attempts to formalize this notion. When implemented with neural networks, both suggest the minimization of entropy at the network's output as an objective for unsupervised learning. The empirical computation of entropy or its derivative with respect to parameters of a neural network unfortunately requires explicit knowledge of the local data density; this information is typically not available when learning from data samples. This dissertation discusses and applies three methods for making density information accessible in a neural network: parametric modelling, probabilistic networks, and nonparametric estimation. By imposing their own structure on the data, parametric density models implement impoverished but tractable forms of entropy such as the log-variance. We have used this method to improve the adaptive dynamics of an anti-Hebbian learning rule which has proven successful in extracting disparity from random stereograms. In probabilistic networks, node activities are interpreted as the defining parameters of a stochastic process. The entropy of the process can then be calculated from its parameters, and hence optimized. The popular logistic activation function defines a binomial process in this manner; by optimizing the information gain of this process we derive a novel nonlinear Hebbian learning algorithm. The nonparametric technique of Parzen window or kernel density estimation leads us to an entropy optimization algorithm in which the network adapts in response to the distance between pairs of data samples. We discuss distinct implementations for data-limited or memory-limited operation, and describe a maximum likelihood approach to setting the kernel shape, the regularizer for this technique. This method has been applied with great success to the problem of pose alignment in computer vision. These experiments demonstrate a range of techniques that allow neural networks to learn concise representations of empirical data by minimizing its entropy. We have found that simple gradient descent in various entropy-based objective functions can lead to novel and useful algorithms for unsupervised neural network learning.

### BibTeX Entry

@phdthesis{Schraudolph95,
author = {Nicol N. Schraudolph},
title = {\href{http://nic.schraudolph.org/pubs/Schraudolph95.pdf}{\bf
Optimization of Entropy with Neural Networks}},
school = {University of California, San Diego},
year =  1995,
b2h_type = {Other},
b2h_topic = {>Entropy Optimization},
b2h_note = {<a href="b2hd-intro">Introduction only</a> &nbsp;&nbsp;&nbsp; Related papers: &nbsp; <a href="b2hd-SchSej92">Chapter 2</a> &nbsp; <a href="b2hd-SchSej93">Chapter 3</a> &nbsp; <a href="b2hd-SchSej95">Chapter 3</a> &nbsp; <a href="b2hd-VioSchSej96">Chapter 4</a>},
abstract = {
The goal of unsupervised learning algorithms is to discover concise yet
informative representations of large data sets; the minimum description
length principle and exploratory projection pursuit are two representative
attempts to formalize this notion.  When implemented with neural networks,
both suggest the minimization of entropy at the network's output as an
objective for unsupervised learning.
The empirical computation of entropy or its derivative with respect to
parameters of a neural network unfortunately requires explicit knowledge
of the local data density; this information is typically not available
when learning from data samples.  This dissertation discusses and applies
three methods for making density information accessible in a neural
network: parametric modelling, probabilistic networks, and nonparametric
estimation.
By imposing their own structure on the data, parametric density models
implement impoverished but tractable forms of entropy such as the
log-variance.  We have used this method to improve the adaptive dynamics
of an anti-Hebbian learning rule which has proven successful in extracting
disparity from random stereograms.
In probabilistic networks, node activities are interpreted as the defining
parameters of a stochastic process.  The entropy of the process can then
be calculated from its parameters, and hence optimized.  The popular
logistic activation function defines a binomial process in this manner;
by optimizing the information gain of this process we derive a novel
nonlinear Hebbian learning algorithm.
The nonparametric technique of Parzen window or kernel density estimation
leads us to an entropy optimization algorithm in which the network adapts
in response to the distance between pairs of data samples.  We discuss
distinct implementations for data-limited or memory-limited operation,
and describe a maximum likelihood approach to setting the kernel shape,
the regularizer for this technique.  This method has been applied with
great success to the problem of pose alignment in computer vision.
These experiments demonstrate a range of techniques that allow neural
networks to learn concise representations of empirical data by
minimizing its entropy.  We have found that simple gradient descent
in various entropy-based objective functions can lead to novel and
useful algorithms for unsupervised neural network learning.
}}


Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33