A Quasi-Newton Approach to Nonsmooth Convex Optimization
J. Yu, S. Vishwanathan, S.
Günter, and N. N. Schraudolph.
A Quasi-Newton Approach to Nonsmooth Convex Optimization. In Proc. 25th
Intl. Conf. Machine Learning (ICML), pp. 1216–1223, Omnipress, Helsinki,
Finland, 2008.
Long version
Download
627.8kB | 120.3kB | 488.2kB |
Abstract
We extend the well-known BFGS quasi-Newton method and its limited-memory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We apply the resulting subLBFGS algorithm to $L_2$-regularized risk minimization with binary hinge loss, and its direction-finding component to $L_1$-regularized risk minimization with logistic loss. In both settings our generic algorithms perform comparable to or better than their counterparts in specialized state-of-the-art solvers.
BibTeX Entry
@inproceedings{YuVisGueSch08, author = {Jin Yu and S.~V.~N. Vishwanathan and Simon G\"unter and Nicol N. Schraudolph}, title = {\href{http://nic.schraudolph.org/pubs/YuVisGueSch08.pdf}{ A Quasi-{N}ewton Approach to Nonsmooth Convex Optimization}}, pages = {1216--1223}, editor = {Andrew McCallum and Sam Roweis}, booktitle = {Proc.\ 25$^{th}$ Intl.\ Conf.\ Machine Learning (ICML)}, address = {Helsinki, Finland}, publisher = {Omnipress}, year = 2008, b2h_type = {Top Conferences}, b2h_topic = {>Quasi-Newton Methods}, b2h_note = {<a href="b2hd-YuVisGueSch10.html">Long version</a>}, abstract = { We extend the well-known BFGS quasi-Newton method and its limited-memory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We apply the resulting subLBFGS algorithm to $L_{2}$-regularized risk minimization with binary hinge loss, and its direction-finding component to $L_{1}$-regularized risk minimization with logistic loss. In both settings our generic algorithms perform comparable to or better than their counterparts in specialized state-of-the-art solvers. }}