10:00 - 10:15 | Opening Remarks |
10:15 - 11:00 | Amir Beck (Technion) |
11:00 - 11:15 | Coffee Break |
11:15 - 12:00 | Dima Drusvyatskiy (University of Washington) |
12:00 - 12:45 | Dan Garber (TTI Chicago) |
12:45 - 14:15 | Lunch |
14:15 - 15:00 | Michael Zibulevsky (Technion) | 15:00 - 15:30 | Coffee Break |
15:30 - 16:30 | Ronny Ben Tal (Technion) |
16:30 - 17:15 | Boris Polyak (Institute for Control Sciences, Russian Academy of Sciences) |
In recent years, due to the “big data” revolution, large-scale and even huge-scale applications have started to emerge, and consequently the need for new, fast and reliable decomposition methods has become evident. We consider several classes of block descent methods that involve the solution of many simple optimization subproblems of small dimension in place of the original problem. Rates of convergence of different variants will be explored.
First-order methods with optimal convergence rates for smooth convex minimization have had a profound impact on computation. Two important downsides of such methods, often remarked in the literature, is a lack of geometric intuition and non-monotone behavior. I will begin by describing a new intuitive viewpoint on acceleration based on optimally averaging quadratic lower models of the function. Averaging only two quadratics at a time gives a new interpretation of the geometric descent method of Bubeck-Lee-Singh , while a longer sequence of quadratics yields limited memory extensions. Without convexity, best-known complexity bounds worsen to those achieved by simple gradient descent. In the second part of the talk, I will present a "universal" accelerated method for non-convex optimization, which automatically accelerates in presence of convexity. The scheme is in the same spirit as the universal catalyst of Lin-Mairal-Harchaoui for convex minimization. A large class of functions, not amenable to such techniques, consists of compositions of non-smooth convex functions and smooth maps. I will explain how, nonetheless, a combination of smoothing and acceleration yields a method with complexity O(1/e^3) in ``stationarity’’. This rate is appealing since in the smooth setting, the analogous complexity achieved by gradient descent, improves to O(1/e^2). Joint work with M. Fazel (U. Washington), Z. Harchaoui (U. Washington), A.S. Lewis (Cornell), H. Lin (Inria), J. Mairal (Inria), C. Paquette (U. Washington), S.Roy (U. Washington).
The conditional gradient method (aka Frank-Wolfe algorithm) has regained much interest in recent years. Part of this interest is in understanding the convergence rates achievable under various assumptions. In this talk we revisit a classical result by Polyak & Levitin who showed that when the feasible set is strongly convex and the gradient vector is lower bounded in magnitude by a constant everywhere in the set, the CG method converges with a linear rate. However, the assumption on the magnitude of the gradient can be very restrictive. In this talk we show that when both the objective and the feasible set are strongly convex (in fact a weaker assumption on the objective is required), and without any additional assumption on the magnitude of the gradient, the CG method converges with a rate of roughly 1/t^2, which gives a quadratic improvement over the standard convergence rate. This is joint work with Elad Hazan that was inspired by a fruitful discussion with Roman.
In this talk, we will study optimization problems with ambiguous stochastic constraints where only partial information consisting of means and dispersion measures of the underlying random parameter is available. Whereas the past literature used the variance as the dispersion measure, here we use the mean absolute deviation from the mean (MAD). The approach is based on the availability of tight upper and lower bounds on the expectation of a convex function of a random variable, first discovered in 1972. We then use these bounds to derive exact robust counterparts of expected feasibility of convex constraints and to construct new safe tractable approximations of chance constraints. We test the applicability of the theoretical results numerically on various practical problems in Operations Research and Engineering
This is an attempt to construct Newton-like methods for solving m nonlinear equations with n variables for m less than n. First we study solvability problem. Second novel methods are proposed which combine fast convergence and large attraction domain. The methods are specified for particular cases (scalar equations and inequalities, quadratic equations, systems with special structure).