Previous Up Next

16.3.2  Minimization on bounded convex polyhedra

The frank_wolfe command finds the minimum of a (twice) differentiable convex function f:ℝn→ℝ on a bounded convex polyhedron defined by the set A⊂ℝn of its vertices by using the method of Frank and Wolfe with backtracking. The problem can be written as:

 
min
xconvA
 f(x),     (2)

where convA denotes the convex hull of A which coincides with the given polyhedron.

Note that (2) can be solved efficiently for n=1 by using the find_minimum command (see Section 16.3.1). Therefore, frank_wolfe requires that n≥ 2.

Examples

The Lasso Problem.

Solve the following ℓ1-constrained problem in n dimensions:

min
1
2
||Axb||2,   ||x||≤ 1,

where A is a m× n matrix with real coefficients, x∈ℝn and b∈ℝm. It can be shown that vertices of the polyhedron defined by ||x||1≤ 1 are defined by the orthonormal basis vectors and their counterparts (2n vertices in total). Hence the set of vertices V can be represented by a matrix with 2n rows and n columns which is obtained by gluing the identity matrix In and its negative −In together:

V=


In
In


Since the objective function is convex, the problem can be solved by the Frank-Wolfe method. Let n=10 and A be a randomly generated square matrix with coefficients in [−1,1].

n:=10:; A:=ranm(n,n,uniformd(-1,1)):;

Now let y∈ℝn be a random point on the sphere with radius r=1.1. Then define b=Ay. Thus the global minimum of f is located near the polyhedron, but outside it.

y:=ranv(n,randnorm):; y:=1.1*y/l2norm(y):; b:=y*tran(A):;

Define V, x, f and call frank_wolfe. You get a vector which depends on the choices of A and y.

V:=tran(augment(idn(n),-idn(n))):; x:=symbol_array("x",n,purge):; f:=simplify(1/2*l2norm(x*tran(A)-b)^2):; fw:=frank_wolfe(f,V)
     

0.0,0.0,0.08905128279,0.0853981396774,0.0,0.0,0.469342995518,0.0,−0.355655381935,0.0
          

The optimal value of the objective function is:

subs(f,x,fw)
     
1.0435553106           

COBYLA, for instance, returns a weaker solution (see Section 16.3.3), which is expected since it does not benefit from gradient information:

cb:=fMin(f,[l1norm(x)<=1],x,[0.01$n],1e-4):; subs(f,x,cb)
     
1.16374646004           

The strength of the Frank-Wolfe method, besides automatic feasibility, is its ability to detect whether the objective function value is sufficiently close to the (unknown) otimal value.

Non-convex objectives.

Suppose that you want to minimize the function f(x,y)=xcos(y)−ysin(x2) inside the quadrilateral Q with vertices (1,1), (1,3), (3,2) and (3,1). The function f has two local minima in Q, one of which is global. To run frank_wolfe from the points (3/2,3/2) and (5/2,3/2), enter:

purge(x,y):; f,Q:=x*cos(y)-y*sin(x^2),[[1,1],[1,3],[3,2],[3,1]]:; ip1,ip2:=[1.5,1.5],[2.5,1.5]:; min1,min2:=frank_wolfe(f,Q,ip1),frank_wolfe(f,Q,ip2)
     

1.26907401345,2.86473876147
,
2.78442772106,2.10749634066
          

To plot the function f, quadrilateral Q, initial points and obtained minima, enter:

plotdensity(f,[x=0.8..3.2,y=0.0..3.5]); polygon(op(B),color=line_width_3+quadrant4,legend="boundary"); point(min1,color=yellow+point_width_2,legend="min1"); point(min2,color=yellow+point_width_2,legend="min2"); point(ip1,legend="ip1"); point(ip2,legend="ip2")

Points min1 and min2 are the global and a local minimum, respectively:

subs(f,[x,y],min1),subs(f,[x,y],min2)
     
−4.08322337291,−3.52045371621           

The algorithm gets stuck in the local minimum min2 when starting from ip2 because it is not allowed to go uphill. To find the global minimum one must apply multi-start, which you can select by using the rand option. Indeed:

frank_wolfe(f,Q,rand)
     

1.26907407408,2.86474054616
          

As another example, frank_wolfe is used to find the global minimum of the six-hump camel function:

f(x,y)=


4−2.1x2+
x4
3



x2+xy+(−4+4y2)y2 

in the rectangle [−3,3]×[−2,2] (see here). The function has six local minima, two of which are global: (0.0898,−0.7126) and (−0.0898,0.7126). Indeed:

f,R:=(4-2.1*x^2+x^4/3)*x^2+x*y+(-4+4*y^2)*y^2,[[-3,-2],[-3,2],[3,2],[3,-2]]:; frank_wolfe(f,R,rand=10000)
     
[0.0898388758644,−0.712673226867]           

By evaluating the above commandline several times in a row, you always get one of the two global minima.

Boundary discretization.

When minimizing on a convex domain, you can simplify the problem by discretizing the domain boundary. For example, minimize

f(x,y)=4ex+5cosyx2y+4x+5y−25e−(x−1)2−(y−2)2 

in the convex set D⊂ℝ2 bounded by the closed parametric curve C defined by

x(t)=


2+cos


t
π
6






sin(sint),   y(t)=2(2+cost)cos


t+
π
6



,   0≤ t≤ 2π. 

Let A={(x(tk),y(tk)):tk=2kπ/n,k=1,2,…,n} for some n≥ 2. Now the convex hull of A can be used as an approximation of D. To define f and the C (with u and v instead of x and y), enter:

f:=4*exp(x)+5*cos(y)-x^2*y+4x+5y-25*exp(-(x-1)^2-(y-2)^2):; u(t):=(2+cos(t-pi/6))*sin(sin(t)):; v(t):=2*(2+cos(t))*cos(t+pi/6):;

Now generate the hull vertices with, say, n=1000. Since linspace includes both 0 and 2π in s, the former is discarded by using tail.

s:=evalf(tail(linspace(0,2*pi,1000))):; A:=tran(apply(u,s),apply(v,s)):;

Use the rand option when calling frank_wolfe because the function f is not convex has several local minima in D.

p:=frank_wolfe(f,A,rand)
     

0.0565029231227,−1.78083520026
          

To plot the function f, curve C and point p, enter:

densityplot(f,[x=-2..3,y=-3..6]); paramplot([u(t),v(t)],t=0..2pi,display=line_width_3); point(p,display=point_width_2+red)
Input with functions.

In some problems, computing the objective function value and/or its derivative may be too complex for automatic handling. In such cases you should provide giac functions which compute these values. For example, suppose you wish to minimize the function

f(x,y)=−Ai(1+x)cos


y
π
8



,   x≥ −1, 

where Ai is the Airy function of the first kind, on the domain enclosed by the pentagon with vertices corresponding to complex solutions of the equation z5=1. To define the boundary, enter:

boundary:=polygon(op(cZeros(z^5=1,z)),display=quadrant4)

To define the objective function, enter:

f(x,y):=-Airy_Ai(1+x)*cos(y-pi/8):;

Computing the gradient of f automatically is difficult because the grad command does not know how to compute the derivative of Ai function, so you will have to provide the gradient function yourself. By the relation between Airy functions and modified Bessel functions, you get:

Ai′(z)=−
z
π
3
K2/3(ζ), 

where ζ=2/3z3/2 and

Kν(z)=
π
2νΓ(ν+1/2)
ez
z


0



2+
t
z



ν−1/2



 
tν−1/2etdt

The above integral is easily approximated with Gaussian quadrature for z>0 real. Therefore, the derivative of Ai can be computed for positive arguments by using the following giac function:

Ai_diff:=proc(x) local zeta,K,K0,t; zeta:=2/3*x^(3/2); purge(t); K0:=exp(-zeta)/(4^(1/3)*Gamma(7/6)*sqrt(zeta)); K:=K0*int((2t+t^2/zeta)^(1/6)*exp(-t),t=0..inf); return evalf(-x/sqrt(3*pi)*K); end

Now, with some elementary differentiation by hand or by the help of grad, we get the gradient of f:

∇ f(x,y)=








    −Ai′(1+x)cos


y
π
8



    Ai(1+x)sin


y
π
8











which is computed by the function

g:=proc(x,y) return evalf([-Ai_diff(x+1)*cos(y-pi/8),Airy_Ai(x+1)*sin(y-pi/8)]); end

To get the coordinates of pentagon vertices, enter (note the conversion to floating-point values):

A:=evalf(coordinates(vertices(boundary)))
     






0.309016994375−0.951056516295
−0.809016994375−0.587785252292
−0.8090169943750.587785252292
0.3090169943750.951056516295
1.00.0






          

Now find the minimum of f inside the pentagon:

frank_wolfe([f,g],evalf(coordinates(vertices(p))))
     

−0.809016994375,0.393184912674
          

Indeed:

plotdensity(f(x,y),[x=-1..1,y=-1..1]); display(boundary,line_width_2); point(p,display=point_width_2+red)

Note that using the rand option in this case would also require a function which computes the Hessian of f at a given point.


Previous Up Next