Lagrange multiplier in L-BFGS-B

Wishing to do a harder problem, I tried to attack it from a simpler point of view with the following toy minimization equation

enter image description here

which does not have a trivial solution.

In order to do that, I need to use an augmented lagrangian / dual function 1 with its gradient 2, and the equilibrium point 3. The augmented lagrangian version of the previous problem:

enter image description here

The point of a Lagrange multiplier is to optimize over mu in [0,inf] in order to take into account the weird constraints of our problem.

Running the following code

a = 1
nbtests = 5 ; minmu = 0 ; maxmu = 5
def dual(mu) :
    x =  spo.fsolve(lambda x : 2*x  -  mu * (np.exp(x)+1) , 1)
    return   (- (x**2 - mu *(np.exp(x) + x -a )) ,- (np.exp(x) + x-a))

pl = np.empty((nbtests,2))
for i,nu in enumerate(np.linspace(minmu,maxmu,pl.shape[0]))  :
    res = spo.fmin_l_bfgs_b(dual,nu,bounds=[(0,None)],factr=1e6)
    print(nu,res[0],res[2]['task'])
    pl[i] = [nu,spo.fsolve(lambda x :  2*x  -  res[0] * (np.exp(x)+1), 1)]

plt.plot(pl[:,0],pl[:,1],label="L-BFGS-B")

plt.axhline(y=spo.fsolve(lambda x : np.exp(x)+x-a, 1)[0], label="Target",color="red")
plt.legend()
plt.show()  

plt.plot(pl[:,0],pl[:,1],label="L-BFGS-B")

plt.axhline(y=spo.fsolve(lambda x : np.exp(x)+x-a, 1)[0], label="Target",color="red")
plt.legend()
plt.show()   

Here, we are trying to L-BFGS-B optimizer in Python (which is the fastest one, since we have access to the gradient) from the dual problem, then revert to the original solution with fsolve. Note that the – signs inside the function and gradient are because the minimisation of the primal problem is equal to the maximistation of the dual problem, so they are for the actual maximisation of the equations. I plot here the result of the optimizer with respect to the initial guess of mu, and shows it’s very very dependent on the initialisation, not robust at all. When changing the parameter a to other values, the convergence of the method is even worse.

The plot of the optimizer result with respesct to mu (for a bigger nbtests)

enter image description here

and the output of the program

0.0 [0.] b'ABNORMAL_TERMINATION_IN_LNSRCH'
0.5555555555555556 [0.55555556] b'ABNORMAL_TERMINATION_IN_LNSRCH'
1.1111111111111112 [3.52870269] b'ABNORMAL_TERMINATION_IN_LNSRCH'
1.6666666666666667 [3.52474085] b'ABNORMAL_TERMINATION_IN_LNSRCH'
2.2222222222222223 [3.5243099] b'ABNORMAL_TERMINATION_IN_LNSRCH'
2.7777777777777777 [3.49601967] b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
3.3333333333333335 [3.52020875] b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
3.8888888888888893 [3.88888889] b'ABNORMAL_TERMINATION_IN_LNSRCH'
4.444444444444445 [4.44444354] b'ABNORMAL_TERMINATION_IN_LNSRCH'
5.0 [5.] b'ABNORMAL_TERMINATION_IN_LNSRCH'

The first column is the initial guess, and the second one is the estimated guess after the optimzer, and we see that it does not optimize at all for most of the values. For a <= 0 , the domain of the function is x < 0, which gives a trivial minimisation x^2 = 0, for mu = 0. So all the solutions should give 0 at the return of the optimizer.

The error b’ABNORMAL_TERMINATION_IN_LNSRCH comes from a bad gradient, but here, is it the real gradient of the function…

What did I miss?

Answer

There are multiple wrong signs in your code, which, by the way, violates the DRY principle. It should be x**2 + mu * (np.exp(x) + x - a) instead of x**2 - mu * (np.exp(x) + x - a) and similar for the derivative. IMHO, something like

from scipy.optimize import fsolve, fmin_l_bfgs_b

a = 1
nbtests = 5
minmu = 0
maxmu = 5

def lagrange(x, mu):
    return x**2 + mu * (np.exp(x) + x - a)

def lagrange_grad(x, mu):
    grad_x = 2*x + mu * (np.exp(x) + 1)
    grad_mu = np.exp(x) + x - a
    return grad_x, grad_mu

def dual(mu):
    x = fsolve(lambda x: lagrange_grad(x, mu)[0], x0=1)
    obj_val = lagrange(x, mu)
    grad = lagrange_grad(x, mu)[1]
    return -1.0*obj_val, -1.0*grad

pl = np.empty((nbtests, 2))
for i, nu in enumerate(np.linspace(minmu,maxmu,nbtests)):
    res = fmin_l_bfgs_b(dual, x0=nu, bounds=[(0,None)], factr=1e6)
    mu_opt = res[0]
    x_opt = fsolve(lambda x: lagrange_grad(x, mu_opt)[0], x0=1)
    pl[i] = [nu, *x_opt]

is much cleaner. This gives me

array([[0.  , 0.  ],
       [1.25, 0.  ],
       [2.5 , 0.  ],
       [3.75, 0.  ],
       [5.  , 0.  ]])

as desired.