# Description of the method#

This page provides detailed mathematical descriptions of the algorithms underneath COBYQA. It is a derivative-free trust-region SQP method designed to tackle nonlinearly constrained optimization problems that include equality and inequality constraints. A particular feature of COBYQA is that it visits only points that respect the bound constraints, if any. This is useful because the objective functions of applications that admit bound constraints are often undefined when the bounds are violated.

For a complete description of COBYQA, we refer to Chapters 5 to 7 of the following Ph.D. thesis:

## Statement of the problem#

The problems we consider are of the form

$\begin{split}\min_{x \in \R^n} f(x) \quad \text{s.t.} \quad \left\{ \begin{array}{l} g(x) \le 0,\\ h(x) = 0,\\ l \le x \le u, \end{array} \right.\end{split}$

where $$f$$, $$g$$, and $$h$$ represent the objective and constraint functions. The lower bounds $$l \in (\R \cup \set{-\infty})^n$$ and the upper bounds $$u \in (\R \cup \set{\infty})^n$$ satisfy $$l \le u$$. Note that the bound constraints are not included in the inequality constraints, because they will be handled separately.

## The derivative-free trust-region SQP method#

We present in this section the general framework of COBYQA, named after Constrained Optimization BY Quadratic Approximations. It does not use derivatives of the objective function or the nonlinear constraint functions, but models them using underdetermined interpolation based on the derivative-free symmetric Broyden update.

Recall that the basic trust-region SQP method models the objective and constraint functions based on their gradients and Hessian matrices. Since we do not have access to such information, we use interpolation-based quadratic models of these functions. Specifically, we use quadratic models obtained by underdetermined interpolation based on the derivative-free symmetric Broyden update.

At the $$k$$-th iteration, the objective function $$f$$ is approximated by the quadratic model $$\hat{f}_k$$ that solves

$\min_{Q \in \mathcal{P}_{2, n}} \norm{\nabla^2 Q - \nabla^2 \hat{f}_{k - 1}}_{\mathsf{F}} \quad \text{s.t.} \quad Q(y) = f(y), ~ y \in \mathcal{Y}_k,$

where $$\mathcal{P}_{2, n}$$ is the set of quadratic polynomials on $$\R^n$$, $$\norm{\cdot}_{\mathsf{F}}$$ denotes the Frobenius norm, and $$\mathcal{Y}_k \subseteq \R^n$$ is a finite interpolation set updated along the iterations. The interpolation must satisfy

$n + 2 \le \card (\mathcal{Y}_k) \le \frac{1}{2} (n + 1) (n + 2),$

so that the quadratic model $$\hat{f}_k$$ is well-defined (note that COBYQA ensures that $$\mathcal{Y}_k$$ is always sufficiently well-poised). The quadratic models $$\hat{g}_k$$ and $$\hat{h}_k$$ of $$g$$ and $$h$$ are defined similarly.

### The derivative-free trust-region SQP framework#

We now present the derivative-free trust-region SQP framework employed by COBYQA. We denote for convenience by $$\hat{\mathcal{L}}_k$$ the Lagrangian function

$\hat{\mathcal{L}}_k(x, \lambda, \mu) = \hat{f}_k(x) + \lambda^{\T} \hat{g}_k(x) + \mu^{\T} \hat{h}_k(x).$

Also, given a penalty parameter $$\gamma_k \ge 0$$, we denote by $$\varphi_k$$ the $$\ell_2$$-merit function

$\varphi_k(x) = f(x) + \gamma_k \sqrt{\norm{[g(x)]^+}_2^2 + \norm{h(x)}_2^2},$

where $$[\cdot]^+$$ is the componentwise positive-part operator. Finally, we denote by $$\hat{\varphi}_k$$ the $$\ell_2$$-merit function

$\hat{\varphi}_k(d) = \hat{f}_k(x_k) + \nabla \hat{f}_k(x_k)^{\T} d + \frac{1}{2} d^{\T} \nabla^2 \hat{\mathcal{L}}_k(x_k, \lambda_k, \mu_k) d + \gamma_k \sqrt{\norm{[\hat{g}_k(x_k) + \nabla \hat{g}_k(x_k) d]^+}_2^2 + \norm{\hat{h}_k(x_k) + \nabla \hat{h}_k(x_k) d}_2^2},$

where $$x_k \in \mathcal{Y}_k$$ is the best interpolation point according to the merit function $$\varphi_{k - 1}$$, and $$\lambda_k$$ and $$\mu_k$$ are approximations of the Lagrange multipliers associated with the inequality and equality constraints, respectively.

We provide below a simplified framework of the COBYQA algorithm.

Simplified framework of COBYQA

Data: Initial trust-region radius $$\Delta^0 > 0$$.

• Set the penalty parameter $$\gamma_{-1} \gets 0$$

• Build the initial interpolation set $$\mathcal{Y}_0 \subseteq \R^n$$

• Define $$x_0$$ to a solution to $$\min_{y \in \mathcal{Y}_0} \varphi_0(y)$$

• Estimate the Lagrange multipliers $$\lambda_0$$ and $$\mu_0$$

• For $$k = 0, 1, \dots$$ until convergence, do

• Compute the models $$\hat{f}_k$$, $$\hat{g}_k$$, and $$\hat{h}_k$$

• Set the trial step $$d_k$$ to an approximate solution to

$\begin{split}\min_{d \in \R^n} \hat{f}_k(x_k) + \nabla \hat{f}_k(x_k)^{\T} d + \frac{1}{2} d^{\T} \nabla^2 \hat{\mathcal{L}}_k(x_k, \lambda_k, \mu_k) d \quad \text{s.t.} \quad \left\{ \begin{array}{l} \hat{g}_k(x_k) + \nabla \hat{g}_k(x_k) d \le 0,\\ \hat{h}_k(x_k) + \nabla \hat{h}_k(x_k) d = 0,\\ l \le x_k + d \le u,\\ \norm{d} \le \Delta^k \end{array} \right.\end{split}$
• Pick a penalty parameter $$\gamma_k \ge \max \set[\big]{\gamma_{k - 1}, \sqrt{\norm{\lambda_k}^2 + \norm{\mu_k}^2}}$$ providing $$\hat{\varphi}_k(d_k) < \hat{\varphi}_k(0)$$

• Evaluate the trust-region ratio

$\rho_k \gets \frac{\varphi_k(x_k) - \varphi_k(x_k + d_k)}{\hat{\varphi}_k(0) - \hat{\varphi}_k(d_k)}$
• If $$\rho_k > 0$$ then

• Choose a point $$\bar{y} \in \mathcal{Y}_k$$ to remove from $$\mathcal{Y}_k$$

• Else

• Choose a point $$\bar{y} \in \mathcal{Y}_k \setminus \set{x_k}$$ to remove from $$\mathcal{Y}_k$$

• Update the interpolation set $$\mathcal{Y}_{k + 1} \gets (\mathcal{Y}_k \setminus \set{\bar{y}}) \cup \set{x_k + d_k}$$

• Update the current iterate $$x_{k + 1}$$ to a solution to $$\min_{y \in \mathcal{Y}_{k + 1}} \varphi_k(y)$$

• Estimate the Lagrange multipliers $$\lambda_{k + 1}$$ and $$\mu_{k + 1}$$

• Update the trust-region radius $$\Delta_{k + 1}$$

• Improve the geometry of $$\mathcal{Y}_{k + 1}$$ if necessary

### A lot of questions need to be answered#

The framework above is a simplified version of the COBYQA algorithm. Maybe questions need to be answered to completely define the algorithm. We provide below some examples.

1. How to calculate the trial step? What if the trust-region subproblem is infeasible?

2. What are the approximate Lagrange multipliers? How to estimate them?

3. How to update the penalty parameter?

4. How to update the trust-region radius?

5. What if the interpolation set $$\mathcal{Y}_k$$ is almost nonpoised?

The answers to these questions (and more) are provided in Ph.D. thesis mentioned at the beginning of this page.