Non-negative least squares

In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix $A$ and a (column) vector of response variables $y$ , the goal is to find^[1]

\operatorname {arg\,min} \limits _{\mathbf {x} }\|\mathbf {Ax} -\mathbf {y} \|_{2}^{2}

subject to

x \geq 0

.

Here $x \geq 0$ means that each component of the vector $x$ should be non-negative, and $‖\cdot‖ 2$ denotes the Euclidean norm.

Non-negative least squares problems turn up as subproblems in matrix decomposition, e.g. in algorithms for PARAFAC^[2] and non-negative matrix/tensor factorization.^[3]^[4] The latter can be considered a generalization of NNLS.^[1]

Another generalization of NNLS is bounded-variable least squares (BVLS), with simultaneous upper and lower bounds $α i \leq x i \leq β i$ .^[5]^: 291^[6]

Quadratic programming version

The NNLS problem is equivalent to a quadratic programming problem

\operatorname {arg\,min} \limits _{\mathbf {x\geq 0} }\left({\frac {1}{2}}\mathbf {x} ^{\mathsf {T}}\mathbf {Q} \mathbf {x}  \mathbf {c} ^{\mathsf {T}}\mathbf {x} \right),

where $Q$ = $A T A$ and $c$ = $- A T y$ . This problem is convex, as $Q$ is positive semidefinite and the non-negativity constraints form a convex feasible set.^[7]

Algorithms

The first widely used algorithm for solving this problem is an active-set method published by Lawson and Hanson in their 1974 book Solving Least Squares Problems.^[5]^: 291 In pseudocode, this algorithm looks as follows:^[1]^[2]

Inputs:
- a real-valued matrix $A$ of dimension $m \times n$ ,
- a real-valued vector $y$ of dimension $m$ ,
- a real value $ε$ , the tolerance for the stopping criterion.
Initialize:
- Set $P = \emptyset$ .
- Set $R = {1, ..., n$ }.
- Set $x$ to an all-zero vector of dimension $n$ .
- Set $w = A T (y - A x)$ .
- Let $w R$ denote the sub-vector with indexes from R
Main loop: while R ≠ ∅ and max(w^R) > ε:
- Let $j$ in $R$ be the index of $max(w R)$ in $w$ .
- Add $j$ to $P$ .
- Remove $j$ from $R$ .
- Let $A P$ be $A$ restricted to the variables included in $P$ .
- Let $s$ be vector of same length as $x$ . Let $s P$ denote the sub-vector with indexes from P, and let $s R$ denote the sub-vector with indexes from R.
- Set $s P = ((A P) T A P) -1 (A P) T y$
- Set $s R$ to zero
- While min(s^P) ≤ 0:
  - Let $α = min .mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num{display:block;line-height:1em;margin:0.0em 0.1em;border-bottom:1px solid}.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0.1em 0.1em}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);clip-path:polygon(0px 0px,0px 0px,0px 0px);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}⁠xi/xi − si⁠ for i in P where si ≤ 0$ .
  - Set $x$ to $x α (s - x)$ .
  - Move to $R$ all indices $j$ in $P$ such that $x j \leq 0$ .
  - Set $s P = ((A P) T A P) -1 (A P) T y$
  - Set $s R$ to zero.
- Set $x$ to $s$ .
- Set $w$ to $A T (y - A x)$ .
Output: x

This algorithm takes a finite number of steps to reach a solution and smoothly improves its candidate solution as it goes (so it can find good approximate solutions when cut off at a reasonable number of iterations), but is very slow in practice, owing largely to the computation of the pseudoinverse $((A P) T A P) -1$ .^[1] Variants of this algorithm are available in MATLAB as the routine lsqnonneg^[8]^[1] and in SciPy as optimize.nnls.^[9]

Many improved algorithms have been suggested since 1974.^[1] Fast NNLS (FNNLS) is an optimized version of the Lawson–Hanson algorithm.^[2] Other algorithms include variants of Landweber's gradient descent method^[10] and coordinate-wise optimization based on the quadratic programming problem above.^[7]

References

^ ^a ^b ^c ^d ^e ^f Chen, Donghui; Plemmons, Robert J. (2009). Nonnegativity constraints in numerical analysis. Symposium on the Birth of Numerical Analysis. CiteSeerX 10.1.1.157.9203.
^ ^a ^b ^c Bro, Rasmus; De Jong, Sijmen (1997). "A fast non-negativity-constrained least squares algorithm". Journal of Chemometrics. 11 (5): 393. doi:10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L.
^ Lin, Chih-Jen (2007). "Projected Gradient Methods for Nonnegative Matrix Factorization" (PDF). Neural Computation. 19 (10): 2756–2779. CiteSeerX 10.1.1.308.9135. doi:10.1162/neco.2007.19.10.2756. PMID 17716011.
^ Boutsidis, Christos; Drineas, Petros (2009). "Random projections for the nonnegative least-squares problem". Linear Algebra and Its Applications. 431 (5–7): 760–771. arXiv:0812.4547. doi:10.1016/j.laa.2009.03.026.
^ ^a ^b Lawson, Charles L.; Hanson, Richard J. (1995). "23. Linear Least Squares with Linear Inequality Constraints". Solving Least Squares Problems. SIAM. p. 161. doi:10.1137/1.9781611971217.ch23. ISBN 978-0-89871-356-5.
^ Stark, Philip B.; Parker, Robert L. (1995). "Bounded-variable least-squares: an algorithm and applications" (PDF). Computational Statistics. 10: 129.
^ ^a ^b Franc, Vojtěch; Hlaváč, Václav; Navara, Mirko (2005). "Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem". Computer Analysis of Images and Patterns. Lecture Notes in Computer Science. Vol. 3691. pp. 407–414. doi:10.1007/11556121_50. ISBN 978-3-540-28969-2.
^ "lsqnonneg". MATLAB Documentation. Retrieved October 28, 2022.
^ "scipy.optimize.nnls". SciPy v0.13.0 Reference Guide. Retrieved 25 January 2014.
^ Johansson, B. R.; Elfving, T.; Kozlov, V.; Censor, Y.; Forssén, P. E.; Granlund, G. S. (2006). "The application of an oblique-projected Landweber method to a model of supervised learning". Mathematical and Computer Modelling. 43 (7–8): 892. doi:10.1016/j.mcm.2005.12.010.

[chen-1] ^ ^a ^b ^c ^d ^e ^f Chen, Donghui; Plemmons, Robert J. (2009). Nonnegativity constraints in numerical analysis. Symposium on the Birth of Numerical Analysis. CiteSeerX 10.1.1.157.9203.

[bro-2] Bro, Rasmus; De Jong, Sijmen (1997). "A fast non-negativity-constrained least squares algorithm". Journal of Chemometrics. 11 (5): 393. doi:10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L.

[3] Lin, Chih-Jen (2007). "Projected Gradient Methods for Nonnegative Matrix Factorization" (PDF). Neural Computation. 19 (10): 2756–2779. CiteSeerX 10.1.1.308.9135. doi:10.1162/neco.2007.19.10.2756. PMID 17716011.

[4] Boutsidis, Christos; Drineas, Petros (2009). "Random projections for the nonnegative least-squares problem". Linear Algebra and Its Applications. 431 (5–7): 760–771. arXiv:0812.4547. doi:10.1016/j.laa.2009.03.026.

[lawson-5] Lawson, Charles L.; Hanson, Richard J. (1995). "23. Linear Least Squares with Linear Inequality Constraints". Solving Least Squares Problems. SIAM. p. 161. doi:10.1137/1.9781611971217.ch23. ISBN 978-0-89871-356-5.

[6] Stark, Philip B.; Parker, Robert L. (1995). "Bounded-variable least-squares: an algorithm and applications" (PDF). Computational Statistics. 10: 129.

[sca-7] Franc, Vojtěch; Hlaváč, Václav; Navara, Mirko (2005). "Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem". Computer Analysis of Images and Patterns. Lecture Notes in Computer Science. Vol. 3691. pp. 407–414. doi:10.1007/11556121_50. ISBN 978-3-540-28969-2.

[8] "lsqnonneg". MATLAB Documentation. Retrieved October 28, 2022.

[9] "scipy.optimize.nnls". SciPy v0.13.0 Reference Guide. Retrieved 25 January 2014.

[10] Johansson, B. R.; Elfving, T.; Kozlov, V.; Censor, Y.; Forssén, P. E.; Granlund, G. S. (2006). "The application of an oblique-projected Landweber method to a model of supervised learning". Mathematical and Computer Modelling. 43 (7–8): 892. doi:10.1016/j.mcm.2005.12.010.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Quadratic programming version

Algorithms

See also

References