Robert Bajons | Joint work with Kurt Hornik
Vienna University of Economics and Business
International Workshop on Statistical Modelling 2024
Jul 16, 2024
Applications in sports:
Masarotto and Varin (2012) use paired comparisons of sports teams (i.e. matches played) to rank them via the ranking lasso:
Use similar ideas for different problems:
\[\min_{\boldsymbol{\beta}} \ell\left(Y;X,\boldsymbol{\beta}\right)+\lambda ||D\boldsymbol{\beta}||_1,\]
with \(\ell\left(Y;X,\boldsymbol{\beta}\right) = \sum_{i = 1}^{n} y_i \log(\frac{1}{1+\exp(-\boldsymbol{\beta}^{\top} x)})+(1-y_i)\log(\frac{1}{1+\exp(\boldsymbol{\beta}^{\top} x)})\).
\(D\): Matrix representing structural constraint on parameters:
Basic definition:
\[ \begin{align} \text{minimize } & a_0^{\top}x \\ \text{s.t. } & Ax+s = b, \quad s \in \mathcal{K}. \end{align} \]
\(\mathcal{K}\) is a composition of simple convex cones.
E.g.: Any standard constrained linear problem \(a_0^{\top}x\) s.t. \(Ax \le b\) is a conic problem consisting of linear cones \(K_{{\rm lin}} = \{x \in \mathbb{R} | x \ge 0\}\).
Any convex problem can be reformulated into CP via its epigraph form (Boyd and Vandenberghe 2004).
Initial convex problem:
\[ \begin{align*} \text{minimize } & f_0(x) \\ \text{s.t. } & f_i(x) \le 0. \end{align*} \]
\[\Leftrightarrow\]
Epigraph form:
\[ \begin{align*} \text{minimize } & t \\ \text{s.t. } & f_0(x) \le t, \quad f_i(x) \le 0. \end{align*} \]
Only a few convex cones allow for modelling of a variety of problems:
Algorithmic advantages of CPs:
Proven to be robust, reliable and effective for similar type of problems (F. Schwendinger, Grün, and Hornik 2021).
Readily implemented in R via package ROI (Theußl, Schwendinger, and Hornik 2020).
\[ \begin{align} \min_{\boldsymbol{\beta}} \quad -\sum_{i = 1}^{n} \big(y_i \log(\frac{1}{1+\exp(-\boldsymbol{\beta}^{\top} x_i)})+(1-y_i)\log(\frac{1}{1+\exp(\boldsymbol{\beta}^{\top} x)})\big)+ \\ \lambda \sum_{j = 1}^m w_{j}|D_j| \end{align} \]
\(y_i\) binary outcome, \(x_i\) \(i\)-th Row of \(X \in \mathbb{R}^{n\times p}\), \(\boldsymbol{\beta} \in \mathbb{R}^p\) .
\(D_j\): \(j\)-th component of \(D\boldsymbol{\beta}\), with \(D\in \mathbb{R}^{m \times p}\).
\(w_j\): adaptively control for degree of shrinkage for specific terms (Zou 2006).
\[ \begin{align} \min_{(\boldsymbol{r},\boldsymbol{\beta},\boldsymbol{s},\boldsymbol{t},\boldsymbol{z}_1,\boldsymbol{z}_2)} & \sum_{i = 1}^n t_i +\lambda r \\ \text{s.t.} & \left(z_{i1}, 1, u_i-t_i\right) \in K_{\rm exp}, \\ & \left(z_{i2}, 1,-t_i\right) \in K_{\rm exp }, \\ & z_{i1}+z_{i2} \leq 1, & i = 1,\dots,n, \\ & -s_j \le D_j \le s_j, & j = 1,\dots,m, \\ & r - w_1D_1 - \dots - w_mD_m \ge 0. \end{align} \]
\[ \begin{align} \min_{(\boldsymbol{r},\boldsymbol{\beta},\boldsymbol{s},\boldsymbol{t},\boldsymbol{z}_1,\boldsymbol{z}_2)} & \sum_{i = 1}^n t_i +\lambda r \\ \text{s.t.} & \color{green}{\left(z_{i1}, 1, u_i-t_i\right) \in K_{\rm exp}}, \\ & \color{green}{\left(z_{i2}, 1,-t_i\right) \in K_{\rm exp }}, \\ & z_{i1}+z_{i2} \leq 1, & i = 1,\dots,n, \\ & -s_j \le D_j \le s_j, & j = 1,\dots,m, \\ & r - w_1D_1 - \dots - w_mD_m \ge 0. \end{align} \]
Conic program on augmented set of variables \((\boldsymbol{r},\boldsymbol{\beta},\boldsymbol{s},\boldsymbol{t},\boldsymbol{z}_1,\boldsymbol{z}_2)\), where \(u_i = \pm \beta^\top x_i\).
Composition of two classes of convex cones:
\[ \begin{align} \min_{(\boldsymbol{r},\boldsymbol{\beta},\boldsymbol{s},\boldsymbol{t},\boldsymbol{z}_1,\boldsymbol{z}_2)} & \sum_{i = 1}^n t_i +\lambda r \\ \text{s.t.} & \left(z_{i1}, 1, u_i-t_i\right) \in K_{\rm exp}, \\ & \left(z_{i2}, 1,-t_i\right) \in K_{\rm exp }, \\ & \color{green}{z_{i1}+z_{i2} \leq 1}, & i = 1,\dots,n, \\ & \color{green}{-s_j \le D_j \le s_j}, & j = 1,\dots,m, \\ & \color{green}{r - w_1D_1 - \dots - w_mD_m \ge 0}. \end{align} \]
Conic program on augmented set of variables \((\boldsymbol{r},\boldsymbol{\beta},\boldsymbol{s},\boldsymbol{t},\boldsymbol{z}_1,\boldsymbol{z}_2)\), where \(u_i = \pm \beta^\top x_i\).
Composition of two classes of convex cones:
R: package ROI allows for usage of variety of solvers.\[w_{j}=\left|D\tilde{\boldsymbol{\beta}}\right|^{-1}, \quad \tilde{\boldsymbol{\beta}}_{\epsilon}={\arg \min}_{\boldsymbol{\beta}} \left\{-\ell(\boldsymbol{\beta})+\epsilon \sum_{i}\beta_i^2\right\}.\]
Comparison of approaches:
Conic approaches:
ROI implementation with different solvers (Mosek, Clarabel, ECOS).Augmented Lagrangian methods: implementation with package alabama:
auglag.constrOptim.nl.Simulation studies with \(n = 5000\) (sample size) and \(p = 20,40,50,80\) (number of covariates).
50 repetition for each scenario.
Covariate structure: 4 groups of coefficients, 2/3 of the coefficients relevant ( \(X\) Gaussian).
Logistic regression model: binary response \(Y\) is drawn from: \[y_i | x_{i1},\dots,x_{im} \sim {\rm Ber}(p_i), \quad m = 1,\dots,\lfloor \frac{2}{3}p \rfloor,\] where \(p_i = 1/\big(1+\exp(-\boldsymbol{\beta}^{\top} x)\big)\).
Evaluation of simulation:
Average number of groups detected (in 50 runs of simulations, fixed \(\lambda\)):
Conic solver substantially faster than augmented Lagrangian methods:
ROI implementation leads to runtime overhead but allows for flexibility.Conic solvers more accurate in terms of finding optimal solution.
Conic solvers more accurately retrieve implicit grouping structure than alabama.
Conic solver require no starting values!
Aim: rank players based on on-ice-time during goals.
Data set of hockey goals (Gramacy, Jensen, and Taddy 2013):
Idea:
Set up matrix \(D \Rightarrow\) separate ranking lasso penalty for player columns and team columns
\[ \begin{align} D = \left(\begin{array}{cc} D_{\rm player} & \boldsymbol{0} \\ \boldsymbol{0} & D_{\rm team} \end{array}\right), \quad D_{\rm player} = \left(\begin{array}{ccc} A_1 & B_1 & C_1 \\ \vdots & \vdots & \vdots \\ A_{n_p-1} & B_{n_p-1} & C_{n_p-1} \end{array}\right). \end{align} \]
\(A_i\): Matrices of zero of dimension \((n_p-1) \times (i-1)\).
\(B_i\): vector of ones of length \(n_p\).
\(C_i\): negative identity matrix of dimension \((n_p-1)\).
Set up \(D_{\rm team}\) analogously.
Total dimension of \(D\): \(3030195 \times 2769\) (Simulations: \(p = 50\) \(\rightarrow\) \(1275 \times 50\)!).
Fast, flexible and robust implementation of the adaptive generalized logistic lasso (AGLL) model using conic programming methods.
Application to ranking in sports \(\Rightarrow\) high-dimensional and sparse data setup.
Work in progress and extensions:
Implementation in R via package ROI \(\Rightarrow\) flexible but overhead in computation (work in progress).
Generalization to other GLM type of models and non linear effects possible.
Thank you for your attention!
Original Problem:
\[ \begin{align*} \text{minimize}_y & \quad \exp(2y_1+y_2) \\ \text{s.t. } & \quad 5y_1+2 y_2 \le 4. \end{align*} \]
Epigraph form:
\[ \begin{align*} \text{minimize}_{y,t} & \quad t \\ \text{s.t. } & \quad \exp(2y_1+y_2) \le t \\ & \quad \quad 5y_1+2 y_2 \le 4. \end{align*} \]
Two convex cones necessary to write in conic form:
Exponential cone \(K_{\exp} = \{x \in \mathbb{R}^3 | x_1 \ge x_2 \exp(x_3/x_2), x_1 > 0, x_2 > 0\}\):
\[\exp(2y_1+y_2) \le t \Leftrightarrow (t,1,2y_1+y_2) \in K_{\rm exp}\]
Linear cone:
\[5y_1+2 y_2 \le 4 \Leftrightarrow 4-(5y_1+2 y_2) \in K_{\text{lin}}\]
Likelihood terms: \[ \begin{align} &- y_i \log(\frac{1}{1+\exp(-\boldsymbol{\beta}^{\top} x_i)})-(1-y_i)\log(\frac{1}{1+\exp(\boldsymbol{\beta}^{\top} x_i)}) \le t_i \\ \Leftrightarrow & \begin{cases}\log(1+\exp(-\beta^\top x_i)) \le t_i &, y_i = 1\\ \log(1+\exp(\beta^\top x_i)) \le t_i &, y_i = 0\end{cases} \end{align} \]
Constraint \[t \ge \log(1+\exp(u)) \Leftrightarrow \exp(u-t) + \exp(-t) \le 1\]
Exponential function can be modeled via exponential cone: \[ \begin{align} z_1 \ge \exp(u-t) &\Leftrightarrow (z_1,1,u-t) \in K_{\rm exp}. \\ z_2 \ge \exp(-t) &\Leftrightarrow (z_2,1,-t) \in K_{\rm exp} \end{align} \]
Additional linear cone is necessary for \(z_1 + z_2 \le 1\).
3 steps necessary:
Advantage of ROI:
R-users.Weights: Masarotto and Varin (2012) suggest weights inversely proportional to modified MLE with small ridge penalty on parameters.
\[w_{j}=\left|D\tilde{\boldsymbol{\beta}}\right|^{-1}, \quad \tilde{\boldsymbol{\beta}}_{\epsilon}={\arg \min}_{\boldsymbol{\beta}} \left\{-\ell(\boldsymbol{\beta})+\epsilon \sum_{i}\beta_i^2\right\}.\]
Compute AGLL solution for a range of values of \(\lambda\).
Selection of \(\lambda\) through minimization of AIC type criterion (Tibshirani and Taylor 2011): \[{\rm AIC}(\lambda) = -2\ell(\boldsymbol{\beta})+2{\rm enp}(\lambda),\] \({\rm enp}(\lambda) \dots\) effective number of parameters estimated as number of distinct groups for \(\lambda\).
Classical CV procedure: