07/03/2026
Scalable approximation of the transformation-free linear simplicial-simplicial regression via constrained iterative reweighted least squares

Scalable approximation of the transformation-free linear simplicial-simplicial regression via constrained iterative reweighted least squares

Simplicia-simplicial regression concerns statistical modeling scenarios in which both the predictors and the responses are vectors constrained to lie on the simplex. Fiksel et al. (2022) introduced a transformationfree linear regression framework for this setting, wherein the regression coefficients are estimated by minimizing the Kullback-Leibler divergence between the observed and fitted compositions, using an expectation-maximization (EM) algorithm for optimization. In this work, we reformulate the problem as a constrained logistic regression model, in line with the methodological perspective of Tsagris (2025), and we obtain parameter estimates via constrained iteratively reweighted least squares. Simulation results indicate that the proposed procedure substantially improves computational efficiency-yielding speed gains ranging from 6×−−326×-while providing estimates that closely approximate those obtained from the EM-based approach.

Views: 18

Simplicial-simplicial regression concerns statistical modeling scenarios in which both the predictors and the responses are vectors constrained to lie on the simplex. Fiksel et al. (2022) introduced a transformation-free linear regression framework for this setting, wherein the regression coefficients are estimated by minimizing the Kullback-Leibler divergence between the observed and fitted compositions, using an expectation-maximization (EM) algorithm for optimization. In this work, we reformulate the problem as a constrained logistic regression model, in line with the methodological perspective of Tsagris (2025), and we obtain parameter estimates via constrained iteratively reweighted least squares. Simulation results indicate that the proposed procedure substantially improves computational efficiency-yielding speed gains ranging from 6×−326×while providing estimates that closely approximate those obtained from the EM-based approach.

Simplicial, or compositional data, data1 are non-negative multivariate vectors whose components convey only relative information. When the vectors are scaled to sum to 1, their sample space is the standard simplex. Such data arise in a wide range of scientific disciplines, and the extensive literature on their proper statistical treatment attests to their prevalence in practical applications. In econometrics, these are commonly referred to as multivariate fractional data (Mullahy, 2015, Murteira and Ramalho, 2016). For numerous examples of real-world applications involving simplicial data see Tsagris and Stewart (2020). 

The frequent appearance of simplicial variables in regression settings has motivated the developmentof appropriate regression methodologies, leading to several recent methodological advances. Most contributions in this field focus either on the case of a simplicial response with real-valued predictors (simplicial–real regression) or on the converse setting with simplicial predictors and real-valued responses (real–simplicial regression). By contrast, the simplicial–simplicial regression setting, in which both the response and predictor variables are simplicial, has received comparatively limited attention. This constitutes the main focus of the present work. 

Applications involving simplicial responses and simplicial predictors include, among others, Wang et al. (2013), who modeled the relationship between economic outputs and inputs in China, and Di Marzio et al. (2015), who predicted the composition of the moss layer using the composition of the O-horizon layer. Chen et al. (2017) investigated the association between age structure and consumption structure across economic regions. Filzmoser et al. (2018) studied differences in educational composition across countries. Aitchison (2003) and Alenazi (2019) examined relationships between alternative methods of estimating white blood cell type compositions. Chen et al. (2021) explored associations between chemical metabolites of Astragali Radix and plasma metabolites in rats following administration. Tsagris (2025) analyzed relationships between crop production and cultivated area in Greece, as well as voting proportion dynamics in Spain. Finally, Rios et al. (2025) investigated associations between high-dimensional multiomics simplicial datasets.

Most existing approaches to simplicial–simplicial regression apply transformations to both simplicial variables. For example, Hron et al. (2012), Wang et al. (2013), Chen et al. (2017), and Han and Yu (2022) employed log-ratio transformations for both predictors and responses followed by multivariate linear regression. Alenazi (2019) transformed simplicial predictors via the α-transformation (Tsagris et al., 2011), applied principal component analysis, and subsequently fitted a Kullback–Leibler divergence (KLD) regression (multinomial logit) model (Murteira and Ramalho, 2016). In contrast, Fiksel et al. (2022) proposed a transformation-free linear regression (TFLR) framework, in which the regression coefficients lie on the simplex and are estimated by minimizing the KLD between observed and fitted simplicial responses using an Expectation–Maximization (EM) algorithm. 

A limitation of the EM-based estimation procedure for the TFLR model is its computational burden, as the EM algorithm is generally slow to converge. To alleviate this issue, we employ a constrained iteratively reweighted least squares (CIRLS) algorithm, incorporating simplex constraints on the regression coefficients. The implementation of the EM algorithm in Tsagris (2025) was shown to be approximately four times faster than that of Fiksel et al. (2022). Our simulation studies demonstrate that CIRLS achieves additional substantial computational gains, yielding speed improvements ranging from 2× to 255×, depending on the scenario, while producing estimates that closely approximate those obtained via the EM algorithm. These computational benefits are particularly relevant in contexts involving: (a) analysis of multiple datasets, (b) large-scale simulation studies, (c) high-dimensional compositions (Rios et al., 2025), and (d) permutation- or bootstrap-based inference procedures (Fiksel et al., 2022, Tsagris, 2025).

See also

Τμήμα Οικονομικών Επιστημών

myEcon Newsletter

Εγγραφείτε στην λίστα ειδοποιήσεων του Τμήματος Οικονομικών Επιστημών.