31/03/2024
Constrained Least Squares Simplicial-Simplicial Regression

Constrained Least Squares Simplicial-Simplicial Regression

Simplicial-simplicial regression refers to the regression setting where both the responses and predictor variables lie within the simplex space, i.e. they are compositional. For this setting, constrained least squares, where the regression coefficients themselves lie within the simplex, is proposed. The model is transformation-free but the adoption of a power transformation is straightforward, it can treat more than one compositional datasets as predictors and offers the possibility of weights among the simplicial predictors. Among the model’s advantages are its ability to treat zeros in a natural way and a highly computationally efficient algorithm to estimate its coefficients. Resampling based hypothesis testing procedures are employed regarding inference, such as linear independence, and equality of the regression coefficients to some pre-specified values. The performance of the proposed technique and its comparison to an existing methodology that is of the same spirit takes place u

Topics: Statistics , Theory
Authors: Tsagris Michail
Views: 456

Compositional data are non-negative multivariate vectors whose variables (typically called components) conveying only relative information. When the vectors are scaled to sum to 1, their sample space is the standard simplex. Examples of such data may be found in many different fields of study and the extensive scientific literature that has been published on the proper analysis of this type of data is indicative of its prevalence in real-life applications. The widespread occurrence of this type of data in numerous scientific fields that involve predictors has necessitated the need for valid regression models which in turn has led to several developments in this area, many of which have been proposed recently. Most of these regression models have a restricted attention to the case of a simplicial response (simplicial-real regression setting), or a simplicial predictor (real-simplicial regression setting). The case of simplicial-simplicial regression, where both sides of the equation contain compositional data has not gained too much attention, and this is the main focus of this paper.

Most published papers regarding the last case scenario involve transformations of both simplicial sides. Hron et al. (2012), Wang et al. (2013), Chen et al. (2017) and Han and Yu (2022) used a logratio transformation for both the response and predictor variables and performed a multivariate linear regression model. Alenazi (2019) transformed the simplicial predictor using the α-transformation (Tsagris et al., 2011) followed by principal component analysis and then employed the Kullback-Leibler divergence regression (or multinomial logit) model (Murteira and Ramalho, 2016). The exception is Fiksel et al. (2022) who proposed a transformation-free linear regression (TFLR) model whose coefficients lie within the simplex and are estimated via minimization of the Kullback-Leibler divergence (KLD) between the observed and the fitted simplicial responses.

An important issue with compositional data analysis is the presence of zeros that prohibit the use of the logarithmic transformations, and hence the approach of Hron et al. (2012), Wang et al. (2013) and Chen et al. (2017), an issue that is not addressed in most papers. The classical strategy addressing this issue is to replace the zero values by a small quantity (Aitchison, 2003). However, the approach of Alenazi (2019) and handles the zero cases in a natural manner. This is not true in general for the TFLR model though. Tsagris et al. (2011) categorized the compositional data analysis approaches into two main categories, the raw data approach and the log-ratio approach. A perhaps better classification would be the raw data and the transformation-based approaches. Moving along the lines of the raw data approach the paper proposes the use of the same transformation-free linear regression model, as in Fiksel et al. (2022), when both the response and the predictor variables are simplicial. However, the adoption of a power transformation in the simplicial response generalizes the model. The regression coefficients are estimated via simplicial constrained least squares (SCLS) and as the name implies, least squares is the loss function used to estimate the regression coefficients which are constrained to lie on the simplex. This in turn implies that the expected value of the simplicial response can be expressed as a Markov transition from the simplicial predictor. The proposed SCLS model allows for more than one simplicial predictor, further allows the possibility of assigning weights to the simplicial predictors, and treats zero values naturally, in both the simplicial response and the predictor variables. The assumption of linear independence between the simplicial variables, and hypotheses regarding the mateix of regression coefficients can be tested using resampling techniques. Evidently, the SCLS is similar in spirit to the TFLR, but they have different loss (or objective functions). The TFLR model employs the Expectation-Maximization (EM) algorithm, whereas the SCLS model is based on quadratic programming, thus it enjoys a really low computational cost.

The problem of constrained least squares (CLS), with a univariate real response, is not new. Liew (1976) and Wets (1991) have studied the asymptotic properties of constrained regression and have established the consistency of the regression coefficients, assuming the linear specification is correct. Wets (1991) specifically formalized the asymtptotic properties of the regression coefficients for the case of the M-estimators, whose least squares is a special case. More recently, James et al. (2019) proposed the constrained LASSO, a penalized version of the CLS. The current work though differs from these works in that it deals with the case of a constrained multivariate response.

See also

Department Of Economics Website

myEcon Newsletter

Join the notification list of the Department of Economics.