Abstract
R-squared (R2) and adjusted R-squared (R2 Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2 *. The proposed ρ2 * remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2 Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2 *, while the traditional R2 Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2 Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2 Adj to ρ2 * is demonstrated. The effects of model parameters on the bias of R2 and R2 Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated.
Original language | English |
---|---|
Pages (from-to) | 112-119 |
Number of pages | 8 |
Journal | American Statistician |
Volume | 71 |
Issue number | 2 |
DOIs | |
State | Published - 3 Apr 2017 |
Keywords
- Adjusted R-squared
- Linear regression
- Multiple correlation shrinkage
ASJC Scopus subject areas
- Statistics and Probability
- General Mathematics
- Statistics, Probability and Uncertainty