In linear regression, R-squared (R²) is a statistical measure that indicates how much of the variation of a dependent variable is explained by an independent variable in a regression model. It is also known as the coefficient of determination. R-squared evaluates the scatter of the data points around the fitted regression line and can take any values between 0 to 1. A higher R-squared value indicates that more variability is explained by the model.
To calculate R-squared, the sum of squares for the model and the total sum of squares are used. The sum of squares for the model is the sum of the residuals squared, and the total sum of squares is the sum of the distance the data is away from the mean all squared. The formula for R-squared is:
$$R^2=1-\frac{\text{sum squared regression (SSR)}}{\text{total sum of squares (SST)}}$$
Or
$$R^2=1-\frac{\sum({y_i}-\hat{y_i})^2}{\sum(y_i-\bar{y})^2}$$
Where SSR is the sum of the residuals squared, and SST is the total sum of squares. R-squared is always between 0 and 100%.
The most common interpretation of R-squared is how well the regression model explains observed data. For example, an R-squared of 60% reveals that 60% of the variability observed in the target variable is explained by the regression model. However, it is not always the case that a high R-squared is good for the regression model. The quality of the statistical measure depends on many factors, such as the nature of the variables employed in the model, the units of measure of the variables, and the applied data transformation.