when classifying data with logistic classification, what is the upper bound of the likelihood in the maximum likelihood method? is this value attainable?

when classifying data with logistic classification, what is the upper bound of the likelihood in the maximum likelihood method? is this value attainable?

6 hours ago 2
Nature

When classifying data with logistic regression using the maximum likelihood method, the likelihood function represents the probability of observing the given data under the model parameters. The key points regarding the upper bound of this likelihood and its attainability are as follows:

Upper Bound of the Likelihood in Logistic Classification

  • The likelihood function for logistic regression is the product of probabilities assigned to the observed class labels given the input features and model parameters. Maximizing this likelihood corresponds to maximizing the log-likelihood, which is more computationally convenient
  • The maximum possible value of the log-likelihood for any model is attained by the saturated model , which perfectly fits the data. This saturated model assigns the predicted probability for each data point equal to the observed proportion of the class labels, achieving the highest possible likelihood
  • More concretely, for each data point iii with observed label yiy_iyi​ and number of trials nin_ini​, the log-likelihood contribution is maximized when the predicted probability π(xi)\pi(x_i)π(xi​) equals the empirical fraction yi/niy_i/n_iyi​/ni​. Hence, the upper bound of the log-likelihood is:

∑i[yilog⁡yini+(ni−yi)log⁡(1−yini)]\sum_i \left[y_i\log \frac{y_i}{n_i}+(n_i- y_i)\log \left(1-\frac{y_i}{n_i}\right)\right]i∑​[yi​logni​yi​​+(ni​−yi​)log(1−ni​yi​​)]

This represents the maximum achievable log-likelihood for the data

Is This Upper Bound Attainable in Logistic Regression?

  • In practice, the logistic regression model is parametric and constrained by its functional form (sigmoid applied to a linear function of inputs). It cannot always perfectly match the empirical probabilities for all data points unless the data is perfectly separable.
  • For linearly separable data , the maximum likelihood estimate (MLE) does not have a finite solution. Instead, the likelihood can be increased indefinitely by increasing the magnitude of the weight vector www toward infinity, making the decision boundary perfectly separate the classes and pushing predicted probabilities toward 0 or 1. Thus, the likelihood approaches the upper bound asymptotically but is not attained at any finite parameter value
  • For non-separable data , the MLE converges to a finite parameter vector that maximizes the likelihood, but the likelihood value will be strictly less than the saturated model’s upper bound because the model cannot perfectly fit all points.

Summary

Aspect| Description
---|---
Upper bound of likelihood| Achieved by the saturated model, matching predicted probabilities to empirical frequencies.
Attainability for separable data| Not attainable at finite parameters; likelihood grows without bound as weights go to infinity.
Attainability for non-separable data| Attainable at finite parameters but less than the saturated model likelihood.

Therefore, the upper bound of the likelihood in logistic classification corresponds to the likelihood of the saturated model, which is generally not attainable by logistic regression for linearly separable data (it is approached asymptotically with infinite weights) and is attainable only approximately for non-separable data

Read Entire Article