In Eval(Family$initialize) : Non-integer #successes in a Binomial Glm!
Like @HoongOoi said, glm.fit with binomial family expects integer counts and throws a warning otherwise; if you want non-integer counts, utilize quasi-binomial. The rest of my answer compares these.
Quasi-binomial in R for glm.fit is exactly the same as binomial for the coefficient estimates (as mentioned in comments past @HongOoi) merely not for the standard errors (every bit mentioned in the comment by @nograpes).
Comparison of source code
A diff on the source code of stats::binomial and stats::quasibinomial shows the following changes:
- the text "binomial" becomes "quasibinomial"
- the aic function returns NA instead of the calculated AIC
and the following removals:
- setting outcomes to 0 when weights = 0
- check on integrality of weights
-
simfunfunction to simulate data
Only simfun could brand a difference, merely the source code of glm.fit shows no use of that function, different other fields in the object returned by stats::binomial such as mu.eta and link.
Minimal working example
The results from using quasibinomial or binomial are the same for the coefficients in this minimal working case:
library('MASS') library('stats') gen_data <- function(n=100, p=3) { prepare.seed(1) weights <- stats::rgamma(n=north, shape=rep(1, n), rate=rep(1, northward)) y <- stats::rbinom(n=n, size=i, prob=0.v) theta <- stats::rnorm(n=p, mean=0, sd=one) means <- colMeans(as.matrix(y) %*% theta) x <- MASS::mvrnorm(n=due north, ways, diag(ane, p, p)) return(list(10=ten, y=y, weights=weights, theta=theta)) } fit_glm <- function(family) { information <- gen_data() fit <- stats::glm.fit(x = information$x, y = data$y, weights = data$weights, family = family) return(fit) } fit1 <- fit_glm(family=stats::binomial(link = "logit")) fit2 <- fit_glm(family=stats::quasibinomial(link = "logit")) all(fit1$coefficients == fit2$coefficients) Comparing with the quasibinomial probability distribution
This thread suggests that the quasibinomial distribution is different from the binomial distribution with an additional parameter phi. Simply they hateful different things in statistics and in R.
First, no identify in the source code of quasibinomial mentions that boosted phi parameter.
2nd, a quasiprobability is similar to a probability, but non a proper one. In this case, one cannot compute the term (northward \cull k) when the numbers are non-integers, although 1 could with the Gamma function. This may be a problem for the definition of the probability distribution but is irrelevant for estimation, as the term (north choose k) practise not depend on the parameter and fall out of optimisation.
The log-likelihood estimator is:
The log-likelihood every bit a function of theta with the binomial family is:
where the constant is independent of the parameter theta, and then information technology falls out of optimisation.
Comparison of standard errors
The standard errors calculated past stats::summary.glm use a different dispersion value for the binomial and quasibinomial families, as mentioned in stats::summary.glm:
The dispersion of a GLM is not used in the fitting process, but it is needed to find standard errors. If
dispersionis not supplied orNULL, the dispersion is taken asonefor thebinomialandPoissonfamilies, and otherwise estimated past the rest Chisquared statistic (calculated from cases with not-zero weights) divided by the residual degrees of liberty....
cov.unscaled: the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.
cov.scaled: ditto, scaled bydispersion.
Using the the above minimal working example:
summary1 <- stats::summary.glm(fit1) summary2 <- stats::summary.glm(fit2) print("Equality of unscaled variance-covariance-matrix:") all(summary1$cov.unscaled == summary2$cov.unscaled) impress("Equality of variance-covariance matrix scaled by `dispersion`:") all(summary1$cov.scaled == summary2$cov.scaled) impress(summary1$coefficients) impress(summary2$coefficients) shows the same coefficients, same unscaled variance-covariance matrix, and dissimilar scaled variance-covariance matrices:
[1] "Equality of unscaled variance-covariance-matrix:" [one] TRUE [ane] "Equality of variance-covariance matrix scaled past `dispersion`:" [one] Imitation Estimate Std. Error z value Pr(>|z|) [1,] -0.3726848 0.1959110 -1.902317 0.05712978 [2,] 0.5887384 0.2721666 two.163155 0.03052930 [3,] 0.3161643 0.2352180 1.344133 0.17890528 Estimate Std. Error t value Pr(>|t|) [1,] -0.3726848 0.1886017 -1.976042 0.05099072 [2,] 0.5887384 0.2620122 two.246988 0.02690735 [3,] 0.3161643 0.2264421 one.396226 0.16583365 spencerthallusithe.blogspot.com
Source: https://stackoverflow.com/questions/12953045/warning-non-integer-successes-in-a-binomial-glm-survey-packages
0 Response to "In Eval(Family$initialize) : Non-integer #successes in a Binomial Glm!"
Post a Comment