5 Other families
5.1 Data set : Cell discolouration
Suppose that another methodology to evaluate the virulence would be to grow the cells in a larger plate and score cell death via imaging and proportion of the plate surface covered with discoloured cells. The data are shown in the graph below as the black dots. The red dots are simple averages.
to download the data: here
The data are proportions. Proportions (constrained on both sides of its range) can be treated via a beta error distribution.
There is a caveat for using this distribution: although the data are constrained by two limits (in this case 0 and 1) the data should not contain those limits. In the case of this computer vision that may not be an issue, but visual assessments can have total absence or complete discolouration. If so a transformation needs to be done that squeezes all data inward. See how you can do this in the chunk below.It is demonstrated on a data set of 100 proportions with 4 of them either 0 or 1. We show only the first 10 data points. The values around 0.5 do not change but the more extreme are shrunken toward 0.5, with the largest shift for the zeros and ones.
<- function (y) {
transformforbeta
if(all(y > 0 & y < 1 )) {
return(y)
else {
} <- length(y)
n return((y*(n-1) + 0.5)/n)
}
}
<- c(0,0,1,1,runif(96))
x round(x[1:10],3)
[1] 0.000 0.000 1.000 1.000 0.963 0.091 0.633 0.799 0.996 0.648
round(transformforbeta(x)[1:10],3)
[1] 0.005 0.005 0.995 0.995 0.959 0.095 0.632 0.796 0.991 0.646
Because the virulence data set does not contains zero or ones, the line of code below will have no impact.
<- ds.ip %>% mutate(discoloured=transformforbeta(discoloured)) ds.ip
The beta distribution has two parameters, but the parametrisation of beta distributions come in two forms. The two parameter are either two shape parameters (as used for instance by the rbeta() or dbeta() function in the R stats base package) or a mean (mu) and a second parameter called phi that lets the spread to vary. This phi has the behavior of a precision, i.e. the inverse of a variance. So the larger, the narrower the distribution. See the appendix for some examples. This mu-phi parametrisation is used by brms and inla, and hence we will use it here.
This parametrisation also brings about the similarity with fitting models based on a normal error distribution, which has also a mean and a variance. When fitting an anova we assume there is one common variance, and focus on the means. Only when the assumption of homoscedasticity (equal variance) cannot be supported, heteroscedastic models where both the means and variance can vary, albeit not necessarily along the same factors. For instance, in data sets with increasing variance for higher values, the mean can be estimate for the different treatments, whereas the variance is modelled according to the fitted values.
A similar reasoning is followed here for the beta regression. The standard solution could be separate mu’s and one common phi, but in some cases we can opt for varying phi albeit with a different model and in this situation even a different link family.
5.1.1 Beta regression with brm
Below the code for three beta regressions with brms. Model m6.brm is the base model for which phi is considered as common over all treatments (the default). Model m7.brm and model m8.brm are two alternative models in which phi is modelled either by treatment or by the random effect of week.
<- brm(data=ds.ip, discoloured ~ treatment + (1|week),
m6.brm family = Beta(link="logit",link_phi = "log"),
chains = 4,cores = 4, iter=3000, warmup = 1000)
Compiling Stan program...
Start sampling
Warning: There were 6 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
summary(m6.brm)
Warning: There were 6 divergent transitions after
warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: beta
Links: mu = logit; phi = identity
Formula: discoloured ~ treatment + (1 | week)
Data: ds.ip (Number of observations: 120)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~week (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.15 0.14 0.01 0.52 1.00 2566 3511
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.52 0.22 -0.95 -0.11 1.00 2892 4017
treatmentT1 1.21 0.29 0.64 1.77 1.00 4315 4530
treatmentT2 1.42 0.28 0.87 2.00 1.00 3710 5287
treatmentT3 1.32 0.28 0.77 1.88 1.00 3769 5241
treatmentT4 0.78 0.28 0.23 1.35 1.00 3684 5012
treatmentT5 1.35 0.29 0.77 1.91 1.00 4269 5308
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 4.21 0.50 3.29 5.24 1.00 5668 5141
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The output is very similar as what we had before. There is now extra output to show the estimate of phi (in this model a single unique estimate). In order to have interpretable numbers for the fixed effects from this fit we still need to backtransform (the inverse logit), exactly the same way as was shown for the binomial model earlier.
The phi parameter is already on the scale of the measurement. Hence no need for back transformation.
The output of the two alternative models looks a bit different and depends how the phi was modelled.
<- brm(data=ds.ip,
m7.brm bf(discoloured ~ treatment + (1|week),
~ treatment),
phi family = Beta(link="logit",link_phi = "log"),
chains = 4,cores = 4, iter=3000, warmup = 1000)
Compiling Stan program...
Start sampling
Warning: There were 18 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
summary(m7.brm)
Warning: There were 18 divergent transitions after
warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: beta
Links: mu = logit; phi = log
Formula: discoloured ~ treatment + (1 | week)
phi ~ treatment
Data: ds.ip (Number of observations: 120)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~week (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.20 0.19 0.01 0.70 1.00 1679 848
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.49 0.24 -0.96 -0.02 1.00 2008 1783
phi_Intercept 1.23 0.29 0.63 1.79 1.00 2888 3575
treatmentT1 1.12 0.32 0.50 1.73 1.00 3702 5253
treatmentT2 1.31 0.32 0.68 1.93 1.00 3345 4845
treatmentT3 1.33 0.29 0.75 1.89 1.00 2900 4425
treatmentT4 0.74 0.30 0.15 1.33 1.00 3413 4960
treatmentT5 1.40 0.28 0.84 1.94 1.00 3036 4707
phi_treatmentT1 -0.08 0.41 -0.89 0.74 1.00 3741 4214
phi_treatmentT2 -0.07 0.42 -0.89 0.76 1.00 3561 4500
phi_treatmentT3 0.55 0.43 -0.30 1.41 1.00 3570 4215
phi_treatmentT4 0.20 0.41 -0.61 1.01 1.00 3867 4931
phi_treatmentT5 0.72 0.44 -0.13 1.57 1.00 3560 5010
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
<- brm(data=ds.ip,
m8.brm bf(discoloured ~ treatment + (1|week),
~ (1|week)),
phi family = Beta(link="logit",link_phi = "log"),
chains = 4,cores = 4, iter=3000, warmup = 1000)
Compiling Stan program...
Start sampling
Warning: There were 12 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
summary(m8.brm)
Warning: There were 12 divergent transitions after
warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: beta
Links: mu = logit; phi = log
Formula: discoloured ~ treatment + (1 | week)
phi ~ (1 | week)
Data: ds.ip (Number of observations: 120)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~week (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.18 0.18 0.01 0.63 1.00 1431 2476
sd(phi_Intercept) 0.34 0.30 0.01 1.15 1.00 1449 807
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.57 0.23 -1.03 -0.13 1.00 2228 2821
phi_Intercept 1.46 0.23 1.03 1.98 1.01 1372 633
treatmentT1 1.30 0.30 0.72 1.92 1.00 3023 2516
treatmentT2 1.51 0.30 0.92 2.10 1.00 2763 3024
treatmentT3 1.40 0.30 0.82 1.99 1.00 3414 3920
treatmentT4 0.82 0.29 0.27 1.40 1.00 3206 2703
treatmentT5 1.43 0.30 0.85 2.01 1.00 2928 2893
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
When comparing the 3 beta models via the leave-one-out crossvalidaton, it turns out that there is no additional value to model phi in a more complex manner.
loo(m6.brm,m7.brm,m8.brm)
Warning: Found 1 observations with a pareto_k > 0.7 in model 'm8.brm'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching for
problematic observations.
Output of model 'm6.brm':
Computed from 8000 by 120 log-likelihood matrix
Estimate SE
elpd_loo 23.5 7.2
p_loo 8.4 1.4
looic -47.1 14.4
------
Monte Carlo SE of elpd_loo is 0.0.
All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
Output of model 'm7.brm':
Computed from 8000 by 120 log-likelihood matrix
Estimate SE
elpd_loo 20.6 7.0
p_loo 14.2 2.2
looic -41.2 14.1
------
Monte Carlo SE of elpd_loo is 0.1.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 114 95.0% 1327
(0.5, 0.7] (ok) 6 5.0% 437
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
Output of model 'm8.brm':
Computed from 8000 by 120 log-likelihood matrix
Estimate SE
elpd_loo 22.5 7.7
p_loo 11.5 2.2
looic -44.9 15.5
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 118 98.3% 623
(0.5, 0.7] (ok) 1 0.8% 2841
(0.7, 1] (bad) 1 0.8% 124
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
Model comparisons:
elpd_diff se_diff
m6.brm 0.0 0.0
m8.brm -1.1 1.4
m7.brm -2.9 2.3
5.1.2 Beta regression with inla
For the record, model m6 in an inla-version
<- inla(data=ds.ip, discoloured ~ treatment + f(week, model = "iid"),
m6.inla family="beta",
control.family = list(control.link=list(model="logit")),
control.compute=list(config=TRUE)
)
summary(m6.inla)
Call:
c("inla.core(formula = formula, family = family, contrasts = contrasts,
", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
verbose, ", " lincomb = lincomb, selection = selection, control.compute
= control.compute, ", " control.predictor = control.predictor,
control.family = control.family, ", " control.inla = control.inla,
control.fixed = control.fixed, ", " control.mode = control.mode,
control.expert = control.expert, ", " control.hazard = control.hazard,
control.lincomb = control.lincomb, ", " control.update =
control.update, control.lp.scale = control.lp.scale, ", "
control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
working.directory = working.directory, ", " silent = silent, inla.mode
= inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
.parent.frame)")
Time used:
Pre = 1.01, Running = 0.319, Post = 0.216, Total = 1.55
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) -0.504 0.197 -0.897 -0.502 -0.124 NA 0
treatmentT1 1.195 0.281 0.648 1.194 1.753 NA 0
treatmentT2 1.402 0.285 0.850 1.400 1.967 NA 0
treatmentT3 1.296 0.283 0.746 1.294 1.857 NA 0
treatmentT4 0.763 0.277 0.223 0.762 1.310 NA 0
treatmentT5 1.331 0.283 0.780 1.329 1.893 NA 0
Random effects:
Name Model
week IID model
Model hyperparameters:
mean sd 0.025quant
precision parameter for the beta observations 4.24 5.05e-01 3.32
Precision for week 16763.29 1.60e+04 1087.09
0.5quant 0.975quant mode
precision parameter for the beta observations 4.21 5.30 NA
Precision for week 11879.99 59276.52 NA
Marginal log-Likelihood: 3.16
is computed
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
5.1.3 Beta regression in lme4
Beta error distributions are not implemented in lme4. The alternative R packages glmmTMB does handle more diverse error distributions than lme4. The beta distribution is one of them. See the documentation for glmmTMB.
5.2 Data set: seedling height
In a sowing experiment various seed treatments are tested to see whether they can enhance earlier development of the seedling. Seeds are individually sown in soil plugs. After 3 weeks the height of the seedlings is measured and majority ranges between 8 and 16 cm at that time. The height will be used to compare the treatments on their plant stimulating effect.
However, not all seeds germinated or the sprout never or hardly appeared at the surface. About 5% will never become a normal healthy plant. Some of this could be genetic defects, but it may have been caused by the seed treatment as well.
Ignore these seedlings could create a bias. Including all the zeros or very small numbers will render the error distribution not-normal.
The experiment tested 4 different seed coats (T1 to T4), each with 120 seeds, sown in 6 complete blocks.
Below histograms of the seedling height in cm after 3 weeks by seed coat. Plant height smaller than 3 cm were set to 0, as they will die off anyhow. There seem to be a block effect as well on the number of not properly established seedlings (height = 0).
<- readRDS("_book/data/Seedlings.obj")
ds.sd %>% ggplot(aes(x=height)) +
ds.sd geom_histogram(binwidth=0.5) +
facet_wrap(~seedcoat)
%>% filter(height == 0) %>% group_by(Block) %>% count() ds.sd
# A tibble: 6 × 2
# Groups: Block [6]
Block n
<fct> <int>
1 B1 8
2 B2 7
3 B3 3
4 B4 4
5 B5 5
6 B6 7
Below we model this data as a hurdle model to take care of the zeros and a lognormal distribution for the continous non-zero part of the heights.
<- brm(data=ds.sd,
m8.brm bf(height ~ seedcoat + (1|Block),
~ seedcoat + (1|Block)
hu
),family = hurdle_lognormal(),
iter = 2500,warmup = 500,
control = list(adapt_delta = 0.95,max_treedepth = 10),
chains = 4,cores = 4)
Compiling Stan program...
Start sampling
Warning: There were 1 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
summary(m8.brm)
Warning: There were 1 divergent transitions after
warmup. Increasing adapt_delta above 0.95 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: hurdle_lognormal
Links: mu = identity; sigma = identity; hu = logit
Formula: height ~ seedcoat + (1 | Block)
hu ~ seedcoat + (1 | Block)
Data: ds.sd (Number of observations: 480)
Draws: 4 chains, each with iter = 2500; warmup = 500; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~Block (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.05 0.00 0.18 1.00 2424 3161
sd(hu_Intercept) 0.32 0.29 0.01 1.06 1.00 3041 3262
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.23 0.06 1.12 1.35 1.00 5275 4661
hu_Intercept -2.54 0.39 -3.32 -1.82 1.00 6544 5931
seedcoatT2 -0.16 0.07 -0.30 -0.02 1.00 7102 6171
seedcoatT3 -0.07 0.07 -0.21 0.07 1.00 6960 5698
seedcoatT4 0.30 0.07 0.16 0.43 1.00 7238 6329
hu_seedcoatT2 -0.29 0.53 -1.36 0.74 1.00 7334 5197
hu_seedcoatT3 0.49 0.45 -0.39 1.39 1.00 7403 6126
hu_seedcoatT4 -0.90 0.63 -2.23 0.29 1.00 7599 5558
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.53 0.02 0.50 0.57 1.00 10918 6365
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
<- ds.sd %>% select(seedcoat) %>% distinct()
nd <- bind_cols(nd,fitted(m8.brm,newdata=nd,summary=TRUE,scale="response",re_formula = NA))
fit.m8.brm fit.m8.brm
seedcoat Estimate Est.Error Q2.5 Q97.5
1 T1 3.654191 0.2415263 3.195151 4.140889
2 T2 3.170830 0.2020734 2.787290 3.586448
3 T3 3.244804 0.2285700 2.802152 3.710929
4 T4 5.155251 0.3121723 4.567153 5.791892
To check what the difference is with fitting a general linear model with
- all data ignoring the fact that there are many zeros
- data filtered heights > 0
<- lmer(data=ds.sd,
m8.lmer ~ seedcoat + (1|Block)
height )
boundary (singular) fit: see help('isSingular')
<- bind_cols(nd,pred.hu = fit.m8.brm$Estimate, pred.all = predict(m8.lmer,newdata=nd,re.form = NA)) fits.m8
<- lmer(data=ds.sd %>% filter(height >0),
m8bis.lmer ~ seedcoat + (1|Block)
height
)<- bind_cols(fits.m8,pred.wozero = predict(m8bis.lmer,newdata=nd,re.form = NA))
fits.m8 fits.m8
seedcoat pred.hu pred.all pred.wozero
1 T1 3.654191 3.492267 3.775557
2 T2 3.170830 3.152294 3.348147
3 T3 3.244804 3.221363 3.646893
4 T4 5.155251 4.736259 4.900647