4 More complex questions

4.1 Contrast based on main model

4.1.1 Exploit the Posterior sample of a brms-fit

First we come back to the brms fit.

To show that this is a summary describing a large sample, we will dig a little deeper. We retrieve the posterior sample for all the parameters composing this model.

postsamples <- brms::as_draws_df(m1.brm)
str(postsamples)

draws_df [8,000 × 17] (S3: draws_df/draws/tbl_df/tbl/data.frame)
 $ b_Intercept            : num [1:8000] -1.15 -1.06 -1.09 -2.62 -2.29 ...
 $ b_treatmentT1          : num [1:8000] -0.0322 -0.0206 0.0426 0.7192 0.2711 ...
 $ b_treatmentT2          : num [1:8000] 0.1003 -0.0481 0.0862 0.8712 0.359 ...
 $ b_treatmentT3          : num [1:8000] -0.0701 -0.0192 -0.2258 0.6996 0.479 ...
 $ b_treatmentT4          : num [1:8000] 0.493 0.492 0.355 1.293 1.043 ...
 $ b_treatmentT5          : num [1:8000] 0.5108 0.0698 0.4967 0.9935 0.741 ...
 $ sd_week__Intercept     : num [1:8000] 0.671 0.734 0.741 1.193 0.674 ...
 $ r_week[week1,Intercept]: num [1:8000] 0.143 0.175 -0.083 1.14 0.893 ...
 $ r_week[week2,Intercept]: num [1:8000] -0.352 -0.371 -0.128 0.724 0.716 ...
 $ r_week[week3,Intercept]: num [1:8000] -1.324 -0.917 -1.311 -0.417 -0.43 ...
 $ r_week[week4,Intercept]: num [1:8000] -0.7146 -0.575 -0.6478 0.0727 0.0645 ...
 $ r_week[week5,Intercept]: num [1:8000] -0.6706 -1.0757 -0.8417 -0.0184 0.0984 ...
 $ lprior                 : num [1:8000] -3.29 -3.3 -3.3 -3.63 -3.51 ...
 $ lp__                   : num [1:8000] -207 -207 -208 -203 -203 ...
 $ .chain                 : int [1:8000] 1 1 1 1 1 1 1 1 1 1 ...
 $ .iteration             : int [1:8000] 1 2 3 4 5 6 7 8 9 10 ...
 $ .draw                  : int [1:8000] 1 2 3 4 5 6 7 8 9 10 ...

The data frame consist of the expected 8000 samples for a large number of variables.

The variables starting with b_ are the fixed effects, one for the intercept (= reference) and one effect compared to that reference for each of the treatments. The variable starting with sd_ is the standard deviation of the random effects, and the ones starting with r_ are the blups (if you don’t know what a blup is, reread the text on LMM). The rest of the variables are of no importance at this moment.

For instance, the sample for treatment T1 looks like this.

postsamples %>% ggplot(aes(x=b_treatmentT1)) + geom_histogram(bins=100)

The mean is around -1.6 and the bulk of the sample ranges between -2.7 and +1.4.

We can take aside the fixed effect variables from this dataframe and back transform them.

However, before we can backtransform the samples, we need to turn the effects of T1 to T5 to absolute value by adding the intercept to the effects for coefficients 2 to 6.

postsamples <- postsamples %>% select(starts_with("b_")) %>% mutate(across(starts_with("b_tr"),~ . + b_Intercept))

Warning: Dropping 'draws_df' class as required metadata was removed.

Then we back-transform:

postsamples.backtransformed <- postsamples %>% inv_logit()

And finally summarize these samples, for instance by calculating the mean and median and the quantiles 0.025 and 0.975, followed by some formatting for readability.

abs.m1.brm.mean <- apply(postsamples.backtransformed,2,mean)
abs.m1.brm.quants <- apply(postsamples.backtransformed,2,quantile,c(0.5,0.025,0.975)) %>% t()
dimnames(abs.m1.brm.quants)[[2]] <- c("median","low","up") 

abs.m1.brm <- bind_cols(mean=abs.m1.brm.mean,abs.m1.brm.quants) %>% as_tibble(rownames="treatment") %>% 
  mutate(treatment = case_when(treatment == "b_Intercept" ~ "reference",
                               TRUE ~ sub("b_treatment","",treatment)))

abs.m1.brm

# A tibble: 6 × 5
  treatment  mean median    low    up
  <chr>     <dbl>  <dbl>  <dbl> <dbl>
1 1         0.124  0.118 0.0567 0.231
2 2         0.163  0.156 0.0759 0.291
3 3         0.180  0.173 0.0848 0.315
4 4         0.175  0.169 0.0826 0.312
5 5         0.273  0.266 0.142  0.440
6 6         0.217  0.210 0.106  0.370

This approach leads immediately to all we need: estimates of mean and median and credible intervals.

Compare this to preds.m1.brms as it was obtained via predict.

[To be complete: check also fitted and conditional.effects.]

We can follow the same approach to get any contrast. For instance, if we need the difference between T1 and T2, and between T4 and T5, we can simple do the following:

(postsamples.backtransformed[["b_treatmentT1"]] - postsamples.backtransformed[["b_treatmentT2"]]) %>% quantile(c(0.,0.025,0.975))

         0%        2.5%       97.5% 
-0.19246662 -0.09067268  0.05151396

(postsamples.backtransformed[["b_treatmentT4"]] - postsamples.backtransformed[["b_treatmentT5"]]) %>% quantile(c(0.,0.025,0.975))

         0%        2.5%       97.5% 
-0.08942922 -0.02104247  0.13961437

According to the same strategy we can also calculate p-value-like probabilities, for instance the probability that T1 leads to more cell infection than T2.

(postsamples.backtransformed[["b_treatmentT1"]] > postsamples.backtransformed[["b_treatmentT2"]]) %>% sum()/nrow(postsamples.backtransformed)

[1] 0.310875

…or he probability that T4 results in 10 percent points more infection than the reference.

(postsamples.backtransformed[["b_treatmentT4"]] > (postsamples.backtransformed[["b_Intercept"]] + 0.1))  %>% sum()/nrow(postsamples.backtransformed)

[1] 0.855375

The approach of exploiting the posterior sample is handy if you have such sample. The two other model strategies, don’t work with posterior samples, yet we can produce a similar sample that can be sued in a similar manner.

4.1.2 Contrasts with inla

Inla allows us to draw samples from the posterior distribution that was derived during modelling via the funtion inla.posterior.sample. Below we ask for 5000 samples.

m1.inla.postsamples <-  inla.posterior.sample(5000, result = m1.inla)

The object produced by this function is rather complex so we wrote a function to help extract the fixed effects.

extract.fe <- function(x,fe) {
  
  retain <- paste0("Intercept|",fe)
  
  ff <- function(a){
    lat <- a$latent
    lat[grepl(retain,rownames(lat)),]
  }
  purrr::map_df(x,ff) %>% 
    rename_with(.fn=~sub(fe,"",sub(":1","",.)))
}

m1.inla.postsamples.reformatted <- m1.inla.postsamples %>% extract.fe("treatment")

m1.inla.postsamples.reformatted

# A tibble: 5,000 × 6
   `(Intercept)`      T1       T2    T3    T4    T5
           <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl>
 1         -1.95  0.0368  0.390   0.106 0.849 0.550
 2         -1.71  0.295   0.558   0.130 0.968 0.530
 3         -1.92  0.410   0.370   0.216 1.07  0.523
 4         -2.09  0.335   0.752   0.667 0.971 0.790
 5         -2.02  0.489   0.359   0.676 1.31  0.762
 6         -2.40  0.381   0.602   1.10  1.26  0.971
 7         -1.78 -0.365  -0.00841 0.127 0.676 0.468
 8         -2.15  0.525   0.421   0.302 1.46  0.860
 9         -1.98  0.523   0.417   0.246 0.819 0.643
10         -2.01  0.478   0.610   0.387 1.17  0.926
# ℹ 4,990 more rows

These samples are effects on the modelling scale, hence you need to make them absolute and back transform them.

m1.inla.postsamples.backtransformed <- m1.inla.postsamples.reformatted %>% 
  mutate(across(starts_with("T"),.fns = ~.+`(Intercept)`)) %>% 
  inv_logit()

This can be summarized as we did before:

abs.fe <- m1.inla.postsamples.backtransformed %>% apply(2,quantile,c(.5,.025,0.975))
abs.fe %>% t()

                  50%       2.5%     97.5%
(Intercept) 0.1177404 0.06876261 0.1855954
T1          0.1531552 0.09552408 0.2369887
T2          0.1698372 0.10688154 0.2569410
T3          0.1653649 0.10382939 0.2505279
T4          0.2620180 0.17477292 0.3681598
T5          0.2073325 0.13201566 0.3040280

effect.fe <- m1.inla.postsamples.backtransformed %>% mutate(across(everything(),.fns = ~.-`(Intercept)`)) %>% apply(2,quantile,c(.5,.025,0.975))
effect.fe %>% t()

                   50%         2.5%     97.5%
(Intercept) 0.00000000  0.000000000 0.0000000
T1          0.03506838 -0.025502931 0.1008312
T2          0.05088236 -0.007952413 0.1216959
T3          0.04615486 -0.012484487 0.1152201
T4          0.14238461  0.070665576 0.2269822
T5          0.08788686  0.024688792 0.1641485

(m1.inla.postsamples.backtransformed[["T1"]] > m1.inla.postsamples.backtransformed[["T2"]]) %>% sum()/nrow(m1.inla.postsamples.backtransformed)

[1] 0.3052

(m1.inla.postsamples.backtransformed[["T4"]] > (m1.inla.postsamples.backtransformed[["(Intercept)"]] + 0.1))  %>% sum()/nrow(m1.inla.postsamples.backtransformed)

[1] 0.8702

4.1.3 Contrasts with glmer

Again a very similar approach based on sampling is recommended fro the glmer model. In this case we do not have a posterior distribution. This role is taken up by bootstrap samples. See the [companion text] (https://hw-appliedlinmixmodinr.netlify.app/extractresult#bootstrapping-for-calculating-confidence-intervals) for more explanation on bootstrapping lmer models.

nd <- data.frame(treatment = unique(ds.cd$treatment))
bootfun <- function(mod){
  predict(mod,newdata=nd,re.form = NA)
}

bs <- bootMer(m1.glmer,bootfun,nsim=500)

T2 vs T1

quantile(inv_logit(bs$t[,3]) - inv_logit(bs$t[,2]),c(0.5,0.025,0.975))

        50%        2.5%       97.5% 
 0.01632939 -0.05190485  0.07844690

T4 vs intercept

quantile(inv_logit(bs$t[,5]) - inv_logit(bs$t[,1]),c(0.5,0.025,0.975))

       50%       2.5%      97.5% 
0.14190254 0.06942526 0.22853234

4.2 Factorial experiments

The treatments T1 to T5 in the cell infection experiment are not just juxtaposed variations of the reference protocol. There is structure among them.

The variations are combinations of a longer exposure and three different ingredients.

ds.cd %>% select(treatment,exposure,product) %>% distinct()

  treatment  exposure      product
1 reference    normal     standard
2        T1 prolonged     standard
3        T2    normal improvementA
4        T3 prolonged improvementA
5        T4    normal improvementB
6        T5 prolonged improvementB

Alternative models can be fit to see whether changes are caused by exposure, product or an interaction between them. There is the full model with both main factors and the interaction (see m2 below). One simpler alternative is the model without the interaction but with the two main factors (m3). If m3 fits as well as m2 it is unlikely that there is an interaction or that it is important compared to the main effects.

The same reasoning we can follow to see if both main factors are contributing or only one of them (models m4 and m5).

We have seen that brms provided most reliable results, but it takes time to calculate. I you are not in a hurry you can try the code below. The models can be compared via a leave-one-out crossvalidation via loo.

notinahury <- FALSE

if(notinahury){
  m2.brm <- brm(data=ds.cd, infected|trials(ntot) ~ product*exposure + (1|week),family = binomial(link="logit"),
            chains = 4,cores = 4, iter=3000, warmup = 1000)
  m3.brm <- brm(data=ds.cd, infected|trials(ntot) ~ product+exposure + (1|week),family = binomial(link="logit"),
            chains = 4,cores = 4, iter=3000, warmup = 1000)
  m4.brm <- brm(data=ds.cd, infected|trials(ntot) ~ product + (1|week),family = binomial(link="logit"),
            chains = 4,cores = 4, iter=3000, warmup = 1000)
  m5.brm <- brm(data=ds.cd, infected|trials(ntot) ~ exposure + (1|week),family = binomial(link="logit"),
            chains = 4,cores = 4, iter=3000, warmup = 1000)
  
  loo(m1.brm, m2.brm,m3.brm,m4.brm,m5.brm)

}

If you do this, you will see that loo will rank the models from most preferred to least preferred: m4 > m3 > m1 > m2 > m5. This indicates that product is the the only factor affecting infection in this experiment.

So here the faster inla come in handy. Mind you that waic=TRUE has been added to the control option list. This will add the waic or Watanabee-Akaiki information criterion to the output. This criterion penalizes for complexity: when two models fit similarly, the simpler one will be preferred. The smaller the waic the better.

m2.inla <- inla(data=ds.cd, infected ~ product * exposure + f(week, model = "iid"), Ntrials= ntot, 
           family="binomial", 
           control.family = list(control.link=list(model="logit")),
           control.compute=list(config=TRUE, waic = TRUE)
            )

m3.inla <- inla(data=ds.cd, infected ~ product + exposure + f(week, model = "iid"), Ntrials= ntot, 
           family="binomial", 
           control.family = list(control.link=list(model="logit")),
           control.compute=list(config=TRUE, waic = TRUE)
            )

m4.inla <- inla(data=ds.cd, infected ~ product + f(week, model = "iid"), Ntrials= ntot, 
           family="binomial", 
           control.family = list(control.link=list(model="logit")),
           control.compute=list(config=TRUE, waic = TRUE)
            )

m5.inla <- inla(data=ds.cd, infected ~ exposure + f(week, model = "iid"), Ntrials= ntot, 
           family="binomial", 
           control.family = list(control.link=list(model="logit")),
           control.compute=list(config=TRUE, waic = TRUE)
            )

The waic stored in the model object can be accessed:

m2.inla$waic$waic

[1] 402.5353

m3.inla$waic$waic

[1] 401.9444

m4.inla$waic$waic

[1] 399.9716

m5.inla$waic$waic

[1] 415.2754

This results indicate a preference for m4.