proof of concept for posterior_pit by avehtari · Pull Request #1857 · paul-buerkner/brms

avehtari · 2026-02-20T13:49:44Z

This is a proof of concept branch and far from ready to be merged, but easier to show a possible way to solve issue #1855

To avoid duplication of code in brms, the idea is to use posterior_predict(), but add a new argument type with default value "r" referring to random draws. New type="p" would instead return CDFs / PITs.

PITs = posterior_predict(fit, type = "p")
PIT_post = colMeans(PITs)
PIT_loo = E_loo(PITs, loo(fit)$psis_object)$value

These PIT values are more useful than the ones based on ranks.

The proof of concept works with families "gaussian" and "student_t" without truncation. If the general idea is acceptable then support for the rest of the families and truncation can be added.

Adding here that natural other types are "d" for posterior predictive densities

ppd = posterior_predict(fit, type = "d")

and
"q" for quantiles that are more accurate than current rank based quantiles (used e.g. by predictive_interval())

ppqs = posterior_predict(fit, type = "q", probs = c(0.05, 0.95))
ppq = apply(ppqs, 1, mean)

paul-buerkner

Thanks! I like the idea. Please find some comments to code structure and naming in my review.

paul-buerkner · 2026-02-23T08:32:09Z

  object, newdata = NULL, re_formula = NULL, re.form = NULL,
  transform = NULL, resp = NULL, negative_rt = FALSE,
-  ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5,
+  ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, type = "r",


Can we think of a more informative name than "type" and more informative options that just "r" and "p"?

I was thinking that "r", "p", "q", and "d" would naturally refer to how the distribution functions are named in R. Looking at R documentation these could be also named "random", "probability", "quantile", and "density".

I've not been able to come up with better name than "type"

How about "kind"?

similarily unspecific. Let' s go with "type" for now until we have a better idea.

Alternative argument names could be output, return_type, or mode.

paul-buerkner · 2026-02-23T08:32:36Z

  prep <- prepare_predictions(
    object, newdata = newdata, re_formula = re_formula, resp = resp,
-    ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ...
+    ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, type = type, ...


why does prepare predictions need to know the type?

It's likely that it doesn't, but I was struggling to see the structure of the code, and possibly put it in more places than needed

I don't think it does

paul-buerkner · 2026-02-23T08:34:41Z

-    mean = mu, sd = sigma,
-    lb = prep$data$lb[i], ub = prep$data$ub[i],
-    ntrys = ntrys
+  switch(type,


whe should solve this with a better abstraction. Changing the code of each of the families using switch is unnecessary. Add another layer of a function that handles the two types once and can be called (instead of just rcontinous) in the family-specific functions

I don't understand. Can explain more or modify this PR to show it with an example?

Should we have a short zoom call about this?

yes. will write you on slack

How about something like the following? We have a helper function that prepares the switching and can be called within the corresponding distributions. In the following a (non-polished) example for a helper function and its function call in the two example distributions (gaussian and student). (Note I changed the argument type to output in the following code... just to get an idea how it would look like)

.predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) { lb <- prep$data$lb[i] ub <- prep$data$ub[i] if (output == "probability") { q <- prep$data$Y[i] return(pcontinuous( q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ndraws = prep$ndraws, ... )) } else if (output == "random") { return(rcontinuous( n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... )) } } posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) { mu <- get_dpar(prep, "mu", i = i) sigma <- get_dpar(prep, "sigma", i = i) sigma <- add_sigma_se(sigma, prep, i = i) .predict_continuous_helper( output = output, prep = prep, i = i, ntrys = ntrys, dist = "norm", mean = mu, sd = sigma ) } posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) { nu <- get_dpar(prep, "nu", i = i) mu <- get_dpar(prep, "mu", i = i) sigma <- get_dpar(prep, "sigma", i = i) sigma <- add_sigma_se(sigma, prep, i = i) .predict_continuous_helper( output = output, prep = prep, i = i, ntrys = ntrys, dist = "student_t", df = nu, mu = mu, sigma = sigma ) }

paul-buerkner · 2026-02-23T08:35:08Z

+    pdist <- paste0("p", dist)
+    out <- do_call(pdist, c(list(n), args))
+  } else {
+    error("not implemented yet")


are you planning to implement this part too as part of this PR?

I can, if we go forward with the plan

Okay. Let's discuss the plan first and then move this one forward too

paul-buerkner · 2026-02-25T07:22:51Z

Yes this is exactly what I meant! Florence Bockting ***@***.***> schrieb am Di. 24. Feb. 2026 um 20:06:

…

***@***.**** commented on this pull request. ------------------------------ In R/posterior_predict.R <#1857 (comment)>: > mu <- get_dpar(prep, "mu", i = i) sigma <- get_dpar(prep, "sigma", i = i) sigma <- add_sigma_se(sigma, prep, i = i) - rcontinuous( - n = prep$ndraws, dist = "norm", - mean = mu, sd = sigma, - lb = prep$data$lb[i], ub = prep$data$ub[i], - ntrys = ntrys + switch(type, How about something like the following? We have a helper function that prepares the switching and can be called within the corresponding distributions. In the following a (non-polished) example for a helper function and its function call in the two example distributions (gaussian and student). (Note I changed the argument type to output in the following code... just to get an idea how it would look like) .predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) { lb <- prep$data$lb[i] ub <- prep$data$ub[i] if (output == "probability") { q <- prep$data$Y[i] return(pcontinuous( q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ndraws = prep$ndraws, ... )) } else if (output == "random") { return(rcontinuous( n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... )) } } posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) { mu <- get_dpar(prep, "mu", i = i) sigma <- get_dpar(prep, "sigma", i = i) sigma <- add_sigma_se(sigma, prep, i = i) .predict_continuous_helper( output = output, prep = prep, i = i, ntrys = ntrys, dist = "norm", mean = mu, sd = sigma ) } posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) { nu <- get_dpar(prep, "nu", i = i) mu <- get_dpar(prep, "mu", i = i) sigma <- get_dpar(prep, "sigma", i = i) sigma <- add_sigma_se(sigma, prep, i = i) .predict_continuous_helper( output = output, prep = prep, i = i, ntrys = ntrys, dist = "student_t", df = nu, mu = mu, sigma = sigma ) } — Reply to this email directly, view it on GitHub <#1857 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2AAA5QELHJGMCXTWVBL4NSOLVAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNBZHE4TMMJZGE> . You are receiving this because you commented.Message ID: ***@***.***>

florence-bockting · 2026-02-28T14:32:05Z

+    F_q <- do_call(pdist, c(list(q), args))
+    # include (lb - 1) to treat lb as inclusive 
+    # this is only relevant for discrete distributions
+    F_lb <- do_call(pdist, c(list(lb - 1), args))


Here I am a bit unsure how to treat this correctly. I would have treated the lower bound as exclusive but this would have been inconsistent with the rdiscrete function.
Thus, I set lb - 1 in order to ensure that the lower bound is inclusive but I was uncertain here.

It should be consistent with how Stan handles it. I am not 100% anymore how it does. Can you double check to match the behavior?

florence-bockting · 2026-03-13T20:58:02Z

I have now updated the proof of concept for 4 functions:

posterior_predict_gaussian()
posterior_predict_student()
posterior_predict_binomial()
posterior_predict_poisson()

They include two additional new arguments:

output (which can take currently the values random or probability) and
randomized (which can be NULL or a boolean).

The current implementation follows the idea that we can call:

rpred <- posterior_predict(fit_norm)
- computes random draws from the normal predictive
- output = "random" and randomized = NULL by default
ppred <- posterior_predict(fit_norm, output = "probability")
- computes cdf of the normal predictive
- randomized = NULL by default
rpred <- posterior_predict(fit_pois)
- computes random draws from the poisson predictive
- output = "random" and randomized = NULL by default
ppred <- posterior_predict(fit_pois, output = "probability")
- computes randomized PITs from the poisson predictive
- randomized = TRUE by default
ppred <- posterior_predict(fit_pois, output = "probability", randomized = FALSE)
- computes the non-randomized PITs from the poisson predictive, which are currently defined as the mid points (F(x) +F(x-1)) * 0.5

avehtari · 2026-03-14T12:13:16Z

computes the non-randomized PITs from the poisson predictive, which are currently defined as the mid points (F(x) +F(x-1)) * 0.5

I think these should be just F(x) to follow what ppois etc. do. For the same reason, I think that it might be more logical to have non-randomized as the default

avehtari · 2026-03-15T16:18:09Z

Or we could have separately output options probability (not randomized F(x)) and pit randomized for discrete

florence-bockting · 2026-03-16T06:50:29Z

Or we could have separately output options probability (not randomized F(x)) and pit randomized for discrete

Okay, one option could be to have for outcome the values random, probability, density, quantile, and pit. Whereby pit is only a valid option for discrete cases. Cons are that there is a conceptual overlap between pit and probability and pit is only a valid choice for a subclass of inputs. Pro is that we have all alternatives in one function and no additional argument (like randomized).

An alternative is to outsource the pit case entirely and to have an additional function such as posterior_pit(fit_pois) that applies only to discrete cases and additionally, we have posterior_predict(fit_pois, outcome = "probability") supporting the default implementations like ppois(). In my opinion, this option is from a design point relatively clean, as we don't have the problem that there are "cases" which work/don't work. However, we would have to introduce an additional function.

paul-buerkner · 2026-03-16T07:06:28Z

I would argue putting everything in one function. The code overhead is otherwise way too high. I like the probability + randomized argument approach. The only question is what the default for randomized is? Aki can you elaborate which use cases you had for much choice of randomized? Florence Bockting ***@***.***> schrieb am Mo. 16. März 2026 um 07:50:

…

*florence-bockting* left a comment (paul-buerkner/brms#1857) <#1857 (comment)> Or we could have separately output options probability (not randomized F(x)) and pit randomized for discrete Okay, one option could be to have for outcome the values random, probability, density, quantile, and pit. Whereby pit is only a valid option for discrete cases. Cons are that there is a conceptual overlap between pit and probability and pit is only a valid choice for a subclass of inputs. Pro is that we have all alternatives in one function and no additional argument (like randomized). An alternative is to outsource the pit case entirely and to have an additional function such as posterior_pit(fit_pois) that applies only to discrete cases and additionally, we have posterior_predict(fit_pois, outcome = "probability") supporting the default implementations like ppois(). In my opinion, this option is from a design point relatively clean, as we don't have the problem that there are "cases" which work/don't work. However, we would have to introduce an additional function. — Reply to this email directly, view it on GitHub <#1857 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2ACTEEEYGBYRD56TRK34Q6P4VAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANRVGQ3DMNZVHA> . You are receiving this because you commented.Message ID: ***@***.***>

avehtari · 2026-03-16T07:55:41Z

Okay, one option could be to have for outcome the values random, probability, density, quantile, and pit. Whereby pit is only a valid option for discrete cases

In this case pit for continuous would be valid, but the result would be the same as with probability

PITs (with randomization for discrete) are needed fot PIT and LOO-PIT uniformity checks (pp_check(..., type = "pit_ecdf|loo_pit_ecdf"))
Probabilities (non-randomized) are needed for calibration plots (reliabilitydiag type plots), for which we have PR in bayesplot PPC Calibration plots stan-dev/bayesplot#352
Probabilities are useful in general for making predictions and modeling visualization
If the outcome options are random, probability, density, quantile, and pit, we don't need separate argument for randomization, and pit computes such PIT values which are useful for uniformity tests both for continuous and discrete case

paul-buerkner · 2026-03-16T08:00:04Z

I am fine with option random, probability, density, quantile, and pit (or whatever subset we go with first), as long all all is realized in the same function (posterior_predict). Otherwise code overhead is too high. If we want, we can still have lightweight wrappers around posterior_predict with output argument fixed.

florence-bockting · 2026-03-16T08:05:36Z

I am fine with option random, probability, density, quantile, and pit (or whatever subset we go with first), as long all all is realized in the same function (posterior_predict).

Alright. We definitely stick with one function - the idea of an extra function is out.
I will change the implementation to having posterior_predict with output = (random, probability, density, quantile, pit). Whereby pit is the randomized pit for discrete distributions and the "normal" cdf for continuous distributions.

florence-bockting · 2026-03-16T08:13:12Z

I'm fine with this. I usually prefer smaller PRs anyway.

avehtari · 2026-03-16T08:32:45Z

How is the log_lik method related to the envisioned density option?

log_lik was meant to compute log likelihood which is used to remove information from the posterior to get leave-one-*-out posterior. log_lik as a function name is confusing when we want to compute predictive density. There is a predictive diagnostic topic in my research plan that requires predictive densities and having this functionality would make experiments easier. Having posterior_predict(..., output="density") would 1) be more logical when we want predictive densities, 2) be logical option to be in the same function as "random", "probability", and "quantile" to complete the usual quartet of distributions (r, p, q, d).

We could start with random, probability, pit. And the worry about the others afterwards? This could simplify this PR and reviewing it.

I'm fine that it would be added later, but wanted to keep in the plans, so that it will be easier to implement later.

paul-buerkner · 2026-03-16T08:49:13Z

Okay, that's fine and sounds like a good plan. @florence-bockting please let me know when (now already?) you want my initial review on your draft PR. Just having support for a few families is sufficient for me to initially review it.

florence-bockting · 2026-03-16T10:51:33Z

Okay, that's fine and sounds like a good plan. @florence-bockting please let me know when (now already?) you want my initial review on your draft PR. Just having support for a few families is sufficient for me to initially review it.

Alright, go ahead @paul-buerkner

florence-bockting · 2026-04-08T07:11:53Z

Okay, that's fine and sounds like a good plan. @florence-bockting please let me know when (now already?) you want my initial review on your draft PR. Just having support for a few families is sufficient for me to initially review it.

Alright, go ahead @paul-buerkner

Just a quick check-in and reminder @paul-buerkner

paul-buerkner

Getting there, thank you! Here are my comments and change requests

paul-buerkner · 2026-04-13T06:43:48Z

Just leave it as is I meant. Florence Bockting ***@***.***> schrieb am Mo. 13. Apr. 2026 um 08:42:

…

***@***.**** commented on this pull request. ------------------------------ In R/posterior_predict.R <#1857 (comment)>: > @@ -92,12 +97,15 @@ posterior_predict.brmsfit <- function( contains_draws(object) object <- restructure(object) prep <- prepare_predictions( - object, newdata = newdata, re_formula = re_formula, resp = resp, + object, Do you want every argument on a new line or do you make a differentiation between required and optional arguments, thus: prepare_predictions( object, newdata = newdata, re_formula = re_formula, resp = resp, ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ... ) or prepare_predictions( object, newdata = newdata, re_formula = re_formula, resp = resp, ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ... ) — Reply to this email directly, view it on GitHub <#1857 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2AHMN2BOUSYKZPT2ZE34VSD63AVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAOJWHAYTANJXGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

… of tibble

…n posterior_predict

…functionality

avehtari · 2026-04-14T18:19:46Z

Would it be too much trouble to allow argument lower.tail=FALSE for output="probability"? And if that, then maybe log for density and log.p for probability and quantile?

…milies

proof of concept for posterior_pit

13427fe

paul-buerkner requested changes Feb 23, 2026

View reviewed changes

florence-bockting added 4 commits February 27, 2026 08:02

switch 'type' to 'output'; add wrapper function

1b046eb

add test for posterior_predict with output arg

5102613

fix 'probability' method for posterior_predict

bbfd28c

add test for posterior_predict_gaussian

780ad34

florence-bockting mentioned this pull request Feb 27, 2026

related to "proof of concept for posterior_pit" PR in brms avehtari/brms#1

Merged

5 tasks

florence-bockting and others added 10 commits February 27, 2026 14:39

fix setting of q

3106b93

add failing test if q is assigned wrongly

232cf1d

add cdf for truncated cont. distr

a898b24

add test for truncated posterior_predict_gaussian

94cb02a

add test for posterior_predict_student

18667c1

remove dot from predict_continuous_helper for consistency

2849ae3

support of posterior_predict for discrete distributions

f5012d8

add test for posterior_predict_binomial

456d4af

add posterior_predict_poisson with support of diff. output values

a6f2beb

add test for posterior_predict_poisson

41f5de0

florence-bockting reviewed Feb 28, 2026

View reviewed changes

Comment thread tests/testthat/tests.posterior_predict.R

Florence Bockting added 2 commits March 13, 2026 21:14

ignore agent skills

070016a

update posterior_predict() with output argument

ee40ef6

Florence Bockting added 2 commits March 16, 2026 12:43

update posterior_predict with outcome values probability, random, pit

6a0a2ce

simplify switch case

5a03d33

Florence Bockting added 2 commits March 23, 2026 08:58

adjust code style

7220a6d

refactor: remove unnessary wrapper

cad4b8e

paul-buerkner requested changes Apr 10, 2026

View reviewed changes

Florence Bockting added 12 commits April 13, 2026 11:48

chore: update .gitignore to include skills

462cc30

build(deps): remove truncnorm and dplyr from Suggests

6cdf33b

docs: update vignette for posterior_predict to use data.frame instead…

ef2520f

… of tibble

tests: remove truncnorm dependency and explicit naming of default values

ed6df11

style,docs: undo style changes, adjust argument checking and naming i…

45b6600

…n posterior_predict

style: undo change in indentation style in docs example

2b364ac

feature: update beta-binomial with new posterior_predict functionality

db26e89

docs: add beta-binomial example to posterior_predict vignette

2fbcea7

feature: update negbinomial with new posterior_predict functionality

49f4d97

docs: add negbinomial example to posterior_predict vignette

12faea1

chore: add packages from Suggests to dependency install GitHub Action

c55c538

feature: update zero-inflated negbinomial with new posterior_predict …

5011361

…functionality

Florence Bockting added 6 commits April 15, 2026 12:06

fix: pass q (quantile) as argument in posterior_predict

ccc9cfe

feature: add lower.tail and log.p to compute_cdf

81c4e22

fix: remove log.p and lower.tail from 'random'

a881609

fix: set ntrys as optional in predict_discrete_helper

35833f3

docs: add lower.tail and log.p documentation to posterior_predict

54cfbb0

feature: add output 'density', 'quantile' to selected distribution fa…

3607f7b

…milies

Uh oh!

Conversation

avehtari commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-buerkner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-buerkner commented Feb 25, 2026 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

florence-bockting commented Mar 13, 2026

Uh oh!

avehtari commented Mar 14, 2026

Uh oh!

avehtari commented Mar 15, 2026

Uh oh!

florence-bockting commented Mar 16, 2026

Uh oh!

paul-buerkner commented Mar 16, 2026 via email

Uh oh!

avehtari commented Mar 16, 2026

Uh oh!

paul-buerkner commented Mar 16, 2026

Uh oh!

florence-bockting commented Mar 16, 2026

Uh oh!

florence-bockting commented Mar 16, 2026

Uh oh!

avehtari commented Mar 16, 2026

Uh oh!

paul-buerkner commented Mar 16, 2026

Uh oh!

florence-bockting commented Mar 16, 2026

Uh oh!

florence-bockting commented Apr 8, 2026

Uh oh!

paul-buerkner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avehtari commented Feb 20, 2026 •

edited

Loading