proof of concept for posterior_pit#1857
proof of concept for posterior_pit#1857avehtari wants to merge 39 commits intopaul-buerkner:masterfrom
Conversation
paul-buerkner
left a comment
There was a problem hiding this comment.
Thanks! I like the idea. Please find some comments to code structure and naming in my review.
| object, newdata = NULL, re_formula = NULL, re.form = NULL, | ||
| transform = NULL, resp = NULL, negative_rt = FALSE, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, type = "r", |
There was a problem hiding this comment.
Can we think of a more informative name than "type" and more informative options that just "r" and "p"?
There was a problem hiding this comment.
I was thinking that "r", "p", "q", and "d" would naturally refer to how the distribution functions are named in R. Looking at R documentation these could be also named "random", "probability", "quantile", and "density".
I've not been able to come up with better name than "type"
There was a problem hiding this comment.
similarily unspecific. Let' s go with "type" for now until we have a better idea.
There was a problem hiding this comment.
Alternative argument names could be output, return_type, or mode.
| prep <- prepare_predictions( | ||
| object, newdata = newdata, re_formula = re_formula, resp = resp, | ||
| ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ... | ||
| ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, type = type, ... |
There was a problem hiding this comment.
why does prepare predictions need to know the type?
There was a problem hiding this comment.
It's likely that it doesn't, but I was struggling to see the structure of the code, and possibly put it in more places than needed
| mean = mu, sd = sigma, | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
| switch(type, |
There was a problem hiding this comment.
whe should solve this with a better abstraction. Changing the code of each of the families using switch is unnecessary. Add another layer of a function that handles the two types once and can be called (instead of just rcontinous) in the family-specific functions
There was a problem hiding this comment.
I don't understand. Can explain more or modify this PR to show it with an example?
There was a problem hiding this comment.
Should we have a short zoom call about this?
There was a problem hiding this comment.
yes. will write you on slack
There was a problem hiding this comment.
How about something like the following? We have a helper function that prepares the switching and can be called within the corresponding distributions. In the following a (non-polished) example for a helper function and its function call in the two example distributions (gaussian and student). (Note I changed the argument type to output in the following code... just to get an idea how it would look like)
.predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) {
lb <- prep$data$lb[i]
ub <- prep$data$ub[i]
if (output == "probability") {
q <- prep$data$Y[i]
return(pcontinuous(
q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys,
ndraws = prep$ndraws, ...
))
} else if (output == "random") {
return(rcontinuous(
n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ...
))
}
}
posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) {
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "norm", mean = mu, sd = sigma
)
}
posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) {
nu <- get_dpar(prep, "nu", i = i)
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "student_t", df = nu, mu = mu, sigma = sigma
)
}
| pdist <- paste0("p", dist) | ||
| out <- do_call(pdist, c(list(n), args)) | ||
| } else { | ||
| error("not implemented yet") |
There was a problem hiding this comment.
are you planning to implement this part too as part of this PR?
There was a problem hiding this comment.
I can, if we go forward with the plan
There was a problem hiding this comment.
Okay. Let's discuss the plan first and then move this one forward too
|
Yes this is exactly what I meant!
Florence Bockting ***@***.***> schrieb am Di. 24. Feb. 2026
um 20:06:
… ***@***.**** commented on this pull request.
------------------------------
In R/posterior_predict.R
<#1857 (comment)>:
> mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
- rcontinuous(
- n = prep$ndraws, dist = "norm",
- mean = mu, sd = sigma,
- lb = prep$data$lb[i], ub = prep$data$ub[i],
- ntrys = ntrys
+ switch(type,
How about something like the following? We have a helper function that
prepares the switching and can be called within the corresponding
distributions. In the following a (non-polished) example for a helper
function and its function call in the two example distributions (gaussian
and student). (Note I changed the argument type to output in the
following code... just to get an idea how it would look like)
.predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) {
lb <- prep$data$lb[i]
ub <- prep$data$ub[i]
if (output == "probability") {
q <- prep$data$Y[i]
return(pcontinuous(
q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys,
ndraws = prep$ndraws, ...
))
} else if (output == "random") {
return(rcontinuous(
n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ...
))
}
}
posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) {
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "norm", mean = mu, sd = sigma
)
}
posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) {
nu <- get_dpar(prep, "nu", i = i)
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "student_t", df = nu, mu = mu, sigma = sigma
)
}
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AAA5QELHJGMCXTWVBL4NSOLVAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNBZHE4TMMJZGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
| F_q <- do_call(pdist, c(list(q), args)) | ||
| # include (lb - 1) to treat lb as inclusive | ||
| # this is only relevant for discrete distributions | ||
| F_lb <- do_call(pdist, c(list(lb - 1), args)) |
There was a problem hiding this comment.
Here I am a bit unsure how to treat this correctly. I would have treated the lower bound as exclusive but this would have been inconsistent with the rdiscrete function.
Thus, I set lb - 1 in order to ensure that the lower bound is inclusive but I was uncertain here.
There was a problem hiding this comment.
It should be consistent with how Stan handles it. I am not 100% anymore how it does. Can you double check to match the behavior?
|
I have now updated the proof of concept for 4 functions:
They include two additional new arguments:
The current implementation follows the idea that we can call:
|
I think these should be just F(x) to follow what ppois etc. do. For the same reason, I think that it might be more logical to have non-randomized as the default |
|
Or we could have separately output options |
Okay, one option could be to have for An alternative is to outsource the |
|
I would argue putting everything in one function. The code overhead is
otherwise way too high. I like the probability + randomized argument
approach. The only question is what the default for randomized is? Aki can
you elaborate which use cases you had for much choice of randomized?
Florence Bockting ***@***.***> schrieb am Mo. 16. März 2026
um 07:50:
… *florence-bockting* left a comment (paul-buerkner/brms#1857)
<#1857 (comment)>
Or we could have separately output options probability (not randomized
F(x)) and pit randomized for discrete
Okay, one option could be to have for outcome the values random,
probability, density, quantile, and pit. Whereby pit is only a valid
option for discrete cases. Cons are that there is a conceptual overlap
between pit and probability and pit is only a valid choice for a subclass
of inputs. Pro is that we have all alternatives in one function and no
additional argument (like randomized).
An alternative is to outsource the pit case entirely and to have an
additional function such as posterior_pit(fit_pois) that applies only to
discrete cases and additionally, we have posterior_predict(fit_pois,
outcome = "probability") supporting the default implementations like
ppois(). In my opinion, this option is from a design point relatively
clean, as we don't have the problem that there are "cases" which work/don't
work. However, we would have to introduce an additional function.
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2ACTEEEYGBYRD56TRK34Q6P4VAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANRVGQ3DMNZVHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
In this case
|
|
I am fine with option random, probability, density, quantile, and pit (or whatever subset we go with first), as long all all is realized in the same function (posterior_predict). Otherwise code overhead is too high. If we want, we can still have lightweight wrappers around posterior_predict with output argument fixed. |
Alright. We definitely stick with one function - the idea of an extra function is out. |
|
I'm fine with this. I usually prefer smaller PRs anyway. |
I'm fine that it would be added later, but wanted to keep in the plans, so that it will be easier to implement later. |
|
Okay, that's fine and sounds like a good plan. @florence-bockting please let me know when (now already?) you want my initial review on your draft PR. Just having support for a few families is sufficient for me to initially review it. |
Alright, go ahead @paul-buerkner |
Just a quick check-in and reminder @paul-buerkner |
paul-buerkner
left a comment
There was a problem hiding this comment.
Getting there, thank you! Here are my comments and change requests
|
Just leave it as is I meant.
Florence Bockting ***@***.***> schrieb am Mo. 13. Apr. 2026
um 08:42:
… ***@***.**** commented on this pull request.
------------------------------
In R/posterior_predict.R
<#1857 (comment)>:
> @@ -92,12 +97,15 @@ posterior_predict.brmsfit <- function(
contains_draws(object)
object <- restructure(object)
prep <- prepare_predictions(
- object, newdata = newdata, re_formula = re_formula, resp = resp,
+ object,
Do you want every argument on a new line or do you make a differentiation
between required and optional arguments, thus:
prepare_predictions(
object,
newdata = newdata, re_formula = re_formula, resp = resp,
ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ...
)
or
prepare_predictions(
object,
newdata = newdata,
re_formula = re_formula,
resp = resp,
ndraws = ndraws,
draw_ids = draw_ids,
check_response = FALSE,
...
)
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AHMN2BOUSYKZPT2ZE34VSD63AVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAOJWHAYTANJXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…n posterior_predict
|
Would it be too much trouble to allow argument |
This is a proof of concept branch and far from ready to be merged, but easier to show a possible way to solve issue #1855
To avoid duplication of code in brms, the idea is to use
posterior_predict(), but add a new argumenttypewith default value"r"referring to random draws. Newtype="p"would instead return CDFs / PITs.These PIT values are more useful than the ones based on ranks.
The proof of concept works with families "gaussian" and "student_t" without truncation. If the general idea is acceptable then support for the rest of the families and truncation can be added.
Adding here that natural other types are "d" for posterior predictive densities
and
"q" for quantiles that are more accurate than current rank based quantiles (used e.g. by predictive_interval())