Dependence aware tests for ppc*ecdf#428
Dependence aware tests for ppc*ecdf#428florence-bockting wants to merge 157 commits intostan-dev:masterfrom
ppc*ecdf#428Conversation
…improve input validation checks
…e-bockting/bayesplot into dependence-aware-LOO-PIT
…e-bockting/bayesplot into dependence-aware-LOO-PIT
ppc_loo_pit_ecdfppc_*_ecdf
ppc_*_ecdfppc*ecdf
|
As mentioned in the PR introduction text, the idea would be migration in multiple stages. Informing the user in the first stage is currently implemented as follows: |
jgabry
left a comment
There was a problem hiding this comment.
Here are a few comments and questions. So far I've only looked at some of the code, so I may have more, but I thought I would give you these now so we could start discussing.
|
|
||
| test <- match.arg(test %||% "POT", choices = c("POT", "PRIT", "PIET")) | ||
| alpha <- 1 - prob | ||
| gamma <- gamma %||% 0 |
There was a problem hiding this comment.
The doc says "If NULL, automatically determined based on p-value" but this just sets it to 0 if NULL. Is that intentional?
There was a problem hiding this comment.
Doesn't this mean that right now by default we highlight every positive contribution point/segment whenever p < alpha? Maybe that's what we want, but that's more than what's documented, right? Or am I misunderstanding? (quite possibly!)
There was a problem hiding this comment.
The doc says "If
NULL, automatically determined based on p-value" but this just sets it to 0 ifNULL. Is that intentional?
Thanks, this was wrong in the documentation. I updated it: "If NULL (default), gamma is set to 0, and thus all suspicious points are flagged."
R/ppc-loo.R
Outdated
| linewidth <- linewidth %||% 0.3 | ||
| color <- color %||% c(ecdf = "grey60", highlight = "red") | ||
| help_text <- help_text %||% TRUE | ||
| pareto_pit <- pareto_pit %||% is.null(pit) && test %in% c("POT", "PIET") |
There was a problem hiding this comment.
I think R will parse this as
(pareto_pit %||% is.null(pit)) && (test %in% c("POT", "PIET"))and not as
pareto_pit %||% (is.null(pit) && test %in% c("POT", "PIET"))So if we have
pareto_pit = TRUE
pit = NULL
test = "PRIT"then I think this will parse as TRUE && FALSE -> FALSE.
Do we not want this instead?
pareto_pit <- pareto_pit %||% (is.null(pit) && test %in% c("POT", "PIET"))This is a bit confusing for me, so I could definitely be wrong!
There was a problem hiding this comment.
Thank you for catching this behavior! I fixed the notation and created corresponding unittests.
Furthermore, I noticed that I need to catch the situation where a user provides both pareto_pit = TRUE and pit = pit (thus, pit is not NULL). I added an error with the user message:
`pareto_pit = TRUE` cannot be used together with a non-`NULL` `pit` value. Set either `pareto_pit = FALSE` or `pit = NULL`.
| y_label <- if (plot_diff) "ECDF difference" else "ECDF" | ||
|
|
||
| if (method == "correlated") { | ||
| test_res <- posterior::uniformity_test(pit = pit, test = test) |
There was a problem hiding this comment.
This assumes that a pvalue and pointwise are always returned, but can't uniformity_test error in some cases? E.g.
posterior::uniformity_test(c(0.2, 0.8), "POT")
posterior::uniformity_test(c(0.25, 0.5, 0.75), "POT")
posterior::uniformity_test(0.5, "POT")How should we handle this? Or are we unlikely to encounter it in the wild, so to speak?
There was a problem hiding this comment.
Regarding uniformity_test : "POT" and "PRIT" are based on a truncated Cauchy combination test. That is, they error when no p-values below 0.5 exist; I am not aware of any condition where they would return NA. "PIET" is based on an untruncated version of CCT, it can return pvalue=NA under special conditions (which yields in an error later on in ppc_*_ecdf.
In the following, I worked out some edge cases (I don't know how likely this situation would actually happen "in the wild"):
# fails when test = "PIET" as pvalue=NA
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "correlated", test = "PIET")
# works when test="POT" or "PRIT" OR method="independent"
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "correlated", test = "POT")
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "independent")
# fails when test "POT" or "PRIT" as no pvalues < 0.5 exist
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "correlated", test = "POT")
# works when test="PIET" OR method="independent"
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "correlated", test = "PIET")
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "independent")
Now, I am uncertain what a clean suggestion for a user from a methodological point of view is (@avehtari):
- changing the test type (e.g., from POT to PIET or the other way around)
- allowing to change truncation behavior for the test in case of failing
- something else
… corresponding unittests
…e-bockting/bayesplot into dependence-aware-LOO-PIT
Description
The current approach in
ppc_loo_pit_ecdfandppc_pit_ecdfassumes independence of LOO-PIT values which is not valid (Marhunenda et al., 2005). The corresponding graphical test yields an envelope that is too wide, reducing the test's ability to reveal model miscalibration.Tesso & Vehtari (2026, see preprint) propose three testing procedures that can handle any dependent uniform values and provide an updated graphical representation that uses color coding to indicate influential regions or most influential points of the ECDF. This PR implements the new development, by adding the updated approach (method = "correlated") additionally to the previous approach (method = "independent").
TODOs
ppc_loo_pit_ecdf()function inppc-loo.Rppc_pit_ecdf()andppc_pit_ecdf_grouped()function inppc-distributions.Rmethodargumentmethodargument