Skip to content

Dependence aware tests for ppc*ecdf#428

Open
florence-bockting wants to merge 157 commits intostan-dev:masterfrom
florence-bockting:dependence-aware-LOO-PIT
Open

Dependence aware tests for ppc*ecdf#428
florence-bockting wants to merge 157 commits intostan-dev:masterfrom
florence-bockting:dependence-aware-LOO-PIT

Conversation

@florence-bockting
Copy link
Copy Markdown

@florence-bockting florence-bockting commented Mar 4, 2026

Description

The current approach in ppc_loo_pit_ecdf and ppc_pit_ecdf assumes independence of LOO-PIT values which is not valid (Marhunenda et al., 2005). The corresponding graphical test yields an envelope that is too wide, reducing the test's ability to reveal model miscalibration.
Tesso & Vehtari (2026, see preprint) propose three testing procedures that can handle any dependent uniform values and provide an updated graphical representation that uses color coding to indicate influential regions or most influential points of the ECDF. This PR implements the new development, by adding the updated approach (method = "correlated") additionally to the previous approach (method = "independent").

TODOs

  • updated ppc_loo_pit_ecdf() function in ppc-loo.R
  • updated ppc_pit_ecdf() and ppc_pit_ecdf_grouped() function in ppc-distributions.R
  • add unittests and visual regression tests
  • update documentation
  • deprecation suggestion of old method
    • P1: old method (default) and new method available via method argument
    • P2: new method (default) and old method available via method argument
    • P3: old method is removed

@florence-bockting florence-bockting changed the title Dependence aware tests for ppc_loo_pit_ecdf Dependence aware tests for ppc_*_ecdf Apr 5, 2026
@florence-bockting florence-bockting changed the title Dependence aware tests for ppc_*_ecdf Dependence aware tests for ppc*ecdf Apr 5, 2026
@florence-bockting
Copy link
Copy Markdown
Author

As mentioned in the PR introduction text, the idea would be migration in multiple stages.
First stage is non-breaking behavior: That is users can simply use "original" code but are informed that this behavior will change in future. Future changes consider changing the default method and thus are breaking.

Informing the user in the first stage is currently implemented as follows:

# using the "original" code provides a message about the change
> ppc_loo_pit_ecdf(y, yrep, lw)
ℹ In the next major release, the default `method` will change to 'correlated'.
• To silence this message, explicitly set `method = 'independent'` or `method = 'correlated'`.

# explicit use of "independent" method provides information that this method is superseded
> ppc_loo_pit_ecdf(y, yrep, lw, method = "independent")
The 'independent' method is superseded by the 'correlated' method.

Copy link
Copy Markdown
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few comments and questions. So far I've only looked at some of the code, so I may have more, but I thought I would give you these now so we could start discussing.


test <- match.arg(test %||% "POT", choices = c("POT", "PRIT", "PIET"))
alpha <- 1 - prob
gamma <- gamma %||% 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc says "If NULL, automatically determined based on p-value" but this just sets it to 0 if NULL. Is that intentional?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that right now by default we highlight every positive contribution point/segment whenever p < alpha? Maybe that's what we want, but that's more than what's documented, right? Or am I misunderstanding? (quite possibly!)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc says "If NULL, automatically determined based on p-value" but this just sets it to 0 if NULL. Is that intentional?

Thanks, this was wrong in the documentation. I updated it: "If NULL (default), gamma is set to 0, and thus all suspicious points are flagged."

R/ppc-loo.R Outdated
linewidth <- linewidth %||% 0.3
color <- color %||% c(ecdf = "grey60", highlight = "red")
help_text <- help_text %||% TRUE
pareto_pit <- pareto_pit %||% is.null(pit) && test %in% c("POT", "PIET")
Copy link
Copy Markdown
Member

@jgabry jgabry Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think R will parse this as

(pareto_pit %||% is.null(pit)) && (test %in% c("POT", "PIET"))

and not as

pareto_pit %||% (is.null(pit) && test %in% c("POT", "PIET"))

So if we have

pareto_pit = TRUE
pit = NULL
test = "PRIT"

then I think this will parse as TRUE && FALSE -> FALSE.

Do we not want this instead?

pareto_pit <- pareto_pit %||% (is.null(pit) && test %in% c("POT", "PIET"))

This is a bit confusing for me, so I could definitely be wrong!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this behavior! I fixed the notation and created corresponding unittests.
Furthermore, I noticed that I need to catch the situation where a user provides both pareto_pit = TRUE and pit = pit (thus, pit is not NULL). I added an error with the user message:

`pareto_pit = TRUE` cannot be used together with a non-`NULL` `pit` value. Set either `pareto_pit = FALSE` or `pit = NULL`.

y_label <- if (plot_diff) "ECDF difference" else "ECDF"

if (method == "correlated") {
test_res <- posterior::uniformity_test(pit = pit, test = test)
Copy link
Copy Markdown
Member

@jgabry jgabry Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that a pvalue and pointwise are always returned, but can't uniformity_test error in some cases? E.g.

posterior::uniformity_test(c(0.2, 0.8), "POT")
posterior::uniformity_test(c(0.25, 0.5, 0.75), "POT")
posterior::uniformity_test(0.5, "POT")

How should we handle this? Or are we unlikely to encounter it in the wild, so to speak?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding uniformity_test : "POT" and "PRIT" are based on a truncated Cauchy combination test. That is, they error when no p-values below 0.5 exist; I am not aware of any condition where they would return NA. "PIET" is based on an untruncated version of CCT, it can return pvalue=NA under special conditions (which yields in an error later on in ppc_*_ecdf.

In the following, I worked out some edge cases (I don't know how likely this situation would actually happen "in the wild"):

# fails when test = "PIET" as pvalue=NA 
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "correlated", test = "PIET")
# works when test="POT" or "PRIT" OR method="independent"
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "correlated", test = "POT")
ppc_loo_pit_ecdf(pit = c(0, 1, 0.5), method = "independent")

# fails when test "POT" or "PRIT" as no pvalues < 0.5 exist
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "correlated", test = "POT")
# works when test="PIET" OR method="independent"
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "correlated", test = "PIET")
ppc_loo_pit_ecdf(pit = c(0.8, 0.5, 0.3, 0.1), method = "independent")

Now, I am uncertain what a clean suggestion for a user from a methodological point of view is (@avehtari):

  • changing the test type (e.g., from POT to PIET or the other way around)
  • allowing to change truncation behavior for the test in case of failing
  • something else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants