Skip to content

Comments

new pareto_pit function#435

Open
avehtari wants to merge 5 commits intomasterfrom
feature-pareto_pit
Open

new pareto_pit function#435
avehtari wants to merge 5 commits intomasterfrom
feature-pareto_pit

Conversation

@avehtari
Copy link
Member

@avehtari avehtari commented Feb 18, 2026

Summary

This PR adds

  • new pgeneralized_pareto() distribution function for the generalized Pareto distribution, needed by pareto_pit()
  • adds tests also for the old qgeneralized_pareto() function
  • adds support for weights in gpdfit(), needed by pareto_pit()
  • new pareto_pit() function that behaves as pit() expect that in tails instead of using ECDF, uses CDF computed from fitted generalized Pareto distribution. This reduces variability and provides non-0 and non-1 PIT values also outside of the range of reference draws.

pareto_pit() is useful for stabilizing PIT uniformity checking, as current pit() can produce big variation whether there is single 0 or 1 or not.

The following figure illustrates how PIT values that are 0 or 1 from pit() function are replaced with a PIT value estimated with GPD fitted to the tail. Left plot is lower tail and right plot is upper tail. Log scale is used to better show the range of PIT values in tails, The dot with red circle has PIT value 0 from pit() and in log scale that is -Inf, but ggplot plots it next to the axis (and gives a warning).

EDIT: added dashed lines showing PIT values if 1, 2, or 3 draws are further in tail than y. In the left plot one pit() result matches 3. In right plot the predicted tail probability is much smaller than 1/4000.

image

The function is specifically useful when the draws x distribution is unbounded and has nicely behaving tails, which is common for many observation families. The function is useful if we happen to have only draws from the posterior predictive distribution, but in general it would be better to compute PIT values using the known parametric observation distribution CDF.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@github-actions
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1b96d28 is merged into master:

  • ✔️as_draws_array: 136ms -> 136ms [-1.06%, +1.88%]
  • ✔️as_draws_df: 71.5ms -> 72.6ms [-0.4%, +3.33%]
  • ✔️as_draws_list: 159ms -> 159ms [-1.19%, +1.6%]
  • ✔️as_draws_matrix: 31.2ms -> 30.8ms [-2.94%, +0.45%]
  • ✔️as_draws_rvars: 79.3ms -> 79.4ms [-0.97%, +1.11%]
  • ✔️summarise_draws_100_variables: 745ms -> 743ms [-0.66%, +0.18%]
  • ✔️summarise_draws_10_variables: 86ms -> 86.3ms [-0.13%, +0.7%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

raw_pit
}, FUN.VALUE = 1.0)

min_tail_prob <- 1/ndraws/1e4
Copy link
Collaborator

@florence-bockting florence-bockting Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion: min_tail_prob <- 1 / (ndraws * 1e4)

pgeneralized_pareto <- function(q, mu = 0, sigma = 1, k = 0, lower.tail = TRUE, log.p = FALSE) {
stopifnot(length(mu) == 1 && length(sigma) == 1 && length(k) == 1)
if (is.na(sigma) || sigma <= 0) {
return(rep(NaN, length(q)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about informing the user at this point that sigma needs to be positive for easier debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal function, and used in places where we don't want to print messages

fit1 <- gpdfit(x)
fit2 <- gpdfit(x, weights = NULL)
expect_identical(fit1, fit2)
})
Copy link
Collaborator

@florence-bockting florence-bockting Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add a check and corresponding error message for the case where length(weights) unequals length(x).

test_that("gpdfit with length(weights) differs from length(x)", {
  set.seed(42)
  x <- rexp(200)
  w <- rep(1, 150)
  expect_error(gpdfit(x, weights = w))
})

Perhapse same for non-positive weight values.

@florence-bockting
Copy link
Collaborator

florence-bockting commented Feb 19, 2026

I think roxygen2::roxygenise() wasn't running and therefore pareto_pit.Rd and pgeneralized_pareto.Rd are missing in the man folder (causing issues in the GitHub Actions).

Might actually be a good candidate for including this call in the GitHub Action Workflow?

@github-actions
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 5e9d66f is merged into master:

  • ✔️as_draws_array: 132ms -> 132ms [-1.46%, +0.84%]
  • ✔️as_draws_df: 74.9ms -> 75.6ms [-0.85%, +2.71%]
  • ✔️as_draws_list: 169ms -> 168ms [-2.55%, +0.34%]
  • ✔️as_draws_matrix: 28.9ms -> 28.8ms [-1.19%, +0.58%]
  • ✔️as_draws_rvars: 80ms -> 80.8ms [-0.18%, +2.14%]
  • ✔️summarise_draws_100_variables: 725ms -> 724ms [-0.48%, +0.21%]
  • ✔️summarise_draws_10_variables: 85.9ms -> 85.5ms [-1.46%, +0.44%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@github-actions
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if acbea6a is merged into master:

  • ✔️as_draws_array: 130ms -> 130ms [-0.47%, +1.6%]
  • ✔️as_draws_df: 67.9ms -> 67.9ms [-1.29%, +1.32%]
  • ✔️as_draws_list: 155ms -> 156ms [-0.29%, +1.47%]
  • ✔️as_draws_matrix: 28.1ms -> 28.1ms [-1.2%, +1.28%]
  • ✔️as_draws_rvars: 78.4ms -> 78.3ms [-1.11%, +0.92%]
  • ✔️summarise_draws_100_variables: 729ms -> 728ms [-0.52%, +0.13%]
  • ✔️summarise_draws_10_variables: 84.9ms -> 84.3ms [-1.71%, +0.29%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants