Conversation
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1b96d28 is merged into master:
|
| raw_pit | ||
| }, FUN.VALUE = 1.0) | ||
|
|
||
| min_tail_prob <- 1/ndraws/1e4 |
There was a problem hiding this comment.
minor suggestion: min_tail_prob <- 1 / (ndraws * 1e4)
| pgeneralized_pareto <- function(q, mu = 0, sigma = 1, k = 0, lower.tail = TRUE, log.p = FALSE) { | ||
| stopifnot(length(mu) == 1 && length(sigma) == 1 && length(k) == 1) | ||
| if (is.na(sigma) || sigma <= 0) { | ||
| return(rep(NaN, length(q))) |
There was a problem hiding this comment.
How about informing the user at this point that sigma needs to be positive for easier debugging?
There was a problem hiding this comment.
This is an internal function, and used in places where we don't want to print messages
| fit1 <- gpdfit(x) | ||
| fit2 <- gpdfit(x, weights = NULL) | ||
| expect_identical(fit1, fit2) | ||
| }) |
There was a problem hiding this comment.
Probably add a check and corresponding error message for the case where length(weights) unequals length(x).
test_that("gpdfit with length(weights) differs from length(x)", {
set.seed(42)
x <- rexp(200)
w <- rep(1, 150)
expect_error(gpdfit(x, weights = w))
})
Perhapse same for non-positive weight values.
|
I think Might actually be a good candidate for including this call in the GitHub Action Workflow? |
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 5e9d66f is merged into master:
|
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if acbea6a is merged into master:
|
Summary
This PR adds
pareto_pit() is useful for stabilizing PIT uniformity checking, as current pit() can produce big variation whether there is single 0 or 1 or not.
The following figure illustrates how PIT values that are 0 or 1 from pit() function are replaced with a PIT value estimated with GPD fitted to the tail. Left plot is lower tail and right plot is upper tail. Log scale is used to better show the range of PIT values in tails, The dot with red circle has PIT value 0 from pit() and in log scale that is -Inf, but ggplot plots it next to the axis (and gives a warning).
EDIT: added dashed lines showing PIT values if 1, 2, or 3 draws are further in tail than y. In the left plot one pit() result matches 3. In right plot the predicted tail probability is much smaller than 1/4000.
The function is specifically useful when the draws x distribution is unbounded and has nicely behaving tails, which is common for many observation families. The function is useful if we happen to have only draws from the posterior predictive distribution, but in general it would be better to compute PIT values using the known parametric observation distribution CDF.
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: