Sometimes I get asked for numbers at work in situations where I have next to no information. Here’s a good example from today:

A company is looking to buy protection against an event which we believe happens once in ten years. I do not know how we arrived at the 1 in 10 year number, or even when we came to believe in that number, or, as I think about it, even it was actually us who believed in that number. In any case:

Year 1: Event happens
Year 2: Event happens
Year 3: Event happens

The question to me was – what do you think the probability of the event happening in year 4 is? There’s little to no math or calculation you can do here – there are only two pieces of information:

(a) At some point someone believed that the event happened once in 10 years.
(b) That event has happened in each of the last 3 years.

That’s all you know . . . so what’s your answer to the question?

First, I would want to thoroughly explore the possibility that there was some game here (a feedback interaction between human decisions and the occurrence of the event). I’m sure you’ve done that and we’re excluding that possibility.

Second, I would want to understand our loss function: what are the costs and benefits of getting the estimate wrong, either too high or too low, what choices do we have about how large an exposure to take to this risk, whether this is one-shot or multi-year w/ automatic renewal, w/ our option to renew, etc.

Analytically, a simple version of bayesian updating seems helpful.
Here’s a good mathstackexchange post: unfair dice.

Assuming a beta distribution (to make things easier), I would start with Beta(1, 9) and Beta (0.5, 4.5), then use the data to update to Beta(3.5, 4.5) and Beta(4,9). In expectation, these give probabilities that are about 30% and 45%. However, the more valuable information is to look at how wide the PDFs for those distributions are and acknowledge how uncertain we are about the estimate. Again, this would inform my decisions about the size of the risk exposure.

FWIW, my choice of illustrative parameters in the beta distribution was based on a pure gut feel for what the prior distribution might be that would allow “reasonable” people to guess 10% at the start. I would love to hear from a more experienced practitioner about how they would choose the initial distribution function (both type and parameters).

Very interesting. Is this a fair interpretation about what it would mean in your scenario?
(1) gather data on all risks that have been classified as 1 in 10 year events
(2) find the frequency distribution of those events
(3) estimate parameters for beta distribution to generate prior

The interesting question is what gets put in the reference class for step 1.

I take it that something along these lines was your approach?

Being an engineer, I’d put a large tolerance on my answer. 🙂

First, I would want to thoroughly explore the possibility that there was some game here (a feedback interaction between human decisions and the occurrence of the event). I’m sure you’ve done that and we’re excluding that possibility.

Second, I would want to understand our loss function: what are the costs and benefits of getting the estimate wrong, either too high or too low, what choices do we have about how large an exposure to take to this risk, whether this is one-shot or multi-year w/ automatic renewal, w/ our option to renew, etc.

Analytically, a simple version of bayesian updating seems helpful.

Here’s a good mathstackexchange post: unfair dice.

Assuming a beta distribution (to make things easier), I would start with Beta(1, 9) and Beta (0.5, 4.5), then use the data to update to Beta(3.5, 4.5) and Beta(4,9). In expectation, these give probabilities that are about 30% and 45%. However, the more valuable information is to look at how wide the PDFs for those distributions are and acknowledge how uncertain we are about the estimate. Again, this would inform my decisions about the size of the risk exposure.

FWIW, my choice of illustrative parameters in the beta distribution was based on a pure gut feel for what the prior distribution might be that would allow “reasonable” people to guess 10% at the start. I would love to hear from a more experienced practitioner about how they would choose the initial distribution function (both type and parameters).

You’ll like this paper:

http://varianceexplained.org/r/empirical_bayes_baseball/?utm_content=buffer20f5c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

Very interesting. Is this a fair interpretation about what it would mean in your scenario?

(1) gather data on all risks that have been classified as 1 in 10 year events

(2) find the frequency distribution of those events

(3) estimate parameters for beta distribution to generate prior

The interesting question is what gets put in the reference class for step 1.

I take it that something along these lines was your approach?