Measuring Severity in Statistical Inference


Severity (Mayo, 2018) is a principle of statistical hypothesis testing. It assesses the hypothesis test in relation to the claim it makes, and the data on which the claim is based. Specifically, a claim C passes a severe test with the data at hand, if with high probability the test would have found flaws with C if present, and yet it does not. In this talk, I discuss how the concept of severity can be extended beyond frequentist hypothesis testing to general statistical inference tasks. Reflecting the Popperian notion of falsifiability, severity seeks to establish a stochastic version of modus tollens, as a measure of strength of probabilistic inference. Severity measures the extent to which the inference resulting from an inferential strategy is warranted, in relation to the body of evidence at hand. If the current available evidence leads a method to infer something about the world, then were it not the case, would the method still have inferred it? I discuss the formulation of severity and its properties, and demonstrate its assessment and interpretation in examples that follow either the frequentist or Bayesian traditions as well as beyond. A connection with significance function (Fraser, 1991) and confidence distribution (Xie & Singh, 2013) is drawn. These tools and connections may enable the assessment of severity in a wide range of modern applications that call for evidence-based scientific decision making.

Handout for this talk can be found here.