Q:

In class we’re told that measurements without uncertainties are meaningless. We don’t know how accurate they are. But what about the uncertainties we come up with? How accurate are they? Why don’t those numbers have their own uncertainties? Aren’t they just as meaningless?

- Joe (age 28)

University of Maryland, College Park, Maryland

- Joe (age 28)

University of Maryland, College Park, Maryland

A:

Joe-

You've put your finger on one of the central issues in philosophy. There's an infinite chain of uncertainties, just as you say. There are many attempts to deal with that question, so I'll just give you the one that makes most sense to me, of the sort that's usually named "Bayesian", after an 18th century pastor who helped reframe the ideas of probability.

Basically, we have to confess at some point that we have some prior ideas about how the world is. If we have reasonable confidence about, say, the error in the error in the error, it often turns out that the answers to the questions we're interested in won't change much if we're off a little bit. Any our ancestors who consistently had incorrect prior beliefs had fewer chances to have descendants, so maybe that's why our sense of things tends to work.

I had better give a more specific example. Say that you think that an opinion poll is equivalent to drawing 1000 answers absolutely randomly from a huge box filled with 'yeses" and "nos". Then IF your assumption is correct, you can calculate precisely how likely you would be to get the result that you find, given some assumption about what the actual fraction (p) is of 'yeses' in the huge box. Now it's plausible to say that you didn't know anything head of time about p , so the likelihood that p really has some value is just proportional to the likelihood that you'd get whatever result you found if p were the true value. You can then calculate the typical range of actual p's that are consistent with your observed value. That's essentially what pollsters do. Now your question would be something like "How do we know that the polling is like drawing answers randomly from a hat?" The answer would be that actually we know it isn't quite like that because there are all sorts of systematic errors affecting the probability of reaching different types of people. Then we have to go over old results to see whether there are some non-random errors. That may give some further uncertainties. Then we want to know "How well can you be sure that old results are relevant? People didn't used to have cell phones or answering machines." At some point you have to give up and just make some educated guesses. Of course, the errors here won't be more than 100%, unlike in some problems where the possible answers aren't even bounded.

Mike W.

---------------------

As a working particle physicist, we encounter this question every day, since (almost!) every number has an associated uncertainty, and we spend much of our time trying to ascertain just how big these uncertainties are.

The usually unambiguous uncertainties to evaluate are the statistical uncertainties. With large data samples, these follow well-known procedures, which usually rely on the properties of the normal distribution, but many other distributions are also well studied. With a large data sample, the question can be asked how the result would fluctuate if we repeated the experiment. In Mike's yes/no polling example above, this statistical error would be delta_p = sqrt(p*(1-p)/n) where n is the number of cards drawn from the box. The interpretation gets more interesting if, say, only one card was drawn from the box. Ideally, you'd like to collect more data, but some experiments are just so expensive they cannot be repeated.

Systematic errors can creep in even in the simple, idealized case if, say, the "yes" cards had a systematically different size or shape from the "no" cards. To correct for biases, secondary measurements should be attempted -- for example, sampling in a different way, picking a different population. Often, systematic errors are assessed by comparing doing a measurement in several different ways, but this isn't always the best (you can fool yourself into thinking your systematic error is too small if all of your different ways of doing the measurement share a common defect). Instead we must try as hard as we can to think about how the assumptions that went into interpreting the data can be violated.

That having been said, observations really don't have errors on them, but interpretations do. If we draw "yes" and "no" cards from a box and write down the totals, these numbers do not have errors assocated with them. Only when we try to interpret their ratio as the fraction of the "yeses" and "nos" in the rest of the box do we incur uncertainty. Measure the length of something with a ruler, and the observation that the object lines up with a particular mark on the ruler has no uncertainty (although it could be different when the measurement is repeated). The interpretation of that observation that the object has a particular length expressed as a number with dimensions requires including our uncertainty about the calibration of the ruler, how well the zero-mark of the ruler was placed on the other end of the object, and such things as temperature and time variations in the calibrations and lengths.

Most measurements in physics have analogous uncertainties. We are constantly checking the calibrations of quantities which, while not the quantities we are interested in measuring at the end of the day, are nonetheless required for the measurement (in statistics jargon, these are called "nuisance parameters"). Ideally, a nuisance parameter will have been measured in another experiment with a well defined uncertainty. The best case of all is if the auxiliary experiment has only statistical errors, then there is no real reason to question the errors. The presence of systematic errors brings Bayesian judgement into play ("how much do I believe this experiment's systematic uncertainty is evaluated correctly?") If it's a really big deal, we should seek to find alternative determinations of the same nuisance parameter.

Another case of systematic errors having well-defined uncertainties is the following. In high-energy physics experiments, different models exist of how standard (uninteresting, already-studied but perhaps not perfectly) processes work. These standard, ordinary processes may produce, in high-energy particle collisions, events which look just like more interesting ones where a new particle is sought. A typical estimation of the systematic uncertainty on the standard process rate (the "background" to a search for a new particle) is to compare different models by running Monte Carlo simulations of them. If these simulations do not have enough events in their distributions, then they suffer from statistical uncertainty. In principle, this can be made as small as we like, since we just run our computer programs longer to get more precise estimates of how different the models are. But sometimes we run out of computers, money, or people to do the work, and settle for less. In fact, there is no real reason to push for infinitely statistically precisely estimated systematic uncertainties if the statistical error on the systematic uncertainty is much less than the systemtic uncertainty itself, or more problematically, is less than our belief that there could be other plausible models out there in addition to the ones we investigated which could have produced yet different values for the nuisance parameter, and therefore our measured parameter.

So -- when to stop? Often people just add the statistical uncertainty on a systematic uncertainty determination in quadrature to get the total contribution to the uncertainty from that unknown parameter.

In the end, almost no one reports uncertainties on uncertainties, but rather the uncertainties are inflated to cover our ignorance of the true spread.

Tom

You've put your finger on one of the central issues in philosophy. There's an infinite chain of uncertainties, just as you say. There are many attempts to deal with that question, so I'll just give you the one that makes most sense to me, of the sort that's usually named "Bayesian", after an 18th century pastor who helped reframe the ideas of probability.

Basically, we have to confess at some point that we have some prior ideas about how the world is. If we have reasonable confidence about, say, the error in the error in the error, it often turns out that the answers to the questions we're interested in won't change much if we're off a little bit. Any our ancestors who consistently had incorrect prior beliefs had fewer chances to have descendants, so maybe that's why our sense of things tends to work.

I had better give a more specific example. Say that you think that an opinion poll is equivalent to drawing 1000 answers absolutely randomly from a huge box filled with 'yeses" and "nos". Then IF your assumption is correct, you can calculate precisely how likely you would be to get the result that you find, given some assumption about what the actual fraction (p) is of 'yeses' in the huge box. Now it's plausible to say that you didn't know anything head of time about p , so the likelihood that p really has some value is just proportional to the likelihood that you'd get whatever result you found if p were the true value. You can then calculate the typical range of actual p's that are consistent with your observed value. That's essentially what pollsters do. Now your question would be something like "How do we know that the polling is like drawing answers randomly from a hat?" The answer would be that actually we know it isn't quite like that because there are all sorts of systematic errors affecting the probability of reaching different types of people. Then we have to go over old results to see whether there are some non-random errors. That may give some further uncertainties. Then we want to know "How well can you be sure that old results are relevant? People didn't used to have cell phones or answering machines." At some point you have to give up and just make some educated guesses. Of course, the errors here won't be more than 100%, unlike in some problems where the possible answers aren't even bounded.

Mike W.

---------------------

As a working particle physicist, we encounter this question every day, since (almost!) every number has an associated uncertainty, and we spend much of our time trying to ascertain just how big these uncertainties are.

The usually unambiguous uncertainties to evaluate are the statistical uncertainties. With large data samples, these follow well-known procedures, which usually rely on the properties of the normal distribution, but many other distributions are also well studied. With a large data sample, the question can be asked how the result would fluctuate if we repeated the experiment. In Mike's yes/no polling example above, this statistical error would be delta_p = sqrt(p*(1-p)/n) where n is the number of cards drawn from the box. The interpretation gets more interesting if, say, only one card was drawn from the box. Ideally, you'd like to collect more data, but some experiments are just so expensive they cannot be repeated.

Systematic errors can creep in even in the simple, idealized case if, say, the "yes" cards had a systematically different size or shape from the "no" cards. To correct for biases, secondary measurements should be attempted -- for example, sampling in a different way, picking a different population. Often, systematic errors are assessed by comparing doing a measurement in several different ways, but this isn't always the best (you can fool yourself into thinking your systematic error is too small if all of your different ways of doing the measurement share a common defect). Instead we must try as hard as we can to think about how the assumptions that went into interpreting the data can be violated.

That having been said, observations really don't have errors on them, but interpretations do. If we draw "yes" and "no" cards from a box and write down the totals, these numbers do not have errors assocated with them. Only when we try to interpret their ratio as the fraction of the "yeses" and "nos" in the rest of the box do we incur uncertainty. Measure the length of something with a ruler, and the observation that the object lines up with a particular mark on the ruler has no uncertainty (although it could be different when the measurement is repeated). The interpretation of that observation that the object has a particular length expressed as a number with dimensions requires including our uncertainty about the calibration of the ruler, how well the zero-mark of the ruler was placed on the other end of the object, and such things as temperature and time variations in the calibrations and lengths.

Most measurements in physics have analogous uncertainties. We are constantly checking the calibrations of quantities which, while not the quantities we are interested in measuring at the end of the day, are nonetheless required for the measurement (in statistics jargon, these are called "nuisance parameters"). Ideally, a nuisance parameter will have been measured in another experiment with a well defined uncertainty. The best case of all is if the auxiliary experiment has only statistical errors, then there is no real reason to question the errors. The presence of systematic errors brings Bayesian judgement into play ("how much do I believe this experiment's systematic uncertainty is evaluated correctly?") If it's a really big deal, we should seek to find alternative determinations of the same nuisance parameter.

Another case of systematic errors having well-defined uncertainties is the following. In high-energy physics experiments, different models exist of how standard (uninteresting, already-studied but perhaps not perfectly) processes work. These standard, ordinary processes may produce, in high-energy particle collisions, events which look just like more interesting ones where a new particle is sought. A typical estimation of the systematic uncertainty on the standard process rate (the "background" to a search for a new particle) is to compare different models by running Monte Carlo simulations of them. If these simulations do not have enough events in their distributions, then they suffer from statistical uncertainty. In principle, this can be made as small as we like, since we just run our computer programs longer to get more precise estimates of how different the models are. But sometimes we run out of computers, money, or people to do the work, and settle for less. In fact, there is no real reason to push for infinitely statistically precisely estimated systematic uncertainties if the statistical error on the systematic uncertainty is much less than the systemtic uncertainty itself, or more problematically, is less than our belief that there could be other plausible models out there in addition to the ones we investigated which could have produced yet different values for the nuisance parameter, and therefore our measured parameter.

So -- when to stop? Often people just add the statistical uncertainty on a systematic uncertainty determination in quadrature to get the total contribution to the uncertainty from that unknown parameter.

In the end, almost no one reports uncertainties on uncertainties, but rather the uncertainties are inflated to cover our ignorance of the true spread.

Tom

*(published on 10/22/2007)*