Why Detecting a Missile Can Be Challenging: A Case for Hypothesis Testing | by Emin Ucer | Sep, 2024


In 1983, the Soviet nuclear early warning system detected an intercontinental ballistic missile launched by the United States. The standard protocol was to respond with a counterattack by launching another nuclear missile, a move that could have pushed the world to the brink of nuclear war. Fortunately, this warning was deemed a false alarm by Stanislav Petrov, who was an engineer on duty at the center of the early warning system. No further actions were taken.

Photo by Maciej Ruminkiewicz on Unsplash

I was still a student when I first read about this story, and I always wondered how a detection system actually works and how it can fail. Consider this: if a radar picks up a signal, how can you determine whether it is coming from an enemy target, interference, or pure noise? One reasonable solution is to set a threshold value for the signal strength. If the signal is strong (above the threshold), it is less likely to result from noise or interference and more likely to come from a large enemy missile or aircraft. But what if the threshold is set too high, only detecting very large targets and missing smaller ones? Conversely, what if it is set too low, detecting everything as a threat? What is the best way to select this threshold? And is it at least possible to determine the probability of errors in both scenarios?

You might increase the number of examples to illustrate this concept. How do you detect whether an incoming bit signal is a 1 or a 0 in a noisy communication channel? How do you know if your diet really works and you lost some weight if the scale gives slightly different readings each time? How do you determine if a COVID vaccine is effective if some recipients still succumb to the virus? As you can see, the common thread in these scenarios is the challenge of making decisions under uncertainty based on the observations. This is exactly where hypothesis testing comes into play. It has applications in various fields of engineering and is deeply rooted in statistical signal processing and the classification methods of machine learning.

That covers our motivation. Now, let’s build an intuitive understanding of the theory and enhance it with some interactive exploration. Consider a binary detection system that outputs a signal strength of 1 if a target is present and -1 if it is not. You can also think of it as a binary communication channel where bit one is represented as 1 and bit zero as -1. However, I prefer the target analogy, so I’ll stick with the missile detection system example.

The world would be perfect for engineers if there were no noise involved. We would declare the presence of a target whenever we saw a 1, and vice versa. However, reality is different. Let’s add some Gaussian measurement noise with zero mean and some known variance to the signals. This is how we would model the system.

H0 here is called the null hypothesis and it represents the case where there is no target (-1). Ha is the alternative hypothesis that represents the presence of the target (1). Omega (w) in this equation is the Gaussian noise defined as follows.

The objective is to assess whether there is sufficient evidence to either accept or reject the null hypothesis. However, how much evidence is considered ‘enough’? Imagine you obtain a reading of 0.31 from the detector. Does this value fully convince you that there is a target, simply because it’s closer to 1 than to -1? Perhaps not. The spread of numbers could be quite wide. Now, how can we increase your confidence? For instance, would having more samples from the detector enhance your level of assurance? Consider the following additional samples:

[0.31, -0.83, -0.68, -1.38, 0.05, -1.59]

Now, based on your observations (evidence), you may be more confident that there is no target out there. As you collect more and more samples, the standard variation of the sample distribution will diminish and the average value of the numbers will converge to the true value since the noise has zero bias.

After gaining an intuitive understanding, we’ll now take a look into the mathematical background of hypothesis testing and detection. Hopefully, this will also eliminate any ambiguity surrounding the concept of ‘confidence’.

As you may have noticed, this is essentially a conditional probability problem. We are evaluating the probability of observing the received sample(s) under a given null hypothesis. This probability is commonly known as the p-value, and if it is very low, such as 5% or less, we can reject the null hypothesis and accept the alternative. However, this does not mean the alternative hypothesis is true; it only indicates insufficient evidence to support the null hypothesis. This assessment can change as more data is collected and additional observations are made.

Now you may ask, “Why 5%? Who says it is significantly low?” You are right. This is up to you. This value is known as the significance level (or error rate), and you can, for example, reduce it to 1% to become 99% “confident” about your detection. So, a confidence level is like the complementary probability to the significance level, representing the degree of certainty you have in your results.

The significance level is essentially a probability (e.g., 5%, 10%), which we can use to determine a fixed critical value (or threshold). This threshold represents the point beyond which the probability of occurrence equals the significance level. For our H0 hypothesis, this value approximately corresponds to 0.64 for a 5% significance level (or 95% confidence) as shown below. Now that we’ve determined the critical value for our significance level, we can compare it to the first sample we received, 0.31, to assess whether the probability of observing this sample under this hypothesis is significant based on our chosen level. Since 0.31 is less than 0.64, it does not fall within the right tail of the distribution and is therefore outside our 5% error tolerance region. Hence, we cannot reject this possibility and must accept the null hypothesis, indicating there is no target.

A critical value is a threshold in statistical hypothesis testing that determines the boundary at which the null hypothesis is rejected, corresponding to a specified significance level (e.g., 0.05) and indicating the point beyond which results are considered statistically significant.

Alternatively, we can directly calculate the probability of observing values greater than or equal to 0.31 under the H0 distribution, and compare this probability (p-value) with the significance level (5%). This probability corresponds to the red area under the curve shown below and is approximately 10%. While not highly likely, it is still not insignificant — especially since it exceeds our significance level of 5%. So, we cannot reject the null hypothesis (the same conclusion).

A p-value is the probability of obtaining a test result at least as extreme as the observed one, assuming the null hypothesis is true, and it helps determine whether the observed data is statistically significant.

Up to this point, we’ve focused solely on a single hypothesis (H0), assuming that calculating the probability of observing the sample under the “No target” assumption is sufficient. There’s nothing inherently wrong with this approach, and it could indeed be effective. However, we could improve our detection and decision-making by incorporating more information about the alternative scenario. In our case, we also know the underlying distribution of the alternative hypothesis (Ha) and we can certainly make use of this information. Instead of merely assessing how likely it is to observe a value under the H0 hypothesis, we can also evaluate how likely it is under the Ha hypothesis. This approach allows us to make a more informed decision.

Let’s now say that we believe there is actually a target (Ha). Then we ask the question “what is the probability of seeing a number greater than or equal to 0.31, assuming there is a target (Ha hypothesis)?”

Computing how significant it is to see the received signal under the alternative hypothesis (Ha)

This time, the probability (represented by the blue area above) is calculated to be 24.5%. This probability is even higher than 10%, indicating that the value of 0.31 is more likely to have resulted from the alternative hypothesis rather than the null hypothesis. Given that each sample is derived from one of the two hypotheses, we need to select the more likely hypothesis based on the available evidence. And this is usually done by determining a decision rule (threshold) that minimizes “an error”. We can claim (detect) a target if the incoming signal is higher than the threshold and vice versa.

Let’s make it clear what we mean by error here. As you may guess, there are two error definitions when it comes to binary detection.

  • Detecting a target when there is none is referred to as a “false alarm.”
  • Failing to detect a target when it is actually present is known as “a miss.”

The above definitions for errors are always easier to understand and remember but we also have some other terminology that you will often hear.

A false alarm is known as a ‘False Positive (FP)’ in machine learning, as our prediction is wrong (False), indicating a target is present (Positive) when there is no target. This is also referred to as a ‘Type I Error.’

  • False alarm = False positive = Type-I Error

A miss is known as a ‘False Negative (FN)’ in machine learning because our prediction is wrong (False), indicating no target (Negative) when the target is actually present. This is also referred to as a ‘Type II Error.’ (You can easily remember this trick: ‘II’ stands for two negatives (false + negative))

  • Miss = False negative = Type-II Error

We separate these hypotheses by a decision rule, η, saying

  • if x ≥ η → Claim Ha hypothesis
  • if x< η → Claim H0 hypothesis

We can select the value of η by evaluating the Type-I and Type-II errors. As shown below, moving η to the right reduces the false alarm rate by increasing the likelihood of accepting H0 and claiming no target unless the received signal is sufficiently high (e.g., x>1). However, this approach also increases the risk of missing actual targets (blue area), as more true targets may be overlooked. Conversely, we can minimize the Type-II error by setting a low threshold, accepting Ha and claiming a target unless the received signal is very low (e.g., x<−1). However, this increases the chance of false alarms (red area), leading to unnecessary actions for nonexistent targets.

Binary detection is a decision-making process where a system determines whether a condition is true or false, typically classifying data into two categories, such as “target” or “no target.”

While it might seem that a miss has more severe consequences, a false alarm can also carry significant costs, making this decision more complex. A more systematic approach is needed to determine an optimal η that minimizes both types of errors while maximizing detection accuracy (e.g., using a likelihood ratio test).

There are additional methods to further reduce uncertainty, thereby minimizing errors in the system. By increasing the separation between probability distributions, decisions become clearer and more accurate. One effective approach to reducing uncertainty is to collect more samples, which improves the estimate. As the number of signal samples increases, the average value converges toward the true mean, reducing the variance of the sample distribution. This, in turn, decreases the likelihood of false alarms or missed detections. Alternatively, designing a binary system with a higher bias relative to the signal variance can further separate the signal distributions, making detection easier.

To experiment with all these concepts and play around with different parameters and see how they would effect the sample distributions, I developed an interactive environment for a simple binary detection example [link]. You can also access the source code from this link.

Interactive tool for hypothesis testing and detection

The sample distributions for H0 and Ha are shown at the top, with the detection threshold initially set to 0. The false alarm and miss error probabilities are represented by red and blue shaded areas under the curves, along with their respective values.

You can choose a distribution assuming either the presence or absence of a target and begin generating random samples from it. The detection rule is to calculate the sample mean and compare it to the threshold: if the sample mean is below the threshold, you declare ‘No Target’, otherwise, you declare ‘Target’.

Generating random samples from the selected distribution

Given that the false alarm probability is initially quite high (15.87%), you will likely see some samples fall on the right side of the threshold as you generate more. These samples, despite being generated under H0, will be classified as a target because they exceed the threshold, resulting in false alarms.

Demonstration of a false alarm

You can shift the threshold to the right (e.g., 2.3) to reduce the false alarm rate, but this will increase the miss error probability (blue area). With the new threshold, as you generate samples under H0, you’ll almost never declare a target, resulting in a false alarm rate of less than 1%. However, if you switch to the Ha distribution (assuming there is a target), you will miss most of the targets, as the same threshold will now lead to a higher probability of missed detections.

Setting a very high threshold to reduce false alarm probability ultimately increases the miss rate

One way to reduce error is by increasing the sample size (e.g., to 5) and using the sample mean for detection. This decreases the variance of both distributions, lowering the error probabilities to 1.25%. With this approach, you can now generate samples from either distribution and make an accurate detection 98.75% of the time.

Collecting more samples reduces the variance of distributions, thereby lowering the error probabilities

Alternatively, the system can be designed with sufficient bias to further separate the distributions. While this may not always be feasible due to system noise, it can significantly reduce error probabilities, making even a single sample sufficient for accurate detection.

Separating two distributions effectively reduces the overlap between the distributions and the error rates

You can interactively experiment with different options and parameters to enhance your intuition about hypothesis testing and binary detection.

That’s all for now. I hope it’s now clearer how hypothesis testing is connected to detection problems that subtly impact our lives.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here