Imagine a mad scientist and his lab assistant developing a powerful explosive using two newly discovered chemicals – Alpharium and Betamium.
In their pursuit, they come up with a research question to test:
“Which combination of Alpharium and Betamium will produce the most powerful explosion?”
So, they put together some tests combining Alpharium and Betamium at different ratios while carefully measuring the force generated when each compound explodes.
After the first test, they compare results and discover the mad scientist’s results are different from his assistant’s results.
Was this due to human error in the setup of the experiment? Was it due to environmental conditions? Was it because their gauges were calibrated differently?
They pondered over these questions and decided to try the experiment a few more times.
At the end of their of the testing, they once again compared notes and discovered no two trials had resulted in exactly the same measurements, but the difference between in their average findings for each treatment was minimal.
In a blog post for Mad Science Magazine, they shared their findings:
“Treatment A, a 3:1 combination of Alpharium and Betamium, produced a 36% higher average level of explosive force than Treatment B, an equal combination, N=43, p=.024, α=.05.”
A week later, the mad scientist and his assistant were surprised to find themselves answering more questions about the symbols used in reporting than the actual results themselves.
Results are only useful to the extent they are understood
“So, what exactly do those little symbols mean?”
This is a common question our data sciences team receives frequently from readers of our research publications.
When you’re immersed in data on a daily basis, it becomes all too easy to forget that everyone is not fluent in statistical terminology.
In today’s MarketingExperiments Blog post, I wanted to share three statistical concepts every marketer should understand.
My goal here is not to give you a Ph.D. in statistics, but rather, I want to try and demystify a few common symbols used in statistical reporting you can use to aid your team’s next discussion of test results.
N= how many samples were taken
N means “number” and it tells you how many samples were taken.
Why is this important?
Because the lower number of samples you take, the greater likelihood your results could be skewed by random variation (randomness, in a mathematical sense).
This means our mad scientist and his assistant had higher odds their initial results were erroneous after collecting two samples for each treatment because they were working with a very small sample size.
However, once they collected 43 samples, their sample size was large enough that the statistics told them there was significant difference between the two compounds with an acceptable level of certainty.
p= an estimate of how likely something is to happen
And, what about that p?
p (for probability) refers to the statistical concept of significance.
Any time someone calculates a statistic from a set of samples, they are making an assumption about the entire population or about all possible outcomes from the experiment.
The p-value is a calculation which indicates the probability of observing results as far apart as those recorded in the experiment. It is essentially the probability of a false positive result.
The smaller the N (number of samples taken), the less confident the statistics are about any possible patterns in the data and the larger the p-value becomes.
For example, a p-value of 0.3 would indicate that there is about a 30% chance that the observed differences between the results are actually due to random variation.
Very few scientists, however mad they might be, are willing to risk those odds.
However in this case, our mad scientist and his assistant calculated that p=.024. This means that there is only a 2.4% chance that the observed differences are really just an illusion.
One thing also worth mentioning here is that your statistical level of confidence is also the inverse of your p value. (Hint: This really comes in handy when talking about statistical significance.)
This means the 2.4% chance of illusion is also a 97.6% chance you observed a real difference between the two treatments.
α= the line in the sand that tells you if something is statistically significant
At some point in the history, a few scientists decided to use .05 as the cutoff point for their p-values.
This means that if p was .05 or less (which is also called 95% level of confidence), then your findings were deemed to be “statistically significant” within the context of an industry standard.
This is where the alpha value, or α, also likely became a standard in reporting stats.
There was no special reason that I know of for using p=.05 or less as the line in the sand, but most research today continues to use that cutoff when determining if results are significant. There are a few fields for which a different value is used, such as nuclear physics, where 5% risk of error is unacceptable. In practical business, a level of confidence of 93% (p=7%) might represent an acceptable level of risk for some businesses. For any particular situation, α is determined by agreement of those involved.
So, what about the results for our mad scientist and his assistant?
They reported that their p-value (p=.024, or 2.4%) was less than the α value (α=.05, or 5%) they used for the experiment.
Remember how I explained that the 2.4% chance of false positive also means there is a 97.6% chance that you observed real difference between your treatments?
Well, what p=.024, α=.05 really means is the results were significant and can be used toward future research because their 97.6% level of confidence exceeds the industry standard of a 95% level of confidence or higher.
Therefore, the conclusion that the 36% difference they observed was in fact statistically significant.