The truth we seek usually lies buried in a haze of noisy data almost unrecognizable from the truth.
Research is the process of generating data, gleaning information from it and then forging the information into consequential knowledge. None of these steps is straightforward and each step comes with a level of uncertainty that is not easily measurable. But, the challenge is especially acute for research in farm animal advocacy.
A large part of this difficulty is related to the concept of statistical power.
What is statistical power?
Suppose I want to find out if a talk on the cruelty of factory farming inspires listeners to reduce their meat consumption. Assume I have a hypothesis that the talk will have an effect on some listeners who will begin to eat less meat. So, I choose a population and design an experiment to test it and measure the size of the effect. The power of my test is defined as the probability that my experiment will detect an effect if there is indeed an effect. The convention is to develop experiments with power equal to or greater than 0.8.
Given a definition of what constitutes a successful detection of an effect (based on, say, a significance test), the power of a study depends, primarily, on four factors:
- The size of the effect: It should surprise no one that the larger the size of an effect, the easier it is to detect it.
- Measurement error: Most phenomena of interest do not present themselves to us in black and white for easy categorization or precise measurement. The greater the error in the measurements we make, the lower the power.
- Reliability of the metrics: Even with zero measurement error, there is usually a substantial natural variability in the true value of the quantity being measured. When we choose measures which tend to yield the same answer under similar conditions, we improve the power of the experiment.
- Sample size: As expected, a larger sample increases power. If you toss a biased coin a hundred times you are more likely to be able to detect that it is biased than if you toss it only ten times.
Factors behind power in studies on farm animal advocacy
Lamentably, each of the above four primary factors that dictate statistical power presents a knotty challenge in the case of research in farm animal advocacy. This makes all of the ingredients in the process of such research—the identification of a hypothesis, the design of an experiment, the logistics of conducting a study, and the subsequent analysis of the observed data—more demanding of inventive attention and more arduous to execute correctly. Let's consider each of these factors:
The size of the effect in response to an intervention, based on exploratory studies conducted in farm animal advocacy thus far, is regrettably low. Human nature and our current societal norms—and maybe other determinants as well—conspire to make these effect sizes small and make studies in farm animal advocacy more exposed to being underpowered. Also, when effect sizes are small, false positives dominate the data—as a result, even if deemed true by the data, a study's finding is more likely to actually be false!
Measurement error is an old problem in diet-related studies and plagues much of the research in farm animal advocacy. More often than not, studies of dietary behavior are based on self-reported survey data. But, few people can accurately estimate the number of servings of different categories of food they eat on average or even what they ate in the last 24 or 48 hours. Study subjects may complete surveys hastily and with callous indifference just for the chance at winning the prize typically promised along with the exhortations to complete the survey. It is hardly clear if attention filters built into survey questions are sufficiently effective at discarding invalid data. Self-reported survey data also introduce a variety of biases, including selection and self-desirability biases, which further distort the data. Given the smallness of the effects we seek, these errors sometimes compound each other and bury the signal deep in the noise.
All of these measurement errors are serious and, when ignored in power calculations of a study, render those calculations and the study less credible.
Reliability of the metrics is another vexing issue in diet-based research in farm animal advocacy. Some diet studies rely on 24-hour or 48-hour recall data, but there is significant natural variability in the reported data from day to day. Few people eat exactly the same amounts of each class of food every day—data from a survey conducted on one day may exhibit different characteristics from data using the same survey of the same subjects conducted on another day. Even in the absence of any measurement error, there is an inherent variance in the measure which reduces the power of studies based on such data.
Sample size is a familiar problem in research, but one that adds a particularly onerous cost to research in farm animal advocacy. Suppose I wish to discover the demographic patterns of vegetarians. Given that only about 2% of the general population is vegetarian, surveying a statistically random sample of 1,000 vegetarians requires surveying a statistically random sample of 50,000 people!
The peril of underpowered studies is, unfortunately, greater than is generally assumed. It is a common misconception that the most harm an underpowered study can do is fail to detect a true effect. In fact, the lower the power of a study, the larger the variance in its observed data and, therefore, the lower is the probability that a detected effect reflects a true effect. Even worse, when an underpowered study does detect an effect, it is more likely to estimate an exaggerated magnitude for the effect.
So, are there solutions?
The above, admittedly, presents a grim picture and there are no perfect solutions. But, there are mitigating methodological approaches one can use.
Given the small effects in metrics of interest, the brute-force solution, as always, is a much larger sample size. But, in a few cases, it may also be possible to shift to new metrics with larger effects but which also hold a monotonic relationship to effects based on the metrics they replace. Errors in measurement are best reduced by avoiding our reliance on self-reported survey data for patterns of meat consumption. Reliability of dietary measures is improved by basing inferences less on individual patterns of consumption and more on aggregated patterns over longer durations of time.
The potential inconsistency of the findings of an underpowered study call for power calculations based on all of the four factors above.
Humane League Labs is working on all of the above solutions as steps toward improving the power of our studies. In data-driven social science research, it is an under-appreciated fact that the truth we seek usually lies buried in a haze of noisy data almost unrecognizable from the truth. Each good scientific study wipes out a bit of that haze and it is a long process—both arduous and exciting—by which we slowly begin to see approximations of the truth.