In the final two blogs of our series looking at a research manuscript we will dig into the research itself. (In the first blog we looked at which sources to go to for reliable information and in the second looked at surface aspects of the paper). As with our other blogs, the questions below can be used as a Medical Research Bullshit Detector —if you get one or more “no”s to these questions, there’s a fair chance something is amiss.
Study design can be thought about like the plans for building the frame of a building. Just as the material and design of a building must be chosen to support the final structure, you need the right participants, outcomes and methods to answer your research question and support your conclusions. Just as you would choose different materials and frames to build a shack vs. a skyscraper, there are different study designs to find potential risk factors for cancer vs. to prove a specific compound cures cancer. The wrong study design can lead you to unsupportable (and dangerous) heights.
Whether or not it was done intentionally, the researchers behind Prevagen performed a sloppy study that found (at best) very weak support for Prevagen and over-sold the results to make it seem as if they completed a rigorous and meaningful clinical trial.1 To make the shortcomings of this study more obvious, we will pretend to be a researcher truly interested in determining whether or not Prevagen works and see how our ideal study design differs from that of the published trial.
Is the study testing a real scientific hypothesis?
Ideal: If we really wanted to find out if Prevagen helped people, and given the fears that older adults have about dementia and memory loss, we might test one of the following hypotheses: “Prevagen prevents memory loss in older adults” or “Prevagen improves memory in older adults with cognitive impairment” or “Prevagen slows (or reverses) the progression of Alzheimer’s dementia.” Having a strong and clear hypothesis is critical to a rigorous study because all subsequent decisions about the study design should come back to the question: “is this the best way to test our hypothesis?”
Prevagen: The closes thing to a hypothesis in this paper is: “the primary objective of the current study was to assess the effects of apoaequorin (Prevagen) on cognitive function.” Amongst scientists, we would refer to this type of study as a fishing expedition. In a fishing expedition researchers are not testing an hypothesis, they are casting wide nets and seeing if they catch anything. Fishing expeditions can be useful at early stages of research development (e.g. if we were looking for compounds that might prevent dementia) but are not appropriate when trying to determine the clinical benefits of a supposedly well-developed therapy. The danger here (which we will see played out) is that researchers could claim that anything they catch is significant, even though the odds were in their favor that they would catch something simply by chance.
Answer #1: NO
2. Is the study testing their product in a well-defined and relevant population?
Ideal: Assuming we stick to one of our hypotheses, we might want to test Prevagen in older adults, perhaps over 65. We’d want our participants to have memory issues or a diagnosis of Alzheimer’s dementia to see if we find improvements. To be rigorous, we would do objective tests of thinking and memory to make sure that our participants truly had memory issues or Alzheimer’s and were not simply anxious people with excellent memories. Alternatively, and if we had more time and a larger budget, we might choose to test Prevagen in older adults with normal cognitive function (proven through testing) and follow them for years to see if those taking Prevagen develop memory impairments or dementia at a lower rate than those on placebo.
Prevagen: Again, the actual trial made very different decisions. They defined older adults as age 40-95 which offends me not only because it makes me an older adult, but because we know there are significant differences in brain aging between 40 and 60 that would make results hard to interpret (some studies would even make further distinctions between 60 and 80 or 90). They include people who “have concerns related to memory issues” and do not do any objective testing to distinguish people with normal memory who worry about dementia from people actually experiencing memory problems. More notably, they exclude people with a “history of neurological disease, dementia or related memory-impairment disorders.” As a skeptic, one could take this to mean that they had no confidence Prevagen could treat real, clinical memory issues or dementia. But as a scientist, this does mean that there is NO EVIDENCE that Prevagen helps people with dementia or mild cognitive impairment (also known as predementia).
Answer #2: NO.
3. Do the outcome measures match the research question including a prespecified primary outcome measure?
Ideal: For the ideal study, we have two choices. First, we could use as our outcome a clinical diagnosis (e.g. dementia) and see if Prevagen prevents people from getting it (or even better reverses it, so that people with dementia go back to normal). Alternatively, we could choose a cognitive test or a battery of tests to carefully measure memory, ideally choosing a test that may be predictive of risk for dementia. If choosing this latter path, we should clearly state before we get our results how we will define memory improvement (e.g. change in which specific test) and when (e.g. 30 days) is the best time to test it.
Prevagen: It is difficult to know exactly what the Prevagen researchers did here. They used CogState, a battery of computerized tests. They explicitly mention two of the tests from the battery (shopping list learning and delayed recall) but are a bit vague as to whether they had participants do other tests (there could be up to 16). They stated “tasks used in this study included” the 2 reported but never specifying what other tests were administered. This is an important point because if they used all 16, their chances of seeing a positive result just by chance are over 50%. If we additionally look at subgroups as the Prevagen researchers did (e.g. looking at people with high vs. low memory scores) we increase our chances of a positive result to over 90%.
This is an example of the “Texas Sharpshooter Fallacy.” In this analogy, the sharpshooter first shoots at the side of a barn and then draws a bullseye around his closest cluster of bullet holes. With no prespecified hypothesis or primary outcome, researchers can similarly collect a lot of data and then, after the fact, pick out only those results that align with the story they want to tell.
On the positive side, they did state that although testing at 4 time points (8, 30, 60 and 90 days), the 90-day mark would be their primary measure.
Answer #3: NO
Take Home Points:
Proper study design is critical to allowing researchers the ability to answer their questions and draw sound conclusions.
Key aspects of study design include having a true hypothesis, choosing appropriate participants, and having a prespecified plan for how you will test your hypothesis.
When digging into a research study, it is critical to examine whether the study design matches the study intent and supports the study conclusions.
1. Moran DL, Underwood MY, Gabourie TA, Lerner KC. Effects of a Supplement Containing Apoaequorin on Verbal Learning in Older Adults in the Community. Adv Mind Body Med 2016;30:4-11.
Picture came from “https://www.bayesianspectacles.org/origin-of-the-texas-sharpshooter/” and features an illustration by Dirk-Jan Hoek (https://www.lambiek.net/artists/h/hoek_dirk_jan.htm).