Take That Subset Analysis And Shove It

Actually the title of this article should be “Take That Non Pre-defined Registrational Study Analysis and Shove It” but that title was too long to fit in the space. Subset/subgroup analyses and other non pre-specified data analyses of clinical trials are actually very useful tools in the biostatistician’s toolbox but are highlighted far too often by pharmaceutical companies to mask poor data.

Lets talk about why biostatisticians so often look unfavorably at non predefined data analyses. As mentioned by Richard Simon in PATIENT SUBSETS AND VARIATION IN THERAPEUTIC EFFICACY:

Suppose that we have randomly assigned treatments A and B and partition our patients into G mutually exclusive subsets. For each subset we perform a statistical significance test and declare a difference statistically significant’ if the calculated significance level is α or smaller. We obtain one ’significant’ difference using α = 0.05, but someone points out that even if the treatments are identical the probability of obtaining at least one significant result is 1 -(1-α)^G

Put another way, even if a drug has no clinical effect whatsoever, just by pure chance the probability is 2 out of 5 (40%) that at least one subset analysis of it out of ten in a study will show a statistically significant p-value of 0.05 or less.

This also applies to primary analyses as well. For example, if I tried testing my old smelly shoe versus placebo as a cancer treatment and I ran three clinical trials of this novel smelly shoe cancer treatment versus placebo, just by dumb luck about 14% of the time at least one of my studies would show a statistically significant (at the 0.05 one-sided level) result on the three studies’ primary endpoint even though my old smelly shoe has no efficacy against any cancer.

                PROBABILITY OF SEEING AT LEAST ONE FALSE-POSITIVE RESULT
               IN X TESTS (OR STUDIES) OF TWO PLACEBOS AGAINST EACH OTHER

                                                         Significance level
                                                       1.0%       2.5%       5.0%
                                  # of tests
                                      5                4.90%     11.89%     22.62%
                                      4                3.94%      9.63%     18.55%
                                      3                2.97%      7.31%     14.26%
                                      2                1.99%      4.94%      9.75%

There are ways to minimize the odds of falsely accepting an incorrect subset or primary analysis. The best ways to prevent these kinds of incorrect conclusions, which statisticians call a type 1 error, are to:

CONFIRM RESULTS BY TRYING TO REPEAT THEM

This possibility of a study producing false-positive results just by chance is why the FDA almost always requires a statistically significant efficacy result to be repeated at least once with a second well-balanced randomized controlled study before it will approve a drug.

As FDA reviewers so often mention, the chance of falsely accepting an incorrect hypothesis from one clinical study at the 0.05 alpha level (two-sided test) is 1/40 or 2.5% whereas the chance of falsely accepting an incorrect study endpoint that produced statistically significant results in two clinical studies is (1/40)*(1/40)=1/1600 or .0625%

PRE-SPECIFYING A STUDY’S PRIMARY ENDPOINTS AND SUBSET ANALYSES

For any clinical trial there are hundreds of ways that its study data can be sliced, divided, and parceled out for subset and sensitivity analyses. To prevent drug companies from data-mining and self-selecting the subgroups, analyses, or endpoints that portray their compound in the best light, regulatory agencies like the FDA require them to pre-specify their study endpoints before undertaking registrational clinical studies and often force drug developers to pay a statistical penalty for delineating multiple primary clinical trial endpoints (if any one of which can be used for a drug’s approval).

So the next time you hear a drug developer reporting that its compound failed in a phase 3 registrational study but showed promise in a subset analysis, be sure to take a good hard look at the data and remember that there is a very high chance that even a smelly shoe will generate some statistically significant results on some analyses versus placebo in a clinical trial. Non pre-defined retrospective analyses are good for exploratory hypothesis-generating ideas, safety checks, and sensitivity testing but will generally not work as a means to get most drugs approved for marketing. Don’t fall for the hype.

For further reading on this topic click on the links below:

www.emea.europa.eu/pdfs/human/ewp/090899en.pdf

www.pubmedcentral.nih.gov/picrender.fcgi?artid=1427603&blobtype=pdf

  • Share/Save/Bookmark

Leave a Reply

Useful Sites