Unfortunately, I do not believe it is 100% bullet proof. I believe an argument can be made that the trial data, as presented by the JAMA article and supplements, allows for the possibility that the drug *does* establish statistical significance in its primary outcome but does not establish that fluvoxamine *reduces hospitalizations intrinsically*. Please note the "rules" of this argument:
1. I don't have access to any hidden information besides what's presented in the JAMA article and in the supplementary materials. For instance, the trial researchers may have all kinds of data that makes them feel the way they do. My argument is based 100% on the JAMA article.
2. I have corresponded with Dr. Boulware as well as Dr. Reiersen about forms of this argument, in the hopes that the new larger RCT could integrate outcomes that can settle these concerns.
Unfortunately, despite #2, the trial began enrollment prior to my contacts with Boulware and Reiersen on this matter, so that the pre-registered outcomes of the trial do not reflect these concerns. But some of them are planned as non-predefined analysis, which I think will be very valuable. There are ways to assuage these concerns with follow up data, which I will detail after my argument below.
## How the trial can show statistically significant primary outcome but not prove fluvoxamine treats COVID-19
### Part 1: definition of clinical deterioration
First, let's look at the primary outcome of the RCT. The primary outcome is notably not "reduction in hospitalizations," but rather (taken from the preregistration statement on https://clinicaltrials.gov/ct2/show/NCT04342663 - similar but slightly different wording is in the JAMA paper):
"Clinical worsening is defined meeting both of the following: (1) presence of dyspnea and/or hospitalization for shortness of breath or pneumonia, plus (2) decrease in O2 saturation (<92%) on room air and/or supplemental oxygen requirement in order to keep O2 saturation >92%."
In the JAMA paper, this is reworded but the idea is the same:
"Clinical deterioration within 15 days of randomization defined by meeting both criteria of (1) shortness of breath or hospitalization for shortness of breath or pneumonia and (2) oxygen saturation less than 92% on room air or need for supplemental oxygen to achieve oxygen saturation of 92% or greater."
Now critically, we must pay attention to what "clinical deterioration" is defined as here. It is *not* "patient was admitted to the hospital for COVID-19," but rather:
"Clinical deterioration" = ("shortness of breath" OR "hospitalization for shortness of breath or pneumonia") AND ("decrease in O2 saturation (<92%) on room air and/or supplemental oxygen requirement in order to keep O2 saturation >92%")
Note the critical first OR:
("shortness of breath" OR "hospitalization for shortness of breath or pneumonia")
I have verified with Dr. Reiersen that this is indeed an "OR" - not an "AND". E.g. my interpretation here does not rely upon potential ambiguity in the language.
### Part 2: trial protocol RE: patient contact
This part of the argument is not strictly required to establish my claim that the trial is not bulletproof. But I believe it highlights another potential pitfall in the trial.
If we take a look at Supp 1 (trial protocol and statistical analysis plan) in the JAMA article, we find this:
"If a study participant develops a decrease in oxygen saturation to less than 90% on room air on >2 readings, persistent increase in respiratory rate to >30 breaths per minute, persistent increase in Heart Rate to > 120 beats per minute, alteration in mentation, or severe worsening in shortness of breath, the research staff will direct them to seek emergency medical care at the nearest emergency department. If none of the above conditions are met, but the research staff still feel that the participant is unwell, one of the physician investigators will evaluate the participant via phone/telehealth and direct them for further care if needed. When a participant is directed to seek emergency medical care, they will be instructed to use a mask if available, and to identify themselves to EMS or to the Emergency Department staff as having been diagnosed with COVID-19."
in particular, note the passages:
"..the research staff will direct them to seek emergency medical care..if none of the above conditions are met, but the research staff still feel that the participant is unwell, one of the physician investigators will evaluate the participant via phone/telehealth and direct them for further care if needed."
So let me sketch out what the JAMA paper and supplements establish in terms of how the trial was operated:
A. Patients are sent study materials and study drug. Patients self-report using a REDCap survey two times a day.
B. The exact phrasing of the survey are not included in the JAMA article or supplement. However, we can infer that the following details must have been asked during the 2x a day survey:
i) A shortness of breath ranked scale, e.g. a question that is probably like "How would you rank your shortness of breath on a scale of 0 - 10?"
ii) SpO2 reading numbers
iii) vital signs
iv) medication adherence
v) COVID-19 symptoms (likely a free form text box)
(Source: second to last paragraph under "Study Design" in the JAMA article)
So each day, participants fill out this survey 2x a day. It asks them to measure their SpO2, report some vitals (like temperature and perhaps BP [unclear but not relevant]), report their ranked SOB score (0-10) and then talk about their symptoms in a free-form way. *Then* the study staff monitor these reports. As detailed, if study staff see the following, they they contact the participant:
A. SpO2 < 90% more than twice
B. Fast respiratory rate
C. Heart rate > 120 BPM
D. Alteration in mentation - very open to interpretation, but extreme anxiety, confusion, etc would likely qualify
E. "Severe worsening in shortness of breath" (defined separately than the above)
F. Catch-all: "If none of the above conditions are met, but the research staff still feel that the participant is unwell, one of the physician investigators will evaluate the participant via phone/telehealth and direct them for further care if needed"
If any of A-F occur on the 2x a day surveys, the study staff will:
"..the research staff will direct them to seek emergency medical care at the nearest emergency department"
and
"..one of the physician investigators will evaluate the participant via phone/telehealth and direct them for further care if needed" - applies only to the catch-all (F) above.
#### Part 2A
So let's look at the defined "clinical deterioration" again:
"Clinical deterioration" = ("shortness of breath" OR "hospitalization for shortness of breath or pneumonia") AND ("decrease in O2 saturation (<92%) on room air and/or supplemental oxygen requirement in order to keep O2 saturation >92%")
Let's go over who is and isn't included in this definition:
A participant who has:
(SOB) AND (SpO2 < 92%)
qualifies. This participant **does not need to be admitted to the hospital**.
A participant who has:
(SpO2 < 92%) AND (hospitalization for shortness of breath or pneumonia)
qualifies.
A participant who has:
(SpO2 < 92%)
Does NOT qualify as clinical deteriorated.
#### Part 2B
Based on the trial protocol, we can also infer that the research staff directed participants to the hospital or to further medical evaluation based on the (A-F) criteria above. This is important because "clinical deterioration" is defined not entirely based on hard outcomes (e.g. SpO2, temperature), but includes subjective data such as shortness of breath.
Looking at the criteria for contact again:
A. SpO2 < 90% more than twice
B. Fast respiratory rate
C. Heart rate > 120 BPM
D. Alteration in mentation - very open to interpretation, but extreme anxiety, confusion, etc would likely qualify
E. "Severe worsening in shortness of breath" (defined separately than the above)
F. Catch-all: "If none of the above conditions are met, but the research staff still feel that the participant is unwell, one of the physician investigators will evaluate the participant via phone/telehealth and direct them for further care if needed"
It is worth noting that of B-F are influenced by personal, subjective perception of anxiety / fear. I can provide citations if needed in a follow up, but shortness of breath, respiratory rate, heart rate, mentation are all highly influenced by anxiety / fear response.
NOTE: I will discuss anxiety further below - I know what you're thinking! Anxiety was measured, etc etc - please hold that thought for now
So for instance, imagine the fluvoxamine group experienced a decrease in anxiety / fear response relative to the control group. *Then* there would be *less* patient contact using the researcher's contact criteria, and *then* less hospitalization / clinical deterioration.
#### Part 2C - lack of SpO2 hard data
If we look at Supp eFigure 3, it shows us a comparison of *baseline* SpO2 measurements between the treatment and the control group.
It does *not* show us a difference in SpO2 measurements *during* the trial between the groups.
So critically, no hard outcome in terms of SpO2 difference is shown. This is unfortunate.
I have suggested to Dr. Boulware and Dr. Reiersen some ideas for how to demonstrate, in the follow up trial, a potential difference in SpO2 between the groups.
If fluvoxamine is treating COVID and is keeping people out of the hospital by improving their condition, then we might expect some difference in SpO2 between the groups.
Note that I recognize that a *simple average of the SpO2 curves between the groups will not capture this difference due to low event rate*. So a different analysis would have to be performed. But this should not be too difficult.
E.g. you could do something like:
1. Difference in SpO2 curves of participants who experienced SpO2 < 92% between groups.
There are probably many other options here. Sadly this was not predefined so anything produced will be post hoc. But I still think it would be extremely valuable.
### Part 3: anxiety connection
The authors were right to analyze potential anti-anxiety effects of fluvoxamine, and as I'm sure you thought as soon as I wrote the above - they produced Supplementary eFigure 2. I think this is great. However, given the study protocol (e.g. the contact criteria) and the way the numbers actually work out (detailed below), the effect on anxiety must be considered more seriously.
While you may think eFigure 2 proves there was no anti-anxiety effect of the drug on the treatment group. I do not believe this can be proven based on the figure alone. Here's why:
1. eFigure 2 displays a scatter plot of the self-assessed anxiety score of participants over time. Due to a low event rate, we would *not* expect a graph like this to prove a lack of anti-anxiety effect of the drug. The anxiety score of most participants, who were not very sick (low event rate) is low. An average comparison like this, in this population, is not adequate. For instance, take a look at the figure. Now imagine the graph only shows those who were very sick. You can easily imagine the graph for those who were very sick could look different than the averaged graph drawn at the bottom of the figure. Here's an example:
https://twitter.com/__philipn__/status/1367981172106567680
If we look at those with the highest level of anxiety, you'll note that the # in the control group is clearly higher as the study progresses.
2. Despite claims that the anti-anxiety effects of SSRIs take 5-6 weeks to kick in, this is not supported by the literature on fluvoxamine:
A. In "Fluvoxamine treatment of generalized social anxiety disorder in Japan: a randomized double-blind, placebo-controlled study" (https://pubmed.ncbi.nlm.nih.gov/16573847/):
* A statistically significant ~35% improvement on an anxiety scale is shown **within a week** in those with generalized social anxiety disorder.
See Figure 8.
B. In "Fluvoxamine in the treatment of panic disorder: a multi-center, double-blind, placebo-controlled study in outpatients" (https://www.sciencedirect.com/science/article/abs/pii/S0165178101002657):
* Establishes that "significant improvement was evident as early as week 1 for some panic variables"
See Figure 1, which shows clear divergence of the panic curves within 1 week.
3. Not an RCT, but I found the following paper interesting: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC167189/ - "Violent acts associated with fluvoxamine treatment" (ignore the title). If you skim through the case reports, you'll note that the participants all suffered from shortness of breath as part of their anxiety / panic disorder, which was treated with fluvoxamine.
What then to make of these SSRI papers showing the effects take several weeks to kick in? Perhaps they are measuring the average effects of the SSRI across the patient population (e.g. "how long until everyone experiencing some statistically significant effect across all characteristics treated by SSRIs" versus "how long until a statistically significant effect is measured against anxiety/panic for ").
Based on the two RCTs above, fluvoxamine could have an anti-anxiety effect in those experiencing shortness of breath / panic / anxiety within 1 week. This means, given the fluvoxamine trial protocol, the contact criteria, etc, that this potential anti-anxiety effect must be factored in.
### Part 4: analysis of statistical significance in light of potential effect on anxiety
Based on the above potential anti-anxiety effect of the treatment drug, what can we make of the figures in the trial? Let's take a look:
"Clinical deterioration occurred in 0 of 80 patients in the fluvoxamine group and in 6 of 72 patients in the placebo group (absolute difference, 8.7% [95% CI, 1.8%-16.4%] from survival analysis; log-rank Pā=ā.009)"
So clearly the figure is statistically significant based on the primary outcome. But is it possible, based on what I've outlined above, that fluvoxamine can succeed in this RCT but *not* actually be treating COVID-19? I believe it's possible. Here's how:
The authors made the excellent decision to include details on every participant that experienced "clinical deterioration", as detailed in Supplementary eResults 1. Let's go through each one, based on what we know above, and see if the results are clearly statistically significant:
* PLACEBO - Participant 1 (P1): This looks like it could be an anti-anxiety effect. Participant had:
i) SpO2 < 92% AND SOB.
ii) Went to hospital, got better as soon as he was there.
iii) Was NOT admitted to the hospital.
So we can throw this participant into the "could have anxiety effect" bucket.
* PLACEBO - P2: There was no hospitalization in this participant. Participant had:
i) SpO2 < 92
ii) SOB.
So we can throw this participant into the "could have anxiety effect" bucket.
* PLACEBO - P3: This patient was hospitalized for 21 days and was on a vent, so
I do not believe a good argument could be made that there was an anxiety effect
here. Though it cannot be ruled out completely, I will not add this participant
to the "could have anxiety effect" bucket.
* PLACEBO - P4: This participant is a little more questionable. She was hospitalized,
but the trial staff report she was hesitant to go to the hospital. It sounds like
the trial staff strongly encouraged her to go to the hospital, and without their
contact she would not have been hospitalized. She was hospitalized for 4 days and
received supplementary O2.
It seems plausible that this participant would have recovered at home. But she was
hospitalized. So let's add her to:
* "Could have anxiety effect" bucket (tentative)
* "Was hospitalized" bucket.
* PLACEBO - P5: This participant did not meet the trial's definition for clinical
deterioration on the basis of SpO2 or SOB. Instead she had nausea, vomiting, diarrhea
and a fever. You'll note that despite having pneumonia in her CT scan at the hospital,
she had no SOB and her SpO2 >= 92%.
* "Was hospitalized" bucket.
* PLACEBO - P6: Participant was not admitted to the hospital for COVID or pneumonia.
Participant did not experience SpO2 < 92%, but instead experienced SOB. Participant
was given supplemental O2 while hospitalized so is counted as COVID hospitalization.
Participant experienced high SOB score and that is likely what caused him to go to
the hospital. So in this case, I do not think we can rule out a potential anti-
anxiety effect.
* "Could have anxiety effect" bucket
* "Was hospitalized" bucket.
Going over the above figures, we have now:
* "Could have anxiety effect": 3 with 1 tentative
Looking at the figures now, we have:
* Placebo: "clinical-deterioration"-without-anxiety-effect: 6 - 3 (or 6 - 4) = 3 or 2, out of 72.
* Treatment: "clinical-deterioration"-without-anxiety-effect: 0 / 80
This is:
* Placebo: 3/73, or 2/72 = 4.17% or 2.78%
* Treatment: 0%
Given these figures (see https://clincalc.com/stats/samplesize.aspx), to have 80% power, the trial would need 366 participants for the 4.17% figure. For 2.78%, the study would need 554 participants.
An important missing piece of the paper / data is how many participants experienced SpO2 < 92% but no SOB and no hospitalization-for-COVID-or-pneumonia. Because the "clinical deterioration" definition is an "AND", and the second clause of the AND operation is influenced by anxiety, this is relevant.
In graphical form:
### Part 5: Crux of argument
The crux of my argument is that:
1. Fluvoxamine may have an anti-anxiety effect during the trial period.
2. This anti-anxiety effect may effect both the definition of "clinical deterioration" and the protocol followed by study staff in terms of participant contact.
3. The above makes it more likely that those in the treatment group will NOT experience clinical deterioration ('AND SOB' OR hospitalization).
4. The numbers provided by the study are not strong enough to show a statistically significant effect that can rule out an anti-anxiety effect of fluvoxamine, despite the trial author's best efforts at addressing this question.
So it is not possible for the current fluvoxamine trial to establish statistical significance modulo possible anti-anxiety effects on need for hospitalization.
-----------
How can the new RCT help? I believe that hard endpoints such as SpO2 could be tracked. I also believe that other trials, such as in patient populations with higher event rates (e.g. higher risk patients) or in early hospitalized cases would go a long way to making this completely clear. It is also possible that the combined figures from the new RCT, even if terminated early for meeting it's primary outcome with statistical significance (say ~200 participants in interim / early termination) can be combined with the previous RCT's numbers to reach significance modulo anxiety effects.
Best-
Philip Neustrom
https://twitter.com/__philipn__