There Are Only Three Possible Conclusions About Sample Size You Can Make for Your Discussion

Aug. 10, 2020, 10:37 a.m.

When it comes time to write the discussion of your paper, what should you say about the sample size? In many cases it is tempting to write the common phrase "as the sample size was small, clearly further research is needed." But, was that the correct thing to write? In some cases, yes, but small sample size is not always too small and knowing what to conclude about sample size in the discussion can make a huge difference in the impact of your paper.

In this post I will talk about the three - and only three - possible conclusions you can make about sample size, and how to know which one is the correct conclusion for your paper. As it turns out, making the choice, and making the correct conclusion is simple - just looking at the p-values and confidence intervals will tell you what to do.

For simplicity in this example we will consider a simple randomized parallel groups trial, although the decision process is basically the same for any study that has p-values or confidence intervals.

How does this work? This is a simple process with two steps:

Look at the p-value
Look at the confidence interval

Looking at these two values will give one of three possible scenarios:

The p-value is small
The p-value is not small but the confidence interval is narrow
The p-value is not small and the confidence interval is wide

1. The p-value is Small

This is the most fortunate event. The study has a significant finding.

In the case of our example using a parallel groups trial, if the study p-value is small, then the confidence interval for the effect will not include zero and the study thus demonstrates a difference likely exists between the two groups.

What can be concluded about the sample size? In this situation it would be wrong to conclude that the sample size is too small. In fact, the sample size was clearly large enough to find a difference between the groups. It is also wrong to conclude that the study lacked power. The study clearly had adequate power to detect a difference. In this situation the final conclusion in your discussion must be that the study size was adequate.

For an example of this situation we will look to a paper by Alba Ripol Gallardo evaluating the effect of a high or low resource setting on simulation performance. In this study, participants were randomized to the high resource or low resource group and their simulation performance was recorded on the 42 point Ottawa Global Rating Scale. Residents’ overall performance decreased between HRS and LRS (P= 0.0007) (95% CI for effect 1.7-4.0). In this case as there is a clear difference between the groups. It would be wrong to conclude that using a large sample size would be likely to change the results.

Significant p-values Make Conclusions Easy

Of important note here is that sample size is not a marker for generalizability. If your study was very focused on a specific population it is certainly OK in the discussion to suggest that the study be repeated in different settings.

2. The p-value is Not Small but the Confidence Interval is Narrow

This also can be a very fortunate event. It is often an event which researchers fail to capitalize on.

In the example of a parallel group trial, we would see a p-value too large to be important. However, the confidence interval is narrow and centred around the zero. How narrow is narrow? Simple: look at each limit of the confidence interval. If both the upper and lower limit of the confidence interval are too small to be of any real practical significance then we will call it narrow.

What does this situation tell us? It tells us that we can be reasonably confident that the true difference between treatments is likely to be very small.

So what should you conclude in your discussion? We should conclude that the study found no difference between the two groups and suggests that if there is a true difference between the groups it is likely to be small

For an example to this situation, we look to a study of residency training in disaster medicine led by Professor Sandy Dong. In this study, residents were given a pre and post-test in disaster medicine rated on a 48 point scale. Mean difference between post and pre-test scores was 0.35/48. The p-value for difference between pre and post-test scores was 0.77, and the 95% confidence interval for the effect was -2.2 to 3.0. This study suggests that we can be 95% confident that the true difference between pre and post-test scores was less than 3 points of 48. As the authors consider that a difference of 3/48 is unlikely to be of any practical significance, it is reasonable to conclude that no significant difference between pre and post-test scores exists. In this case repeating the study on a larger sample is unlikely to be of any benefit.

3. The p-value is Not Small and the Confidence Interval is Wide

This is a common situation in small studies. In this case the confidence interval is wide and contains zero effect. What is wide? Again we look at the upper and lower limits of the confidence interval. If either of these values might be of practical significance, then the interval is wide.

In the case of a parallel groups trial, this means that the effect size may be zero, or it may in fact be important.

And the conclusion to write in your discussion? In this case you should conclude that the present study found no difference between the treatments, but, since the sample size was small, further studies with a larger sample size may be helpful to determine if a true effect exists.

As an example of this situation, we can look at a paper by Liva Christensen on the use of video instruction for donning and doffing of personal protective equipment. In this case participants were randomized to two groups - receiving either in-person or video teaching on personal protective equipment. They were then evaluated on a 100 point scale. There was no significant difference in the doffing score between the video group and the instructor group (95% confidence interval for effect: -7.6 to 18.0; P-value: 0.54) In this case the p-value is clearly large. Although the study found no significant difference between the groups, the confidence interval suggests that the true difference between groups may have been as high as 18/100. As 18/100 may have practical significance, it would be correct in this case to conclude that further studies with a larger sample size may help clarify the true difference.

What About Power?

I prefer to not mention the concept of power when discussing sample size in the discussion. In an upcoming blog post called "Can We Please Stop Talking About Power" I will explain further why I feel that the concept of power is useful for planning a study, but that the confidence intervals are far more effective in interpreting the study.

The Right Way to Interpret Sample Size

In this article you can see clearly that talking about sample size need not be difficult. In fact, the statistics of the study clearly indicate to the writer exactly what conclusions about sample size should be made in the discussion. Making rational and scientific conclusions about your sample size can super-charge your paper and increase your impact.

If you are looking for a graphical summary of how to make conclusions about sample size, why not download the infographic.