When it comes time to write the discussion of your paper, what should you say about the sample size?
In many cases it is tempting to write the common phrase "as the sample size was small, clearly further
research is needed."
But, was that the correct thing to write?
In some cases, yes, but small sample size is not always *too small* and knowing what to conclude
about sample size in the discussion can make a huge difference in the impact of your paper.
## 1. The p-value is Small

## 2. The p-value is Not Small but the Confidence Interval is Narrow

## 3. The p-value is Not Small and the Confidence Interval is Wide

## What About Power?

## The Right Way to Interpret Sample Size

In this post I will talk about the three - and only three - possible conclusions you can make about sample
size, and how to know which one is the correct conclusion for your paper.
As it turns out, making the choice, and making the correct conclusion is simple - just looking at the
*p*-values and confidence intervals will tell you what to do.

For simplicity in this example we will consider a simple randomized parallel groups trial,
although the decision process is basically the same for any study that has *p-*values or
confidence intervals.

How does this work? This is a simple process with two steps:

- Look at the
*p-*value - Look at the confidence interval

Looking at these two values will give one of three possible scenarios:

- The
*p-*value is small - The
*p-*value is not small but the confidence interval is narrow - The
*p-*value is not small and the confidence interval is wide

This is the most fortunate event. The study has a significant finding.

In the case of our example using a parallel groups trial, if the study *p-*value is small,
then the confidence interval for the effect will not include zero and the study thus demonstrates
a difference likely exists between the two groups.

What can be concluded about the sample size? In this situation it would be wrong to conclude that the sample size is too small. In fact, the sample size was clearly large enough to find a difference between the groups. It is also wrong to conclude that the study lacked power. The study clearly had adequate power to detect a difference. In this situation the final conclusion in your discussion must be that the study size was adequate.

For an example of this situation we will look to a paper by Alba Ripol Gallardo evaluating the effect of a high or low resource setting on simulation performance. In this study, participants were randomized to the high resource or low resource group and their simulation performance was recorded on the 42 point Ottawa Global Rating Scale. Residentsâ€™ overall performance decreased between HRS and LRS (P= 0.0007) (95% CI for effect 1.7-4.0). In this case as there is a clear difference between the groups. It would be wrong to conclude that using a large sample size would be likely to change the results.

Of important note here is that sample size is not a marker for generalizability. If your study was very focused on a specific population it is certainly OK in the discussion to suggest that the study be repeated in different settings.

This also can be a very fortunate event. It is often an event which researchers fail to capitalize on.

In the example of a parallel group trial, we would see a *p-*value too large to be important.
However, the confidence interval is narrow and centred around the zero.
How narrow is narrow?
Simple: look at each limit of the confidence interval.
If both the upper and lower limit of the confidence interval are too small to be of any real
practical significance then we will call it narrow.

What does this situation tell us? It tells us that we can be reasonably confident that the true difference between treatments is likely to be very small.

So what should you conclude in your discussion? We should conclude that the study found no difference between the two groups and suggests that if there is a true difference between the groups it is likely to be small

For an example to this situation, we look to a study of
residency training in disaster medicine
led by Professor Sandy Dong.
In this study, residents were given a pre and post-test in disaster medicine rated on a 48 point
scale.
Mean difference between post and pre-test scores was 0.35/48.
The *p-*value for difference between pre and post-test scores was 0.77,
and the 95% confidence interval for the effect was -2.2 to 3.0.
This study suggests that we can be 95% confident that the true difference between pre and post-test
scores was less than 3 points of 48.
As the authors consider that a difference of 3/48 is unlikely to be of any practical significance,
it is reasonable to conclude that no significant difference between pre and post-test scores exists.
In this case repeating the study on a larger sample is unlikely to be of any benefit.

This is a common situation in small studies. In this case the confidence interval is wide and contains zero effect. What is wide? Again we look at the upper and lower limits of the confidence interval. If either of these values might be of practical significance, then the interval is wide.

In the case of a parallel groups trial, this means that the effect size may be zero, or it may in fact be important.

And the conclusion to write in your discussion? In this case you should conclude that the present study found no difference between the treatments, but, since the sample size was small, further studies with a larger sample size may be helpful to determine if a true effect exists.

As an example of this situation, we can look at a paper by Liva Christensen on the use of
video instruction for donning and doffing
of personal protective equipment.
In this case participants were randomized to two groups - receiving either in-person or video
teaching on personal protective equipment.
They were then evaluated on a 100 point scale.
There was no significant difference in the doffing score between the video group and
the instructor group (95% confidence interval for effect: -7.6 to 18.0; P-value: 0.54)
In this case the *p-*value is clearly large.
Although the study found no significant difference between the groups, the confidence interval
suggests that the true difference between groups may have been as high as 18/100.
As 18/100 may have practical significance, it would be correct in this case to conclude that
further studies with a larger sample size may help clarify the true difference.

I prefer to not mention the concept of power when discussing sample size in the discussion. In an upcoming blog post called "Can We Please Stop Talking About Power" I will explain further why I feel that the concept of power is useful for planning a study, but that the confidence intervals are far more effective in interpreting the study.

In this article you can see clearly that talking about sample size need not be difficult. In fact, the statistics of the study clearly indicate to the writer exactly what conclusions about sample size should be made in the discussion. Making rational and scientific conclusions about your sample size can super-charge your paper and increase your impact.

If you are looking for a graphical summary of how to make conclusions about sample size, why not download the infographic.

Sign up for our mailing list and you will get monthly email updates and special offers.