**1. Introduction**

This document offers guidance on target sample sizes to organisations conducting public surveys as part of the Impact & Insight Toolkit.

The target sample sizes included are specific to the mandatory dimensions in Toolkit surveys. For that reason, the recommended sample sizes may be different to other sample size guidance which relates to different survey questions.

**2. Why is sample size important?**

Normally it is too expensive or impractical to collect data from every audience member. Instead, you survey a sample of audience members. A sample is a representative portion of a population you are interested in – in this case, the total audience. You use data from your sample to estimate attributes of the audience as a whole.

A general sampling principle is to survey as many people as possible with the resources available to you. Why is this? If you sample a small group of people, it is possible that, by coincidence, only people who had very positive or very negative feelings about the event are surveyed. Data from this sample would give an inaccurate and unfair representation of the event. The more people who are surveyed, the smaller the chance that this will happen, and data from your sample will give a more accurate representation of how those attended felt.

So the larger the sample size, the better – but there comes a point where increasing the sample size doesn’t do a great deal more to improve the robustness of the results. The art of sampling is to aim for a sample size that gives you a level of confidence in the results that you are happy with – without spending too much time or money.

**3. Choosing an appropriate sample size**

To choose an appropriate sample size, you need to consider the total audience size, the types of things you want to estimate and what ‘margin of error’ you are comfortable with around your estimates.

For example, suppose your total audience size is 500 and you want to estimate the mean average score awarded by audience members for the dimension ‘Captivation’. You can decide how many audience members to sample by deciding what margin of error to aim for. The margin of error is a statistical measure of how confident you are in your estimate of the mean Captivation score. It shows how close you think the average score of your sample is to the ‘true’ average score of all 500 people who experienced the work.

If your sample produces an estimated mean Captivation score of 70 with a margin of error of 10%, then it is likely that if you surveyed all 500 audience members you would get a ‘true’ mean Captivation score of between 63 and 77.

If your sample produces an estimated mean Captivation score of 70 with a margin of error of 5%, then it is likely that if you surveyed all 500 audience members you would get a ‘true’ mean Captivation score of between 66.5 and 73.5.

The table below shows the minimum sample sizes required for Impact & Insight Toolkit evaluations to achieve different margins of error for different total audience sizes . These numbers are indicative, and the actual error calculated from real events is expected to be different from the figures provided here.

We recommend aiming for a 5% margin of error for most evaluations. However, if this is not feasible with the resources available to you then an 8% or 10% margin is still sufficient.

**4. Sample size for in-depth analysis**

The table above gives minimum sample sizes required to estimate the mean dimension scores of the total audience with different margins of error. However, you may want to carry out more detailed demographic analysis of dimension scores. For example, you may want to explore whether one’s gender affects the way the work is experienced.

In this case, you would calculate the mean average score awarded for Captivation by:

• All those who identify as female in your sample

• All those who identify as male in your sample

• All those who identify in another way in your sample

You would then test to see if the differences between the mean averages were significant.

You would want to achieve an appropriate margin of error around all your estimated mean Captivation scores. You would need sufficient and representative numbers of those of different genders in your sample – which would mean a much larger sample size overall.

If you are evaluating an event where you would like to analyse dimension scores by gender, age or any other demographic variables, please contact support@countingwhatcounts.co.uk to discuss what sample size to aim for.

**5. Evaluating small events**

If you are evaluating a small event with a total audience size of around 50, say, then it may be difficult to achieve the minimum sample size of 23 recommended in the table above. Achieving a sample size of 23 from a total audience of 50 would mean achieving a survey response rate of 46%, which may not be realistic.

If you are evaluating a small event and cannot achieve the sample size recommended here, then you can still derive meaning from your evaluation results. For example, suppose you receive 10 completed surveys from a total audience of 50. You may not be able to estimate the average dimension scores of your total audience with much statistical confidence, but you can describe the experiences and characteristics of those 10 people, who make up a fifth of your audience – particularly if your survey included some open text questions to allow respondents to explain their thoughts and feelings about the event.

Please contact support@countingwhatcounts.co.uk if you would like to help with interpreting data collected through the evaluation of small events.

**Technical Appendix – Method to calculate margin of error**

The Sample Size Guidance shows how many people to sample from a given audience size to achieve a given margin of error around estimates of mean average dimension scores. To produce this guidance, it was necessary to calculate the margin of error that would be achieved by taking different sizes of sample from a given total audience size.

The margin of error associated with different sample sizes was calculated using an aggregate dataset containing all the scores awarded to all the dimensions by all the public survey respondents in the Toolkit project to date. This dataset contains many thousands of dimension scores.

A set of example audience sizes was used. The example audience sizes were 50, 100, 250, 500 and 1000. For a given audience size, and for a given dimension, a ‘dummy-audience’ was created from the aggregate dataset. Samples of different sizes were taken from that dummy audience and the margin of error calculated each time. For each audience size the margin of error was calculated for every sample size possible for that audience. So for an audience size of 50, samples were taken of 1, 2, … , 49 people.

This process was replicated many times to find the ‘typical’ margin of error achieved by taking a given sample size from a given audience size for a given dimension. This type of procedure, where data is resampled over and over again, is known as a ‘bootstrap method’.

The exact process to calculate the margin of error for a given audience size/sample size/dimension combination is:

1. Take a dummy-audience from the set of all responses for that dimension using random sampling **with replacement**. Find the mean of this dummy audience; this is the dummy population mean.

2. Take a sample from this dummy-audience using random sampling without **replacement** and find the mean of the sample.

3. Find the difference between the sample mean and the dummy population mean.

4. Repeat steps 2) and 3) a large number of times, recording the difference between the sample mean and the dummy population mean each time.

5. Convert the differences recorded above into margins of error by dividing each difference by the population mean and taking the absolute value.

6. Remove the 5% largest margins of error from list, reflecting a 95% confidence, and take the largest remaining margin of error.

7. Repeat steps 1) through 6) a number of times (>50), generating a new dummy audience each time, and take the mean margin of error from all repeats.

The output of this process is a table of margins of error for each sample size for the selected audience size.

Public surveys carried out as part of the Toolkit project normally contain all six of the dimension questions shown in Table 1. The recommended sample size required to achieve a given margin of error was taken as the sample size at which that margin of error was achieved for all six dimensions.

For example, Table 2 shows how, for an audience size of 50, the sample size required to achieve a margin of error of 5% is 40. This is because the Challenge dimension only achieves a margin of error of 5% at that threshold.

You can read more on the subject of sample size in our blog here.