Main

The United Kingdom has been severely affected by the COVID-19 pandemic, recording one of the highest confirmed death rates in the world in 2020. The NHS COVID-19 app for England and Wales was launched on 24 September 2020 to help to reduce the spread of the virus. The app has been downloaded on 21 million unique devices, out of a population of 34.3 million eligible people with compatible smartphones, and is regularly used by at least 16.5 million people. The main function of the app is digital contact tracing1,2,3,4,5,6 using the privacy-preserving Google Apple Exposure Notification system, which is embedded in the Android and iOS operating systems8,9, supplemented with custom Bluetooth processing algorithms10. App users are notified and instructed to quarantine if they have been in contact with another user later confirmed to have COVID-19 if the exposure had characteristics that exceed a risk threshold. Digital tracing is a novel public health measure with unknown epidemiological impact7. Other functions of the NHS app include providing locally appropriate information on COVID-19 prevention, checking into venues using a custom QR-code scanner (allowing later notification if users have visited risky venues), and a symptom checker linked to the booking of tests. For tests booked through the app, the test result triggers a set of actions automatically, including notification of the tested individual through the app and digital contact tracing for positive results (upon the user’s approval).

When installing the app, users enter their postcode district (the first half of the postcode), which enables analysis of geographical variation in app use. We aggregated data at the level of lower tier local authorities (LTLAs), of which there are 338 in England and Wales, to match case data. App uptake—the fraction of active users in the population—was variable between LTLAs (Fig. 1a, c), with an interquartile range of 24.2–32.4%. We defined three phases for the analysis, annotated in Fig. 1d: phase 0 before app launch, phase 1 from 1 October to early November 2020 (first version of the app) and phase 2 from early November to 31 December 2020 (improved version of the app). These are described in greater detail in Extended Data Table 1. Phases in the app precede phases in the resulting cases: there is a lag between changes in transmission rates and changes in confirmed cases, which we assumed to be 8 days. Other factors besides the app changed during these phases, including locally targeted control measures, a national lockdown and a surge in cases in December, mostly driven by the new SARS-CoV-2 variant B.1.1.7.

Fig. 1: Geographical variability of app uptake and cases of COVID-19.
figure 1

a, c, Map (a) and histogram (c) of app uptake by LTLA. Colours in a indicate app uptake as shown in c. b, Cumulative cases of COVID-19 per 100,000 population over analysis phases 1 and 2. d, Seven-day rolling mean of daily cases of COVID-19 per 100,000 population. Each line represents an LTLA, coloured by app uptake as shown in c. Values for England and Wales are also shown. Black horizontal arrows indicate our analysis phases. In b, d, case numbers are for the whole LTLA population, not just app users.

Roughly 1.7 million notifications were sent as a result of 560,000 app users testing positive over the entire time period: a mean number of notifications per index case of 3.1. Seventy-two per cent of app-using index cases consented to digital tracing upon testing positive, resulting in a mean number of notifications per tracing event of 4.2. Numbers of notifications over time are shown in Extended Data Fig. 1b.

We estimated the SAR in individuals notified by the app; this is the probability that someone who is notified will report a positive test during the recommended quarantine or in the following two weeks. We estimated an SAR of 6.02%, with confidence interval 5.96–6.09%, although sensitivity analyses suggest a precision of roughly 5–7%. These results indicate that the app is functioning at a technical level, as also recently demonstrated for the Swiss and Spanish apps11,12.

To evaluate the epidemiological impact of the app, we first used a modelling approach. We estimated the number of cases averted with a model linking the number of notifications, the probability that notified individuals had COVID-19, the timing of notification relative to transmission, and the adherence to quarantine. Adherence to quarantine is critical but difficult to assess reliably. UK surveys found that only 11% of individuals in quarantine declared proper adherence to quarantine rules, but 65% of individuals intended to adhere to quarantine13, albeit imperfectly. Recent surveys found a high adherence to quarantine (greater than 80%)14, and this behaviour may be more representative of app users. We considered an intermediate scenario corresponding to 61% overall effectiveness of quarantine in preventing transmissions as our central estimate, leading to 284,000 cases averted. The estimated number of cases averted was higher in areas of high app uptake (Fig. 2). The slope of the regression in Fig. 2b indicates that the fraction of cases averted (among all cases observed or averted) increased by 0.8% for 1% increase in app uptake (Table 1).

Fig. 2: The link between app use and cases averted in each LTLA.
figure 2

a, b, Estimated number (a) and percentage (b) of cases averted in phases 1 and 2 combined versus number of app users. c, Unadjusted relationship between difference in app uptake and difference in number of cases per capita in phases 1 and 2 combined. In b, c, the blue line shows the least-squares fit of the y-axis variable to the x-axis variable, and the shaded grey area shows the associated 95% confidence interval.

Table 1 The estimated effect of the NHS COVID-19 app

We used a second approach to evaluate the epidemiological impact of the app, linking variation in app uptake between LTLAs with variation in cumulative cases. We addressed strong confounding factors with a stratified approach, only comparing LTLAs with similar socio-economic properties and geography. We used several different ways of grouping LTLAs into comparable units, with similar results; one method is described below (with full results in Extended Data Table 2) and the other methods are described in the Supplementary Information (their results are presented in Extended Data Fig. 2, Extended Data Tables 3, 4).

Increased app use is associated with more rural areas, less poverty and greater local gross domestic product (GDP) (Supplementary Table 4); we therefore adjusted for these measured confounding variables. Unmeasured confounders could include adherence to social distancing and face-mask use; since these factors affected transmission before app release, app uptake should have some correlation with case numbers even before app release (phase 0). To test this, we regressed phase 0 case numbers on several covariates, including later uptake of the app; app uptake was indeed associated with case numbers (pure confounding). To adjust for this confounding, we stratified LTLAs into quintiles on the basis of the number of cases in phase 0 and compared them only within these strata. This stratification removed the correlation between app uptake and pre-app cases, indicating that this at least partially adjusted for unmeasured confounders (Extended Data Table 2; details on confounding and placebo regression are in the Supplementary Information). Case numbers in an LTLA are also confounded by those in neighbouring LTLAs; we therefore compared only neighbouring (adjacent) LTLAs. We found that the difference in case numbers per capita between neighbouring LTLAs, matched by phase 0 case number quintile, was strongly and robustly associated with differences in app use, regardless of adjustment for other demographic confounders (Fig. 2, Table 1, Extended Data Table 2).

Disaggregating the effect by phase, we found that it was larger during phase 2 (Table 1). This is consistent with the increased number of notifications sent per index case implemented at the start of phase 2 (Extended Data Fig. 1b). Table 1 shows the estimated effect size replicated in different statistical analyses (described in the Supplementary Information).

We estimated the numbers of cases averted during phases 1 and 2 combined: 284,000 (108,000–450,000) using the modelling approach, and 594,000 (317,000–914,000) using the statistical approach. The ranges show a sensitivity analysis exploring 2.5–97.5% of the variability in modelling estimates, and a 95% confidence interval for the statistical one. These estimates are comparable to the number of app users who tested positive and consented for notifications to be sent: roughly 400,000. This suggests that on average, each confirmed COVID-19-positive individual who consented to notification of their contacts through the app prevented one new case; that is, the whole transmission chain following each such case was reduced by one individual. We translated these estimates to deaths averted during phases 1 and 2 using the case fatality rate observed for this period: 1.47% (Methods). This gave an estimate of 4,200 (1,600–6,600) deaths averted using the modelling approach, and 8,700 (4,700–13,500) using the statistical approach. For comparison, the total number of cases and deaths that actually occurred in this period were 1,892,000 and 32,500, respectively. Cases averted over this period are shown in Extended Data Fig. 3.

Finally, we extrapolated the findings to explore different ways in which the app could be improved, by re-running scenarios with different parameters (Table 2). These are retrospective projections; however, the expected reductions in cases are relevant when considering forward projections.

Table 2 Scenarios for improvements

Discussion

Our analyses suggest that a large number of cases of COVID-19 were averted by contact tracing through the NHS app, with estimates ranging from approximately 100,000 to 900,000, depending on the details of the analysis. For comparison, there were 1.9 million actual cases of COVID-19 over the same period. Averted cases were concentrated in phase 2, during November and December 2020, after a major upgrade to the app’s risk-scoring function10. This finding is similar to previous results from modelling: using our individual-based model15, a 30% app uptake was estimated to avert approximately 1 infection for every 4 infections that arose4 during 4.5 months.

Although it is informative to estimate effects on quantities such as the time-varying reproduction number16, we did not pursue such an analysis here. The dynamics of the epidemic for individual LTLAs are difficult to interpret: the period of analysis coincided with staggered introductions of locally targeted restrictions, a short national lockdown, the Christmas holiday season and the emergence of the B.1.1.7 SARS-CoV-2 variant, which is more infectious and spread rapidly across the country17,18,19,20. Future work could perhaps model all of these effects in a single hierarchical model, permitting joint estimation of the app’s effects over LTLAs with linked drivers and dynamics. Our simpler approaches have the benefit of transparency, and we hypothesize that under negative-feedback dynamics (greater local spread triggering greater local control measures), appropriately constructed comparisons of total case counts over an appropriate period may reveal the underlying propensity for disease spread.

The main limitation of our analysis is that it is an observational study: no randomized or systematic experiment resulted in different rates of app uptake in different locations. Interpreting observational analyses requires particular care owing to the risk of confounding. We therefore used two approaches: mechanistically modelling the app’s function, and a statistical approach. Our statistical approach was stratified to focus only on differences between directly comparable areas, emulating how a cluster randomized trial would have been conducted21. Our placebo analysis suggested that our adjustment for confounders largely removed their effect; however, it is still possible that changes in app use over time and across geographies reflect changes in other interventions, and that our analysis incorrectly attributes the effects to the app. Such residual confounding, if present, would mean that our statistical estimate for cases averted is too high and thus our modelling estimate is more accurate. Conversely, there could be a genuine, albeit indirect effect of the app, whereby users maintain a greater distance from others than they otherwise would have done, being aware that the app monitors distance and could later advise quarantine. This would mean that our modelling estimate (derived solely from the app’s direct effect, proportional to the SAR) is too low, and that our statistical estimate is more accurate. On balance, an effect size between the two estimates seems most likely. We discuss the expected effects of further biases in the Supplementary Information.

The app is best understood as part of a system of non-pharmaceutical interventions, and not in isolation7. It is not a substitute for social distancing or face masks: control of the epidemic requires all available interventions to work together. Isolation and quarantine can only be effective when supported financially. All contact tracing requires identification of cases, and is therefore a follow-up to effective, widespread and rapid testing. The specific role of digital tracing is to speed up tracing, and to reach more people per index case. An advantage of the NHS app compared with other digital tracing apps is its full integration with testing: tests ordered through the app trigger actions automatically, without requiring the user to enter their results in the app. Further improvement could potentially be achieved with increased use of location-specific QR-code scanning: notifications were issued for 226 venue events designated as risky as of 20 January 2021. Backwards contact tracing22 could help to identify risky venue events. The COVID-19 response policy concerning the hospitality sector—restaurants, pubs and so on—required visitors to ‘check in’ to facilitate outbreak analysis when needed. Scanning the venue QR code with the app provided a more convenient way of doing this than writing the contact details of the individual, and hence individuals visiting such venues may have been more likely than average to use the app, giving a greater epidemiological effect than expected.

Digital tracing is not a substitute for manual tracing—both are valuable. We compare the two approaches in Supplementary Table 1. In summary, the SAR of 6% that we estimated for the app is similar to the SAR of 6.9% for manual contact tracing of close contacts during December 2020 and January 202123. The mean number of contacts traced per consenting index case was 4.2 for digital tracing, compared with 1.8 for manual tracing, and a larger fraction of these traced contacts is expected to be outside of the household of the index case for the app. Contacts outside the household have a smaller probability of having already been notified informally by the index case, and so obtain greater benefit of having been traced. This increased coverage and the speed of notification by the app (Extended Data Fig. 4) suggest that the effect of digital tracing was mostly additional to that of manual tracing. We confirmed this with an analysis that included adjustment for quality of manual tracing, which did not affect our conclusions.

The surest ways to increase the effectiveness of the digital tracing programme are to increase uptake of the app and to provide material support to individuals undergoing isolation and quarantine. Special efforts may be needed to reach underserved communities. It is well established that testing should be as rapid as possible to help to prevent transmission. This could perhaps be facilitated by point-of-care antigen tests and integration of self-testing with the app, however this would need investigation to establish accuracy and usability. Widespread vaccination will eventually reduce the need for non-pharmaceutical interventions, but vaccination is unlikely to achieve global reach within the coming months, during which time improved non-pharmaceutical interventions could still prevent many infections24,25. Smartphone use is already global, and thus privacy-preserving contact-tracing apps should be further integrated into the public health toolkit.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Estimating app uptake

To monitor the safe function of the app and enable its evaluation, a limited amount of data are shared with a secure NHS server. Each active app sends a single data packet daily. The fields in these packets contain no sensitive or identifying information, and are approved and publicly listed by the Information Commissioner (https://www.gov.uk/government/publications/nhs-covid-19-app-privacy-information). The raw data fields we used are described in Supplementary Table 2; further variables derived from these are described in Supplementary Table 3. A schematic illustration of data gathering is shown in Extended Data Fig. 5. For the reported numbers of downloads, repeat downloads to the same phone are counted only once. The number of active users each day is defined as the number of data packets received by the NHS server; for a single representative value of this quantity, we took the mean over all days from 1 November to 11 December 2020 (earlier data was deemed less reliable). We note that there continue to be unexplained fluctuations in reported user numbers on Android phones. To estimate uptake within an LTLA, each postcode district was mapped to the LTLA in which the majority of its population reside, and we took the ratio (number of active users in postcode districts mapped to this LTLA)/(total population in postcode districts mapped to this LTLA). The population of England and Wales is 59.4 million, of whom 48.1 million are 16 or over and thus eligible to use the app (https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates). We assumed that England and Wales are representative of the UK, in which 82% of people aged 16 years and over have smartphones (OFCOM, personal communication), and that of smartphones in circulation, 87% support the Google Apple Exposure Notification system (Department of Health and Social Care, personal communication). The denominators for measuring uptake at the national level are therefore 59.4 million (total population) and 34.3 million (eligible population with compatible phones).

Defining numbers of cases

The COVID-19 case numbers per day we used here are those reported at https://coronavirus.data.gov.uk/, by specimen date and LTLA. We obtained per-capita case numbers at the LTLA level by dividing by LTLA populations reported by ONS. These per capita case numbers by phase are shown in Extended Data Fig. 6. Testing has been available through the NHS Test and Trace system in all areas throughout the period, with a median delay of less than 2 days from booking a test to receiving the result. Testing capacity has mostly exceeded demand, except for two weeks in early September. We assumed that case ascertainment has been relatively constant over the period of analysis, an assumption qualitatively supported by the unbiased ONS and REACT studies26,27.

Estimating the SAR

We focused on a period in December 2020 and January 2021 when the number of positive test results in app users could be disaggregated by whether the user had been recently notified or not. Even with this data, successive data packets sent by the same device are not linked to each other. This means that when a given number of notifications are sent on a particular day, the exact number of those individuals notified who later receive a positive test result is unknown, because of the lack of linkage over time. We therefore used a probabilistic model for how many positive test results we would expect among those recently notified, as a function of the number of notifications on previous days, of the estimated delay from notification to testing positive, and of the SAR. We estimated the SAR by maximising the likelihood of this model. In detail: let  fNP(t) be the probability that an individual notified on a given day then tests positive t days later (conditional on their testing positive at some later time, that is, the function is normalized to 1). Let N(t) be the number of individuals notified on day t, and IN(t) the number of individuals reporting a positive test on day t having been notified recently (either they are currently in the quarantine period recommended by the app, or the following 14 days). The number expected for the latter is  \({\rm{S}}{\rm{A}}{\rm{R}}\times \sum _{{t}^{{\prime} }\le t}{f}_{NP}(t-{t}^{{\prime} })N({t}^{{\prime} })\), and we maximised a Poisson likelihood for the number observed, IN(t) (shown in Extended Data Fig. 1d), given the number expected, treating observations from different days as independent. The confidence interval was obtained by likelihood profiling; however, sensitivity analyses suggested greater uncertainty (see Supplementary Information).  fNP(t) was calculated as a convolution of the distributions for times from exposure to symptoms, from symptoms to testing positive, and from exposure to notification (Supplementary Information). Our SAR calculation used only data from iPhones, excluding Android phones, for more stable daily numbers of analysis packets.

Modelling cases averted based on notifications and SAR

The effect of notifications received at time t on cases averted can be modelled as the product of (i) the number of notifications, (ii) the secondary attack rate, that is, a conservative underestimate of the probability that notified individuals are actually infected, (iii) the expected fraction of transmissions preventable by strict quarantine of an infectious individual after a notification, (iv) the actual adherence to quarantine, and (v) the expected size of the full transmission chain that would be originated by the contact if not notified. Before each notification, the contact’s app sends a request for permission to the central NHS server. We estimated the total number of notifications per day on each operating system (OS; being either Android or iOS) from these requests. We estimated the number of notifications per LTLA from the number of partial days of quarantine (typically corresponding to the first day of quarantine, that is, the day of notification) per day, OS and LTLA, rescaling it by a time- and OS-dependent factor to match the number of notifications per day and OS. The geographical variability in notifications after summing over time is shown in Supplementary Fig. 1. The delay between last exposure and notification is assumed to follow a normal distribution, with time-dependent parameters estimated via least squares from the daily number of notifications and individuals in quarantine. The fraction of preventable transmissions is estimated from the delay distribution using the generation time distribution in28 with mean 5.5 days. For the effectiveness of quarantine in reducing transmission from traced contacts, we assumed as our central value that 45.5% of traced contacts quarantine perfectly (100% reduction in transmission), 31% of traced contacts quarantine imperfectly with 50% reduction in transmission, and 23.5% of traced contacts do not quarantine at all (0% reduction in transmission). This is equivalent to an average effectiveness of quarantine of 61%. Finally, the size of the epidemic chain triggered by a single case is computed assuming that local epidemics do not mix and that the extra cases do not affect the epidemic dynamic. See Supplementary Information for further details.

Statistical analysis

The main statistical analysis compared statistics for each LTLA, labelled x, to those of the set comprising all of its ‘matched’ neighbours N(x) = {n1, n2, n3,…, and so on}. The matched neighbours N(x) were defined as other LTLAs that share a border with x and were in the same quintile for number of cases per capita in phase 0. Distributions showing the variability between LTLAs in the number of neighbours and number of matched neighbours are shown in Supplementary Fig. 2. Stratification into quintiles (as opposed to deciles and so on) was chosen to balance power and sufficient adjustment; no other possibility was tried, to guard against investigator bias.

Each statistic of interest was averaged over the matched neighbours, weighting by population size, to obtain the mean value in the matched neighbours of x. This was compared to the statistic for x. Linear regression was carried out using, for each statistic of interest, the difference between its value in x and in its matched neighbours N(x). The statistics we considered were: per capita number of cases in each phase; the fraction of the population using the app; a measure of rural/urban mix on a scale from 1 to 5, from the Office of National Statistics (ONS); a measure of local GDP per capita from the ONS, adjusted for rural/urban score; and a measure of the fraction of the population living in poverty before housing costs, from the ONS.

Our main regression was

log(cumulative cases per capita in x) – log(cumulative cases per capita in N(x)) =

beta_rural_urban × (rural/urban score of x − rural/urban score of N(x)) +

beta_gdp_band × (local GDP band of x − local GDP band of N(x)) +

beta_poverty × (per cent of the population living in poverty in x – per cent of the population living in poverty in N(x)) +

beta_users × (per cent of the population using the app in x − per cent of the population using the app in N(x)) +

epsilon_residual

where the different data points for the regression (the different values of x) were the set of LTLAs with at least one matched neighbour, excluding LTLAs with no matched neighbours. Cumulative cases were considered in each of the three phases separately or with phases 1 and 2, as reported in our results. The values of the beta coefficients we estimated are shown in Extended Data Table 2. We used a logarithmic transform for the response variable in our regression, because cases are generated by an exponential process (transmission) and so the rate at which the number of cases varies with the dose of a treatment (that is, the extent of an intervention) is highly confounded with the absolute number of cases. A regression with quadratic effect of uptake and intercept at 0 produced very similar findings to the above regression with linear effect of uptake (not shown). We considered additional uncertainty in the regression due to redundancy in the differences approach, for example, in comparing both LTLA x with LTLA n and LTLA n with LTLA x, described in the bootstrapping section of Supplementary Information.

Predictions for cases averted were found using the regression coefficient beta_uptake to linearly extrapolate log(cumulative cases per capita) for each LTLA to that expected for an uptake of 15% (or keeping cases counts as they were, if uptake was already less than 15%). Here we assumed that there is negligible benefit of app uptake below 15% (though this is not expected to be the case in settings where usage is clustered into high-uptake communities29). The definition of beta_users in the regression equation above means it is the expected increase in log(cumulative cases per capita) associated with a one-percentage-point increase in app uptake, when keeping constant GDP, rural/urban mix, and level of poverty. Our central estimate of beta_users in this analysis was −0.023 for phase 1 and 2 combined; this means an increase in uptake of p percentage points is expected to be associated with an increase by the factor e−0.023p in the cumulative number of cases per capita in phases 1 and 2. An increase of p = 1 percentage points in uptake means a decrease of 2.3% in cases as we reported above. We estimated the number of deaths averted by multiplying the number of cases averted by the crude case fatality rate.

Alternative regressions are described in Supplementary Information; their results are in Extended Data Tables 3 and 4, and Extended Data Fig. 2.

Case fatality rate

The case fatality rate was estimated as the ratio of total deaths (27,922) to cases (1,891,777) for phases 1 and 2 combined. To test for heterogeneity, it was also estimated as the regression of local deaths to cases, but no substantial heterogeneity was observed (not shown). It is a lower-bound due to right censoring of the time series of deaths.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.