Prediction of short-term atrial fibrillation risk using primary care electronic health records

Ramesh Nadarajah, Jianhua Wu, David Hogg, Keerthenan Raveendra, Yoko M. Nakao, Kazuhiro Nakao, Ronen Arbel, Moti Haim, Doron Zahger, John Parry, Chris Bates, Campbel Cowan, Chris P. Gale

    Research output: Contribution to journalArticlepeer-review

    1 Scopus citations


    Objective: Atrial fibrillation (AF) screening by age achieves a low yield and misses younger individuals. We aimed to develop an algorithm in nationwide routinely collected primary care data to predict the risk of incident AF within 6 months (Future Innovations in Novel Detection of Atrial Fibrillation (FIND-AF)). Methods: We used primary care electronic health record data from individuals aged ≥30 years without known AF in the UK Clinical Practice Research Datalink-GOLD dataset between 2 January 1998 and 30 November 2018, randomly divided into training (80%) and testing (20%) datasets. We trained a random forest classifier using age, sex, ethnicity and comorbidities. Prediction performance was evaluated in the testing dataset with internal bootstrap validation with 200 samples, and compared against the CHA2DS2-VASc (Congestive heart failure, Hypertension, Age >75 (2 points), Stroke/transient ischaemic attack/thromboembolism (2 points), Vascular disease, Age 65-74, Sex category) and C2HEST (Coronary artery disease/Chronic obstructive pulmonary disease (1 point each), Hypertension, Elderly (age ≥75, 2 points), Systolic heart failure, Thyroid disease (hyperthyroidism)) scores. Cox proportional hazard models with competing risk of death were fit for incident longer-term AF between higher and lower FIND-AF-predicted risk. Results: Of 2 081 139 individuals in the cohort, 7386 developed AF within 6 months. FIND-AF could be applied to all records. In the testing dataset (n=416 228), discrimination performance was strongest for FIND-AF (area under the receiver operating characteristic curve 0.824, 95% CI 0.814 to 0.834) compared with CHA2DS2-VASc (0.784, 0.773 to 0.794) and C2HEST (0.757, 0.744 to 0.770), and robust by sex and ethnic group. The higher predicted risk cohort, compared with lower predicted risk, had a 20-fold higher 6-month incidence rate for AF and higher long-term hazard for AF (HR 8.75, 95% CI 8.44 to 9.06). Conclusions: FIND-AF, a machine learning algorithm applicable at scale in routinely collected primary care data, identifies people at higher risk of short-term AF.

    Original languageEnglish
    Article numberheartjnl-2022-322076
    StateAccepted/In press - 1 Jan 2023


    • atrial fibrillation
    • biostatistics
    • electronic health records

    ASJC Scopus subject areas

    • Cardiology and Cardiovascular Medicine


    Dive into the research topics of 'Prediction of short-term atrial fibrillation risk using primary care electronic health records'. Together they form a unique fingerprint.

    Cite this