Governance Organization

TOH – The Ottawa Hospital Academic Medical Organization

Project Title

TOH-21-011 – Getting ahead of the curve: Predictive COVID-19 case identification using an iterative propensity score modelling and AI approach

Project Highlights

To date, the researchers have identified the data and defined variables in publicly available datasets that will contribute to creating the best predictive COVID-19 case identification model using acceptability ranking. Preliminary propensity score regression analysis has been conducted to create a predictive model using publicly available data. However, we could not achieve the level of granularity and predictive model accuracy required using publicly available data alone. The researchers have accessed individual-level population health data for Ontario that is housed in the ICES. The researchers have also analyzed the heath administrative data alone to find the best predictive variables and models. An amendment to the IC/ES submission forms was submitted to integrate our identified and aggregated publicly available data with the individual-level population health data. However, this integration of health-administrative and publicly available data is still pending. We will move on to simulation and optimization of the model once analysis of the combined datasets is completed. Key Findings 1. Predictive variables: For models trained without symptom data, variables that scored high in terms of predictive value include age, gender (M or F), and Number of Outpatient Physician Visits in the two years prior to the COVID test date. On the other hand, for models trained with symptoms data (i.e., information about whether or not a patient was symptomatic at the time of COVID testing), variables with high predictive value include the presence or absence of symptoms, age, cough, fever, and having received a first dose of vaccine at least 14 days prior to COVID-19 test (protective effect). 2. Preferred modeling approach: Of all the approaches tested (including classical logistic regression and AI approaches), the one that performed best was the AI Gradient Boosted Tree (GBT) approach. This approach will be used for future analyses, as it enables substantially better prediction of which patients can be expected to test positive for COVID based on a given set of characteristics. The availability of symptom data (i.e. whether or not a patient is experiencing symptoms) substantially increases the GBT model’s predictive ability. Methods (To Date) Modeling – Iterative estimation-validation approach: 1. Perform Exploratory Data Analysis (EDA) on the HAD as soon as it is made available to better understand and visualize the correlations between the various features/variables. – Completed 2. Estimate, based on data at day t-x, the probability of being COVID-positive at day t; repeat daily, until model converges on what are the relevant/significant predictors at t-x days of getting COVID at day t (1). Perform sensitivity analyses, possibly testing all time windows from t-5 to t-21 days. – Completed 3. First, estimate model using health administrative data only, as this data ‘universe’ contains the individual-level COVID test results needed to validate the model. – Completed 4. Once the best-fitting model has been identified, tag its output as our baseline results, then link publicly available data about social mobility social and mobility data (SMD), both aggregate and individual-level, using, if available, name, date of birth, and location of residence (six-digit postal code and address), with individual-HAD data. Rerun the model using this additional source of data, and see if the predictive ability improves. – Currently awaiting linkage of publicly available date with health administrative data. 5. Select the model (set of variables) producing the best outcome prediction. – Pending *Note: The next steps, simulation and optimization of the model, are pending on the data integration at ICES and analysis of the integrated data sets. The simulations framework is currently under development with the data available. References 1. Yanes-Lane M, Winters N, Fregonese F, Bastos M, Perlman-Arrow S, Campbell JR, Menzies D. Proportion of asymptomatic infection among COVID-19 positive persons and their transmission potential: A systematic review and meta-analysis. PLoS One. 2020;15(11):e0241536.


Risks and Complications

Primary Project Lead for Contact

Dr. Lise Bjerre


©2021 IFPOC - The Innovation Fund Provincial Oversight Committee - created by Techna

Log in with your credentials

Forgot your details?