Improving Malaria Prediction Models
Using satellite data, including vegetation levels, nighttime lights, rainfall and temperature, connected to malaria levels in South Asia to improve malaria prediction models.
In this project, a collaboration with Lahore University of Mangement Sciences in Pakistan, we developed a deep learning model to improve the prediction of malaria outbreaks in South Asia. The model combines vegetation levels, nighttime light pollution, rainfall and temperature data for specific locations and times to create a highly specific prediction model. Our study used data spanning 2000-2016, and then tested the prediction model on 2017 outbreaks.
The abstract for this project has just been published in the Lancet Planetary Health journal - read it here
GIF shows malaria levels (red circles) correlated with temperature on the left (Dark Blue to Light Blue), and with rainfall on the right (Yellow to Dark Purple)
Currently, approximately 50% of the global population is at risk of malaria infection, particularly in Africa and South Asia. Outbreaks are also connected to extreme weather events which are becoming more frequent as climate change continues. Tracking and predicting outbreaks is difficult as data is often extrapolated from small-scale household surveys, and influenced by a broad range of factors.
To combat this, we used a multi-dimensional long short-term memory model (LSTM) which combines data from satellites or weather stations, and historic malaria outbreaks, to build a more accurate prediction model. These factors are all seen to influence malaria outbreaks - like the levels of vegetation, rainfall and temperature - or tell us something about the local populations. The nighttime light pollution recorded on satellites was used in this study as a way to represent the socioeconomic status of regions, as lower nighttime light levels is associated with higher levels of poverty.
GIF shows malaria levels (red circles) correlated with rainfall on the left (Dark Purple to Yellow), and with vegetation levels on the right (Light to Dark Green)
Our satellite data came from DMSP OLS (2000-2013) and VIIRS (2014-2017), while vegetation, temperature and rainfall data were all derived from the ARENA project, which combined Demographic Health Data and geo-referenced environmental data.
When we tested our model, we found it performed well in comparison with the true malaria incidence for the districts of Pakistan, India and Bangladesh as shown in this figure - where purple districts show high accuracy, pink districts show overestimation of risk, and blue districts show underestimation. When our model was tested against the existing deep learning models (particularly looking at the model of Shi et al, 2015), our model produced a reduction in the error rates for prediction.
Map of Pakistan, India and Bangladesh showing the accuracy of our prediction model
Further details of this study, including a talk by one of the team, can be found here on the NeurIPS conference website, and we'll share other materials as they are published! This permalink will also be updated as the work is published.