5 Health Data Science Projects Ideas
The best way to become a data scientist is to have an exciting project that you put on your resume and can explain in interviews. Many types of data science projects can be done in healthcare. As shown in the diagram below, several data science projects for public health or population health. I will give examples of 5 different projects that you can do with health data along with published studies using these methods.
Machine Learning Model to Identify Population At-Risk of Opioid Addiction
A supervised machine learning model can be used to predict the risk of opioid addiction.
Lo-Ciganic, Wei-Hsuan, et al. "Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions." JAMA network open 2.3 (2019): e190968-e190968.
Text Mining Twitter for Suicide Prevention
Public health experts have been studying how to prevent suicide with data science. Using data from Twitter, researchers have predicted suicidal thoughts with sentiment analysis.
E. R. Kumar and A. K. V. S. N. R. Rao, "Suicide Prediction in Twitter Data using Mining Techniques: A Survey," 2019 International Conference on Intelligent Sustainable Systems (ICISS), 2019, pp. 122-131, DOI: 10.1109/ISS1.2019.8907987.
Outbreak detection using clustering
With COVID-19 and other infectious diseases, there has been an increased use of machine learning to detect outbreaks. This systematic review summarizes how machine learning has been used during the COVID-19 pandemic. Anomaly detection and a Support vector machine, gradient boosting machine, and random forest have been used to detect outbreaks. Read the article to learn more.
Zeng, Daniel, Zhidong Cao, and Daniel B. Neill. "Artificial intelligence-enabled public health surveillance—from local detection to global epidemic monitoring and control." Artificial Intelligence in Medicine. Academic Press, 2021. 437-453.
Natural language processing to analyze common topics on maternal health.
Maternal health topics have been analyzed looking at text in online forums.
Wexler, Anna, et al. "Pregnancy and health in the age of the Internet: A content analysis of online “birth club” forums." PloS one 15.4 (2020): e0230947
Supervised Learning to classify cancer subtypes
There are many uses of data science in cancer and genetic epidemiology. Machine learning can be used to classify cancer subtypes.
Muhamed Ali, Ali, et al. "A machine learning approach for classifying kidney cancer subtypes using miRNA genome data." Applied Sciences 8.12 (2018): 2422.
I have datasets posted here where you can try out some of these ideas.
If you have a favorite health data science project, leave a comment below.