A Q&A With Dr. Mona Issabakhsh, a Research Instructor at Georgetown University
Learn about how she applies machine learning to her tobacco control research
If you are Health Data Scientist Manager, Researcher, or Director, let me know if you are interested in sharing the story of your career journey. You can email me at ahobby@healthdatasciencenewsletter.com
Summary
Andrea Hobby interviewed Mona Issabakhsh, Research Instructor at Georgetown University, who uses machine learning for her tobacco control research.
Please provide an overview of the work that you're doing.
Yes. Here is a little bit about my background. So my background is only in industrial engineering. When I finished my Ph.D. in 2021, I was interested in using engineering for public health applications and problems. I looked for a faculty role in public health or medicine. So, I started working at Georgetown with a team focused on tobacco regulatory science. I'm a modeler at Georgetown and currently analyze tobacco use data. And more specifically, I research smoking cessation.
Can you talk in more detail about the machine-learning models you're building?
During the first year, I started my position at Georgetown, I got to work on a pilot project. We wanted to see what factors contribute to smoking cessation among U.S. adults. There is a large data set called Population Assessment of Tobacco and Health. It contains information on tobacco use by both adults and U.S. youth. We developed a simple machine-learning model. This project had two objectives. We wanted to find the most important variables of smoking cessation and develop a predictive model for smoking cessation. For that, we developed binary classifiers. We tracked tobacco cessation for current smokers for one year and applied and developed a binary classifier to predict smoking cessation.
What are the advantages of using a machine learning model in this context, as opposed to biostatistics or more traditional statistics?
Applying machine learning, first of all, you can include as many variables as you want in a model. Our original dataset included +1700 variables, and we wanted to find the most important variables for predicting smoking cessation, considering as many variables as possible (to reduce selection bias). This was the most significant objective of our analysis, which was made possible by using machine learning. Another advantage of machine learning is that it can detect complex and nonlinear relationships versus other simpler models.
What are some of the downsides you have had with the research?
So definitely, yes. One of the most significant disadvantages that most people in my field find is that they feel that a machine-learning algorithm is like a black box. Many can understand the simple regression model. While selling more advanced machine learning models is not easy. Also, it can be challenging to validate the results. Additionally, you cannot develop a machine-learning model with a limited data set. Another problem I faced was that the smoking cessation rate was very low (only 7%), so developing a machine-learning predictive model was tough.
How do you ensure the quality of the data you're working with?
I spent most of my time on data cleaning because we were using survey data. We had to make sure that the input variables of the model are relevant, and not correlated. We also had to impute the missing data. by imputation.
What is the future direction of the field?
The work of modelers is very important in tobacco research. It is becoming increasingly important for the FDA to develop new policies. They look at some modelers' work in this field. Also, whenever a new tobacco product comes to the market, it's very important to predict what will happen to the market. How will the tobacco users react to that new product? Every day those new products are introduced to the market. We, as modelers, need to think about how to model the intake of those new product users. People in this research field still don't trust machine learning. At the same time, they find it very interesting. So we will see more machine-learning papers applying tobacco use data. However, at the same time, we need to develop understandable and explainable models for everybody. For instance, in disseminating results, we need to create a way that everyone can understand every part of the model to accept it better.