Student Engagement Activities

The GEO Health Community of Practice supports ongoing GEO youth engagement activities for integrating Earth observations in public health applications. The voices and actions of global youth can offer innovative perspectives that propel culturally appropriate, nature-based solutions to combat complex emerging health challenges. The One Health concept can guide youth engagement and encourage cross-cutting collaborations to fill knowledge gaps and drive action to the 2030 Agenda for Sustainable Development.

Over the past four years, CoP teleconferences have showcased the robust data science projects of the Rensselaer Polytechnic Institute (RPI), as part of the NASA-ORNL-RPI Student Engagement. Under this engagement framework, RPI faculty (Thilanka Munasinghe) and NASA and ORNL data scientists (Assaf Anyamba, Heidi Tubbs, Bhaskar Bishnoi) offered guidance and provided datasets, problems sets, and guidance to RPI data science students in their semester projects. Through these experiences, students had opportunities to learn about NASA data products and applications and apply relevant data analysis techniques to address real-world challenges arising from vector-borne diseases and other environmental health topics.

The CoP team would like to highlight key projects that have contributed to these cross-cutting collaborations that support youth and STEM engagement.

Scraping Unstructured Data to Explore the Relationship between Rainfall Anomalies and Vector-Borne Disease Outbreaks, as part of the 2021 IEEE International Conference on Big Data (Ethan Joseph, Thilanka Munasinghe, Heidi Tubbs, Bhaskar Bishnoi, Assaf Anyamba)

  • Vector-borne diseases, which account for over 17% of all infectious diseases, contribute to significant global morbidity and mortality rates. The ability to track and predict vector-borne disease transmission requires data extraction from weather and climate data, including unstructured disease reporting sources. This project, which aimed to understand the effect of global rainfall patterns on vector-borne disease transmission, developed a data extraction pipeline for outbreak reporting sources (ProMED-mail) utilizing transformer neural networks and incorporated global rainfall anomalies from NASA Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement (IMERG) dataset. Project findings highlighted that using ProMED-mail and IMERG data, vector-borne disease outbreaks are clustered toward the tropics, and outbreaks are frequently amplified during rainy seasons.

Predicting Crimean-Congo Hemorrhagic Fever Outbreaks via Multivariate Time-Series Classification of Climate Data, as part of the Proceedings of the 6th International Conference on Medical and Health Informatics (Jonathan Harris, Thilanka Munasinghe, Heidi Tubbs, Assaf Anyamba)

  • Crimean-Congo hemorrhagic fever (CCHF), a tick-borne disease with a 3-50% fatality rate, is a high-priority disease among leading health organizations. Understanding the effects of climate patterns and their influence on CCHF transmission can help high-risk countries better prepare for outbreaks. Focusing on the climate variables of temperature and precipitation, this project aimed to propose an approach that utilizes multivariate time-series classification to detect temporal climatic patterns in Pakistan using precipitation data from one of NASA's Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement (IMERG) datasets, near surface air temperature data from a Global Land Data Assimilation System dataset, and CCHF outbreak alert data from ProMED-mail. Project findings reported predictions of CCHF outbreaks within Pakistan with a 92% test accuracy.

2021 Monthly Rice Production in Chinese Coastal Provinces, as part of the 2022 3rd International Conference on Big Data Analytics and Practices (Ajeet Parmar, Thilanka Munasinghe, Heidi Tubbs, Assaf Anyamba)

  • This paper focuses on predicting monthly rice yields provided information on rainfall and NDVI values for Chinese coastal provinces. The need for this work is due to a significant proportion of the global population depending on rice as a staple food and as a measure to ensure that rice production yields continue to meet the global food supply and demand, which is linked to economic growth and global food security, in the face of effects of climate change. This project, which focused on precipitation and Normalized Difference Vegetation Index (NDVI) as the predictor variables, aims to predict monthly rice production from April to October across six Chinese provinces (Liaoning, Jilin, Heilongjiang, Shanghai, Jiangsu, Zhejiang). Project findings concluded that specific predictor variables (like NDVI) and anomalies due to a myriad of factors are key to understanding this real-world rice production yield.

Landslide Likelihood Prediction using Machine Learning Algorithms, as part of the 2022 IEEE International Conference on Big Data (Vasundhara Acharya, Anindita Ghosh, Inwon Kang, Thilanka Munasinghe, Binita Kc)

  • Electricity sources from power plants, which maintain the operation of infrastructure systems, can be disrupted during natural hazards. Using integrated databases of explanatory variables (topographic, soil moisture, precipitation) and machine learning algorithms, this paper aimed to predict landslides likelihood in selected critical infrastructure locations in the northeastern United States. Project findings concluded that models with Random Forest algorithm can enhance the precision of the landslide likelihood predictions.

From Satellites to Fields: Machine Learning Applications for Prediction of Corn Production Using NDVI, Precipitation and Land Surface Temperature for Large Producer Countries, as part of the 2023 IEEE International Conference on Big Data (Saniya Nangia, Thilanka Munasinghe, Heidi Tubbs, Assaf Anyamba)

  • This project aims to determine the predictors of corn production in Iowa (United States), Heilongjiang (China), Mato Grosso (Brazil), Cordoba (Argentina), and Poltava Oblast (Ukraine). Focusing on precipitation and land surface temperature values from the Earth Engine tool and MODIS NDVI values from the NASA GIMMS Global Agricultural Monitoring System, various machine learning models are compared using their adjusted R² values. The findings reveal that corn production values in the five regions can best be predicted using lasso regression models, with adjusted R² values of 0.841, 0.933, 0.847, 0.854, and 0.860 respectively.

Integrating Climate Variable Data in Machine Learning Models for Predictive Analytics of Tomato Yields in California, as part of the 2023 IEEE International Conference on Big Data (Tianze Zhu, Junyi Wu, Tingyi Tan, Shuheng Wang, Thilanka Munasinghe, Heidi Tubbs, Assaf Anyamba)

  • The project aims to improve the accuracy of predicting tomato yield in California by integrating various climate variables into machine learning models. These variables include daytime and nighttime temperatures, precipitation, vegetation index, and evapotranspiration. By developing four distinct machine learning-based predictors, the project demonstrates that neural networks and linear regression models can achieve an average accuracy rate of 70% to 80% in predicting tomato yields. The research emphasizes the importance of incorporating a wide range of climate factors to create robust and versatile predictive models.