Air pollution, especially fine particulate matter (PM2.5), poses serious health and environmental threats globally. Traditional models for air quality prediction often fall short in capturing the complex, dynamic nature of pollution due to their linear assumptions and lack of contextual information such as meteorological or human-activity patterns. This research work proposes an ensemble machine learning framework that integrates both environmental pollutant data and temporal features derived from timestamps to enhance the accuracy, robustness, and interpretability of PM2.5 prediction. The dataset used includes hourly air quality measurements from Delhi, consisting of pollutant concentrations (CO, NO, NO₂, O₃, SO₂, PM₁₀, NH₃) and timestamps. Data preprocessing involves parsing timestamps into structured datetime features (hour, day, month, weekday), handling missing values using mean-based imputation, and applying standard normalization. Two ensemble learning models—Random Forest Regressor (RF) and Gradient Boosting Regressor (GBR)—are trained to predict PM2.5 concentrations. Model performance is evaluated using MAE, MSE, RMSE, and R². Additionally, feature importance analysis from the Gradient Boosting model is conducted to enhance explainability. The Random Forest model achieved superior predictive performance, with an MAE of 9.386, RMSE of 15.265, and an R² score of 0.995, outperforming Gradient Boosting which yielded an MAE of 11.794, RMSE of 17.585, and R² of 0.994. These results significantly improve upon earlier baseline performances (e.g., MAE = 22.4), highlighting the impact of integrating temporal features and comprehensive preprocessing. Feature importance analysis further identified key contributors to PM2.5 concentration, improving interpretability. The proposed framework demonstrates that combining environmental and temporal features with ensemble models significantly enhances air quality prediction. The Random Forest model, in particular, proves effective in modeling complex, non-linear pollution behavior. The addition of feature explainability supports data-driven decision-making for environmental policy and real-time public health interventions.
Keywords
Machine Learning, Air Quality Prediction, PM2.5 Forecasting, Ensemble Learning, Random Forest, Gradient Boosting, Temporal Features, Feature Importance.
Y. Fang, D. L. Mauzerall, J. Liu, A. M. Fiore, and L. W. Horowitz, “Impacts of 21st century climate change on global air pollution-related premature mortality,” Climatic Change, vol. 121, no. 2, pp. 239–253, Aug. 2013, doi: 10.1007/s10584-013-0847-8.
Ł. Zaręba et al., “The Relationship between Fine Particle Matter (PM2.5) Exposure and Upper Respiratory Tract Diseases,” Journal of Personalized Medicine, vol. 14, no. 1, p. 98, Jan. 2024, doi: 10.3390/jpm14010098.
T. Li et al., “Fine particulate matter (PM2.5): The culprit for chronic lung diseases in China,” Chronic Diseases and Translational Medicine, vol. 4, no. 3, pp. 176–186, Sep. 2018, doi: 10.1016/j.cdtm.2018.07.002.
X. Zhang et al., “Linking urbanization and air quality together: A review and a perspective on the future sustainable urban development,” Journal of Cleaner Production, vol. 346, p. 130988, Apr. 2022, doi: 10.1016/j.jclepro.2022.130988.
St. Naydenova, A. Veli, Z. Mustafa, S. Hudai, E. Hristova, and L. Gonsalvesh-Musakova, “Atmospheric levels, distribution, sources, correlation with meteorological parameters and other pollutants and health risk of PAHs bound in PM2.5 and PM10 in Burgas, Bulgaria – a case study,” Journal of Environmental Science and Health, Part A, vol. 57, no. 4, pp. 306–317, Mar. 2022, doi: 10.1080/10934529.2022.2060669.
P. C. Kandpal, “Air Pollution in Delhi: Causes and Consequences,” Combating Air Pollution, pp. 61–75, 2024, doi: 10.1007/978-3-031-68027-4_3.
J. Praveenchandar et al., “IoT-Based Harmful Toxic Gases Monitoring and Fault Detection on the Sensor Dataset Using Deep Learning Techniques,” Scientific Programming, vol. 2022, pp. 1–11, Aug. 2022, doi: 10.1155/2022/7516328.
Masih, A. (2019). Machine learning algorithms in air quality modeling. Global Journal of Environmental Science & Management (GJESM), 5(4).
S. C. Izah, L. Sylva, M. C. Ogwu, A. Shahsavani, S. Bazzazpour, and M. Rahmatinia, “Modeling and Statistical Approaches for Air Pollution Analysis,” Air Pollutants in the Context of One Health, pp. 425–456, 2024, doi: 10.1007/698_2024_1138.
I. Essamlali, H. Nhaila, and M. El Khaili, “Supervised Machine Learning Approaches for Predicting Key Pollutants and for the Sustainable Enhancement of Urban Air Quality: A Systematic Review,” Sustainability, vol. 16, no. 3, p. 976, Jan. 2024, doi: 10.3390/su16030976.
S. Ketu, “Spatial Air Quality Index and Air Pollutant Concentration prediction using Linear Regression based Recursive Feature Elimination with Random Forest Regression (RFERF): a case study in India,” Natural Hazards, vol. 114, no. 2, pp. 2109–2138, Jul. 2022, doi: 10.1007/s11069-022-05463-z.
G. Ravindiran, G. Hayder, K. Kanagarathinam, A. Alagumalai, and C. Sonne, “Air quality prediction by machine learning models: A predictive study on the indian coastal city of Visakhapatnam,” Chemosphere, vol. 338, p. 139518, Oct. 2023, doi: 10.1016/j.chemosphere.2023.139518.
Dr. B. Devender, Dr. C. Srinivas, K. Shivaprasad, Dr. K. P. Kumar, Dr. S. Nagavarapu, and Dr. N. Vimala, “A Machine Learning-Driven Framework For Real-Time Environmental Pollution Monitoring And Prediction Using Iot And Remote Sensing Data,” International Journal of Environmental Sciences, vol. 11, no. 10s, pp. 714–723, Jun. 2025, doi: 10.64252/n7ears56.
D. Seng, Q. Zhang, X. Zhang, G. Chen, and X. Chen, “Spatiotemporal prediction of air quality based on LSTM neural network,” Alexandria Engineering Journal, vol. 60, no. 2, pp. 2021–2032, Apr. 2021, doi: 10.1016/j.aej.2020.12.009.
A. V, G. P, V. R, and S. K P, “DeepAirNet: Applying Recurrent Networks for Air Quality Prediction,” Procedia Computer Science, vol. 132, pp. 1394–1403, 2018, doi: 10.1016/j.procs.2018.05.068.
Doreswamy, H. K S, Y. KM, and I. Gad, “Forecasting Air Pollution Particulate Matter (PM2.5) Using Machine Learning Regression Models,” Procedia Computer Science, vol. 171, pp. 2057–2066, 2020, doi: 10.1016/j.procs.2020.04.221.
Y. Zhang et al., “A Predictive Data Feature Exploration-Based Air Quality Prediction Approach,” IEEE Access, vol. 7, pp. 30732–30743, 2019, doi: 10.1109/access.2019.2897754.
P.-W. Soh, J.-W. Chang, and J.-W. Huang, “Adaptive Deep Learning-Based Air Quality Prediction Model Using the Most Relevant Spatial-Temporal Relations,” IEEE Access, vol. 6, pp. 38186–38199, 2018, doi: 10.1109/access.2018.2849820.
M. Castelli, F. M. Clemente, A. Popovič, S. Silva, and L. Vanneschi, “A Machine Learning Approach to Predict Air Quality in California,” Complexity, vol. 2020, pp. 1–23, Aug. 2020, doi: 10.1155/2020/8049504.
B. S. Freeman, G. Taylor, B. Gharabaghi, and J. Thé, “Forecasting air quality time series using deep learning,” Journal of the Air & Waste Management Association, vol. 68, no. 8, pp. 866–886, May 2018, doi: 10.1080/10962247.2018.1459956.
Y.-C. Liang, Y. Maimury, A. H.-L. Chen, and J. R. C. Juarez, “Machine Learning-Based Prediction of Air Quality,” Applied Sciences, vol. 10, no. 24, p. 9151, Dec. 2020, doi: 10.3390/app10249151.
J. Ma, J. C. P. Cheng, C. Lin, Y. Tan, and J. Zhang, “Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques,” Atmospheric Environment, vol. 214, p. 116885, Oct. 2019, doi: 10.1016/j.atmosenv.2019.116885.
H. Liao, L. Yuan, M. Wu, and H. Chen, “Air quality prediction by integrating mechanism model and machine learning model,” Science of The Total Environment, vol. 899, p. 165646, Nov. 2023, doi: 10.1016/j.scitotenv.2023.165646.
CRediT Author Statement
The authors confirm contribution to the paper as follows:
Conceptualization: Karthikeyan T, Vivekanandan S J, Ashwini Barbadekar, Vasukidevi G, Muthukumar Subramanian and Reny Jose;
Methodology: Karthikeyan T, Vivekanandan S J and Ashwini Barbadekar;
Software: Vasukidevi G, Muthukumar Subramanian and Reny Jose;
Data Curation: Karthikeyan T, Vivekanandan S J and Ashwini Barbadekar;
Writing- Original Draft Preparation: Karthikeyan T, Vivekanandan S J, Ashwini Barbadekar, Vasukidevi G, Muthukumar Subramanian and Reny Jose;
Visualization: Karthikeyan T, Vivekanandan S J and Ashwini Barbadekar;
Investigation: Vasukidevi G, Muthukumar Subramanian and Reny Jose;
Supervision: Karthikeyan T, Vivekanandan S J and Ashwini Barbadekar;
Validation: Vasukidevi G, Muthukumar Subramanian and Reny Jose;
Writing- Reviewing and Editing: Karthikeyan T, Vivekanandan S J, Ashwini Barbadekar, Vasukidevi G, Muthukumar Subramanian and Reny Jose; All authors reviewed the results and approved the final version of the manuscript.
Acknowledgements
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Karthikeyan T
Department of Computer Science and Business Systems, Panimalar Engineering College, Chennai, Tamil Nadu, India.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Karthikeyan T, Vivekanandan S J, Ashwini Barbadekar, Vasukidevi G, Muthukumar Subramanian and Reny Jose, “Air Quality Prediction Using Ensemble Machine Learning Models with Environmental and Meteorological Features”, Journal of Machine and Computing, vol.6, no.1, pp. 150-164, 2026, doi: 10.53759/7669/jmc202606011.