Abstract
Accurately monitoring and managing energy consumption in buildings is critical for improving efficiency and reducing environmental impact. Non-Intrusive Load Monitoring (NILM) enables energy disaggregation from aggregate meter data, but disaggregating heating and cooling loads presents unique challenges due to their continuously variable nature and dependence on external weather conditions. This paper presents a structured methodology for preparing a dataset specifically designed for temperature-dependent load disaggregation in commercial and public buildings. The dataset includes energy consumption from multiple buildings across diverse climate zones, integrating sub-metered heating and cooling loads with weather data. A rigorous data cleaning and preprocessing pipeline was implemented to ensure consistency, including unit normalization, outlier detection, cross-meter validation, and alignment with meteorological data. The resulting dataset provides researchers with high-quality energy and weather time series, enabling the development and validation of NILM algorithms for HVAC disaggregation. By documenting the dataset preparation process, this work establishes best practices for handling real-world energy data, ensuring reliability and reproducibility. This paper serves as a reference for future dataset curation efforts in NILM and temperature-dependent load analysis. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.