Abstract
Technical Debt (TD) remains a critical challenge in software engineering, degrading maintainability and long-term quality. While traditional TD detection methods rely heavily on static analysis and manual inspection, recent advances in Large Language Models (LLMs) offer a compelling new approach for automating and scaling this process. In this paper, we present DebtGuardian, the first open-source LLM-based framework for detecting TD directly from source code changes. DebtGuardian combines zero-shot and few-shot prompting strategies, supports both granular and batch-level detection, and employs Guardrails-AI for validating and standardizing model outputs. To enhance robustness, it enables majority voting across multiple LLMs. We evaluate DebtGuardian using the MLCQ dataset, applying a variety of state-of-the-art open-source LLMs specialized for code understanding, as well as general-purpose LLMs. The results demonstrate that granular prompting, code-specialized models, and larger context windows significantly improve TD detection performance. Majority voting boosts recall by 8.17%, showing clear benefits in model ensemble strategies. We also conduct a detailed evaluation of line-level metrics and find that using a 10-line threshold achieves the best balance between precision and tolerance for small discrepancies in predicted TD locations. DebtGuardian advances the field by offering a flexible, extensible, and empirically validated LLM-based solution for TD detection. Our framework paves the way for integrating AI-driven analysis into continuous integration pipelines, making TD management more scalable and accurate in modern software development workflows.