AI Technical Debt

Time to Retire That Debt

We’ve been working around the edges of our client projects trying to understand how GenAI apps could be tested for quality outcomes, consistently delivered. There is a large body of academic research on the same topic, mostly aimed at the intractable problem of testing language models for performance. Our focus is at a level higher; we assume the ‘platform’ works and have been testing the applications built on top.

CIOs know there are risks hidden in their digital assets: software bugs crash applications, breach security, and process errors at the speed of light. There are no ‘zero defect’ production systems. The basic strategy of every CIO is to ‘contain the chaos’ and ‘limit the liability’ by fielding sufficient maintenance capacity to address the rate of the errors encountered in production within an acceptable timeframe. Of course, this is on top of the huge investments made by every executive in QA personnel, processes and infrastructure. The collection of known and unknown software defects is labeled as ‘technical debt’, which is an appropriate term given the consequences of not remediating the problem. As the debt mounts, it may become too much for an organization to manage successfully. The consequences of software defects are becoming even more severe with regulatory actions, as evidenced by the recent EU directive extending product liability to defective software, and not just hardware.

By some estimates, after 60+ years of software delivery, there is more than $1.5 trillion of exposure in technical debt in application portfolios (You read that correctly. ‘T’ is for trillion). Some software professionals are worried that the rise of AI as an aid to software development will only make the problem worse. The reason is that anything that makes it easier for coders—especially less-experienced ones—to write and ship software tends to lead to more technical debt. For coders, getting features out the door is often prioritized, rather than taking the time to optimize that code for performance and quality. We’ve noted in our own Lab that code snippets generated by AI may sometimes magically perform well, other times not so much. The defective code snippets could be easily missed by a junior dev.

Since the 1960’s, the discipline of software quality assurance has been defined, refined and studied at length, with insights published by Industry luminaries such as Edsger Diskstra and Richard Lipton. The process of ‘proving the correctness of the program’ is hard with legacy apps and languages, and now much more with AI. We’ve concluded that GenAI apps can only be moderated, not corroborated, given the state of the art of testing platforms. Accordingly, we’ve made recommendations on practical steps to moderate.

But there is a unique opportunity with GenAI to help address technical debt. Deploy it to remediate apps, and scour existing codebases to identify security risks, hidden defects and neglected upgrades. Bill Curtis, the chief scientist at Cast Software noted that ‘Banking systems are loaded with Cobol, built ages ago, and not documented. The guy who built it is probably dead.’ He has a point. Rather than GenAI adding to technical debt, use it to pay down that debt by moving codebases to modern architectures, platforms and languages. With it, resources are freed up and innovation accelerates. Given the state of art with AI, this might be the most valuable investment that CIOs could make.