Troubleshooting Legacy and Monolithic Apps in Production: Addressing the Challenges

Eran Kinsbruner
5 min readMar 15, 2023

Legacy systems maintenance and debugging are challenging. They require specific skills and techniques, which are quite different from the approaches used on codebases under active development. Developers find themselves going through a code they don’t know enough about: either because they are unfamiliar with the codebase or not knowledgeable about the tech. The original developers are usually long gone or moved to other tasks, and documentation quality is never good enough. Changes to the runtime environment, operating system, external services, and third-party software could result in nontrivial maintenance duties to keep the legacy application working. Integration with newer systems can be hard to attain.

Debugging old legacy code (Mehmet Ozkaya), particularly under the pressure of an ongoing incident, can sometimes be frustrating and time-consuming, however such apps are still being used and of great value to the organizations. Such organizations must embrace the complexity of maintaining legacy applications and find approaches and tools to alleviate those difficulties, thus allowing developers to be rapid and effective in their debugging efforts.

The Challenges in Debugging Legacy Applications

In legacy settings, debugging tools available to developers are often scarce. In general, there are two possible approaches:

  • Local: Running the code through a debugger on the local machine.
  • Remote: Using log statements to gain insight into the application in a live environment.

However, both approaches can be unsatisfactory due to the peculiarities of legacy environments. Let’s see.

High Level Monolithic Architecture

Build Time

Legacy applications can exhibit high build times; thus, debugging-by-logging can incur a significant penalty. A slow build time causes a reduction in the developer’s effectiveness and speed. When the wait time is too long, we tend to lose focus and become more prone to distraction and context switches, breaking the flow and causing time waste. When it is necessary to debug the live application remotely, we incur additional time overhead due to the deployment process, which can often be manual or slow.

State Recovery

Legacy applications are often complex or monolithic, including being highly stateful and lowly decoupled, frequently not delegating status to external services. This is exemplified by the presence of integrated caches, internal data structures, connection pools, thread pools, etc. The presence of those architectural features leads to the fact that reproducing a specific issue seen in production takes considerable time due to the need to recover the state of the application.

A classic example of this is debugging database connection pools: the behavior of a connection pool at the application start can be very different from the behavior in a long-lived application under production workload, where the pool is more likely to exhibit erroneous behavior. The time needed to recover the state can severely hinder a quick debug cycle.

Dependencies

Legacy code usually requires complex dependencies to operate: cache engines, libraries, databases, middleware components, queues, etc. This complicates the reproduction of erroneous behaviors in the developers’ local environment due to the complexity of setting up those services. Sometimes a dependency can not be reproduced locally: this is often the case for complex databases. Implementing a connection to a test database is often necessary, provided such a database exists. Even if a test database is available, sometimes reproducing a bug can only be done with having access to actual production data, which is rarely done due to security and compliance concerns.

Restart Time

Heavy monoliths can take a long time to startup, adding additional delays to the development cycle. This happens, for example, in technology stacks that require the restart of an entire application server, both on traditional application servers and when the application has been lifted and shifted over to a container orchestrator.

Transform your Debugging Experience with Lightrun

Ligthrun’s Developer Observability Platform represents a game-changer in debugging legacy applications. It allows the dynamic instrumentation of applications that run remotely on a production server by adding logs, metrics, and virtual breakpoints without stopping the application. Avoiding restarting the application and debugging the live application remotely are key improvements in legacy settings.

The value provided by Lightrun in the context of legacy applications is significant: it reduces development time by avoiding application rebuild, redeploy, and restart and by not losing the application state. This, in turn, reduces the time to restore the system after a fault and optimizes the developer’s workload. Also, the dynamic insights provided by Lightrun features are a crucial improvement when trying to make sense of an outdated and complex codebase.

Ligtrun provides several features that empower developers of legacy applications. In particular:

  • Dynamic Logs allow developers to add new log lines anywhere in the codebase without writing new code or redeploying the application and without losing state.
  • Snapshots are virtual breakpoints that provide the familiar breakpoint functionalities without stopping execution, allowing them to be used on the remote application.
  • Metrics can monitor live applications in real time and on demand. They can, for example, monitor the size of data structures over time, allowing them to debug bugs that can be reproduced only on the live system.
Lightrun Virtual Breakpoint (Snapshot) in Action from within IntelliJ IDE

Lightrun features enable developers to troubleshoot legacy software remotely, simplifying the debug process. In particular, Lightrun’s ability to instrument a live application removes the need for rebuilding and redeploying the application, thus avoiding delays caused by high build and deployment times. Furthermore, Lightrun helps preserve the application state (cache, database thread pools, application data), as it makes it possible to keep the application running while providing input to it and getting logs, variable values, and metrics, significantly improving the reproducibility of erroneous application behavior.

One additional benefit unlocked by Lightrun’s ability to debug remotely is removing the need to build a complex local development environment complete with all the necessary dependencies (either testing data, databases, queue, middlewares, and external services).

Finally, Lightrun avoids time waste in restarting the application after adding logs and breakpoints; this is particularly useful in the case of fat monoliths and applications based on slow-starting application servers.

Stop being afraid to touch legacy applications! Do it safely with Lightrun. Try it by yourself on Lightrun’s playground.

--

--

Eran Kinsbruner

Global Head of Product Marketing at Lightrun, Best Selling Author, Patent holder, CMMI and FinOps Practitioner Certified (https://lightrun.com)