Key Metrics for Tracking Developer Productivity: What to Measure, How and Why

Eran Kinsbruner
FAUN — Developer Community 🐾
10 min readNov 8, 2023

--

Unlocking $300 Billion Wasted

According to a 2018 Stripe report, developers spend an average of 17.3 hours per week on maintenance issues. This amounts to $300 billion in lost developer productivity every year. This highlights a substantial inefficiency in developer workflows, as a considerable amount of their time is dedicated to code churn rather than innovation.

To navigate complexities, organizations rely on effective KPIs and actionable metrics. Companies like Google and Microsoft remarkably invest in understanding developer workflows, recognizing that even small improvements can result in substantial increases in productivity over time and development cycles.

Google, the technological Goliath, stresses the value of developer productivity metrics to such an extent that they acquired and now lead DORA — a significant force in this field and one of the most considered DevOps initiatives. DORA has defined a set of metrics to gauge software delivery performance and its acquisition by Google highlights the importance of performance measurement.

DORA is not the only research program aiming to solve the developer productivity equation. There are other methodologies and frameworks available, such as Agile, Lean, Value Stream Mapping, SPACE, Flow, and others. These frameworks offer different approaches to measuring developer productivity, each with its own set of metrics. But when confronted with this considerable number of options, how do you determine which metrics are the most suitable for your developer teams?

The answer lies in considering your specific goals, the maturity of your team, and the size of your organization. With that in mind, let’s dive into a list of 10 key metrics for tracking developer productivity to guide you in selecting what resonates most with your needs.

Understanding the 10 Essential Metrics

Lead time for changes (DORA)

What is it?

Lead time for changes refers to the duration from the moment a developer commits a change to the point it gets deployed in a production environment. This metric provides insight into the efficiency of the software delivery process, effectively showing how swiftly changes move through the different stages (dev, test, prod) and pipelines.

How do we measure it?

To quantify lead time for changes, track the number of days (or hours) from the time a commit is made until it’s successfully deployed in production. Tools like CI/CD platforms and version control systems can help automate this tracking.

What are the potential benefits?

It’s all about operational efficiency and cost savings.

By reducing lead time for changes, developers receive feedback more quickly, and this is what allows for immediate adjustments.

Reducing lead time to change not only boosts developer morale by showcasing their contributions more quickly but also benefits the end-users with quick feature releases and bug fixes.

Change Failure Rate (DORA)

What is it?

The percentage of deployed changes that fail, requiring a subsequent fix or rollback.

How do we measure it?

By comparing the number of failed deployments to the total number of deployments over a specific period. For example, if you have 10 failed deployments out of 100 total deployments, your Change Failure Rate (CFR) is 10%.

What are the potential benefits?

A low change failure rate means you have a robust development and testing environment. By minimizing failures, teams reduce downtime, enhance user experience, and build trust in their release process. This efficiency increases user satisfaction and reduces costs associated with hotfixes or rollbacks.

Change Failure Rate (DORA)

What is it?

The percentage of deployed changes that fail, requiring a subsequent fix or rollback.

How do we measure it?

By comparing the number of failed deployments to the total number of deployments over a specific period. For example, if you have 10 failed deployments out of 100 total deployments, your Change Failure Rate (CFR) is 10%.

What are the potential benefits?

A low change failure rate means you have a robust development and testing environment. By minimizing failures, teams reduce downtime, enhance user experience, and build trust in their release process. This efficiency increases user satisfaction and reduces costs associated with hotfixes or rollbacks.

Deployment Frequency (DORA)

What is it?

The number of times a team deploys code to production within a given time frame. High-performing companies have higher deployment frequency, take the example of Facebook, on Android alone, they do between 50,000 and 60,000 builds a day. By applying continuous delivery techniques to their mobile stack, they’ve gone from four-week releases to two-week releases to one-week releases.

How do we measure it?

Track the number of deployments made over a specific period, such as daily, weekly, or monthly.

What are the potential benefits?

Higher deployment frequency indicates a more agile and responsive engineering process. Frequent deployments allow teams to deliver features and fix more rapidly to end-users, leading to quicker feedback loops and better adaptability to market demands. This can result in improved customer satisfaction and competitive advantage.

Time to Restore Service (DORA)

What is it?

The duration required to recover from a service outage or a critical incident and restore the application to its normal operating condition.

How do we measure it?

Calculate the time from when an incident is reported to the moment your services are fully operational again.

What are the potential benefits?

A shorter time to restore service indicates the effectiveness of a team in diagnosing and resolving issues. This ensures minimal disruption for end-users and maintains trust in the software. Additionally, by being more efficient in restoring incidents, companies reduce potential revenue loss and maintain their brand image.

Mean Time to Recover (DORA)

What is it?

Mean Time to Recover (MTTR) represents the average time it takes to restore a system or application to its normal function after experiencing a failure or outage.

How do we measure it?

Calculate MTTR by taking the total downtime during a specified period and dividing it by the number of incidents or outages during that same period.

What are the potential benefits?

Understanding and optimizing MTTR ensures that when failures do occur, the system can quickly return to normal operation, minimizing disruption. A lower MTTR can lead to improved user satisfaction, reduced business impact, and increased confidence in the system’s resilience.

Cycle Time (Lean/Agile)

What is it?

Cycle time originates from lean manufacturing principles and has become a key metric for software engineering teams. This metric represents the duration from the moment work begins to when it’s completed and delivered.

Note that cycle time and lead time are two distinct metrics. Cycle time represents the average duration it takes to complete a cycle of work. On the other hand, lead time refers to the time elapsed between receiving an order and the delivery date.

How do we measure it?

Assess the average duration from the initial commit to its release in production.

What are the potential benefits?

Shorter cycle times lead to faster feedback, quicker value delivery, and more responsiveness to your processes. As a result, your company can adapt swiftly to your market and users’ needs.

Uptime

What is it?

Uptime refers to the duration a system or application is operational and available to users without any downtime or interruptions. It’s a commonly used metric for understanding the stability and reliability of a software or a system. It is tightly connected to SLA.

How do we measure it?

It’s often expressed as a percentage, with 100% being always available. For instance, an uptime of 99.99% means that the system was unavailable for only 0.01% of the time, which translates to about 52.6 minutes per year while 99.999% uptime means only 5.25 minutes per year.

What are the potential benefits?

Maintaining high uptime ensures trust and satisfaction. Frequent downtimes are harmful to user experience and can impact an organization’s reputation and revenue. Therefore, prioritizing uptime reduces the costs associated with outages.

Developer Velocity (Agile metric)

What is it?

Developer velocity gauges the rate at which a team completes work within a specific timeframe, typically during a sprint in Agile methodologies.

How do we measure it?

Velocity can be quantified using man-hours or more commonly, story points, which are assigned to each user story based on its complexity.

What are the potential benefits?

By tracking Developer Velocity, teams can enhance their project and sprint planning, ensuring realistic commitments and foreseeing potential roadblocks or resource requirements. It also helps in setting clear expectations and optimizing team performance over time.

Work-in-progress (Agile metric)

What is it?

Work-in-progress (WIP) indicates the current tasks that a team is working on but hasn’t completed yet.

How do we measure it?

It’s quantified by counting the number of tasks or user stories that are in progress, typically visualized on a Kanban board or Agile dashboard.

What are the potential benefits?

Monitoring WIP helps in pinpointing bottlenecks where tasks might be piling up, allowing teams to address and resolve these issues. Additionally, by keeping an eye on WIP, companies can identify potential sunk costs and ensure smoother workflows. However, this metric can sometimes be inconsistent and easily manipulated so use it with caution.

Customer Satisfaction Score

What is it?

Customer Satisfaction Score (CSAT) is a metric that measures the contentment level of your customers regarding your products or services.

How to measure it?

This metric is typically measured using surveys where customers rate their satisfaction on a scale (e.g., 1 to 5 or 1 to 10). It is also possible to use open-ended questions. Even if responses to this kind of questions are hard to quantify, they can be used alongside quantifiable questions to provide more detailed feedback on the participants’ experiences.

What are the potential benefits?

By leveraging CSAT, companies build stronger customer relationships, pinpoint areas of improvement, and ensure they consistently meet or exceed customer expectations. Obviously, high satisfaction scores lead to increased loyalty and, consequently, higher revenue.

Vanity Metrics — What not to Measure?

Vanity metrics are statistics that may appear impressive at first glance but do not necessarily translate into any meaningful business results. This kind of metrics is not actionable and does not provide any real outcome. Some of these metrics are about the number of lines of code, tasks checked off, commits made, and bugs fixed. These metrics can be respectively replaced with more meaningful ones such as code quality, value delivered, merge requests, and user satisfaction score.

Maximizing Developer Efficiency with Lightrun

Metrics serve as a compass for developer productivity, guiding teams toward more efficient coding and operational practices.

Indeed, developer productivity is essential for the growth and success of any software-based organization. The more efficient and focused developers are, the more value they can deliver. To begin your developer productivity journey, it is crucial to select the appropriate tools that can genuinely strengthen your development team’s capabilities. Lightrun developer observability stands out as one such platform.

Developer productivity is essential for the growth and success of any software-based organization. The more efficient and focused developers are, the more value they can deliver. To embark on your developer productivity journey, it is important to choose the right tools that can truly enhance your development team’s abilities. Lightrun is one such platform.

InsideTracker, as a matter of fact, reduced debugging time by 50% and saved hours previously spent on hotfixes and redeployments after using Lightrun. InsideTracker is a personalized health analysis and data-driven wellness guide, designed to help you live healthier longer. The company development team uses Java runtime technology in a cloud-based environment. They have two Kubernetes clusters, one for production and another for development and testing purposes.

Their app’s multiple services in different pods are inaccessible to developers. Troubleshooting incidents and adding logs and telemetry is a time-consuming process, causing delays in issue resolution for customers. InsideTracker developers faced difficulties in troubleshooting partner incidents involving API calls with unparsable responses. They also struggle with troubleshooting and parsing issues related to 3rd party data providers like smartwatches and health tracking solutions. To quickly resolve production issues, InsideTracker, as a health and wellness technology company, needed an efficient and secure troubleshooting solution for their services.

Lightrun helped InsideTracker developers troubleshoot apps in both pre-production and production environments. They were able to do this directly from their development IDEs, saving the engineering team hours of debugging time. Using Lightrun, over a dozen InsideTracker developers were able to add logs and snapshots without hotfixing, redeploying, or changing the app’s runtime state. All of this was done through a secure and private customer platform.

By using Lightrun, our development team was able to figure out an extremely complex incident that was hard to parse and reproduce locally and with the standard debugging solutions. The team was able to use Lightrun to quickly add logs and snapshots around the area of the incident, reproduce and fix it” — Yan Dyshkalps, Director of Technology Research, Architecture and Infrastructure, InsideTracker

Indeed, Lightrun frees developers from time-consuming engineering tasks. Instead of spending hours going through logs, Lightrun allows developers to quickly find the information they need, exactly when they need it, directly from their IDE. The platform offers real-time debugging and performance insights in production environments, enabling developers to understand and resolve issues without disrupting the live application. With just a few mouse clicks, you can easily insert dynamic log lines, metrics, or snapshots into a live environment without the need to redeploy or restart your systems. By integrating Lightrun Actions into the developer’s IDE and CI/CD pipelines, Lightrun simplifies troubleshooting and ensures a smoother software delivery process.

This improves developers’ productivity by up to 21%, potentially resulting in millions of dollars in value-added work. After all, developers are most effective when they can focus on innovating and creating impactful features, rather than getting lost in logs. With Lightrun, developers can dedicate more time to what they enjoy — creating — and less time to debugging.

Start by creating an account here to get a 14-day free trial. You can also request a demo here. Alternatively, explore our Playground where you can experiment with Lightrun in a real, live app without any configuration required.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

--

--

Global Head of Product Marketing at Lightrun, Best Selling Author, Patent holder, CMMI and FinOps Practitioner Certified (https://lightrun.com)