5 Terrible Metrics to Avoid When Evaluating Developer Performance
Introduction
As software development teams continue to grow, engineering leaders are increasingly expected to produce tangible numbers to show how their teams are improving.
As software developers, we're constantly trying to improve the way we work. But how do we know if we're actually making progress? One way is by tracking key performance indicators (KPIs), which help us measure and analyze our work. However, not all KPIs are created equal. In fact, some can be downright misleading.
This ongoing search for software development metrics is a tale as old as time, and while there are many metrics out there, it’s important to understand which ones are flawed and should be avoided.
In this blog post, we'll take a look at some of the worst metrics for evaluating developer performance and why you should avoid them.
Worst Metrics
1. Lines of Code (LoC)
First, one of the worst metrics to use when evaluating developer performance is the number of lines of code they write. This metric is often seen as a measure of productivity, but it’s actually a terrible indicator of a developer’s ability. Just because a developer can write a lot of code doesn’t mean that it’s good code. A developer could write a lot of code that is inefficient, poorly written or even buggy. This metric also doesn’t take into account the amount of time it takes to write the code, which is a much better indicator of a developer’s productivity.
LoC is probably the most well-known metric for evaluating developers, but it's also one of the worst. The problem with LoC is that it's extremely noisy. According to an analysis of 1 million open source commits, about 70% of LoC is noise. And that's just the intrinsic noise. When you factor in the fact that about 30% of all commits in open source repos are eventually discarded, the noise level increases to around 80%.
But it gets even worse. LoC tends to spike when new features are being implemented, which can incentivize rapid code addition and lead to a codebase bogged down by tech debt. Additionally, the value of a line of code can vary greatly depending on the language it's written in. A line of CSS, for example, might take a fraction of the time to write compared to a line of Java, Python, or Ruby. As a result, the "most valuable" developer as measured by LoC might be the one adding the most CSS, whitespace, and third-party libraries.
2. Commit Count
The second terrible metric to use when evaluating developer performance is the number of commits they make. Commit count is relatively easy to track, but it's not very useful. While it does have some advantages over LoC (it's not susceptible to noise from trivial line changes and can be a useful indicator of whether a developer is stuck), it lacks signal. In other words, it doesn't tell us much about the quality or impact of the work being done.
Commit frequency is typically used to reward teams with a high frequency and improve teams with a lower one. At face value, it might seem like an okay metric, but it’s easy to game. Just create more commits. Even if it’s not gamed, a rise in commits doesn’t indicate more productivity, output, or value delivered.
For example, a developer who makes a lot of small, incremental commits might have a high commit count, but that doesn't necessarily mean they're doing the most valuable work. On the other hand, a developer who makes fewer, larger commits might be doing more impactful work, but they would have a lower commit count.
3. Pull Request Count
Pull request count can give you a sense of release cadence and continuous delivery. However, it’s a vanity metric. It doesn’t take into account the size or difficulty of pull requests, and it’s easy to game. It encourages developers to create an excessive amount of small pull requests just to inflate their metric, which causes bloat in the code review process and creates unnecessary overhead across the team.
4. Velocity or Story Points
Velocity points are a common agile approach and can be a great tool when used to forecast delivery and estimations. Unfortunately, team velocity and story points are often misused as performance metrics. When you turn velocity from an estimation tool to a measure of software productivity or output, you end up rewarding teams based on points. This immediately jeopardizes the accuracy of estimations as developers are incentivized to inflate points.
5. Code Churn
Code churn is a measure of how much a codebase changes over time. While it might seem like a useful metric at first glance, it's quite noisy and doesn't provide much useful information. For example, a high code churn rate could be caused by several factors, such as refactoring, bug fixing, or the implementation of new features. It's hard to discern any meaningful signal from the noise.
6. Test Coverage
Test coverage is a measure of how much of a codebase is covered by automated tests. While it's certainly important to have good test coverage, it's not a good metric for evaluating developer performance. That's because test coverage is influenced by several factors beyond an individual developer's control, such as the complexity of the code and the number of edge cases that need to be covered.
Additionally, test coverage is a lagging indicator. In other words, it tells us about the quality of the code that's been written, but it doesn't predict the quality of code that will be written in the future.
7. Impact
This is a new metric used by many engineering ‘intelligence’ platforms, but it’s far from intelligent. ‘Impact scores’ essentially boil down to lines of code with extra steps. They factor in the number of lines touched, new vs. existing code, etc. - all combined as an “impact score.” A lot of companies attempt to use this metric, and - in almost all cases - developers hate it. Not only does it suffer from the same flaws as lines of code, but it’s even more difficult to understand. The biggest flaw in this metric is its name, as it suggests to executives and managers how this metric should be used.
Why are these metrics so commonly misused?
The desire for key performance indicators can lead us to measure the wrong things or use metrics in the wrong ways - even when we see the flaws. Being a manager and not being able to measure - that frustration - is powerful. Leaders in charge of thousands of engineers - have no idea what’s going on or if their software development process is a healthy one. Open source maintainers of the largest projects in the world have no insight into whether their communities are healthy, and growing or what the impact of their projects even is. These are areas where software metrics would be useful - but the metrics used today aren’t great ways of doing so.
At foyer, we create tailored dashboards for engineering teams to provide insightful analysis of essential metrics to ensure the correct metrics are being measured correctly.
By using the right metrics, you can more effectively measure and improve the performance of your team. If you want to learn about healthy patterns and types of metrics to use, you can book a demo with the team foyer we’ve put together a smart analysis dashboard that can help you handle big engineering teams.