|
1 | 1 | # Engineering Metrics |
2 | 2 |
|
3 | | -These metrics provide signals as to how teams are adhering to the adoption of the engineering practices and principals. Complimentary to the DORA metrics, these metrics are designed to be leading indicators of how teams are delivering software. |
| 3 | +These metrics provide signals as to how teams are adhering to the adoption of |
| 4 | +the engineering practices and principals. Complimentary to the |
| 5 | +[DORA metrics][dora], these metrics are designed to be leading |
| 6 | +indicators of how teams are delivering software. |
4 | 7 |
|
5 | | -## Branch & PR Age |
| 8 | +Many of these metrics can be gathered using an OpenTelemetry collector |
| 9 | +configured to run a [GitProvider receiver][gitprovider]. |
6 | 10 |
|
7 | | -Measures the lifespan and efficiency of code integration in repos. |
| 11 | +## Branch Metrics |
8 | 12 |
|
9 | | -- _How to Measure:_ Calculated by the overall average age of branches & pull requests across each repo |
10 | | -- _Example:_ Number of commits a branch is behind or ahead of main. Hours or days a PR has existed before merging into main. |
| 13 | +Engineering Defaults: [Pair Programming][pp], [Small Batch Delivery][sbd], and |
| 14 | +[Trunk Based Development][tbd] |
11 | 15 |
|
12 | | -## Open Branches |
| 16 | +***Branch Count*** measures the number of branches that exist within a |
| 17 | +repository at a given point in time, less the default branch. |
13 | 18 |
|
14 | | -Measures the number of active and open branches in a repo, excluding the trunk. |
| 19 | +***Branch Age*** measures the time a branch has existed within a repository at a |
| 20 | +given point in time, less the default branch. |
15 | 21 |
|
16 | | -- _How to Measure:_ Count the number of open branches in a repo. |
17 | | -- _Example:_ If the `liatrio-otel-collector` repo has a main branch and 5 feature branches, then the count is 5. |
| 22 | +High branch counts and branch ages are forms of technical debt, introducing |
| 23 | +unnecessary risk through increased maintenance and cognitive overhead. High |
| 24 | +counts and ages may also signify: |
| 25 | + |
| 26 | +* The team is using GitFlow |
| 27 | +* The team is not pair programming |
| 28 | +* The team is not delivering in small batches |
| 29 | +* A high number of merge conflicts that must be resolved regularly |
| 30 | + |
| 31 | +Branch count and branch age should be reduced to a minimum based on team context |
| 32 | +and goals. These metrics have to be evaluated in context. For example, a large |
| 33 | +open source project may accept a much higher norm than a product team of eight |
| 34 | +engineers. |
| 35 | + |
| 36 | +The below chart shows targets towards the engineering defaults for branch count |
| 37 | +and branch age when taken in the context of an ideal product team: |
| 38 | + |
| 39 | +| | Risky | Mediocre | Better | Engineering Defaults | |
| 40 | +|:--------------------:|-------|----------|--------|----------------------| |
| 41 | +| Branch Count | 20+ | 10 - 20 | 5 - 10 | < 5 | |
| 42 | +| Branch Age (in days) | 10+ | 7 - 10 | 3 - 7 | < 3 | |
| 43 | + |
| 44 | +***Branch Ahead By Commit Count*** measures the number of commits a downstream |
| 45 | +branch is ahead of its upstream branch, typically the trunk. A high number of |
| 46 | +"commits ahead" may indicate a need for smaller batch delivery. |
| 47 | + |
| 48 | +***Branch Behind By Commit Count*** measures the number of commits a downstream |
| 49 | +branch is behind its upstream branch, typically the trunk. A high number of |
| 50 | +"commits behind" may indicate the branch has lived too long, adding extra |
| 51 | +maintenance and cognitive overload. |
| 52 | + |
| 53 | +***Branch Lines Added*** measures the number of lines added to a downstream |
| 54 | +branch when compared to its upstream branch, typically the trunk. |
| 55 | + |
| 56 | +***Branch Lines Deleted*** measures the number of lines deleted from a |
| 57 | +downstream branch when compared to its upstream branch, typically the trunk. |
| 58 | + |
| 59 | +> Junior developers add code. Senior developers delete code.[^seniority] |
| 60 | +
|
| 61 | +The purpose of these metrics is to simply provide observable data points with |
| 62 | +regards to addition and deletion of code when comparing a branch to the default |
| 63 | +trunk. It is a purely contextual metric that a team can leverage to provide |
| 64 | +additional information during self-evaluation. This metric can be correlated |
| 65 | +with other metrics like Pull Request Age to provide additional insight on |
| 66 | +cognitive overheard. |
| 67 | + |
| 68 | +> These metrics can be gathered automatically from GitHub and GitLab through the |
| 69 | +> [Liatrio OTEL Collector][lcol]. Check out the [Liatrio OTEL Demo Fork][demo] |
| 70 | +> to see this metric collection in action. |
| 71 | +
|
| 72 | +## Number of Unique Contributors |
| 73 | + |
| 74 | +***Unique Contributors*** measures the total count of unique contributors to a |
| 75 | +repository over the course of its lifetime. This count will monotonically |
| 76 | +increase over time. |
| 77 | + |
| 78 | +Interpreting this metric is very contextual. Measuring an OpenSource Library |
| 79 | +that is used within production code may require a different number of |
| 80 | +contributors than a one-off proof-of-concept (POC) of an internal repository. |
| 81 | + |
| 82 | +The below chart takes a view based on several common scenarios. |
| 83 | + |
| 84 | +| Impact | Risky | Hesitant | Desirable | |
| 85 | +|:--------:|--------|----------|-----------| |
| 86 | +| Critical | 1 - 20 | 21 - 50 | 51+ | |
| 87 | +| High | 1 - 10 | 11 - 25 | 26+ | |
| 88 | +| Moderate | 1 - 5 | 6 - 20 | 21+ | |
| 89 | +| Low | 1 - 3 | 4 - 10 | 11+ | |
| 90 | + |
| 91 | +> These metrics can be gathered automatically from GitHub and GitLab through the |
| 92 | +> [Liatrio OTEL Collector][lcol]. Check out the [Liatrio OTEL Demo Fork][demo] |
| 93 | +> to see this metric collection in action. |
18 | 94 |
|
19 | 95 | ## Code Coverage |
20 | 96 |
|
21 | | -Measures the percentage of code statements exercised during unit test runs. Assesses the amount of code logic invoked during unit testing. |
| 97 | +***Code Coverage*** measures the percentage of code statements exercised during |
| 98 | +unit test runs. Assesses the amount of code logic invoked during unit testing. |
| 99 | +Third party tooling such as [CodeCov][codecov] runs in your automated CI/CD |
| 100 | +pipelines. If the code in question has 100 lines of code and 50 of those are |
| 101 | +executed by unit tests, then the code coverage percentage of this software is |
| 102 | +50%. |
22 | 103 |
|
23 | | -- _How to Measure:_ 3rd party tooling which runs in your automated CI/CD builds |
24 | | -- _Example:_ If the code we're testing has 100 lines of code and 50 of those are executed by unit test code, then the code coverage percentage of this software is 50% |
| 104 | +[codecov]: https://app.codecov.io/gh/open-telemetry/opentelemetry-collector-contrib |
| 105 | + |
| 106 | +Open O11y recommends having code coverage for any product which is going to |
| 107 | +production. |
25 | 108 |
|
26 | 109 | ## Code Quality |
27 | 110 |
|
28 | | -Measures the quality of code across three tenets: Security (Vulnerabilities), Reliability (Bugs), and Maintainability (Code Smells). |
| 111 | +***Code Quality*** measures the quality of code across three tenets: |
| 112 | + |
| 113 | +Security (Vulnerabilities): Security in the context of code quality refers to |
| 114 | +the identification and mitigation of vulnerabilities that could be exploited by |
| 115 | +attackers to compromise the system. |
| 116 | + |
| 117 | +Reliability (Bugs): Reliability focuses on the software's ability to perform its |
| 118 | +intended functions under specified conditions for a designated period. Bugs, or |
| 119 | +errors in the code, can significantly impact the reliability of a software |
| 120 | +application, leading to system crashes, incorrect outputs, or performance |
| 121 | +issues. |
| 122 | + |
| 123 | +Maintainability (Code Smells): Maintainability is about how easily software can |
| 124 | +be understood, corrected, adapted, and enhanced. "Code smells" are indicators of |
| 125 | +potential problems in the code that may hinder maintainability. These can |
| 126 | +include issues like duplicated code, overly complex methods, or classes with too |
| 127 | +many responsibilities. Addressing code smells through refactoring and adhering |
| 128 | +to coding standards and best practices helps improve the maintainability of the |
| 129 | +codebase, making it easier for developers to work with and evolve the software |
| 130 | +over time. |
| 131 | + |
| 132 | +Use third party tooling such as Coverity that provide these metrics. |
| 133 | + |
| 134 | +## Work Cycle Time |
| 135 | + |
| 136 | +Engineering Defaults: [Small Batch Delivery][sbd] |
| 137 | + |
| 138 | +Work cycle time calculates the time between a work item being started and |
| 139 | +finished. For each work item, calculate the cycle time as: |
| 140 | + |
| 141 | +$$ |
| 142 | +t = t_{finish} - t_{start} |
| 143 | +$$ |
| 144 | + |
| 145 | +A team can then calculate the average cycle time for work items in a given |
| 146 | +period of time. For example, a team may calculate the following cycle times for |
| 147 | +four work items: |
| 148 | + |
| 149 | +* $t_0 = 48$ hours |
| 150 | +* $t_1 = 72$ hours |
| 151 | +* $t_2 = 16$ hours |
| 152 | +* $t_3 = 144$ hours |
| 153 | + |
| 154 | +Then, the team can calculate the average as: |
| 155 | + |
| 156 | +$$ |
| 157 | +\frac{1}{n} |
| 158 | +\left( |
| 159 | + \sum_{i=0}^{n} |
| 160 | + t_i |
| 161 | +\right) |
| 162 | += \frac{48 + 72 + 16 + 144}{4} |
| 163 | += 70 |
| 164 | +\text{ hours} |
| 165 | +$$ |
| 166 | + |
| 167 | +In this example, the team may conclude that their average cycle time is very |
| 168 | +large. As a result, the team agrees to write smaller work items to deliver in |
| 169 | +smaller batches. |
| 170 | + |
| 171 | +Team Retrospective Questions: |
| 172 | + |
| 173 | +* Are work items blocked regularly? |
| 174 | +* Do work items need to be split into smaller scope portions? |
| 175 | + |
| 176 | +The below chart reflects a general targets between large and small batch |
| 177 | +delivery: |
| 178 | + |
| 179 | +| | Large Batch | Mediocre | Decent | Small Batch | |
| 180 | +|:------------------:|-------------|----------|--------|-------------| |
| 181 | +| Average Cycle Time | Months | Weeks | Days | Hours | |
| 182 | + |
| 183 | +> Important: We recommend first taking a look at the cycle time for branches -> |
| 184 | +> pull requests -> deployment into production through the DORA metrics instead |
| 185 | +> of relying on work cycle time. |
| 186 | +
|
| 187 | +## Repositories Count |
| 188 | + |
| 189 | +The quantity of repositories managed by an organization or team is a critical |
| 190 | +indicator of the extent of code they oversee. This metric is the base for all |
| 191 | +other Source Control Management (SCM) metrics. High numbers of repositories for |
| 192 | +a small team may signify a high cognitive overhead. You can correlate the number |
| 193 | +of repositories a team owns to the number of repositories within the |
| 194 | +organization. |
| 195 | + |
| 196 | +However, it's crucial to recognize that this metric does not offer a |
| 197 | +one-size-fits-all solution. Although it forms the basis for further analysis, |
| 198 | +its significance can vary greatly. Like all metrics, it should be interpreted |
| 199 | +within the broader context and aligned with the specific values and objectives |
| 200 | +of the team. |
| 201 | + |
| 202 | +## Pull Request Count and Age |
| 203 | + |
| 204 | +Engineering Defaults: [Pair Programming][pp], [Small Batch Delivery][sbd], and |
| 205 | +[Trunk Based Development][tbd] |
| 206 | + |
| 207 | +***Pull Request Count*** measures the number of pull requests against the |
| 208 | +default branch in a repository at a given point in time. |
| 209 | + |
| 210 | +***Pull Request Age*** measures the time from when a pull request is opened to |
| 211 | +when it is approved and merged. |
| 212 | + |
| 213 | +These metrics help teams discover bottlenecks in the lifecycle of pull requests. |
| 214 | +There are three main states of a pull request that can be measured, `open`, |
| 215 | +`approved`, `merged`. Ideally, a team processes a steady flow of pull requests |
| 216 | +which suggests a healthy, productive development process. |
| 217 | + |
| 218 | +The below is the definition of states as defined in the |
| 219 | +[Git Provider Receiver][gitprovider] in terms of states for age: |
| 220 | + |
| 221 | +* ***open age***: the amount of time a pull request has been open |
| 222 | +* ***approved age***: the amount of time it took for a pull request to go from |
| 223 | + open to approved |
| 224 | +* ***merged age***: the amount of time it took for a pull request to go from |
| 225 | + open to merged |
| 226 | + |
| 227 | +The below chart outlines time for each of these metric states centered towards |
| 228 | +the engineering defaults. Remember to evaluate these in context of things like |
| 229 | +team size, contributor size, and inner vs open source. |
| 230 | + |
| 231 | +| | Risky | Mediocre | Better | Engineering Defaults | |
| 232 | +|:---------------------------:|-------|----------|--------|----------------------| |
| 233 | +| Pull Request Count | 20+ | 10 - 20 | 5 - 10 | < 5 | |
| 234 | +| Pull Request Age - Open | Weeks | Days | Hours | Minutes | |
| 235 | +| Pull Request Age - Approved | Weeks | Days | Hours | Minutes | |
| 236 | +| Pull Request Age - Merged | Weeks | Days | Hours | Minutes | |
| 237 | + |
| 238 | +Team Retrospective Questions: |
| 239 | + |
| 240 | +* Are pull requests simply being ignore? |
| 241 | +* Is the team overwhelmed with external contributions? |
| 242 | +* Are the merge requirements excessively difficult? Can automation help? |
| 243 | +* Are team members pair programming enough? |
| 244 | +* Is the team delivering in large batches? |
| 245 | + |
| 246 | +Pair programming can reduce the time needed towards reviewing pull requests. |
| 247 | +When pairing, a code review effectively occurs in real-time during development. |
| 248 | +Thus, the pairing partner is very familiar with the changes and is able to very |
| 249 | +quickly review and approve a pull request. |
29 | 250 |
|
30 | | -- _How to Measure:_ 3rd party tooling which runs in your automated CI/CD builds |
31 | | -- _Example:_ One aspect of code quality is reusability, which can be measured by counting the number of interdependencies. The more tightly-coupled the code is with surrounding code and dependencies, the less reusable the code tends to be. |
| 251 | +Large batch deliveries increase the the time needed to review a pull request. |
| 252 | +This problem is discussed in detail [above](#branch-metrics). |
32 | 253 |
|
33 | | -## Story Cycle Time |
| 254 | +Teams should also be concerned when these metrics are very low. This likely |
| 255 | +indicates that teams aren't reviewing pull requests effectively. Additionally, |
| 256 | +merging pull requests too quickly prevents other team members from reviewing the |
| 257 | +code changes. |
34 | 258 |
|
35 | | -Measures the time between starting a work item (In Progress) and completing a work item (Done). Promote small batch delivery by striving for smaller cycle times. |
| 259 | +[^seniority]: Unknown source |
36 | 260 |
|
37 | | -- _How to Measure:_ Calculated as the average time that stories remain active. |
38 | | -- _Example:_ If a 4-person team completes 32 stories in a month with 22 work days, then the cycle time is `(4 * 22) / 32` or 2.75 days. |
| 261 | +[pp]: ../../engineering-defaults.md#pair-programming |
| 262 | +[tbd]: ../../engineering-defaults.md#trunk-based-development |
| 263 | +[sbd]: ../../engineering-defaults.md#small–batch-delivery |
| 264 | +[demo]: https://github.com/liatrio/opentelemetry-demo/blob/main/docs/delivery.md |
| 265 | +[lcol]: https://github.com/liatrio/liatrio-otel-collector/ |
| 266 | +[dora]: https://dora.dev/ |
| 267 | +[gitprovider]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/gitproviderreceiver |
0 commit comments