Add documentation for architectural decisions by jeffkreeftmeijer · Pull Request #1034 · appsignal/appsignal-elixir

jeffkreeftmeijer · 2026-03-12T13:38:43Z

This PR adds documentation for architectural decisions and implementation details that were never formally written down. The documentation focuses on the "why" behind design choices, particularly around the ETS-based tracer registry, NIF integration, configuration system, instrumentation patterns, and error handling.

[skip changeset]

backlog-helper · 2026-03-12T13:39:58Z

✔️ All good!

New issue guide | Backlog management | Rules | Feedback

docs/configuration-system.md

docs/error-handling.md

docs/instrumentation-decorators.md

docs/nif-rust-integration.md

docs/span-lifecycle.md

tombruijn · 2026-03-17T10:03:11Z

docs/tracer-registry.md

@@ -0,0 +1,271 @@
+# Tracer Registry (ETS Table)
+
+The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes.


Suggested change

The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes.

The AppSignal Elixir integration uses an ETS table as a registry to track spans across processes.

tombruijn · 2026-03-17T10:06:19Z

docs/tracer-registry.md

+This happens every 60 seconds by default (`@sync_interval`).
+
+**Why?**
+This synchronization ensures the Monitor's internal state stays consistent with the actual Erlang process monitors, recovering from any potential state drift.


This is for clean up of no longer instrumented processes?

This document covers the design and implementation of the ETS-based span registry used by Appsignal.Tracer, including: - Why :duplicate_bag was chosen for the table type - How process monitoring ensures cleanup of orphaned spans - Performance optimizations that removed redundant running checks - The ignore pattern for opting processes out of tracing - Cross-process span passing via register_current/1 - Historical evolution of the implementation The documentation focuses on architectural decisions that weren't previously written down, particularly around the performance optimizations made in June 2023.

This document covers the architecture and implementation of the Native Implemented Function (NIF) layer that connects Elixir to the Rust-based AppSignal agent, including: - How the NIF is loaded and why it always returns :ok - Fallback implementations for testing without the agent - Erlang resource types for memory management - Atom creation optimization from May 2023 - Platform-specific build system configuration - Data encoding for complex nested structures - Error handling strategies The documentation explains the thin C extension layer that translates between Elixir and the Rust agent's C API, and why business logic is kept out of the C code.

This document covers the layered configuration system that merges settings from multiple sources, including: - Configuration source priority (defaults, system detection, application config, environment variables) - Type conversion for environment variables - Validation (only push_api_key is required) - Distinction between configured_as_active and active - Performance optimization to avoid duplicate ETS lookups - Handling empty configurations without crashing - Writing configuration to the NIF for the Rust agent - Feature flags for instrumentation toggles - Special handling for Oban error reporting modes - Log level determination and log file path caching The documentation explains design decisions like why most integrations default to enabled and why configuration sources are tracked separately for debugging purposes.

This document covers both functional and decorator-based instrumentation approaches, including: - Functional instrumentation with instrument/2 and instrument/3 - Root span instrumentation for background jobs - Closing current span fix for out-of-order span closing - Error reporting patterns (set_error vs send_error) - Decorator-based instrumentation with @decorate macros - Name generation from module/function/arity - Slashes to underscores change from August 2021 - Compile-time configuration for test fakes - Common instrumentation patterns The documentation explains the fix for Phoenix render functions that create spans out of order, and why decorator-generated names switched from slashes to underscores for OpenTelemetry compatibility.

This document covers the error backend architecture and why it's disabled by default, including: - Three ways errors get reported (manual, automatic, backend) - Error backend as a Logger gen_event behavior - Crash reason handling and PID extraction - Cowboy error filtering from June 2023 - Span lookup and creation logic - Why the backend was disabled by default in June 2024 - Problems with lack of context and duplicate reporting - When to enable the error backend - Stacktrace handling for edge cases - Process isolation in distributed systems - Migration guide for users relying on the backend The documentation explains that the error backend is a relic from before proper framework integrations existed, and why modern instrumentation approaches provide better context.

Covers the telemetry-based integration pattern used by Ecto, Oban, Finch, Tesla, and Absinthe integrations. Explains why most are enabled by default and how to add custom integrations.

Covers span creation (root vs child), operations (attributes, errors, sample data), closing, time handling, nil handling, and registry integration.

Explains how AppSignal uses the :telemetry library for event-driven instrumentation, the attachment pattern, event structure, and how to create custom telemetry handlers.

Covers test helpers, fake implementations (Tracer, Span, Nif, Monitor), testing instrumentation, framework integration testing, and common testing patterns and pitfalls.

@tag

ExUnit requires @describetag to be placed inside describe blocks, not before them. Moving @tag :skip_env_test_no_nif to @describetag inside the describe blocks fixes compilation errors. This fixes the following error: ** (RuntimeError) @describetag must be set inside describe/2 blocks

Logger.add_backend/1 and Logger.remove_backend/1 are deprecated in Elixir 1.15+, replaced by LoggerBackends.add/2 and LoggerBackends.remove/2 from the :logger_backends dependency. This commit: - Adds :logger_backends dependency for Elixir 1.15+ - Creates Appsignal.Utils.LoggerHandler module as a compatibility layer that uses LoggerBackends when available, falling back to the old Logger API for older Elixir versions - Updates code to use the compatibility layer instead of calling Logger methods directly This fixes deprecation warnings: warning: Logger.add_backend/1 is deprecated. Use LoggerBackends.add/2 from :logger_backends dependency warning: Logger.remove_backend/1 is deprecated. Use LoggerBackends.remove/2 from :logger_backends dependency

Logger.disable/1 and Logger.enable/1 are deprecated in favor of Logger.put_process_level/2 and Logger.delete_process_level/1. This commit updates the test helper function to use the new API. This fixes deprecation warnings: warning: Logger.disable/1 is deprecated. Use Logger.put_process_level(pid, :none) instead warning: Logger.enable/1 is deprecated. Use Logger.delete_process_level(pid) instead

The :ssl.cipher_suites/0 function is deprecated in favor of :ssl.cipher_suites/2. Since Elixir 1.11+ requires OTP 21+, we can remove the conditional code for older OTP versions and always use the newer API. This simplifies the code and fixes deprecation warnings: warning: :ssl.cipher_suites/0 is undefined or private, use cipher_suites/2,3 instead

This commit removes two minor compiler warnings: - Remove unused "require Logger" from MapFilter module - Prefix unused opts parameter with underscore in FakeDebounce test support module These warnings don't affect functionality but create noise in the test output.

jeffkreeftmeijer · 2026-03-25T12:48:12Z

Alright, this was a bit of a tryout to see if we can generate documentation based on the code and commit history of the project (as requested). I believed we could get pretty good results, as most of the decisions have been documented in the commit messages, so I instructed Claude to go over my commits to gain context and write documentation.

However, what was generated was mostly a description of the code as is. The result is so tightly coupled (including code samples and direct line locations in the code), that it'd surely become a maintenance burden if it was merged as is. Aside from that, it's very verbose in explaining parts of the code that aren't interesting, and it lacks or ignores context that would be. Instead of documenting why decisions were made, it mostly explains what the code does, which one can also find when reading the code.

I wasn't able to get through it yet (it's quite a bit of work), but I'm working on a patch on top of this to remove the cruft and add some valuable context.

This pull request is interesting as a case study. AI-generated documentation often appears valuable, but because it’s produced directly from the code, it typically ends up as little more than another representation of that same code. Also, the verbosity of the result dissuades anyone from actually reading it.

tombruijn · 2026-03-26T09:05:19Z

@jeffkreeftmeijer Thanks for your perspective. I had gotten the feeling it was approached that way. Personally I don't think this is mergeable. As discussed, can you take another pass at it to make it more human friendly and maintainable?

The documentation's general outline is solid, but it goes into code-level detail by including code snippets and referring directly to lines in the code base. This commit removes those, as well as somea redundant documentation files, to keep the documentation short and to the point.

jeffkreeftmeijer · 2026-03-27T10:54:47Z

I've updated the docs to remove most of the cruft, including some documentation files that only regurgitated the code, as well as the direct links to the code and most of the code snippets. I've also added a note about the decorators.

tombruijn · 2026-03-30T08:32:13Z

@jeffkreeftmeijer you may need to push your changes.

tombruijn · 2026-03-30T12:09:13Z

docs/configuration-system.md

+
+### 3. Application Configuration
+
+Configuration from `config/config.exs` or environment-specific config files:287-289):


Suggested change

Configuration from `config/config.exs` or environment-specific config files:287-289):

Configuration from `config/config.exs` or environment-specific config files:

tombruijn · 2026-03-30T12:10:29Z

docs/configuration-system.md

+
+### `configured_as_active?/0`
+
+Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124):


Suggested change

Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124):

Returns `true` if the `active` config option is set to `true`:

tombruijn · 2026-03-30T12:10:42Z

docs/configuration-system.md

+
+## Special Configuration: Oban Error Reporting
+
+The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249):


Suggested change

The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249):

The `report_oban_errors` option has three possible values:

tombruijn · 2026-03-30T15:56:51Z

I think this is better. Less noise. May improve upon this later.

backlog-helper · 2026-04-09T07:02:05Z

This Pull Request needs more reviews. @thijsc @lipskis - (More info)

This is a message from the daily scheduled checks.

New issue guide | Backlog management | Rules | Feedback

jeffkreeftmeijer requested review from lipskis, thijsc, tombruijn and unflxw March 12, 2026 13:38

jeffkreeftmeijer self-assigned this Mar 12, 2026

jeffkreeftmeijer added the writing Issues involving content production, like a blog post, documentation, homepage content, etc. label Mar 12, 2026