Skip to content

Add documentation for architectural decisions#1034

Open
jeffkreeftmeijer wants to merge 16 commits intomainfrom
docs
Open

Add documentation for architectural decisions#1034
jeffkreeftmeijer wants to merge 16 commits intomainfrom
docs

Conversation

@jeffkreeftmeijer
Copy link
Copy Markdown
Contributor

@jeffkreeftmeijer jeffkreeftmeijer commented Mar 12, 2026

This PR adds documentation for architectural decisions and implementation details that were never formally written down. The documentation focuses on the "why" behind design choices, particularly around the ETS-based tracer registry, NIF integration, configuration system, instrumentation patterns, and error handling.

[skip changeset]

@jeffkreeftmeijer jeffkreeftmeijer self-assigned this Mar 12, 2026
@jeffkreeftmeijer jeffkreeftmeijer added the writing Issues involving content production, like a blog post, documentation, homepage content, etc. label Mar 12, 2026
@backlog-helper
Copy link
Copy Markdown

backlog-helper bot commented Mar 12, 2026

✔️ All good!

New issue guide | Backlog management | Rules | Feedback

@backlog-helper

This comment has been minimized.

1 similar comment
@backlog-helper

This comment has been minimized.

@@ -0,0 +1,271 @@
# Tracer Registry (ETS Table)

The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes.
The AppSignal Elixir integration uses an ETS table as a registry to track spans across processes.

This happens every 60 seconds by default (`@sync_interval`).

**Why?**
This synchronization ensures the Monitor's internal state stays consistent with the actual Erlang process monitors, recovering from any potential state drift.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for clean up of no longer instrumented processes?

@unflxw unflxw removed their request for review March 17, 2026 10:36
@backlog-helper

This comment has been minimized.

1 similar comment
@backlog-helper

This comment has been minimized.

This document covers the design and implementation of the ETS-based span
registry used by Appsignal.Tracer, including:

- Why :duplicate_bag was chosen for the table type
- How process monitoring ensures cleanup of orphaned spans
- Performance optimizations that removed redundant running checks
- The ignore pattern for opting processes out of tracing
- Cross-process span passing via register_current/1
- Historical evolution of the implementation

The documentation focuses on architectural decisions that weren't
previously written down, particularly around the performance
optimizations made in June 2023.
This document covers the architecture and implementation of the Native
Implemented Function (NIF) layer that connects Elixir to the Rust-based
AppSignal agent, including:

- How the NIF is loaded and why it always returns :ok
- Fallback implementations for testing without the agent
- Erlang resource types for memory management
- Atom creation optimization from May 2023
- Platform-specific build system configuration
- Data encoding for complex nested structures
- Error handling strategies

The documentation explains the thin C extension layer that translates
between Elixir and the Rust agent's C API, and why business logic is
kept out of the C code.
This document covers the layered configuration system that merges
settings from multiple sources, including:

- Configuration source priority (defaults, system detection, application
  config, environment variables)
- Type conversion for environment variables
- Validation (only push_api_key is required)
- Distinction between configured_as_active and active
- Performance optimization to avoid duplicate ETS lookups
- Handling empty configurations without crashing
- Writing configuration to the NIF for the Rust agent
- Feature flags for instrumentation toggles
- Special handling for Oban error reporting modes
- Log level determination and log file path caching

The documentation explains design decisions like why most integrations
default to enabled and why configuration sources are tracked separately
for debugging purposes.
This document covers both functional and decorator-based instrumentation
approaches, including:

- Functional instrumentation with instrument/2 and instrument/3
- Root span instrumentation for background jobs
- Closing current span fix for out-of-order span closing
- Error reporting patterns (set_error vs send_error)
- Decorator-based instrumentation with @decorate macros
- Name generation from module/function/arity
- Slashes to underscores change from August 2021
- Compile-time configuration for test fakes
- Common instrumentation patterns

The documentation explains the fix for Phoenix render functions that
create spans out of order, and why decorator-generated names switched
from slashes to underscores for OpenTelemetry compatibility.
This document covers the error backend architecture and why it's
disabled by default, including:

- Three ways errors get reported (manual, automatic, backend)
- Error backend as a Logger gen_event behavior
- Crash reason handling and PID extraction
- Cowboy error filtering from June 2023
- Span lookup and creation logic
- Why the backend was disabled by default in June 2024
- Problems with lack of context and duplicate reporting
- When to enable the error backend
- Stacktrace handling for edge cases
- Process isolation in distributed systems
- Migration guide for users relying on the backend

The documentation explains that the error backend is a relic from
before proper framework integrations existed, and why modern
instrumentation approaches provide better context.
Covers the telemetry-based integration pattern used by Ecto, Oban,
Finch, Tesla, and Absinthe integrations. Explains why most are enabled
by default and how to add custom integrations.
Covers span creation (root vs child), operations (attributes, errors,
sample data), closing, time handling, nil handling, and registry
integration.
Explains how AppSignal uses the :telemetry library for event-driven
instrumentation, the attachment pattern, event structure, and how to
create custom telemetry handlers.
Covers test helpers, fake implementations (Tracer, Span, Nif, Monitor),
testing instrumentation, framework integration testing, and common
testing patterns and pitfalls.
ExUnit requires @describetag to be placed inside describe blocks, not
before them.
Moving @tag :skip_env_test_no_nif to @describetag inside the describe
blocks fixes compilation errors.

This fixes the following error:
** (RuntimeError) @describetag must be set inside describe/2 blocks
Logger.add_backend/1 and Logger.remove_backend/1 are deprecated in
Elixir 1.15+, replaced by LoggerBackends.add/2 and
LoggerBackends.remove/2 from the :logger_backends dependency.

This commit:
- Adds :logger_backends dependency for Elixir 1.15+
- Creates Appsignal.Utils.LoggerHandler module as a compatibility layer
  that uses LoggerBackends when available, falling back to the old
  Logger API for older Elixir versions
- Updates code to use the compatibility layer instead of calling Logger
  methods directly

This fixes deprecation warnings:
warning: Logger.add_backend/1 is deprecated.
Use LoggerBackends.add/2 from :logger_backends dependency

warning: Logger.remove_backend/1 is deprecated.
Use LoggerBackends.remove/2 from :logger_backends dependency
Logger.disable/1 and Logger.enable/1 are deprecated in favor of
Logger.put_process_level/2 and Logger.delete_process_level/1.

This commit updates the test helper function to use the new API.

This fixes deprecation warnings:
warning: Logger.disable/1 is deprecated.
Use Logger.put_process_level(pid, :none) instead

warning: Logger.enable/1 is deprecated.
Use Logger.delete_process_level(pid) instead
The :ssl.cipher_suites/0 function is deprecated in favor of
:ssl.cipher_suites/2.

Since Elixir 1.11+ requires OTP 21+, we can remove the conditional
code for older OTP versions and always use the newer API.

This simplifies the code and fixes deprecation warnings:
warning: :ssl.cipher_suites/0 is undefined or private,
use cipher_suites/2,3 instead
This commit removes two minor compiler warnings:

- Remove unused "require Logger" from MapFilter module
- Prefix unused opts parameter with underscore in FakeDebounce test
  support module

These warnings don't affect functionality but create noise in the test
output.
@backlog-helper

This comment has been minimized.

1 similar comment
@backlog-helper

This comment has been minimized.

@tombruijn tombruijn requested a review from unflxw March 23, 2026 11:03
@tombruijn tombruijn removed the request for review from unflxw March 23, 2026 11:04
@backlog-helper

This comment has been minimized.

1 similar comment
@backlog-helper

This comment has been minimized.

@jeffkreeftmeijer
Copy link
Copy Markdown
Contributor Author

Alright, this was a bit of a tryout to see if we can generate documentation based on the code and commit history of the project (as requested). I believed we could get pretty good results, as most of the decisions have been documented in the commit messages, so I instructed Claude to go over my commits to gain context and write documentation.

However, what was generated was mostly a description of the code as is. The result is so tightly coupled (including code samples and direct line locations in the code), that it'd surely become a maintenance burden if it was merged as is. Aside from that, it's very verbose in explaining parts of the code that aren't interesting, and it lacks or ignores context that would be. Instead of documenting why decisions were made, it mostly explains what the code does, which one can also find when reading the code.

I wasn't able to get through it yet (it's quite a bit of work), but I'm working on a patch on top of this to remove the cruft and add some valuable context.

This pull request is interesting as a case study. AI-generated documentation often appears valuable, but because it’s produced directly from the code, it typically ends up as little more than another representation of that same code. Also, the verbosity of the result dissuades anyone from actually reading it.

@backlog-helper

This comment has been minimized.

@tombruijn
Copy link
Copy Markdown
Member

tombruijn commented Mar 26, 2026

@jeffkreeftmeijer Thanks for your perspective. I had gotten the feeling it was approached that way. Personally I don't think this is mergeable. As discussed, can you take another pass at it to make it more human friendly and maintainable?

@backlog-helper

This comment has been minimized.

The documentation's general outline is solid, but it goes into
code-level detail by including code snippets and referring directly to
lines in the code base. This commit removes those, as well as somea
redundant documentation files, to keep the documentation short and
to the point.
@jeffkreeftmeijer
Copy link
Copy Markdown
Contributor Author

I've updated the docs to remove most of the cruft, including some documentation files that only regurgitated the code, as well as the direct links to the code and most of the code snippets. I've also added a note about the decorators.

@backlog-helper

This comment has been minimized.

@tombruijn
Copy link
Copy Markdown
Member

@jeffkreeftmeijer you may need to push your changes.


### 3. Application Configuration

Configuration from `config/config.exs` or environment-specific config files:287-289):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Configuration from `config/config.exs` or environment-specific config files:287-289):
Configuration from `config/config.exs` or environment-specific config files:


### `configured_as_active?/0`

Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124):
Returns `true` if the `active` config option is set to `true`:


## Special Configuration: Oban Error Reporting

The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249):
The `report_oban_errors` option has three possible values:

@tombruijn
Copy link
Copy Markdown
Member

I think this is better. Less noise. May improve upon this later.

@backlog-helper

This comment has been minimized.

7 similar comments
@backlog-helper

This comment has been minimized.

@backlog-helper

This comment has been minimized.

@backlog-helper

This comment has been minimized.

@backlog-helper

This comment has been minimized.

@backlog-helper

This comment has been minimized.

@backlog-helper

This comment has been minimized.

@backlog-helper
Copy link
Copy Markdown

backlog-helper bot commented Apr 9, 2026


This is a message from the daily scheduled checks.

New issue guide | Backlog management | Rules | Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

writing Issues involving content production, like a blog post, documentation, homepage content, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants