Add documentation for architectural decisions#1034
Add documentation for architectural decisions#1034jeffkreeftmeijer wants to merge 16 commits intomainfrom
Conversation
|
✔️ All good! |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
| @@ -0,0 +1,271 @@ | |||
| # Tracer Registry (ETS Table) | |||
|
|
|||
| The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes. | |||
There was a problem hiding this comment.
| The Appsignal Elixir integration uses an ETS table as a registry to track spans across processes. | |
| The AppSignal Elixir integration uses an ETS table as a registry to track spans across processes. |
| This happens every 60 seconds by default (`@sync_interval`). | ||
|
|
||
| **Why?** | ||
| This synchronization ensures the Monitor's internal state stays consistent with the actual Erlang process monitors, recovering from any potential state drift. |
There was a problem hiding this comment.
This is for clean up of no longer instrumented processes?
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This document covers the design and implementation of the ETS-based span registry used by Appsignal.Tracer, including: - Why :duplicate_bag was chosen for the table type - How process monitoring ensures cleanup of orphaned spans - Performance optimizations that removed redundant running checks - The ignore pattern for opting processes out of tracing - Cross-process span passing via register_current/1 - Historical evolution of the implementation The documentation focuses on architectural decisions that weren't previously written down, particularly around the performance optimizations made in June 2023.
This document covers the architecture and implementation of the Native Implemented Function (NIF) layer that connects Elixir to the Rust-based AppSignal agent, including: - How the NIF is loaded and why it always returns :ok - Fallback implementations for testing without the agent - Erlang resource types for memory management - Atom creation optimization from May 2023 - Platform-specific build system configuration - Data encoding for complex nested structures - Error handling strategies The documentation explains the thin C extension layer that translates between Elixir and the Rust agent's C API, and why business logic is kept out of the C code.
This document covers the layered configuration system that merges settings from multiple sources, including: - Configuration source priority (defaults, system detection, application config, environment variables) - Type conversion for environment variables - Validation (only push_api_key is required) - Distinction between configured_as_active and active - Performance optimization to avoid duplicate ETS lookups - Handling empty configurations without crashing - Writing configuration to the NIF for the Rust agent - Feature flags for instrumentation toggles - Special handling for Oban error reporting modes - Log level determination and log file path caching The documentation explains design decisions like why most integrations default to enabled and why configuration sources are tracked separately for debugging purposes.
This document covers both functional and decorator-based instrumentation approaches, including: - Functional instrumentation with instrument/2 and instrument/3 - Root span instrumentation for background jobs - Closing current span fix for out-of-order span closing - Error reporting patterns (set_error vs send_error) - Decorator-based instrumentation with @decorate macros - Name generation from module/function/arity - Slashes to underscores change from August 2021 - Compile-time configuration for test fakes - Common instrumentation patterns The documentation explains the fix for Phoenix render functions that create spans out of order, and why decorator-generated names switched from slashes to underscores for OpenTelemetry compatibility.
This document covers the error backend architecture and why it's disabled by default, including: - Three ways errors get reported (manual, automatic, backend) - Error backend as a Logger gen_event behavior - Crash reason handling and PID extraction - Cowboy error filtering from June 2023 - Span lookup and creation logic - Why the backend was disabled by default in June 2024 - Problems with lack of context and duplicate reporting - When to enable the error backend - Stacktrace handling for edge cases - Process isolation in distributed systems - Migration guide for users relying on the backend The documentation explains that the error backend is a relic from before proper framework integrations existed, and why modern instrumentation approaches provide better context.
Covers the telemetry-based integration pattern used by Ecto, Oban, Finch, Tesla, and Absinthe integrations. Explains why most are enabled by default and how to add custom integrations.
Covers span creation (root vs child), operations (attributes, errors, sample data), closing, time handling, nil handling, and registry integration.
Explains how AppSignal uses the :telemetry library for event-driven instrumentation, the attachment pattern, event structure, and how to create custom telemetry handlers.
Covers test helpers, fake implementations (Tracer, Span, Nif, Monitor), testing instrumentation, framework integration testing, and common testing patterns and pitfalls.
ExUnit requires @describetag to be placed inside describe blocks, not before them. Moving @tag :skip_env_test_no_nif to @describetag inside the describe blocks fixes compilation errors. This fixes the following error: ** (RuntimeError) @describetag must be set inside describe/2 blocks
Logger.add_backend/1 and Logger.remove_backend/1 are deprecated in Elixir 1.15+, replaced by LoggerBackends.add/2 and LoggerBackends.remove/2 from the :logger_backends dependency. This commit: - Adds :logger_backends dependency for Elixir 1.15+ - Creates Appsignal.Utils.LoggerHandler module as a compatibility layer that uses LoggerBackends when available, falling back to the old Logger API for older Elixir versions - Updates code to use the compatibility layer instead of calling Logger methods directly This fixes deprecation warnings: warning: Logger.add_backend/1 is deprecated. Use LoggerBackends.add/2 from :logger_backends dependency warning: Logger.remove_backend/1 is deprecated. Use LoggerBackends.remove/2 from :logger_backends dependency
Logger.disable/1 and Logger.enable/1 are deprecated in favor of Logger.put_process_level/2 and Logger.delete_process_level/1. This commit updates the test helper function to use the new API. This fixes deprecation warnings: warning: Logger.disable/1 is deprecated. Use Logger.put_process_level(pid, :none) instead warning: Logger.enable/1 is deprecated. Use Logger.delete_process_level(pid) instead
The :ssl.cipher_suites/0 function is deprecated in favor of :ssl.cipher_suites/2. Since Elixir 1.11+ requires OTP 21+, we can remove the conditional code for older OTP versions and always use the newer API. This simplifies the code and fixes deprecation warnings: warning: :ssl.cipher_suites/0 is undefined or private, use cipher_suites/2,3 instead
This commit removes two minor compiler warnings: - Remove unused "require Logger" from MapFilter module - Prefix unused opts parameter with underscore in FakeDebounce test support module These warnings don't affect functionality but create noise in the test output.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
|
Alright, this was a bit of a tryout to see if we can generate documentation based on the code and commit history of the project (as requested). I believed we could get pretty good results, as most of the decisions have been documented in the commit messages, so I instructed Claude to go over my commits to gain context and write documentation. However, what was generated was mostly a description of the code as is. The result is so tightly coupled (including code samples and direct line locations in the code), that it'd surely become a maintenance burden if it was merged as is. Aside from that, it's very verbose in explaining parts of the code that aren't interesting, and it lacks or ignores context that would be. Instead of documenting why decisions were made, it mostly explains what the code does, which one can also find when reading the code. I wasn't able to get through it yet (it's quite a bit of work), but I'm working on a patch on top of this to remove the cruft and add some valuable context. This pull request is interesting as a case study. AI-generated documentation often appears valuable, but because it’s produced directly from the code, it typically ends up as little more than another representation of that same code. Also, the verbosity of the result dissuades anyone from actually reading it. |
This comment has been minimized.
This comment has been minimized.
|
@jeffkreeftmeijer Thanks for your perspective. I had gotten the feeling it was approached that way. Personally I don't think this is mergeable. As discussed, can you take another pass at it to make it more human friendly and maintainable? |
This comment has been minimized.
This comment has been minimized.
The documentation's general outline is solid, but it goes into code-level detail by including code snippets and referring directly to lines in the code base. This commit removes those, as well as somea redundant documentation files, to keep the documentation short and to the point.
|
I've updated the docs to remove most of the cruft, including some documentation files that only regurgitated the code, as well as the direct links to the code and most of the code snippets. I've also added a note about the decorators. |
This comment has been minimized.
This comment has been minimized.
|
@jeffkreeftmeijer you may need to push your changes. |
|
|
||
| ### 3. Application Configuration | ||
|
|
||
| Configuration from `config/config.exs` or environment-specific config files:287-289): |
There was a problem hiding this comment.
| Configuration from `config/config.exs` or environment-specific config files:287-289): | |
| Configuration from `config/config.exs` or environment-specific config files: |
|
|
||
| ### `configured_as_active?/0` | ||
|
|
||
| Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124): |
There was a problem hiding this comment.
| Returns `true` if the `active` config option is set to `true` (lib/appsignal/config.ex:122-124): | |
| Returns `true` if the `active` config option is set to `true`: |
|
|
||
| ## Special Configuration: Oban Error Reporting | ||
|
|
||
| The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249): |
There was a problem hiding this comment.
| The `report_oban_errors` option has three possible values (lib/appsignal/config.ex:223-249): | |
| The `report_oban_errors` option has three possible values: |
|
I think this is better. Less noise. May improve upon this later. |
This comment has been minimized.
This comment has been minimized.
7 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This is a message from the daily scheduled checks. |
This PR adds documentation for architectural decisions and implementation details that were never formally written down. The documentation focuses on the "why" behind design choices, particularly around the ETS-based tracer registry, NIF integration, configuration system, instrumentation patterns, and error handling.
[skip changeset]