Skip to content

i#7854 wholesys traces: Invariant checks for hw cxt markers#7880

Merged
abhinav92003 merged 9 commits intomasterfrom
i7854-hardware-marker-inv
May 5, 2026
Merged

i#7854 wholesys traces: Invariant checks for hw cxt markers#7880
abhinav92003 merged 9 commits intomasterfrom
i7854-hardware-marker-inv

Conversation

@abhinav92003
Copy link
Copy Markdown
Contributor

@abhinav92003 abhinav92003 commented May 1, 2026

Augments the invariant checker to identify the TRACE_MARKER_TYPE_HARDWARE_EVENT and TRACE_MARKER_TYPE_HARDWARE_CONTEXT_RETURN markers.

This involves treating them similarly to the existing TRACE_MARKER_TYPE_KERNEL_EVENT and TRACE_MARKER_TYPE_KERNEL_XFER markers, and keeping track of the enclosed part of the trace in a separate context.

Also adds unit tests for the various hardware context marker related scenarios.

Issue: #7854

Augments the invariant checker to identify the TRACE_MARKER_TYPE_HARDWARE_EVENT and TRACE_MARKER_TYPE_HARDWARE_CONTEXT_RETURN markers.

This involves treating them similarly to the existing TRACE_MARKER_TYPE_KERNEL_EVENT and TRACE_MARKER_TYPE_KERNEL_XFER markers, and keeping track of the enclosed part of the trace in a separate context.

Issue: #7854
@abhinav92003 abhinav92003 changed the title i#7854 wholesys traces: Invariant checks for hardware context markers i#7854 wholesys traces: Invariant checks for hw cxt markers May 1, 2026
@abhinav92003 abhinav92003 requested a review from derekbruening May 3, 2026 23:48
Comment thread clients/drcachesim/common/trace_entry.h Outdated
Comment thread clients/drcachesim/common/trace_entry.h Outdated
Comment thread clients/drcachesim/tests/invariant_checker_test.cpp Outdated
Comment thread clients/drcachesim/tests/invariant_checker_test.cpp Outdated
gen_instr(TID_A, /*pc=*/1),
gen_marker(TID_A, TRACE_MARKER_TYPE_SYSCALL, 123),
gen_marker(TID_A, TRACE_MARKER_TYPE_SYSCALL_TRACE_START, 123),
gen_marker(TID_A, TRACE_MARKER_TYPE_HARDWARE_EVENT, 5),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't this jump from pc=1 to pc=5 be a discontinuity? How did it get to 5 with no branch or a size=4 instruction?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some syscalls have TCG discontinuity events with a from_pc like such. I've seen it for nanosleep notably. Since the TCG traces would have some apparent discontinuities simply from switching threads (currently we don't have a way to identify tid), I assumed these are not a problem per-se.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to say: "known discontinuity due to some whole-system event like possibly switching to a different thread"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        // Control is expected to return to some PC other than non-fallthrough
        // of the syscall. This is a known discontinuity at the syscall event itself
        // due to some whole-system event like possibly switching to a different
        // thread.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we can't detect discontinuities from switching threads (this example should probably change TID then?) how can we detect any other discontinuity since it could also switch threads?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a software thread switch go through the kernel though? That should involve a timer interrupt for a preempt or a voluntary switch and go through context switch code. Not understanding this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ow can we detect any other discontinuity since it could also switch threads?

Other discontinuities that cause an abrupt PC change are different and would show up as an error. This one in particular is at syscalls. My understanding is that for some of these instances, it is known to TCG at the syscall itself it won't be coming back to syscall-fallthrough.

Wouldn't a software thread switch go through the kernel though?

Correct, and that's what this discontinuity at the syscall is I believe, for voluntary switches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following this: any switch, including a voluntary one, will run some kernel code to actually accomplish the switch. Yet here we have the PC changing suddenly from one instruction to the next in a whole-system trace: that is an error in the trace unless there was a trap or interrupt there and we're now running the handler: I don't see how that can possibly be a software thread switch.

Comment thread clients/drcachesim/tools/invariant_checker.cpp
Comment thread clients/drcachesim/tools/invariant_checker.cpp Outdated
@abhinav92003 abhinav92003 merged commit f4f488d into master May 5, 2026
23 checks passed
@abhinav92003 abhinav92003 deleted the i7854-hardware-marker-inv branch May 5, 2026 13:19
abhinav92003 added a commit that referenced this pull request May 5, 2026
…7888)

This arg was renamed in #7880 but the definition in header wasn't.

Issue: #7854
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants