Fix access point cleanup on instance removal#10179
Fix access point cleanup on instance removal#10179openroad-ci wants to merge 8 commits intoThe-OpenROAD-Project:masterfrom
Conversation
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
I found this issue during new Resizer architecture testing. @eder-matheus I added you as a reviewer because I found that you edited the related codes a week ago. |
There was a problem hiding this comment.
Code Review
This pull request refactors access point management in dbInst and dbMPin to ensure idempotency and correctly handle duplicate references. Key changes include updating dbMPin::clearPinAccess to sort and deduplicate access points before destruction and replacing manual cleanup in dbInst with a centralized method call. New unit tests were added to verify these improvements. The review feedback suggests hardening the index check in dbMPin::clearPinAccess against negative values and using structured bindings with std::ranges::unique to enhance code readability.
| @@ -153,8 +155,11 @@ void dbMPin::clearPinAccess(const int pin_access_idx) | |||
| if (pin->aps_.size() <= pin_access_idx) { | |||
There was a problem hiding this comment.
The comparison between pin->aps_.size() (which is unsigned) and pin_access_idx (which is a signed int) can lead to unexpected behavior if pin_access_idx is negative. While it currently works "by accident" for -1 (as it promotes to a large unsigned value), it is safer and clearer to explicitly check for negative values or change the parameter type to uint32_t to match dbInst::getPinAccessIdx().
| if (pin->aps_.size() <= pin_access_idx) { | |
| if (pin_access_idx < 0 || pin->aps_.size() <= static_cast<uint32_t>(pin_access_idx)) { |
| dbVector<dbId<_dbAccessPoint>> aps; | ||
| aps.swap(pin->aps_[pin_access_idx]); | ||
| std::ranges::sort(aps); | ||
| aps.erase(std::ranges::unique(aps).begin(), aps.end()); |
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
| srcs = [ | ||
| "TestAccessPoint.cpp", | ||
| ], | ||
| features = ["-layering_check"], |
There was a problem hiding this comment.
The test added private include statements that Bazel disallows by default.
#include "../../src/db/dbAccessPoint.h"
#include "../../src/db/dbMPin.h"
-layering_check is added to avoid the Bazel issue.
|
@codex review |
|
Why do we have duplicate APs in the first place? |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
osamahammad21
left a comment
There was a problem hiding this comment.
The only effective change of this PR is clearing the MPin's access points at the end of dbMPin::clearPinAccess
However, the other changes and the test case seem to be defensive, which I generally don't prefer.
dbInst::destroy() tried to remove AP back-references before ITerm deletion, but it used iterm->getAccessPoints(). That API derives APs from the instance's current pin-access index, so it can miss preferred APs stored in the ITerm aps_ map for other pin-access indices.
What do you mean by other pin-access indices? Each instance has one and only one pin access index, so its iterms.
Sort and de-duplicate the copied AP id list before destruction so duplicated AP bookkeeping cannot cause a double destroy.
There should never be duplicate APs. If there is, then this needs to be addressed.
| if (pin_access_idx < 0 | ||
| || pin->aps_.size() <= static_cast<std::size_t>(pin_access_idx)) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I wonder what made you do this change? It is correct, but I wonder if there is any call that uses a negative pin_access_idx which would be more concerning.
Also, I think the more correct approach to make pin_access_idx argument a size_t or unsigned int instead.
There was a problem hiding this comment.
nvm, just found it's ai review triggered.
| } | ||
| } | ||
| } | ||
| iterm->clearPrefAccessPoints(); |
There was a problem hiding this comment.
Although this is a more efficient approach, the outcome theoretically should be exactly the same. The preferred access point of an iterm is always a subset of its total access points which is returned by getAccessPoints
There was a problem hiding this comment.
Yeah. It is a code reduction by using the existing API.
If you don't want this, I'll revert this change (not a bug fix).
Change 2: Less code. More intuitive to me.
- This is not the root-cause fix.
- IMO, this is more like a minor enhancement.
| dbVector<dbId<_dbAccessPoint>> aps; | ||
| aps.swap(pin->aps_[pin_access_idx]); | ||
| std::ranges::sort(aps); | ||
| const auto duplicate_aps = std::ranges::unique(aps); | ||
| aps.erase(duplicate_aps.begin(), duplicate_aps.end()); | ||
| for (const dbId<_dbAccessPoint>& ap : aps) { | ||
| odb::dbAccessPoint::destroy( | ||
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | ||
| } |
There was a problem hiding this comment.
Theoretically, there shouldn't be duplicate aps. This looks a bit defensive. I would prefer if such a case exists a crash rather than passing unnoticed.
| dbVector<dbId<_dbAccessPoint>> aps; | |
| aps.swap(pin->aps_[pin_access_idx]); | |
| std::ranges::sort(aps); | |
| const auto duplicate_aps = std::ranges::unique(aps); | |
| aps.erase(duplicate_aps.begin(), duplicate_aps.end()); | |
| for (const dbId<_dbAccessPoint>& ap : aps) { | |
| odb::dbAccessPoint::destroy( | |
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | |
| } | |
| auto& aps = pin->aps_[pin_access_idx]; | |
| for (const auto& ap : aps) { | |
| odb::dbAccessPoint::destroy( | |
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | |
| } | |
| aps.clear(); |
There was a problem hiding this comment.
You're right. The deduplication is too defensive, which should be removed.
Regarding your suggested code, I think it has a vector iterator invalidation risk.
auto& aps = pin->aps_[pin_access_idx]; // Used reference
for (const auto& ap : aps) {
odb::dbAccessPoint::destroy( ... ); // destroy() updates `aps`, which invalidates the `aps` iteration
}
aps.clear();
To make it safe, a copy should be used: auto aps = pin->aps_[pin_access_idx]; // No &.
Or swap can be used to avoid the copy overhead.
dbVector<dbId<_dbAccessPoint>> aps;
aps.swap(pin->aps_[pin_access_idx]); // pin->aps_[pin_access_idx] will be empty.
for (const dbId<_dbAccessPoint>& ap : aps) {
odb::dbAccessPoint::destroy(
(odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap));
}
Change 3: Use swap instead of copy.
- This is not a root-cause fix.
- This is just a minor change to avoid the small vector copy overhead.
@jhkim-pii Actually after checking the code, I found that erasing the mpin's aps list should be done by |
There should be no duplicate as Osama mentioned. The deduplication code is redundant. |
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
|
||
| EXPECT_EQ(block_->findInst("buf1"), nullptr); | ||
| EXPECT_TRUE(ap_impl->iterms_.empty()); | ||
| } |
There was a problem hiding this comment.
Made a new test case that catches the problematic sequence.
- Execute pin access
- Do
swapMaster(BUF_X1 -> BUF_X4)
- The preferred AP of the old iterm
BUF_X1/Ais not cleared properly. The AP has a stale pointer to the old itermBUF_X1/A.
- Remove the buffer
- When the buffer is destroyed, the APs related to the buffer instance should be deleted.
- But during the AP deletion, the AP's stale iterm pointer (
BUF_X1/A) is accessed --> assert fail.
|
clang-tidy review says "All clean, LGTM! 👍" |
| for (const uint32_t iterm_id : inst->iterms_) { | ||
| dbITerm* iterm = (dbITerm*) block->iterm_tbl_->getPtr(iterm_id); | ||
| iterm->clearPrefAccessPoints(); | ||
| } |
There was a problem hiding this comment.
Change 1. Add preferred AP clean-up before swapping master.
- I think this is the root-cause fix.
- Without this, AP has a stale iterm pointer after
swapMaster.
|
@osamahammad21 My previous understanding about the problem was wrong. Problematic sequence
This PR has three changes. Change 1. Root-cause fix - Add preferred AP clean-up in |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea37d99fa5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const uint32_t iterm_id : inst->iterms_) { | ||
| dbITerm* iterm = (dbITerm*) block->iterm_tbl_->getPtr(iterm_id); | ||
| iterm->clearPrefAccessPoints(); |
There was a problem hiding this comment.
Journal preferred AP cleanup in swapMaster
dbInst::swapMaster() now unconditionally clears each ITerm’s preferred access points before completing the swap, but this mutation is not captured as a journal field update. dbDatabase::undoEco() reverts swaps by calling swapMaster() again, so both the forward and undo paths clear AP metadata and the original preferred AP/back-reference state is never restored. In ECO trial flows that swap then undo (e.g., rejected sizing moves), this leaves the database observably different after undo and can break subsequent pin-access dependent logic.
Useful? React with 👍 / 👎.
Summary
Problem
nangate45/gcdcrashed in5_1_grtwhilerepair_timingtriggered parasitic updates, incremental global routing, and DRT dirty pin-access database updates.FLOW_VARIANT=legacy_mtandRSZ_POLICY=legacy_mt. The same4_cts.odb/.sdcinput passed withRSZ_POLICY=legacy, which ruled out an already-corrupt input database.legacy_mtwassizeup,buffer, so the multi-threaded legacy repair flow exposed stale OpenDB access point bookkeeping during repeated ECO pin-access updates.Symptom:
Call stack:
Root-cause
OpenDB access point cleanup did not fully maintain the bidirectional references among MPin pin-access slots, ITerm preferred access points, and AccessPoint ITerm back-references.
dbMPin::clearPinAccess()destroyed thedbAccessPointobjects stored inpin->aps_[pin_access_idx], but it did not clear the MPin AP id vector for that pin-access slot. A later update of the same pin-access index could revisit stale AP ids and try to destroy already-destroyed access points.dbInst::destroy()tried to remove AP back-references before ITerm deletion, but it usediterm->getAccessPoints(). That API derives APs from the instance's current pin-access index, so it can miss preferred APs stored in the ITermaps_map for other pin-access indices._dbAccessPoint::iterms_. Later,dbAccessPoint::destroy()iterated_ap->iterms_and dereferenced an already-destroyed ITerm throughblock->iterm_tbl_->getPtr(), hitting the OpenDB allocation assert.Solution
dbMPin::clearPinAccess(), move the AP ids out ofpin->aps_[pin_access_idx]before destroying the AP objects. This leaves the MPin slot empty immediately, prevents stale id reuse on repeated cleanup, and avoids iterator invalidation whiledbAccessPoint::destroy()updates MPin bookkeeping.dbInst::destroy(), replace the current-pin-access-index cleanup withiterm->clearPrefAccessPoints(). That method walks the ITerm's preferred AP map directly and removes this ITerm id from every referenced AP back-reference list before the ITerm object is destroyed.TestAccessPointregression coverage for idempotent MPin pin-access cleanup, duplicate AP ids in an MPin pin-access slot, and stale AP back-reference removal when an instance is destroyed.Testing
cmake --build build --target TestAccessPoint -j4./build/src/odb/test/cpp/TestAccessPoint --gtest_filter='*clear_mpin_access_points_is_idempotent:*clear_mpin_access_points_removes_duplicate_references:*destroy_inst_removes_access_point_back_references:*test_default'bazel test //src/odb/test/cpp:TestAccessPointThe new stale back-reference test fails on
origin/masterHEADa00a5750166bbefore the fix and passes with this branch.