handles words>maxprefixlen#36
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the internal Adaptive Radix Tree (ART) implementation to avoid panics and incorrect prefix comparisons when a node’s compressed prefix length exceeds maxprefixlen (8), by fetching a leaf to compare against the full key prefix.
Changes:
- Adds helper utilities (
fetchleaf, updatedcheckprefix,addchild/growhelpers) to support long-prefix handling. - Introduces/updates core ART operations: insert and search, plus a tree debug printer.
- Adds a small
cmd/radFS/arttestexecutable to exercise/debug the ART implementation.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/art/util.go | Prefix checking helpers + child management + growth logic to support long prefixes |
| internal/art/insert.go | Insert logic updated to store full prefixlen and split nodes using long-prefix comparisons |
| internal/art/search.go | Search logic that uses checkprefix and full leaf-key comparison |
| internal/art/print_tree.go | Debug tree printer (uses fetchleaf for long prefixes) |
| internal/art/node.go | Node types, constants (maxprefixlen), and shared node metadata (prefixlen, num_children) |
| internal/art/node4.go | Node4 constructor |
| internal/art/node16.go | Node16 constructor |
| internal/art/node48.go | Node48 constructor |
| internal/art/node256.go | Node256 constructor |
| internal/art/leaf.go | Leaf node + newleaf / isleaf |
| internal/art/art.go | Public-ish wrapper (Tree) providing Insert/Search |
| cmd/radFS/arttest/main.go | Manual executable for basic ART insert + print |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| child1, pos1 := findchild(k, n) | ||
| if child1 != nil { | ||
| in.children[pos1] = child | ||
| return n | ||
| } | ||
|
|
||
| if n.innerNode.num_children == len(in.keys) { | ||
| n = grow(n) | ||
| in = n.innerNode | ||
|
|
||
| } | ||
|
|
||
| var i int | ||
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | ||
| in.keys[i+1] = in.keys[i] | ||
| in.children[i+1] = in.children[i] | ||
|
|
||
| } | ||
|
|
||
| in.keys[i+1] = k | ||
| in.children[i+1] = child | ||
| in.num_children += 1 | ||
|
|
||
| return n | ||
|
|
There was a problem hiding this comment.
addchild grows a node when num_children == len(in.keys), but this is incorrect for Node48 (len(keys)=256 while max children=48) and Node256 (keys is nil, so len(keys)=0). This can prevent growth and eventually cause out-of-bounds writes to children[i+1] or even call grow on a Node256 and get nil back. Please gate growth based on the node type’s actual capacity (e.g., Node4/16: len(keys), Node48: len(children)=48, Node256: len(children)=256) and implement insertion logic per node type instead of shifting arrays for Node48/256.
| child1, pos1 := findchild(k, n) | |
| if child1 != nil { | |
| in.children[pos1] = child | |
| return n | |
| } | |
| if n.innerNode.num_children == len(in.keys) { | |
| n = grow(n) | |
| in = n.innerNode | |
| } | |
| var i int | |
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | |
| in.keys[i+1] = in.keys[i] | |
| in.children[i+1] = in.children[i] | |
| } | |
| in.keys[i+1] = k | |
| in.children[i+1] = child | |
| in.num_children += 1 | |
| return n | |
| // If a child for this key already exists, simply replace it. | |
| child1, pos1 := findchild(k, n) | |
| if child1 != nil { | |
| in.children[pos1] = child | |
| return n | |
| } | |
| switch in.nodeType { | |
| case Node4, Node16: | |
| // For Node4/Node16, capacity is len(keys); grow when full. | |
| if in.num_children == len(in.keys) { | |
| n = grow(n) | |
| // Re-dispatch insertion on the grown node, which may be a different type. | |
| return addchild(n, k, child) | |
| } | |
| // Insert key in sorted order by shifting keys/children to make room. | |
| var i int | |
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | |
| in.keys[i+1] = in.keys[i] | |
| in.children[i+1] = in.children[i] | |
| } | |
| in.keys[i+1] = k | |
| in.children[i+1] = child | |
| in.num_children++ | |
| return n | |
| case Node48: | |
| // For Node48, capacity is the number of children slots (typically 48), | |
| // while keys is a 256-entry index map. | |
| if in.num_children == len(in.children) { | |
| n = grow(n) | |
| // After growing (to Node256), insert using the appropriate logic. | |
| return addchild(n, k, child) | |
| } | |
| // Find first free child slot. | |
| slot := -1 | |
| for i := 0; i < len(in.children); i++ { | |
| if in.children[i] == nil { | |
| slot = i | |
| break | |
| } | |
| } | |
| if slot == -1 { | |
| // Should not happen if num_children is correctly maintained, | |
| // but avoid out-of-bounds behavior. | |
| return n | |
| } | |
| in.children[slot] = child | |
| // Store slot+1 so that zero represents "no child". | |
| in.keys[k] = byte(slot + 1) | |
| in.num_children++ | |
| return n | |
| case Node256: | |
| // For Node256, capacity is len(children) (typically 256); | |
| // keys is nil and not used. | |
| if in.num_children == len(in.children) { | |
| // Already at maximum capacity; cannot grow further safely. | |
| // Avoid calling grow or writing out of bounds. | |
| return n | |
| } | |
| if in.children[k] == nil { | |
| in.num_children++ | |
| } | |
| in.children[k] = child | |
| return n | |
| } | |
| // Fallback: if an unknown node type is encountered, return unchanged. | |
| return n |
| child1, pos1 := findchild(k, n) | ||
| if child1 != nil { | ||
| in.children[pos1] = child | ||
| return n | ||
| } | ||
|
|
||
| if n.innerNode.num_children == len(in.keys) { | ||
| n = grow(n) | ||
| in = n.innerNode | ||
|
|
||
| } | ||
|
|
||
| var i int | ||
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | ||
| in.keys[i+1] = in.keys[i] | ||
| in.children[i+1] = in.children[i] | ||
|
|
||
| } | ||
|
|
||
| in.keys[i+1] = k | ||
| in.children[i+1] = child | ||
| in.num_children += 1 | ||
|
|
||
| return n | ||
|
|
There was a problem hiding this comment.
addchild shifts both in.keys and in.children as if they are parallel sorted arrays. That only holds for Node4/Node16; it breaks Node48 (indirection table) and Node256 (direct map) and can lead to out-of-bounds writes on children[i+1]. Consider implementing addchild separately per node type.
| child1, pos1 := findchild(k, n) | |
| if child1 != nil { | |
| in.children[pos1] = child | |
| return n | |
| } | |
| if n.innerNode.num_children == len(in.keys) { | |
| n = grow(n) | |
| in = n.innerNode | |
| } | |
| var i int | |
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | |
| in.keys[i+1] = in.keys[i] | |
| in.children[i+1] = in.children[i] | |
| } | |
| in.keys[i+1] = k | |
| in.children[i+1] = child | |
| in.num_children += 1 | |
| return n | |
| // If a child with this key already exists, replace it. | |
| child1, pos1 := findchild(k, n) | |
| if child1 != nil { | |
| in.children[pos1] = child | |
| return n | |
| } | |
| // Grow the node if the children array is full. | |
| if in.num_children == len(in.children) { | |
| n = grow(n) | |
| in = n.innerNode | |
| } | |
| switch in.nodeType { | |
| case Node4, Node16: | |
| // Maintain sorted order of keys and parallel children array. | |
| var i int | |
| for i = in.num_children - 1; i >= 0 && in.keys[i] > k; i-- { | |
| in.keys[i+1] = in.keys[i] | |
| in.children[i+1] = in.children[i] | |
| } | |
| in.keys[i+1] = k | |
| in.children[i+1] = child | |
| in.num_children++ | |
| case Node48: | |
| // Node48 uses an indirection table: keys[k] = index+1 into children. | |
| // Find a free slot in children. | |
| pos := -1 | |
| for i := 0; i < len(in.children); i++ { | |
| if in.children[i] == nil { | |
| pos = i | |
| break | |
| } | |
| } | |
| if pos >= 0 { | |
| in.children[pos] = child | |
| in.keys[k] = byte(pos + 1) | |
| in.num_children++ | |
| } | |
| case Node256: | |
| // Direct mapping from key byte to child index. | |
| in.children[k] = child | |
| in.num_children++ | |
| } | |
| return n |
| } | ||
|
|
||
| // Get the next byte to branch on at current depth. | ||
| // Returns 0 (terminator) if key is exhausted. |
There was a problem hiding this comment.
The comment says keycheck returns 0 (terminator) if key is exhausted, but keycheck currently returns 1 when depth >= len(key). Please align the comment with the implementation, or (preferably) fix keycheck to use the intended terminator semantics.
| // Returns 0 (terminator) if key is exhausted. | |
| // Returns 1 (terminator) if key is exhausted. |
| for i := 0; i < len(in.keys); i++ { | ||
| if in.children[i] != nil { | ||
| fmt.Printf("%s Edge('%c' | %d):\t", indent, in.keys[i], in.keys[i]) |
There was a problem hiding this comment.
This loop indexes in.children[i] while iterating up to len(in.keys). For Node48, len(keys)=256 but len(children)=48, so this will panic when i >= 48 (and it also doesn’t make sense for Node256 where keys is nil). Please iterate over the appropriate child representation per node type.
| for i := 0; i < len(in.keys); i++ { | |
| if in.children[i] != nil { | |
| fmt.Printf("%s Edge('%c' | %d):\t", indent, in.keys[i], in.keys[i]) | |
| for i := 0; i < len(in.children); i++ { | |
| if in.children[i] != nil { | |
| var edgeKey byte | |
| if in.keys != nil && i < len(in.keys) { | |
| edgeKey = in.keys[i] | |
| } else { | |
| edgeKey = byte(i) | |
| } | |
| fmt.Printf("%s Edge('%c' | %d):\t", indent, edgeKey, edgeKey) |
| index := 0 | ||
| for i := 0; i < 16; i++ { | ||
| idx := n.innerNode.keys[i] | ||
| child := n.innerNode.children[i] | ||
|
|
||
| if child != nil { | ||
| n48.innerNode.keys[idx] = byte(index + 1) // the reason its index+1 is because we are making 0 a kind of "no children" case | ||
|
|
||
| n48.innerNode.children[index] = child | ||
| index++ |
There was a problem hiding this comment.
In grow (Node16 -> Node48), index tracks how many children are copied into the Node48, but n48.innerNode.num_children is never set. This leaves the grown node’s metadata inconsistent with its contents. Set n48.innerNode.num_children = index before returning.
| i := depth | ||
| fmt.Println("depth:", depth) | ||
| fmt.Println("oldkey[depth:]:", string(oldkey[depth:])) | ||
| fmt.Println("key[depth:]:", string(key[depth:])) |
There was a problem hiding this comment.
There are multiple fmt.Println debug prints in insert (e.g., printing depth/keys/prefix). This will spam stdout for every insert and makes the package unusable as a library component. Please remove these prints or gate them behind an explicit debug logger/flag.
| in := n.innerNode | ||
| switch in.nodeType { | ||
| case Node4, Node16: | ||
| for i := 0; i < len(in.keys); i++ { |
There was a problem hiding this comment.
For Node4/Node16, findchild scans len(in.keys) instead of only the populated range. Unused key slots are zero-initialized, so lookups for k==0 can match an unused slot and return an incorrect position. Please iterate only up to in.num_children for these node types.
| for i := 0; i < len(in.keys); i++ { | |
| for i := 0; i < in.num_children; i++ { |
| if depth >= len(key) { | ||
| return 1 | ||
|
|
||
| } else { | ||
| return key[depth] | ||
| } |
There was a problem hiding this comment.
keycheck uses 1 as an end-of-key sentinel (depth >= len(key) returns 1). With arbitrary []byte keys this is ambiguous (e.g., []byte{} and []byte{1} collide at depth 0). Please switch to an unambiguous terminator strategy (explicit length checks, or return an int with 256 as terminator, etc.).
| if depth >= len(key) { | |
| return 1 | |
| } else { | |
| return key[depth] | |
| } | |
| // Return the byte at the given depth if it is within bounds. | |
| // Callers should separately check depth against len(key) when they | |
| // need to distinguish "end of key" from an actual byte value. | |
| if depth < 0 || depth >= len(key) { | |
| return 0 | |
| } | |
| return key[depth] |
| if isleaf(n) { | ||
| if string(n.leaf.key) == string(key) { | ||
| return n | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Leaf key comparison converts both []byte to string, which allocates and can be expensive for large keys. Prefer bytes.Equal(n.leaf.key, key) to avoid allocations and to clearly express byte-slice equality.
| for i := 0; i < 4; i++ { | ||
| n16.innerNode.keys[i] = n.innerNode.keys[i] | ||
| n16.innerNode.children[i] = n.innerNode.children[i] | ||
| } |
There was a problem hiding this comment.
In grow (Node4 -> Node16), the new node copies keys/children but never copies num_children. After growth, the returned node will think it has 0 children, corrupting later inserts/searches. Please copy n.innerNode.num_children into the grown node.
| } | |
| } | |
| n16.innerNode.num_children = n.innerNode.num_children |
PR title OR Closes {Link to issue}
Context
Why are these changes being made?
The implementation would not handle words greater than maxprefixlen and panic
Description
How have the changes been made? i.e. steps taken to solve the above issue
would get the leaf ,so that you could compare the key with the full prefix