Skip to content

Explain LLM Version#511

Open
hadia206 wants to merge 29 commits into
mainfrom
Hadia/explain_llm
Open

Explain LLM Version#511
hadia206 wants to merge 29 commits into
mainfrom
Hadia/explain_llm

Conversation

@hadia206

@hadia206 hadia206 commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Linked ticket

Closes #512

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Docs / config

What changed and why?

Adds pydough.explain_llm(), a new exploration API that returns a structured description of a PyDough collection expression designed for LLM consumption.

Unlike explain() (human-readable prose for interactive debugging), explain_llm returns a stable, machine-parseable payload, either a JSON dict (format="json") or markdown (format="md"), so LLM judges can validate and self-correct generated PyDough code.

Key design decisions:

  • available_terms lives under a debug sub-key: scope information is preserved for human debugging but kept out of the main payload so judge prompts aren't distracted by fields irrelevant to correctness.

  • Implicit scoping is made explicit : PyDough's relationship-navigation scoping (e.g. COUNT(orders) inside customers.CALCULATE(...)) is correct but looks like a missing filter to a naive judge. The structured
    implicit_scope_note field and per-step notes surface this so evaluators can distinguish "missing filter" from "implicitly scoped via relationship navigation."

  • Structured error taxonomy: 13 error categories (unrecognized_term, plural_in_calculate, bad_window_per, etc.) with error_type, details, and hint fields, so the LLM gets actionable guidance without parsing the raw exception message.

  • conditions as structured dicts — Where step conditions expose operator, left, and right as parsed sub-dicts (not just strings), so a judge can inspect operands directly.

Output shape (success):

  • query_summary (one deterministic sentence)
  • steps (ordered operations)
  • schema (source collection, output columns + types, ordering, limit).

On error: always {"error": true, "message": ..., "error_type": ..., "details": ..., "hint": ..., "steps": [], "schema": null}.

Implementation:

  • New shared helpers in _common.py: describe_expression, describe_subcollection_arg, generate_step_notes, generate_query_summary, _cond_texts, _collation_entry
  • New pydough/exploration/explain_llm.py: qualification, step-walking, schema building, error classification, markdown render.

How I tested this?

  • 82 tests in tests/test_explain_llm.py
  • local testing and CI
  • Testing with LLM team

Notes for reviewers

"""
return (
isinstance(node.ancestor_context, GlobalContext)
and node.ancestor_context.ancestor_context is not None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A plain root table access has GlobalContext as ancestor_context, but that GlobalContext's own ancestor_context is None. CROSS qualification inserts an extra intermediate GlobalContext whose ancestor_context IS set.
That nesting is the only reliable signal that distinguishes CROSS from a normal table access.

"""
text: str = expr.to_string()

match expr:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChildReferenceExpression and BackReferenceExpression are both subclasses of Reference, they must be matched before case Reference() or they fall through to the wrong branch.

if detail.get("kind") != "Aggregation":
continue
for arg in detail.get("args", []):
implicit_note = arg.get("implicit_scope_note")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two distinct cases:

  1. non-null implicit_scope_note means the collection IS correctly scoped via relationship navigation (correct PyDough pattern, so emit an informational note, not a warning). e.g. customers.CALCULATE(n=COUNT(orders))

  2. null means the collection may be unscoped relative to a cross-product context, only then warn. e.g. nations.CROSS(regions).CALCULATE(n=COUNT(orders)) this is potentially wrong since orders has no relationship to nations or regions. In this case access_path = [] so implicit_scope_note is null. If the CALCULATE doesn't filter on any of the CROSS-introduced terms, the COUNT aggregates all orders for every row, which is likely a bug.

@hadia206 hadia206 marked this pull request as ready for review June 12, 2026 18:05
@hadia206 hadia206 requested review from a team, john-sanchez31, juankx-bodo and knassre-bodo and removed request for a team June 12, 2026 19:03

@john-sanchez31 john-sanchez31 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job Hadia! Please check my comments below before merging. Most are related to docstrings and type hints.

Comment thread documentation/usage.md
* `notes` — list of strings; always present, may be empty

The `schema` section (when `"error"` is `False`) includes:
* `source_collection` — the root table name, or `null` for graph-level expressions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
* `source_collection` — the root table name, or `null` for graph-level expressions
* `source_collection` — the root collection name, or `null` for graph-level expressions

Comment thread documentation/usage.md
"query_summary": "Accesses 'nations', filtered to rows where region.name == 'ASIA', selecting key, name.",
"steps": [
{
"order": 1, "type": "GlobalContext",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"order": 1, "type": "GlobalContext",
"order": 1,
"type": "GlobalContext",

Same applies to all steps

# customers.WHERE(...).orders). Get that chain via child instead.
current = current.child
else:
nxt = getattr(current, "preceding_context", None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint

# Non-empty access_path → row-level scoping via relationship navigation.
implicit_scope_note: str | None = None
if access_path:
path_str = " → ".join(f"'{p}'" for p in access_path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path_str = " → ".join(f"'{p}'" for p in access_path)
path_str: str = " → ".join(f"'{p}'" for p in access_path)


Args:
`arg`: the collection arg from an ``ExpressionFunctionCall``.
`parent`: the parent ``Calculate`` (or other child operator) that owns

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`parent`: the parent ``Calculate`` (or other child operator) that owns
`parent`: the parent ``CALCULATE`` (or other child operator) that owns

return "\n".join(lines)

# ------------------------------------------------------------------ #
# Key Facts — quick-reference block at the top so the judge sees the #

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think we shouldn't mention the judge here. Maybe change this comments for one more general?

# Key Facts — quick-reference block at the top so the judge sees the #
# most checkable facts before reading any steps. #
# ------------------------------------------------------------------ #
schema = result["schema"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints

body = _render_step_body(step)
lines.extend(body)

notes = step.get("notes", [])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hint


lines.append(f"- **Source collection:** {f'`{src}`' if src else '_(none)_'}")

output_cols = schema.get("output_columns", [])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints

f"Expected a collection, but received an expression: "
f"{qualified.to_string()}. Did you mean to use explain_term?"
)
result = _error_payload(msg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hint

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Let's add a pre-declaration near the top: result: dict

# ------------------------------------------------------------------ #
# 1. Subject #
# ------------------------------------------------------------------ #
cross_step = next((s for s in steps if s["type"] == "Cross"), None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for table_step & user_step

detail = s.get("term_details", {}).get(tname, {})
if detail.get("kind") == "Aggregation":
for arg_d in detail.get("args", []):
cname = arg_d.get("name")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hint

# understands they filter a different level of the data.
top_conds: list[str] = []
sub_conds: list[str] = []
past_first_sub = False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
past_first_sub = False
past_first_sub: bool = False

detail = s.get("term_details", {}).get(name, {})
if detail.get("kind") != "Aggregation":
continue
fn = detail.get("function", "AGG").lower()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for arg_name

# ------------------------------------------------------------------ #
topk_step = next((s for s in steps if s["type"] == "TopK"), None)
order_step = next((s for s in steps if s["type"] == "OrderBy"), None)
sort_step = topk_step or order_step

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for topk_step, order_step, sort_step, collation, by_str, suffix & summary

PyDoughUnqualifiedException,
)

msg = str(e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = str(e)
msg: str = str(e)

if "Did you mean" in msg:
details: dict = {}
# Extract the wrong term: "Unrecognized term of ...: 'TERM'."
term_match = _re.search(r":\s*'([^']+)'", msg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
term_match = _re.search(r":\s*'([^']+)'", msg)
term_match: Match[str] | None = _re.search(r":\s*'([^']+)'", msg)

if term_match:
details["term"] = term_match.group(1)
# Extract suggestions: "Did you mean: a, b, c?"
sugg_match = _re.search(r"Did you mean:\s*([^?]+)\?", msg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sugg_match = _re.search(r"Did you mean:\s*([^?]+)\?", msg)
sugg_match: Match[str] | None = _re.search(r"Did you mean:\s*([^?]+)\?", msg)

"""
if isinstance(e, str):
message = e
details: dict[str, object]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the value of details in this case?

@knassre-bodo knassre-bodo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire logic is quite thorough and interesting! I didn't go through every single detail of some of the middle functions, but I have left some comments on places where I think we can potentially iterate a bit further at the macro level.

My biggest wish after reading everything is something that I think would be tricky to conceptually figure out, but if you can do it would be amazing for extensibility in future: could we find a way to move some aspects of this, particularly stuff that is extremely specific to each type of QDAG node, into the QDAG APIs? Those classes use a LOT of ABC logic, with extensive class hierarchies, so perhaps there is a way to make this work by folding in different methods/templates to some of the abstract base classes, then having the explain_llm logic case on the object ancestry (e.g. calculate vs where vs topk vs orderby vs singular all inherit from AugmentingChildOperator, so we could have explain_llm case on whether it is an instance of that in order to resolve a lot of common logic, and a lot of other things inherit from ChildAccess).

If you look into this and it seems horrifically impractical, we can disregard for now, but if it is even somewhat viable I would encourage doing it. After all, we'll need to extend this all over again for EXPLODE, and I'd prefer if it was literally impossible to miss adding any implementations because if we did, the ABC would fail due to un-implemented methods.

Some areas that I think are particularly ripe for moving into the ABCs:

  • describe_expression
  • most/all of the _build_xxx_step can just be made into a single abstract method that the classes implement
  • Possibly _render_step_body?

Besides that, I think the overhaul to the testing approach is probably what I would consider the most. I think actually being able to see the output will help us tell if we are missing anything serious, or any glaring bugs jump out.

``kind`` tags, and explicit scoping notes so a model can self-correct without
parsing prose.

Output schema (success)::

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be two colons here? I don't know the intended format.

}
}

Output schema (error)::

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Comment thread tests/test_explain_llm.py
Comment on lines +57 to +68
@pytest.fixture
def tpch_session(get_sample_graph: graph_fetcher) -> PyDoughSession:
"""A PyDoughSession loaded with the TPCH graph (no DB connection needed)."""
graph: GraphMetadata = get_sample_graph("TPCH")
session = PyDoughSession()
session.metadata = graph
return session


@pytest.fixture
def tpch_graph(get_sample_graph: graph_fetcher) -> GraphMetadata:
return get_sample_graph("TPCH")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be in conftest, and be session-level

Comment thread tests/test_explain_llm.py
Comment on lines +79 to +80
def impl():
return nations.CALCULATE(key, name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make these tests easier to create/run, and potentially even parameterize, you could perhaps turn these into strings and have _run use pydough.from_string.

Comment thread tests/test_explain_llm.py
Comment on lines +37 to +49
def _run(
impl: Callable[[], UnqualifiedNode],
graph: GraphMetadata,
session: PyDoughSession,
) -> dict:
"""Qualify ``impl`` under ``graph`` and call ``explain_llm``."""
node: UnqualifiedNode = pydough.init_pydough_context(graph)(impl)()
return cast(dict, pydough.explain_llm(node, session=session))


def _step(result: dict, order: int) -> dict:
"""Return the step with the given 1-based order."""
return next(s for s in result["steps"] if s["order"] == order)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name these functions a bit more descriptively, also add arguments/returns to the docstrings.

f"Expected a collection, but received an expression: "
f"{qualified.to_string()}. Did you mean to use explain_term?"
)
result = _error_payload(msg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Let's add a pre-declaration near the top: result: dict

Comment on lines +1169 to +1170
steps = _collect_steps(qualified)
schema = _build_schema(qualified)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints

Comment on lines +514 to +519
# Detect window functions (RANKING, PERCENTILE, etc.) in the
# condition. Their `per=` partition argument is resolved to SQL
# PARTITION BY during compilation and is NOT stored on the
# WindowCall QDAG node, so it cannot be shown in the condition
# text above. Alert the judge so it doesn't mis-read a
# per-partition rank as a global rank.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only for WHERE? Technically, calculate/orderby/topk can also contain window functions inside their expression arguments.

Comment on lines +270 to +271
Inside a ``CALCULATE``, aggregation arguments are represented as
``ChildReferenceCollection`` nodes that point to the parent's child list

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just aggregations: it can also be singular sub-collections that are referenced to pull data into the current context (e.g. nations.CALCULATE(nation_name=name, region_name=region.name)

Comment on lines +610 to +615
Clause order:
1. Subject — ``TableCollection`` / ``Cross`` / ``UserGeneratedCollection``
2. Filter — all ``Where`` step conditions joined with ``" and "``
3. Partition — ``PartitionBy`` keys
4. Compute — final ``Calculate`` step (refs + aggregations)
5. Limit/Order — ``TopK`` or ``OrderBy``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this order interact with far more complex queries that have multiple layers of partitioning / stepping back into the children? I'm struggling to visualize this (may help to do so with my testing suggestions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create explain_llm API for LLM queries

4 participants