Explain LLM Version#511
Conversation
…into Hadia/explain_llm
| """ | ||
| return ( | ||
| isinstance(node.ancestor_context, GlobalContext) | ||
| and node.ancestor_context.ancestor_context is not None |
There was a problem hiding this comment.
A plain root table access has GlobalContext as ancestor_context, but that GlobalContext's own ancestor_context is None. CROSS qualification inserts an extra intermediate GlobalContext whose ancestor_context IS set.
That nesting is the only reliable signal that distinguishes CROSS from a normal table access.
| """ | ||
| text: str = expr.to_string() | ||
|
|
||
| match expr: |
There was a problem hiding this comment.
ChildReferenceExpression and BackReferenceExpression are both subclasses of Reference, they must be matched before case Reference() or they fall through to the wrong branch.
| if detail.get("kind") != "Aggregation": | ||
| continue | ||
| for arg in detail.get("args", []): | ||
| implicit_note = arg.get("implicit_scope_note") |
There was a problem hiding this comment.
Two distinct cases:
-
non-null
implicit_scope_notemeans the collection IS correctly scoped via relationship navigation (correct PyDough pattern, so emit an informational note, not a warning). e.g.customers.CALCULATE(n=COUNT(orders)) -
null means the collection may be unscoped relative to a cross-product context, only then warn. e.g.
nations.CROSS(regions).CALCULATE(n=COUNT(orders))this is potentially wrong sinceordershas no relationship tonationsorregions. In this caseaccess_path = []soimplicit_scope_noteisnull. If theCALCULATEdoesn't filter on any of the CROSS-introduced terms, theCOUNTaggregates allordersfor every row, which is likely a bug.
john-sanchez31
left a comment
There was a problem hiding this comment.
Great job Hadia! Please check my comments below before merging. Most are related to docstrings and type hints.
| * `notes` — list of strings; always present, may be empty | ||
|
|
||
| The `schema` section (when `"error"` is `False`) includes: | ||
| * `source_collection` — the root table name, or `null` for graph-level expressions |
There was a problem hiding this comment.
NIT:
| * `source_collection` — the root table name, or `null` for graph-level expressions | |
| * `source_collection` — the root collection name, or `null` for graph-level expressions |
| "query_summary": "Accesses 'nations', filtered to rows where region.name == 'ASIA', selecting key, name.", | ||
| "steps": [ | ||
| { | ||
| "order": 1, "type": "GlobalContext", |
There was a problem hiding this comment.
| "order": 1, "type": "GlobalContext", | |
| "order": 1, | |
| "type": "GlobalContext", |
Same applies to all steps
| # customers.WHERE(...).orders). Get that chain via child instead. | ||
| current = current.child | ||
| else: | ||
| nxt = getattr(current, "preceding_context", None) |
| # Non-empty access_path → row-level scoping via relationship navigation. | ||
| implicit_scope_note: str | None = None | ||
| if access_path: | ||
| path_str = " → ".join(f"'{p}'" for p in access_path) |
There was a problem hiding this comment.
| path_str = " → ".join(f"'{p}'" for p in access_path) | |
| path_str: str = " → ".join(f"'{p}'" for p in access_path) |
|
|
||
| Args: | ||
| `arg`: the collection arg from an ``ExpressionFunctionCall``. | ||
| `parent`: the parent ``Calculate`` (or other child operator) that owns |
There was a problem hiding this comment.
| `parent`: the parent ``Calculate`` (or other child operator) that owns | |
| `parent`: the parent ``CALCULATE`` (or other child operator) that owns |
| return "\n".join(lines) | ||
|
|
||
| # ------------------------------------------------------------------ # | ||
| # Key Facts — quick-reference block at the top so the judge sees the # |
There was a problem hiding this comment.
NIT: I think we shouldn't mention the judge here. Maybe change this comments for one more general?
| # Key Facts — quick-reference block at the top so the judge sees the # | ||
| # most checkable facts before reading any steps. # | ||
| # ------------------------------------------------------------------ # | ||
| schema = result["schema"] |
| body = _render_step_body(step) | ||
| lines.extend(body) | ||
|
|
||
| notes = step.get("notes", []) |
|
|
||
| lines.append(f"- **Source collection:** {f'`{src}`' if src else '_(none)_'}") | ||
|
|
||
| output_cols = schema.get("output_columns", []) |
| f"Expected a collection, but received an expression: " | ||
| f"{qualified.to_string()}. Did you mean to use explain_term?" | ||
| ) | ||
| result = _error_payload(msg) |
There was a problem hiding this comment.
^ Let's add a pre-declaration near the top: result: dict
| # ------------------------------------------------------------------ # | ||
| # 1. Subject # | ||
| # ------------------------------------------------------------------ # | ||
| cross_step = next((s for s in steps if s["type"] == "Cross"), None) |
There was a problem hiding this comment.
same for table_step & user_step
| detail = s.get("term_details", {}).get(tname, {}) | ||
| if detail.get("kind") == "Aggregation": | ||
| for arg_d in detail.get("args", []): | ||
| cname = arg_d.get("name") |
| # understands they filter a different level of the data. | ||
| top_conds: list[str] = [] | ||
| sub_conds: list[str] = [] | ||
| past_first_sub = False |
There was a problem hiding this comment.
| past_first_sub = False | |
| past_first_sub: bool = False |
| detail = s.get("term_details", {}).get(name, {}) | ||
| if detail.get("kind") != "Aggregation": | ||
| continue | ||
| fn = detail.get("function", "AGG").lower() |
| # ------------------------------------------------------------------ # | ||
| topk_step = next((s for s in steps if s["type"] == "TopK"), None) | ||
| order_step = next((s for s in steps if s["type"] == "OrderBy"), None) | ||
| sort_step = topk_step or order_step |
There was a problem hiding this comment.
for topk_step, order_step, sort_step, collation, by_str, suffix & summary
| PyDoughUnqualifiedException, | ||
| ) | ||
|
|
||
| msg = str(e) |
There was a problem hiding this comment.
| msg = str(e) | |
| msg: str = str(e) |
| if "Did you mean" in msg: | ||
| details: dict = {} | ||
| # Extract the wrong term: "Unrecognized term of ...: 'TERM'." | ||
| term_match = _re.search(r":\s*'([^']+)'", msg) |
There was a problem hiding this comment.
| term_match = _re.search(r":\s*'([^']+)'", msg) | |
| term_match: Match[str] | None = _re.search(r":\s*'([^']+)'", msg) |
| if term_match: | ||
| details["term"] = term_match.group(1) | ||
| # Extract suggestions: "Did you mean: a, b, c?" | ||
| sugg_match = _re.search(r"Did you mean:\s*([^?]+)\?", msg) |
There was a problem hiding this comment.
| sugg_match = _re.search(r"Did you mean:\s*([^?]+)\?", msg) | |
| sugg_match: Match[str] | None = _re.search(r"Did you mean:\s*([^?]+)\?", msg) |
| """ | ||
| if isinstance(e, str): | ||
| message = e | ||
| details: dict[str, object] |
There was a problem hiding this comment.
What would be the value of details in this case?
There was a problem hiding this comment.
This entire logic is quite thorough and interesting! I didn't go through every single detail of some of the middle functions, but I have left some comments on places where I think we can potentially iterate a bit further at the macro level.
My biggest wish after reading everything is something that I think would be tricky to conceptually figure out, but if you can do it would be amazing for extensibility in future: could we find a way to move some aspects of this, particularly stuff that is extremely specific to each type of QDAG node, into the QDAG APIs? Those classes use a LOT of ABC logic, with extensive class hierarchies, so perhaps there is a way to make this work by folding in different methods/templates to some of the abstract base classes, then having the explain_llm logic case on the object ancestry (e.g. calculate vs where vs topk vs orderby vs singular all inherit from AugmentingChildOperator, so we could have explain_llm case on whether it is an instance of that in order to resolve a lot of common logic, and a lot of other things inherit from ChildAccess).
If you look into this and it seems horrifically impractical, we can disregard for now, but if it is even somewhat viable I would encourage doing it. After all, we'll need to extend this all over again for EXPLODE, and I'd prefer if it was literally impossible to miss adding any implementations because if we did, the ABC would fail due to un-implemented methods.
Some areas that I think are particularly ripe for moving into the ABCs:
describe_expression- most/all of the
_build_xxx_stepcan just be made into a single abstract method that the classes implement - Possibly
_render_step_body?
Besides that, I think the overhaul to the testing approach is probably what I would consider the most. I think actually being able to see the output will help us tell if we are missing anything serious, or any glaring bugs jump out.
| ``kind`` tags, and explicit scoping notes so a model can self-correct without | ||
| parsing prose. | ||
|
|
||
| Output schema (success):: |
There was a problem hiding this comment.
Should there be two colons here? I don't know the intended format.
| } | ||
| } | ||
|
|
||
| Output schema (error):: |
| @pytest.fixture | ||
| def tpch_session(get_sample_graph: graph_fetcher) -> PyDoughSession: | ||
| """A PyDoughSession loaded with the TPCH graph (no DB connection needed).""" | ||
| graph: GraphMetadata = get_sample_graph("TPCH") | ||
| session = PyDoughSession() | ||
| session.metadata = graph | ||
| return session | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def tpch_graph(get_sample_graph: graph_fetcher) -> GraphMetadata: | ||
| return get_sample_graph("TPCH") |
There was a problem hiding this comment.
These can be in conftest, and be session-level
| def impl(): | ||
| return nations.CALCULATE(key, name) |
There was a problem hiding this comment.
To make these tests easier to create/run, and potentially even parameterize, you could perhaps turn these into strings and have _run use pydough.from_string.
| def _run( | ||
| impl: Callable[[], UnqualifiedNode], | ||
| graph: GraphMetadata, | ||
| session: PyDoughSession, | ||
| ) -> dict: | ||
| """Qualify ``impl`` under ``graph`` and call ``explain_llm``.""" | ||
| node: UnqualifiedNode = pydough.init_pydough_context(graph)(impl)() | ||
| return cast(dict, pydough.explain_llm(node, session=session)) | ||
|
|
||
|
|
||
| def _step(result: dict, order: int) -> dict: | ||
| """Return the step with the given 1-based order.""" | ||
| return next(s for s in result["steps"] if s["order"] == order) |
There was a problem hiding this comment.
Let's name these functions a bit more descriptively, also add arguments/returns to the docstrings.
| f"Expected a collection, but received an expression: " | ||
| f"{qualified.to_string()}. Did you mean to use explain_term?" | ||
| ) | ||
| result = _error_payload(msg) |
There was a problem hiding this comment.
^ Let's add a pre-declaration near the top: result: dict
| steps = _collect_steps(qualified) | ||
| schema = _build_schema(qualified) |
| # Detect window functions (RANKING, PERCENTILE, etc.) in the | ||
| # condition. Their `per=` partition argument is resolved to SQL | ||
| # PARTITION BY during compilation and is NOT stored on the | ||
| # WindowCall QDAG node, so it cannot be shown in the condition | ||
| # text above. Alert the judge so it doesn't mis-read a | ||
| # per-partition rank as a global rank. |
There was a problem hiding this comment.
Why only for WHERE? Technically, calculate/orderby/topk can also contain window functions inside their expression arguments.
| Inside a ``CALCULATE``, aggregation arguments are represented as | ||
| ``ChildReferenceCollection`` nodes that point to the parent's child list |
There was a problem hiding this comment.
It's not just aggregations: it can also be singular sub-collections that are referenced to pull data into the current context (e.g. nations.CALCULATE(nation_name=name, region_name=region.name)
| Clause order: | ||
| 1. Subject — ``TableCollection`` / ``Cross`` / ``UserGeneratedCollection`` | ||
| 2. Filter — all ``Where`` step conditions joined with ``" and "`` | ||
| 3. Partition — ``PartitionBy`` keys | ||
| 4. Compute — final ``Calculate`` step (refs + aggregations) | ||
| 5. Limit/Order — ``TopK`` or ``OrderBy`` |
There was a problem hiding this comment.
How does this order interact with far more complex queries that have multiple layers of partitioning / stepping back into the children? I'm struggling to visualize this (may help to do so with my testing suggestions).
Linked ticket
Closes #512
Type of change
What changed and why?
Adds
pydough.explain_llm(), a new exploration API that returns a structured description of a PyDough collection expression designed for LLM consumption.Unlike
explain()(human-readable prose for interactive debugging),explain_llmreturns a stable, machine-parseable payload, either a JSON dict (format="json") or markdown (format="md"), so LLM judges can validate and self-correct generated PyDough code.Key design decisions:
available_termslives under adebugsub-key: scope information is preserved for human debugging but kept out of the main payload so judge prompts aren't distracted by fields irrelevant to correctness.Implicit scoping is made explicit : PyDough's relationship-navigation scoping (e.g.
COUNT(orders)insidecustomers.CALCULATE(...)) is correct but looks like a missing filter to a naive judge. The structuredimplicit_scope_notefield and per-stepnotessurface this so evaluators can distinguish "missing filter" from "implicitly scoped via relationship navigation."Structured error taxonomy: 13 error categories (
unrecognized_term,plural_in_calculate,bad_window_per, etc.) witherror_type,details, andhintfields, so the LLM gets actionable guidance without parsing the raw exception message.conditionsas structured dicts — Where step conditions exposeoperator,left, andrightas parsed sub-dicts (not just strings), so a judge can inspect operands directly.Output shape (success):
query_summary(one deterministic sentence)steps(ordered operations)schema(source collection, output columns + types, ordering, limit).On error: always
{"error": true, "message": ..., "error_type": ..., "details": ..., "hint": ..., "steps": [], "schema": null}.Implementation:
_common.py:describe_expression,describe_subcollection_arg,generate_step_notes,generate_query_summary,_cond_texts,_collation_entrypydough/exploration/explain_llm.py: qualification, step-walking, schema building, error classification, markdown render.How I tested this?
tests/test_explain_llm.pyNotes for reviewers