Skip to content

feat(extraction): add Astro language support#585

Open
Detective-XH wants to merge 1 commit into
colbymchenry:mainfrom
Detective-XH:feat/astro-language-support
Open

feat(extraction): add Astro language support#585
Detective-XH wants to merge 1 commit into
colbymchenry:mainfrom
Detective-XH:feat/astro-language-support

Conversation

@Detective-XH
Copy link
Copy Markdown

Summary

  • Adds .astro file extraction via tree-sitter-astro (virchau13/tree-sitter-astro, ABI 14, wasm built with emscripten 5.0.7).
  • The grammar's frontmatter_js_block is raw text rather than a parsed TypeScript AST, so the extractor re-parses it through the TypeScript tree-sitter extractor and maps line-number offsets back to .astro coordinates — the same strategy as SvelteExtractor.
  • Template scanning emits references edges for PascalCase component tags (<MyComponent />) and calls edges for {expr(...)} invocations.

Validation

Extraction — verify-extraction PASS on all three repos:

Repo Size Files Nodes Edges
satnaing/astro-paper small 60 502 1,228
onwidget/astrowind medium 93 747 2,030
withastro/docs large 180 965 1,968

Unit tests: 7/7 passing (describe('Astro Extraction') in __tests__/extraction.test.ts)

A/B benchmark (headless, Opus, n=1 per arm):

Repo Duration (with / without) Read (with / without) Tool calls (with / without)
astro-paper (small) 53s / 106s — 2× faster 3 / 6 8 / 11
astrowind (medium) 32s / 23s — trivial question† 1 / 1 4 / 2
astro-docs (large) 67s / 125s — 46% faster 6 / 22 — −73% 11 / 46 — −76%

† The astrowind question ("how does Hero receive its props") resolves to a single Glob + Read in both arms — not representative of codegraph's value.

Known gaps (follow-up)

  • Astro Islands (client:load etc.) — emits references edges only; does not traverse the child framework's component graph
  • Content CollectionsgetCollection() type-level associations cannot be statically tracked
  • Dynamic routing[slug].astro does not produce a route node (could follow SvelteKit resolver pattern)

Adds .astro file extraction via tree-sitter-astro (virchau13/tree-sitter-astro,
ABI 14). The grammar's frontmatter_js_block is raw text rather than a parsed
TypeScript AST, so the extractor follows the same re-parse strategy as
SvelteExtractor: the frontmatter content is passed through the TypeScript
tree-sitter extractor with line-number offsets mapped back to .astro coordinates.
Template scanning emits references edges for PascalCase component tags and calls
edges for {expr(...)} invocations.

Validated on three real repos:
- satnaing/astro-paper (60 files): 502 nodes, 1,228 edges — verify-extraction PASS
- onwidget/astrowind (93 files): 747 nodes, 2,030 edges — verify-extraction PASS
- withastro/docs (180 files): 965 nodes, 1,968 edges — verify-extraction PASS

A/B benchmark (headless, Opus, n=1 per arm):
- astro-paper: 53s vs 106s (2× faster), Read 3 vs 6
- astro-docs:  67s vs 125s (46% faster), tool calls 11 vs 46, Read 6 vs 22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant