refactor: rewrite & modernization - @polygonlabs/pos-sdk@1.0.0 #476
Draft
MaximusHaximus wants to merge 3 commits into
Draft
refactor: rewrite & modernization - @polygonlabs/pos-sdk@1.0.0 #476MaximusHaximus wants to merge 3 commits into
MaximusHaximus wants to merge 3 commits into
Conversation
| # is not blown by checkpoint waits. | ||
| POS_SDK_TEST_E2E_ENABLED: 'true' | ||
| steps: | ||
| - uses: 0xPolygon/pipelines/.github/actions/ci@main |
8a11db7 to
11801e5
Compare
Foundation for the @polygonlabs/pos-sdk 1.0 monorepo. None of these
touch package source — they wire up the workspace, the CI gates, and
the planning record so the rewrite can land in a clean per-package
commit on top of this one.
Workspace tooling
- tsconfig.json: project reference for packages/pos-sdk/tsconfig.build.json
- pnpm-workspace.yaml: adds publicHoistPattern @tsconfig/* so tsup's
load-tsconfig (a transitive dep that runs from
node_modules/.pnpm/...) can follow extends: "@tsconfig/node20" in
each package's strict tsconfig
- package.json (root): adds @tsconfig/node20 to devDependencies for
the same reason
- pnpm-lock.yaml: regenerated; carries the SDK's added runtime deps
(@polygonlabs/verror, p-limit, ethereum-cryptography, @ethereumjs/*)
plus the test-app's vite / playwright tooling. Lockfile lives at
workspace root; its content is consumed by all package commits in
the series.
- eslint.config.js: drops stale ignore patterns for
packages/maticjs/{build_helper,license.js,test} (those targets no
longer exist after the rename) plus the file-specific overrides
for src/utils/http_request.ts and src/index.ts that no longer
apply
CI workflows
- .github/workflows/ci-trigger.yml: forwards integration-test secrets
POS_SDK_TEST_{PARENT_RPC,CHILD_RPC,PRIVATE_KEY} via job.env so the
shared 0xPolygon/pipelines composite picks them up; updates the
build job's filter to @polygonlabs/pos-sdk and drops Node 18.x
from the matrix (the SDK declares engines.node >=20)
- .github/workflows/ci-nightly.yml (new): daily 03:00 UTC schedule
+ workflow_dispatch, sets POS_SDK_TEST_E2E_ENABLED=true so the
4h deposit-withdraw cycle test runs once a day without crowding
PR-driven CI
Plan / decision record
- plans/pos-sdk-1.0-rewrite.md: the 9-stage agent-executable plan
the rewrite was driven from. Kept on the branch as durable
reference so the design rationale isn't lost when the PR closes.
- PLAN.md, PLAN_BACKUP.md: historical strategic-plan documents
retained alongside the executable plan
Cleanup
- .changeset/old-dolls-prove.md: removes a stale empty changeset
left over from a prior interactive `pnpm exec changeset add`
session that was abandoned without a body. The pos-sdk changeset
lands with its package commit.
Renames @maticnetwork/maticjs to @polygonlabs/pos-sdk and replaces
every load-bearing piece of the legacy SDK's design with one that
holds up under multi-tenant, type-strict, observability-aware,
browser-bundleable use.
Why this is a clean break
The 0.x SDK accumulated a decade of abstraction debt that produced
real failure modes:
- The plugin model mutated module-level globals (utils.Web3Client = X),
making multi-tenant use unsafe — two POSClient instances against
different networks cross-contaminated each other in the same
process.
- The lazy ITransactionWriteResult shape conflated submitted and
confirmed: getTransactionHash() *initiated* a send rather than
observing one, and consumers consistently mis-used it. A caller
assuming idempotency caused a production outage by double-
broadcasting.
- The BaseToken → POSToken → ERC20 inheritance chain forced
non-token contracts (RootChainManager, GasSwapper) to extend a
class named BaseToken, blocking otherwise-trivial wrapper additions.
- ABIs were loaded at runtime from a single CDN — a single point of
failure for every running consumer.
- BaseBigNumber predated native bigint by half a decade; consumers
converted to it just so the SDK could convert back internally.
- 27 `: any` and 50 `as` casts on the public surface (5 of them
`as any` on the entire client config) gave consumer code no
compile-time signal of misuse.
What ships in 1.0
- POSClient.init({ network, parent, child }) is the only construction
path. ParentClientConfig is a tagged union — consumers pass their
existing viem / ethers-v5 / ethers-v6 client and tell the SDK what
kind it is. No global state, no plugin to register.
- Three internal Adapter implementations (viem, ethers v5, ethers v6)
abstract every read / write / estimate / receipt / keccak primitive
the bridge code needs. Adapters take type-only imports of their
parent client library so a static module load never crashes for
consumers using a different library; runtime value-imports for
ethers happen via dynamic `import('ethers')`. The Adapter is
deliberately not exported — it's an internal abstraction.
- TxResult = { hash, confirmed() }. Hash is populated synchronously
the moment the parent-chain RPC accepts the broadcast; confirmed()
waits for the receipt and is idempotent (the underlying wait is
memoised).
- Unsigned-tx surface via prepareXxx. Every public write has a
sibling prepareXxx (prepareApprove, prepareDeposit,
prepareCompleteWithdraw, prepareDepositEther, …) returning
{ to, data, value? } instead of broadcasting. Closes the smart-
wallet / batched-multicall / offline-sign gap that the legacy
option.returnTransaction flag served clumsily. Two distinct,
statically-typed methods replace the conditional-shape foot-gun.
- Bridge helpers exposed flat on POSClient: pos.buildExitPayload,
pos.buildExitPayloadOnIndex, pos.isCheckpointed, pos.isWithdrawn,
pos.isWithdrawnOnIndex, pos.getBlockProof, pos.getPredicateAddress.
The legacy SDK's pos.client.exitUtil flat-access was used by
consumers building exit payloads outside the standard token flows
(sync block events, custom bridge events, plasma exits); the new
surface mirrors that ergonomics without re-exposing internals.
- BaseToken hierarchy dismantled in favour of two composed services:
ContractCaller (transaction plumbing — adapter wiring, address
resolution, gas estimation, EIP-1559 guards) and POSBridgeHelpers
(predicates, exit-payload construction, isWithdrawn checks).
- Native `bigint` everywhere. Every public method on the token
classes accepts and returns `bigint`; conversion to BigNumber for
ethers v5 happens once at the v5 adapter boundary (.toBigInt()
inputs, BigNumber.from outputs).
- POSBridgeError extends VError (a TypeScript-first, browser-friendly
port of Joyent's canonical Node verror library) with a closed
28-code POSBridgeErrorCode union. Consumers get the standard
cause-chain helpers — findCauseByName / findCauseByType / info /
fullStack — matching Joyent's documented API. `code` and `info`
are own enumerable properties so any logger that walks them (pino,
winston, custom, @polygonlabs/logger) surfaces the discriminator
and structured payload directly. Codes match the legacy
ErrorHelper.throw(code, ...) keys verbatim so any consumer
dashboards or alert queries keyed off the old strings continue to
match. Sets this.name = 'POSBridgeError' for Sentry/APM
aggregation.
- ABIs vendored as `as const` TypeScript files in src/abi/, giving
viem-typed inference at every internal call site. The CDN
dependency for ABIs is gone.
- Contract addresses still fetched dynamically (the predicate /
RootChainManager addresses change occasionally and we don't want
to ship a new SDK each time), but cached in-process with stale-
while-revalidate semantics on a 1h TTL — long-running services
pick up redeployments without restart, individual calls serve the
cached value at near-zero cost. config.addresses override available
for staging / air-gapped, config.addressTTLMs tunes the freshness
vs. CDN-traffic trade-off.
- Structural Logger interface (pino-shaped). Consumers plug in their
own logger — pino, bunyan, @polygonlabs/logger, custom — without
the SDK taking a runtime dep. Default is noopLogger.
- sanitiseError strips RPC tokens (?token=... / &token=...) from
errors before they reach consumer logs, walking the cause chain
and preserving subclass prototypes.
- ETH deposits hoisted to top-level POSClient.depositEther /
depositEtherWithGas — they don't fit on parent.erc20(addr) because
ETH has no token contract. depositEtherWithGas is mainnet-only and
throws ONLY_ALLOWED_ON_MAINNET on Amoy.
- parent / child namespaces eliminate the easily-inverted isParent:
boolean parameter (`pos.parent.erc20(addr)` not
`pos.erc20(addr, true)`).
- Method renames: withdrawStart → startWithdraw, withdrawExit →
completeWithdraw, withdrawExitFaster → completeWithdrawFast,
etheriumSha3 → soliditySha3.
- p-limit replaces the legacy mapPromise — caps in-flight RPC fan-out
during proof construction, propagates first failure (not the legacy
swallow-into-result-wrapper).
- Tests: full pyramid. Unit (errors / sanitise / address-service /
abi-types / concurrency / prepare-tx, always run), integration on
Amoy + Sepolia with describe.skipIf(!HAS_CREDS) so the suite passes
locally and runs for real in CI when secrets are configured.
Adapter parity tests (same it() names across viem / ethers v5 /
ethers v6) catch translation bugs at PR time. End-to-end
deposit→checkpoint→withdraw cycle gated on POS_SDK_TEST_E2E_ENABLED
for nightly CI.
- README + MIGRATION + examples (viem.ts / ethers-v5.ts /
ethers-v6.ts) all rewritten for the new surface. MIGRATION
includes a comprehensive replacement table for every removed API
(signTypedData, etheriumSha3, encode, sendRPCRequest, …) with
per-library replacement code (viem / ethers v5 / ethers v6).
Legacy examples/{pos,utils_pos,config,package,.env}* and the
manual/ debug scratch directory are deleted — every API they
invoked is gone in 1.0.
Stats
Final source tree under packages/pos-sdk/: ~3,900 lines (down from
the legacy ~5,674). 11 dead-code utility files deleted. The
abstracts/, implementation/, and enums/ directories are gone
entirely. webpack 4 → tsup. Build emits ESM + CJS + DTS. The full
src/ now compiles under the strict base tsconfig (target es2023,
module nodenext, strict, erasableSyntaxOnly, verbatimModuleSyntax)
with zero `: any`, zero `as any`, and zero `console.*` calls.
Scope: PoS only
The legacy @maticnetwork/maticjs package shipped both the PoS bridge
and the zkEVM bridge surfaces. This rewrite covers only the PoS
side. The zkEVM chain is on a wind-down schedule; consumers using
ZkEvmClient stay on @maticnetwork/maticjs until the chain is shut
down, then drop both together. PoS-only and mixed consumers migrate
per packages/pos-sdk/MIGRATION.md.
…right
New private workspace package `@polygonlabs/pos-sdk-test-app` bundles
the SDK through Vite and exercises its public surface in a real browser
via Playwright. Catches any case where the SDK accidentally relies on
Node built-ins, polyfills, or non-browser-safe APIs that unit tests
under Node would miss.
Why this lives in its own package
The pos-sdk unit + integration suites run under Vitest (which is
Node), so a Buffer.from() call inside a transitive dep like
@ethereumjs/util doesn't surface as a failure there — it only breaks
when a consumer bundles for the browser. Catching that gap requires
actually loading the built dist/ output in a browser. Vitest has a
browser mode but it doesn't run integration tests against a real
chain, so the cleanest split is: the SDK package keeps its existing
Node test pyramid; this package owns the "does it bundle and run in a
browser" gate.
What the smoke test exercises
- POSClient.init({ network, parent, child, addresses }) — addresses
override skips the CDN fetch so no real RPC is needed
- pos.parent.erc20(addr).prepareApprove(amount) — the dynamic-import
encodeFunctionData path through the viem adapter
- new POSBridgeError(code, msg, info, { cause }) — VError construction
- sanitiseError(err) — RPC token redaction + cause-chain walk
- noopLogger calls — no-op logger isn't smuggling Node APIs
- ethereum-cryptography/keccak — the noble-hashes path the adapters use
Implementation notes
- Vite 6 (not 5) — workspace's pnpm trustPolicy: no-downgrade rejected
vite@5.4.21's provenance attestation. Vite 6 installs cleanly and
the API surface used here is unchanged.
- src/node-stubs/{events,buffer}.ts — throw-on-call stubs aliased via
vite.config.ts's resolve.alias. Without these, the SDK bundle fails
vite build because @ethereumjs/util / readable-stream named-import
from Node's events / buffer. Stubs preserve the runtime detection
signal — they explicitly are NOT polyfills. Every method call
throws, so any path that actually hits Buffer.X at runtime would
fail loudly in the browser console and the spec.
- The Playwright spec auto-skips with a clear remediation message
(`pnpm exec playwright install --with-deps chromium`) when the
chromium binary is absent — `pnpm -r run test` stays green on
machines without Playwright browsers installed.
Findings the smoke test surfaced
pos-sdk's dist transitively imports Node's events (via
@ethereumjs/util's asyncEventEmitter) and buffer (via readable-stream
/ safe-buffer). 60 Buffer references are inside @ethereumjs/util's
BufferUtil static class — behind method-call gates so the happy-path
public surface (POSClient.init with addresses override + prepareApprove
with spenderAddress) does NOT currently trip them. The browser test
passes, but reducing Node-built-in dependence further is a real
long-term hazard worth tracking.
Verification
- pnpm exec tsc --noEmit (in packages/test-app): exit 0
- pnpm exec vite build: exit 0; dist/ produced (~1.3 MB unminified
main bundle, ~270 KB gzipped)
- pnpm exec playwright test: 1 passed in chromium
- pnpm exec eslint packages/test-app/: 0 errors, 0 warnings
e410c24 to
a4870f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
@maticnetwork/maticjs3.9.x has accumulated a decade of abstraction debt that actively obstructs both consumers and maintainers. These are concrete failure modes, not cosmetic concerns:utils.Web3Client = X). Constructing twoPOSClientinstances against different networks in the same process — a routine pattern in indexers and multi-tenant API services — cross-contaminates them. We have hit this in production.ITransactionWriteResultshape conflates "submitted" and "confirmed".getTransactionHash()initiates a send rather than observing one, and consumers consistently mis-use it. A caller assuminggetTransactionHash()was idempotent caused a production outage by double-broadcasting a transaction.BaseToken → POSToken → ERC20inheritance chain forces every contract wrapper — includingRootChainManagerandGasSwapper, which are not tokens — to extend a class namedBaseToken. Adding a new contract wrapper requires inheriting bridge-specific predicate logic that does not apply to it. This has blocked otherwise-trivial feature work multiple times.static.polygon.technologymoves, every running consumer breaks. This is a single point of failure for the entire SDK.BaseBigNumberpredates nativebigintby half a decade and now adds boilerplate without value. Consumers convert from their library's native type intoBaseBigNumberonly to have the SDK convert it back internally.: anytypes and 50ascasts on the public surface, including 5as anycasts that erase the entire client config type. Consumer code has no compile-time signal that it is misusing the SDK.Scope: PoS only — zkEVM stays on the legacy package
@maticnetwork/maticjsshipped both the PoS bridge and the zkEVM bridge surfaces. This rewrite covers only the PoS side. The zkEVM chain is on a wind-down schedule; rewriting and shipping a@polygonlabs/zkevm-sdkwould put consumers through a migration twice — once now, once when the package is sunset alongside the chain. Skipping the intermediate step is cleaner:@polygonlabs/pos-sdkperMIGRATION.md.@maticnetwork/maticjsand drop the package when the chain is shut down.What we got
Correctness and determinism
utils.Web3Client = Xglobal mutation. MultiplePOSClientinstances against different networks coexist in one process without cross-contamination.TxResult = { hash, confirmed() }is observe-only — no method call has hidden side effects. The hash is populated synchronously the moment the RPC accepts the broadcast;confirmed()waits for the receipt and is idempotent.as constTypeScript files, so the SDK's compiled call sites and the bundled ABIs are guaranteed to match. The CDN dependency for ABIs is gone.RootChainManagerand predicate addresses change occasionally and we do not want to ship a new SDK each time), but theAddressFetcheruses stale-while-revalidate semantics on a 1h TTL — every individual call serves the cached value at near-zero cost, and the next call after the TTL window kicks off a background refresh.pos.parent.erc20(addr)andpos.child.erc20(addr)namespaces eliminate the easily-invertedisParent: booleanparameter.versionconfig field that let consumers pin to stale contract metadata is gone — there is one canonical ABI per release.Type safety
: anytypes eliminated. The four load-bearing ones (provider: any,abi: any) are now typed via theAdapterinterface andas constABIs respectively.ascasts pruned. The 5as anycasts on the legacyWeb3SideChainClientconfig disappeared entirely with the class.TYPE_AMOUNT(legacyBaseBigNumber | string | numberunion) replaced with nativebiginton 27+ method signatures.as constABIs give viem-typed inference at every internal call site — method names, argument shapes, and return types are compile-time-checked.POSBridgeError's 28-code discriminator union enables exhaustiveswitchnarrowing in consumer error handlers — TypeScript fails the consumer's build the moment a new failure mode is added without a correspondingcase.Capability — keeping the tools consumers actually rely on
A migration audit across
repositories/(portal, proof-generation-api, staking-ui, bridge-indexer, etc.) surfaced two real consumer use cases the initial rewrite cut too aggressively. Both are addressed in this PR:prepareXxxsibling (prepareApprove,prepareDeposit,prepareCompleteWithdraw, …) that returns{ to, data, value? }instead of broadcasting. Closes the gap for Safe / Sequence / account-abstraction bundlers, batched multicall flows, pre-flight inspection, and offline signing. Replaces the legacyoption.returnTransactionfoot-gun (same return type meaning two different things depending on a boolean) with two distinct, statically-typed methods.POSClient.proof-generation-api(and any other consumer building exit payloads outside the standard token flows — sync block events, custom bridge events, plasma exits) usedpos.client.exitUtil.{buildPayloadForExit, isCheckPointed, getBlockProof, …}directly. The 1.0 SDK exposes those aspos.buildExitPayload,pos.isCheckpointed,pos.getBlockProof,pos.isWithdrawn,pos.isWithdrawnOnIndex,pos.getPredicateAddress,pos.buildExitPayloadOnIndex. The token classes still wrap them for the 95% case.Errors extend
VErrorPOSBridgeErrorextendsVErrorfrom@polygonlabs/verror— a TypeScript-first, browser-friendly port of Joyent's canonical Nodeverrorlibrary. Consumers get the standard cause-chain helpers (findCauseByName,findCauseByType,info(err),fullStack(err)) matching Joyent's documented API, plus a typedcodediscriminator on top.infois an own enumerable property, so any logger that serializes own properties on Error instances picks up the structured payload. Zero runtime dependencies and ESM-only — bundles cleanly for both Node and browser.Performance and resource use
index.jsonfetch for addresses.Web3SideChainClientsingleton removed; per-instance state reduces memory pressure in multi-tenant deployments.${baseUrl}/${network}, not just network — survivesPOSClientinstance churn and stays multi-tenant-safe.p-limit(battle-tested 10 KB dep) replaces ~80 lines of custommapPromisewith cleaner concurrency semantics and proper first-failure rejection.Observability
POSBridgeErrorexposes a stable discriminatorcodeconsumers can route on (error.code === 'BURN_TX_NOT_CHECKPOINTED'). The 28 codes match the legacyErrorHelper.throw(code, ...)keys verbatim, so existing consumer dashboards and alert queries continue to fire after the rewrite.?token=.../&token=...) applied at the SDK boundary before errors reach consumer loggers — protects services using non-sanitising loggers from leaking RPC credentials.Loggerinterface (pino-shaped) lets consumers plug in their own structured logger (pino,bunyan, custom) without the SDK taking a runtime dep on any specific implementation. The default isnoopLogger— the SDK no longer writes to stdout.Browser compatibility
A new
@polygonlabs/pos-sdk-test-appworkspace package (packages/test-app/) bundles the SDK through Vite and exercises the public surface in a real browser via Playwright. Catches any case where the SDK accidentally relies on Node built-ins, polyfills, or non-browser-safe APIs that Vitest's Node-mode tests would miss. The Playwright spec runs in chromium and asserts the bundle loads cleanly, every public symbol is callable, and noconsole.erroroccurs.pnpm -r run testauto-skips the spec gracefully when Playwright browsers aren't installed.The smoke test surfaced a real long-term hazard worth tracking: pos-sdk's dist transitively imports Node's
events(via@ethereumjs/util'sasyncEventEmitter) andbuffer(viareadable-stream/safe-buffer) — 60Bufferreferences indist/, behind method-call gates so the happy path doesn't trip them.Testability
ContractCallerandPOSBridgeHelpersare independently mockable services because they were extracted as composition, not inheritance. Unit testing the bridge logic no longer requires instantiating the full client.Architectural cleanup
BaseToken → POSToken → ERC20chain dismantled in favour of two composed services:ContractCaller(transaction plumbing) andPOSBridgeHelpers(predicates, exit payloads). Adding a new contract wrapper now means writing a class that uses aContractCaller, not one that inherits 3 levels of bridge logic that does not apply.abstracts/,implementation/, andenums/directories are deleted entirely.utils/index.ts ↔ abstracts/index.ts ↔ subpathsbroken.target: es2023).Test plan
POS_SDK_TEST_PARENT_RPC,POS_SDK_TEST_CHILD_RPC,POS_SDK_TEST_PRIVATE_KEYso PR-trigger CI runs the full integration suite (without them, the integration tier skips cleanly viadescribe.skipIf(!HAS_CREDS)).github/workflows/ci-nightly.yml)changeset-release/masterPR appears after merge@polygonlabs/pos-sdkrelease, recover locally withpnpm exec changeset publishon the merge commitproof-generation-apito usepos.buildExitPayload,pos.isCheckpointed,pos.getBlockProofin place ofmaticClient.exitUtil.X(full mapping inMIGRATION.md)portalandstaking-uifor legacypos.client.parent.Xcalls (signTypedData,etheriumSha3,encode,sendRPCRequest) and migrate perMIGRATION.md's replacement tableZkEvmClientunderstand they keep@maticnetwork/maticjsinstalled alongside@polygonlabs/pos-sdk(or only) until the zkEVM chain is shut downtests/integration/exit-payload.test.ts(one per ERC-20/721/1155); recording procedure inpackages/pos-sdk/tests/README.mdtests/fixtures/networks.tsagainst currently-deployed mintable testnet tokens@maticnetwork/maticjson npm with a message pointing PoS users at@polygonlabs/pos-sdkand zkEVM users at the chain's wind-down notice@maticnetwork/maticjsperpackages/pos-sdk/MIGRATION.md