Releases: website-local/website-scrap-engine
Releases · website-local/website-scrap-engine
0.9.0
BREAKING CHANGE — see Breaking Changes and Migration sections below.
Feature
- logger: make logger implementation configurable (#204) — Replace hardcoded log4js with a pluggable
Loggerinterface. Consumers provide a factory viaDownloadOptions.createLogger. Default implementation writes toconsole. Built-in log4js adapter atlib/logger/log4js-adapter.jsfor backward compatibility. - life-cycle: add statusChange listener hook (#102) — New
statusChangearray onProcessingLifeCycleallows consumers to observe resource progression through the pipeline. Default listener logs skipped/discarded resources and errors. - life-cycle: add existingResource callback for local file handling (#150) — Optional
existingResourcecallback onProcessingLifeCycleto decide what to do when a local file already exists (skip, overwrite, if-modified-since, skipSave). - life-cycle: expose submit resource to init hook (#1131) —
InitLifeCycleFuncreceives an optionalsubmitcallback to add URLs to the download queue during initialization.
Fix
- download: enable warnForNonHtml by default, improve warning (#993) —
warnForNonHtmlis now enabled by default. Warning message includesres.typefor clarity.
Breaking Changes
DownloadOptions.configureLoggerreplaced bycreateLogger?: (options: StaticDownloadOptions) => Logger. The default iscreateDefaultLogger(console-based).log4jsmoved fromdependenciestooptionalDependencies. Consumers who need file-based logging mustnpm install log4jsand use the built-in adapter:import {createLog4jsLogger} from 'website-scrap-engine/lib/logger/log4js-adapter.js'; const options = { createLogger: (opts) => createLog4jsLogger(opts.localRoot, opts.logSubDir), };
- Public
loggernamespace exports are typed asCategoryLoggerinstead of log4jsLogger. Method signatures are compatible (.trace(),.debug(),.info(),.warn(),.error(),.isTraceEnabled()), but consumers using log4js-specific properties will need to update. - Worker log message protocol:
WorkerLog.logger(category string) replaced byWorkerLog.logType(LogType string). Affects custom worker implementations only. ProcessingLifeCyclegains a requiredstatusChange: StatusChangeFunc[]field. Consumers building the life cycle from scratch must addstatusChange: [].warnForNonHtmlis now enabled by default (was opt-in).
New Exports
Loggerinterface — the pluggable logger contractLogTypetype — discriminated union of log categories (io.http.request,system.error, etc.)CategoryLoggerinterface — the per-category proxy type (whatlogger.error,logger.skipetc. are)createDefaultLogger()— factory for the console-based default loggerlogger.setLogger(instance)— configure the logger instance at runtimelogger.getLogger()— retrieve the current logger instance
Misc
- docs: rewrite README with usage examples, architecture details, and adapter helpers
- build(deps): bump picomatch, @typescript-eslint/eslint-plugin, @typescript-eslint/parser, handlebars, ts-jest
0.8.7
0.8.6
Warning
Do not use this, use 0.8.7 instead
Fix
- worker-pool: rewrite task dispatch with 2-pass water-fill algorithm for even load balancing
- worker-pool: reject pending tasks on dispose when maxLoad is set
- process-css: single-pass positional replacement to prevent corrupting already-replaced paths
- download-resource: fix inverted nonHtml detection for array content-type headers
- download-resource: pass missing
optionsarg to requestForResource on retry - download-resource: guard premature close retry with retryLimitExceeded check
- download-resource: wrap encodeURI(decodeURI()) in try-catch for malformed URLs
- download-resource: check Buffer bodies (not just strings) on incomplete HTML retry
- download-streaming-resource: apply computed backoff delay via setTimeout on retry
- options: fix inverted maxRetryAfter comparison
- save-html-to-disk: convert Date.parse milliseconds to seconds for fs.utimes
- save-resource-to-disk: convert Date.parse milliseconds to seconds for fs.utimes
- save-html-to-disk: escape single quotes in redirect HTML JS string literal
- read-or-copy-local-resource: create parent directory before copyFile for StreamingBinary
- worker: assign cloned error back so worker errors propagate to main thread
- worker-pool: only call takeLog for Log messages, not Complete messages
- adapters: widen parseHtml and getResourceBodyFromHtml type to accept Svg
Test
- redirect-html: test encoding and single-quote escaping
- download-streaming-resource: test isBytesAccepted, isSameRangeStart
- options: test calculateFastDelay retry limit, maxRetryAfter, non-retryable methods
Misc
- npm: exclude undici from bundle
- npm: update dependencies
0.8.5
0.8.4
0.8.3
0.8.2
0.8.1
0.8.0
0.8.0
BREAKING
- Requires node.js 18.17 or higher
- Support of es module (and not supports commonjs) (#995) (#218)
- build(deps-dev): bump typescript from 5.0.4 to 5.6.2 (#990)
- build(deps): bump cheerio from 1.0.0-rc.12 to 1.0.0 (#989)
- npm: upgrade to lockfile v3 (#437)
- migrate to got 13
- change importDefaultFromPath to async
Misc
- npm: update dependencies