Skip to content

Releases: website-local/website-scrap-engine

0.9.0

03 Apr 13:18

Choose a tag to compare

BREAKING CHANGE — see Breaking Changes and Migration sections below.

Feature

  • logger: make logger implementation configurable (#204) — Replace hardcoded log4js with a pluggable Logger interface. Consumers provide a factory via DownloadOptions.createLogger. Default implementation writes to console. Built-in log4js adapter at lib/logger/log4js-adapter.js for backward compatibility.
  • life-cycle: add statusChange listener hook (#102) — New statusChange array on ProcessingLifeCycle allows consumers to observe resource progression through the pipeline. Default listener logs skipped/discarded resources and errors.
  • life-cycle: add existingResource callback for local file handling (#150) — Optional existingResource callback on ProcessingLifeCycle to decide what to do when a local file already exists (skip, overwrite, if-modified-since, skipSave).
  • life-cycle: expose submit resource to init hook (#1131)InitLifeCycleFunc receives an optional submit callback to add URLs to the download queue during initialization.

Fix

  • download: enable warnForNonHtml by default, improve warning (#993)warnForNonHtml is now enabled by default. Warning message includes res.type for clarity.

Breaking Changes

  • DownloadOptions.configureLogger replaced by createLogger?: (options: StaticDownloadOptions) => Logger. The default is createDefaultLogger (console-based).
  • log4js moved from dependencies to optionalDependencies. Consumers who need file-based logging must npm install log4js and use the built-in adapter:
    import {createLog4jsLogger} from 'website-scrap-engine/lib/logger/log4js-adapter.js';
    const options = {
      createLogger: (opts) => createLog4jsLogger(opts.localRoot, opts.logSubDir),
    };
  • Public logger namespace exports are typed as CategoryLogger instead of log4js Logger. Method signatures are compatible (.trace(), .debug(), .info(), .warn(), .error(), .isTraceEnabled()), but consumers using log4js-specific properties will need to update.
  • Worker log message protocol: WorkerLog.logger (category string) replaced by WorkerLog.logType (LogType string). Affects custom worker implementations only.
  • ProcessingLifeCycle gains a required statusChange: StatusChangeFunc[] field. Consumers building the life cycle from scratch must add statusChange: [].
  • warnForNonHtml is now enabled by default (was opt-in).

New Exports

  • Logger interface — the pluggable logger contract
  • LogType type — discriminated union of log categories (io.http.request, system.error, etc.)
  • CategoryLogger interface — the per-category proxy type (what logger.error, logger.skip etc. are)
  • createDefaultLogger() — factory for the console-based default logger
  • logger.setLogger(instance) — configure the logger instance at runtime
  • logger.getLogger() — retrieve the current logger instance

Misc

  • docs: rewrite README with usage examples, architecture details, and adapter helpers
  • build(deps): bump picomatch, @typescript-eslint/eslint-plugin, @typescript-eslint/parser, handlebars, ts-jest

0.8.7

22 Mar 04:37

Choose a tag to compare

Fix

  • npm: fix postinstall failure when installed as a dependency
  • npm: fix Node.js DEP0151 deprecation warning for ESM main field resolution

0.8.6

22 Mar 03:29

Choose a tag to compare

0.8.6 Pre-release
Pre-release

Warning

Do not use this, use 0.8.7 instead

Fix

  • worker-pool: rewrite task dispatch with 2-pass water-fill algorithm for even load balancing
  • worker-pool: reject pending tasks on dispose when maxLoad is set
  • process-css: single-pass positional replacement to prevent corrupting already-replaced paths
  • download-resource: fix inverted nonHtml detection for array content-type headers
  • download-resource: pass missing options arg to requestForResource on retry
  • download-resource: guard premature close retry with retryLimitExceeded check
  • download-resource: wrap encodeURI(decodeURI()) in try-catch for malformed URLs
  • download-resource: check Buffer bodies (not just strings) on incomplete HTML retry
  • download-streaming-resource: apply computed backoff delay via setTimeout on retry
  • options: fix inverted maxRetryAfter comparison
  • save-html-to-disk: convert Date.parse milliseconds to seconds for fs.utimes
  • save-resource-to-disk: convert Date.parse milliseconds to seconds for fs.utimes
  • save-html-to-disk: escape single quotes in redirect HTML JS string literal
  • read-or-copy-local-resource: create parent directory before copyFile for StreamingBinary
  • worker: assign cloned error back so worker errors propagate to main thread
  • worker-pool: only call takeLog for Log messages, not Complete messages
  • adapters: widen parseHtml and getResourceBodyFromHtml type to accept Svg

Test

  • redirect-html: test encoding and single-quote escaping
  • download-streaming-resource: test isBytesAccepted, isSameRangeStart
  • options: test calculateFastDelay retry limit, maxRetryAfter, non-retryable methods

Misc

  • npm: exclude undici from bundle
  • npm: update dependencies

0.8.5

24 Jan 06:20

Choose a tag to compare

Enhancement

Misc

  • npm: update dependencies

0.8.4

09 Aug 10:04

Choose a tag to compare

Enhancement

  • Upgraded to typescript 5.9

Test

  • tests: support typescript 5.9

Misc

  • npm: update dependencies

0.8.3

16 Mar 05:26

Choose a tag to compare

Fix

  • options: fix got options memory leak (#1112)
  • downloader: correctly set queue.concurrency (#1113)

0.8.2

14 Mar 12:17

Choose a tag to compare

Fix

  • downloader: use of options before init (#1110)

Misc

  • npm: update dependencies
  • options: deprecate waitForInitBeforeIdle

0.8.1

22 Feb 02:17

Choose a tag to compare

Enhancement

  • sources: support iframe srcdoc (#1081)
  • download-resource: add option to warn for non-html (Part of #993)

Test

  • tests: process-html (#1092)

0.8.0

02 Feb 11:42

Choose a tag to compare

0.8.0

BREAKING

Misc

  • npm: update dependencies

0.7.2

05 Oct 02:43

Choose a tag to compare

0.7.2

Note

  • This would be the last version before updating minimal supported node version

Misc

  • npm: update dependencies