All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added support for scraping Instapaper's special "Liked" and "Archive" collections.
- Added support for configuring the output format via a configuration file, with an option to override it using a command-line argument.
- Added a DeepWiki badge to
README.mdto provide an additional channel for user support.
- Dependencies:
- Updated
certifidependency.
- Updated
- A new
--article-previewflag (and its older alias--add-article-preview) to include the article preview text in the output. - Configuration file support for
add_instapaper_urlandadd_article_previewoptions in the[fields]section ofconfig.toml.
- Renamed
--add-instapaper-urlto--read-urlfor brevity. The old flag is kept for backward compatibility. - Both
--read-urland--article-previewnow support--no-prefixes (e.g.,--no-read-url) to overridetruevalues from the config file.
- A "Contributors" section in
README.mdto visually credit all project contributors.
- Developer Experience & Tooling:
- Added
rufflinting andmypystatic type checking to the CI pipeline to improve code quality. - Integrated automated license compliance checks using
licensecheckinto the CI pipeline. - Configured Dependabot to automatically update GitHub Actions on a weekly basis.
- Added
- Performance:
- Improved application startup time by deferring the import of
json,sqlite3, andcsvmodules to when they are specifically needed.
- Improved application startup time by deferring the import of
- Dependencies:
- Updated the
actions/checkoutGitHub Action to v6 andactions/setup-pythonto v6.
- Updated the
- A new
--add-instapaper-urlcommand-line argument to include a full, clickable URL for each article in the output.
- Developer Experience & Tooling:
- Migrated development tools from
blacktorufffor formatting and linting, and integratedpre-commithooks to automate code quality checks. - Configured the
mypypre-commit hook to only run on thesrc/directory to improve performance.
- Migrated development tools from
- Testing:
- Added comprehensive tests for API and authentication error handling to improve robustness.
- Configured Codecov with new project and pull request coverage targets.
- Output & Export:
- The output filename extension is now automatically corrected based on the selected format (e.g., providing
--output my-file.txt --format csvwill result inmy-file.csv). - CSV output is now fully RFC 4180 compliant, with all fields quoted to improve compatibility with spreadsheet applications.
- SQLite output is optimized to use a generated column for the
instapaper_urlon modern SQLite versions (>=3.31.0), with a fallback for older versions to ensure compatibility.
- The output filename extension is now automatically corrected based on the selected format (e.g., providing
- Robustness & Error Handling:
- Improved the CLI's resilience by adding robust error handling to gracefully manage exceptions during the file-saving process.
- Enhanced the API client's robustness in handling malformed HTML and network errors, particularly for rate-limiting (HTTP 429) scenarios.
- Internal Refactoring:
- Restructured internal constants management into a centralized and more organized architecture, improving code clarity and maintainability.
- Documentation:
- Updated project badges in
README.mdfor clarity and correctness.
- Updated project badges in
- Improved type safety and robustness across the codebase.
First official public release on PyPI.
pyproject.tomlfor project configuration and dependency management.- A
srclayout for the main application code. - A
testsdirectory for the test suite. - A GitHub Actions workflow for CI/CD to automate linting, formatting, and testing.
pytest,pytest-cov, andrequests-mockfor testing.blackandrufffor code formatting and linting.- Added support for JSON and SQLite output formats via the
--formatcommand-line argument. - Added support for custom output filename via the
--outputcommand-line argument.
- The project is now a standard Python package, installable with
pip. - The main script has been replaced by a command-line entry point (
instapaper-scraper). - Decomposed the original
scrape.pyinto logical modules (api,auth,cli,output,exceptions). - Migrated all tests from
unittesttopytest, using fixtures and parametrization. - Updated
README.mdto reflect the new project structure, installation, and usage. - The default output format is now CSV, but users can choose between CSV, JSON, and SQLite.
requirements.txtin favor ofpyproject.toml.- The old
scrape.pyscript. - The old
unittest-based test files.
- The 'page' number has been removed from the output data. Users can now open a specific article on Instapaper by appending the article's unique ID to the base URL:
https://www.instapaper.com/read/<article_id>.
- Implemented session persistence with encryption to streamline authentication.
- Introduced
ScraperStructureChangedcustom exception for better error handling on HTML structure changes. - Added comprehensive tests for error handling in
test_scrape_error_handling.py.
- Implemented robust error handling with exponential backoff and retry logic for transient network errors (Fixes #27).
- Added handling for HTTP 429 (Too Many Requests) errors, respecting
Retry-Afterheaders. - Improved HTML parsing to gracefully handle missing elements.
- Updated dependencies:
cryptographyto 44.0.1 andcertifito 2025.11.12. - Updated
README.mdto reflect the new authentication flow and dependencies.
- Implemented basic logging and login verification for better debugging and security.
- Renamed scrape-transactions.py to scrape.py as main project file.
- Improved HTTP error handling and logging in the scraper.
- Refactored article data handling to use dictionaries for better data structure.
- Updated dependencies:
idna,requests,python-dotenv,soupsieve. - Updated documentation for the new modular architecture.
- Added a
LICENSEfile (GNU GPLv3). - Adjusted the Dependabot configuration for grouped updates.
- Updated various dependencies to their latest versions.
- Added Dependabot and funding configuration files.
- Addressed an issue with handling non-200 status codes during scraping.
- Corrected a boolean conversion error.
- Implemented a new transactional pattern for scraping.
- Pinned
guaradependency to a specific version.
- Introduced support for scraping articles from specific Instapaper folders.
- Removed unused functions and cleaned up imports for a more efficient codebase.
- Added an example environment configuration file.
- Updated the
README.mdto reflect new features like CSV export and folder mode.