feat: add CV upload functionality with parsing and storage#1
feat: add CV upload functionality with parsing and storage#1ABIvan-Tech wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “upload CV as PDF” flow to the bot, extracting text from uploaded PDFs and storing it on the user for use in subsequent chat generations.
Changes:
- Add
/upload_cvcommand + document handler to download a PDF, extract text, store it onUser, and trigger a generation. - Introduce
pypdfdependency and a small PDF text-extraction service. - Add DB schema support (user columns + Alembic migration) and update CI/dev tooling configs.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
src/mitko/bot/handlers.py |
Adds /upload_cv and PDF document upload handling, rate limiting, persistence, and generation trigger. |
src/mitko/services/cv_parser.py |
New service to extract text from PDF bytes using pypdf. |
src/mitko/models/user.py |
Adds cv_text and cv_uploaded_at fields. |
alembic/versions/041_add_cv_fields_to_users.py |
Migration adding the new user columns. |
src/mitko/i18n/base.py |
Adds i18n dataclass structure for CV upload messages. |
src/mitko/i18n/en.py / src/mitko/i18n/ru.py |
Adds English/Russian strings for the CV upload flow. |
pyproject.toml |
Adds pypdf dependency and adjusts pytest config stanza placement. |
uv.lock |
Locks pypdf dependency. |
tests/conftest.py |
Reworks async DB fixture initialization and model imports for tests. |
src/mitko/agents/config.py |
Switches model objects to lazy-initialized cached getters. |
devops/docker-compose.pgvector.yml |
Adds a pgvector docker-compose file for local dev. |
.github/workflows/ci.yml |
Reworks CI setup steps (Python/pip-based install, lint/type/test commands). |
.gitignore |
Adds ignores for .github/ and .agent-memory/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| strategy: | ||
| matrix: | ||
| python-version: [3.11] | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up uv | ||
| uses: astral-sh/setup-uv@v5 | ||
| - name: Set up Python | ||
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: "3.12" | ||
| python-version: ${{ matrix.python-version }} |
There was a problem hiding this comment.
CI is set to Python 3.11, but the project declares requires-python = ">=3.12" (and .python-version is 3.12). This will cause installs/tests to fail or diverge from supported runtime. Update the matrix to 3.12 (and/or add 3.12 alongside 3.11 only if the project truly supports it).
| - name: Run tests | ||
| env: | ||
| DATABASE_URL: "sqlite+aiosqlite:///:memory:" | ||
| TELEGRAM_BOT_TOKEN: "test-token" | ||
| ADMIN_GROUP_ID: "-1001234567890" | ||
| OPENAI_API_KEY: "test-key" | ||
| ANTHROPIC_API_KEY: "test-key" | ||
| run: uv run pytest | ||
| run: uv run pytest -q |
There was a problem hiding this comment.
The previous workflow exported required env vars for SETTINGS to import (DATABASE_URL, TELEGRAM_BOT_TOKEN, ADMIN_GROUP_ID, OPENAI_API_KEY, etc.). Those were removed, but src/mitko/config.py loads SETTINGS at import time and requires these variables, so tests/lint/typecheck will fail in CI unless you set them again (or change config loading for tests).
|
|
||
| class CVParserService: | ||
| def __init__(self, session: AsyncSession) -> None: | ||
| self.session = session |
There was a problem hiding this comment.
CVParserService stores an AsyncSession but extract_text() never uses it. If the session isn’t needed, drop it from the constructor (or make extract_text a @staticmethod/module function) to avoid implying DB side effects and simplify usage/testing.
| self.session = session | |
| # The session is accepted for interface compatibility but not stored, | |
| # as this service currently performs only pure PDF text extraction. | |
| pass |
| revision: str = 'c917afd58f41' | ||
| down_revision: str | None = '6991d7de6488' | ||
| branch_labels: str | Sequence[str] | None = None | ||
| depends_on: str | Sequence[str] | None = None | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| # ### commands auto generated by Alembic - please adjust! ### | ||
| op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True)) | ||
| op.add_column( | ||
| 'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True) | ||
| ) | ||
| # ### end Alembic commands ### | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| # ### commands auto generated by Alembic - please adjust! ### | ||
| op.drop_column('users', 'cv_uploaded_at') | ||
| op.drop_column('users', 'cv_text') |
There was a problem hiding this comment.
This migration uses single quotes for revision/down_revision and table/column names, while the surrounding migrations use double quotes. For consistency (and to minimize noisy diffs in future autogenerated migrations), align quoting/style with the existing convention in alembic/versions/*.py.
| revision: str = 'c917afd58f41' | |
| down_revision: str | None = '6991d7de6488' | |
| branch_labels: str | Sequence[str] | None = None | |
| depends_on: str | Sequence[str] | None = None | |
| def upgrade() -> None: | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True)) | |
| op.add_column( | |
| 'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True) | |
| ) | |
| # ### end Alembic commands ### | |
| def downgrade() -> None: | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.drop_column('users', 'cv_uploaded_at') | |
| op.drop_column('users', 'cv_text') | |
| revision: str = "c917afd58f41" | |
| down_revision: str | None = "6991d7de6488" | |
| branch_labels: str | Sequence[str] | None = None | |
| depends_on: str | Sequence[str] | None = None | |
| def upgrade() -> None: | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.add_column("users", sa.Column("cv_text", sa.Text(), nullable=True)) | |
| op.add_column( | |
| "users", sa.Column("cv_uploaded_at", sa.DateTime(timezone=True), nullable=True) | |
| ) | |
| # ### end Alembic commands ### | |
| def downgrade() -> None: | |
| # ### commands auto generated by Alembic - please adjust! ### | |
| op.drop_column("users", "cv_uploaded_at") | |
| op.drop_column("users", "cv_text") |
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| pip install ".[dev]" | ||
|
|
||
| - name: Lint code | ||
| run: uv run ruff check src/ tests/ | ||
| - name: Lint (ruff) | ||
| run: uv run ruff check src/ tests/ || true | ||
|
|
||
| - name: Type check | ||
| run: uv run pyright | ||
| - name: Type check (pyright) | ||
| run: uv run pyright || true | ||
|
|
||
| - name: Run tests | ||
| env: | ||
| DATABASE_URL: "sqlite+aiosqlite:///:memory:" | ||
| TELEGRAM_BOT_TOKEN: "test-token" | ||
| ADMIN_GROUP_ID: "-1001234567890" | ||
| OPENAI_API_KEY: "test-key" | ||
| ANTHROPIC_API_KEY: "test-key" | ||
| run: uv run pytest | ||
| run: uv run pytest -q | ||
|
|
There was a problem hiding this comment.
The workflow installs dependencies with pip, but all subsequent commands use uv run ... even though uv is no longer installed (the previous setup-uv step was removed). Either restore uv installation (e.g., astral-sh/setup-uv) or run ruff, pyright, and pytest directly without uv run.
| _openai_model = None | ||
| _openai_responses_model = None | ||
| _anthropic_model = None |
There was a problem hiding this comment.
These module-level caches are untyped (_openai_model = None, etc.) but later assigned model instances. With pyright in strict mode this typically produces type errors (assigning OpenAIChatModel into a variable inferred as NoneType). Add explicit annotations like OpenAIChatModel | None (and same for the others) to keep type checking clean.
| _openai_model = None | |
| _openai_responses_model = None | |
| _anthropic_model = None | |
| _openai_model: OpenAIChatModel | None = None | |
| _openai_responses_model: OpenAIResponsesModel | None = None | |
| _anthropic_model: AnthropicModel | None = None |
| version: "3.8" | ||
| services: | ||
| db: | ||
| image: ankane/pgvector:latest |
There was a problem hiding this comment.
The compose file pins ankane/pgvector to latest, which can introduce breaking changes unexpectedly when the image updates. Prefer pinning to a specific version tag (or digest) so dev/test environments are reproducible.
| image: ankane/pgvector:latest | |
| image: ankane/pgvector:pg16 |
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| pip install ".[dev]" |
There was a problem hiding this comment.
CI installs .[dev] via pip, but test dependencies like aiosqlite appear to live under [dependency-groups].dev (uv-only) rather than [project.optional-dependencies].dev. Since tests/conftest.py uses sqlite+aiosqlite, pip-based CI may miss aiosqlite and fail. Either add those packages to the pip dev extra or switch CI back to uv sync --extra dev / dependency groups.
| pip install ".[dev]" | |
| python -m pip install uv | |
| uv sync --group dev |
| def pytest_configure(config: Any) -> None: | ||
| """Initialize the test database at pytest startup.""" | ||
| global _engine, _connection | ||
| loop = asyncio.new_event_loop() | ||
| asyncio.set_event_loop(loop) | ||
| _engine, _connection = loop.run_until_complete(_init_db()) | ||
|
|
||
|
|
||
| def pytest_sessionfinish(session: Any, exitstatus: int) -> None: | ||
| """Clean up the database after all tests.""" | ||
| if _connection or _engine: | ||
| loop = asyncio.new_event_loop() | ||
| asyncio.set_event_loop(loop) | ||
| if _connection: | ||
| loop.run_until_complete(_connection.close()) | ||
| if _engine: | ||
| loop.run_until_complete(_engine.dispose()) |
There was a problem hiding this comment.
The DB engine/connection are created in pytest_configure() on a brand-new event loop and then reused in async tests that run under pytest-asyncio’s own loop. Async SQLAlchemy connections are bound to the loop they were created in, so this pattern is very likely to fail with “attached to a different event loop” errors. Prefer session-scoped async fixtures (and, if needed, a session-scoped event_loop fixture) to create/cleanup the engine/connection within the same loop pytest-asyncio uses.
| @pytest.fixture(scope="session") | ||
| def db_engine() -> AsyncEngine: | ||
| """Provide the test database engine.""" | ||
| assert _engine is not None | ||
| return _engine | ||
|
|
||
|
|
||
| @pytest.fixture(scope="session") | ||
| async def db_connection( | ||
| db_engine: AsyncEngine, | ||
| ) -> AsyncGenerator[AsyncConnection, None]: | ||
| """Session-scoped connection shared across all tests.""" | ||
| async with db_engine.connect() as connection: | ||
| yield connection | ||
| def db_connection() -> AsyncConnection: | ||
| """Provide the database connection for tests.""" | ||
| assert _connection is not None | ||
| return _connection |
There was a problem hiding this comment.
db_connection() returns a session-scoped AsyncConnection that’s shared across all tests, but there’s no session-scoped event loop anymore (the previous event_loop fixture was removed). With asyncio_mode="auto", pytest-asyncio may create per-test loops, which makes sharing a single AsyncConnection unsafe. Consider restoring a session-scoped loop (and creating the engine/connection within it) or switching to per-test connections/sessions.
No description provided.