feat: add CV upload functionality with parsing and storage by ABIvan-Tech · Pull Request #1 · vzakharov/mitko

ABIvan-Tech · 2026-03-04T13:38:53Z

No description provided.

Copilot

Pull request overview

Adds a new “upload CV as PDF” flow to the bot, extracting text from uploaded PDFs and storing it on the user for use in subsequent chat generations.

Changes:

Add /upload_cv command + document handler to download a PDF, extract text, store it on User, and trigger a generation.
Introduce pypdf dependency and a small PDF text-extraction service.
Add DB schema support (user columns + Alembic migration) and update CI/dev tooling configs.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`src/mitko/bot/handlers.py`	Adds `/upload_cv` and PDF document upload handling, rate limiting, persistence, and generation trigger.
`src/mitko/services/cv_parser.py`	New service to extract text from PDF bytes using `pypdf`.
`src/mitko/models/user.py`	Adds `cv_text` and `cv_uploaded_at` fields.
`alembic/versions/041_add_cv_fields_to_users.py`	Migration adding the new user columns.
`src/mitko/i18n/base.py`	Adds i18n dataclass structure for CV upload messages.
`src/mitko/i18n/en.py` / `src/mitko/i18n/ru.py`	Adds English/Russian strings for the CV upload flow.
`pyproject.toml`	Adds `pypdf` dependency and adjusts pytest config stanza placement.
`uv.lock`	Locks `pypdf` dependency.
`tests/conftest.py`	Reworks async DB fixture initialization and model imports for tests.
`src/mitko/agents/config.py`	Switches model objects to lazy-initialized cached getters.
`devops/docker-compose.pgvector.yml`	Adds a pgvector docker-compose file for local dev.
`.github/workflows/ci.yml`	Reworks CI setup steps (Python/pip-based install, lint/type/test commands).
`.gitignore`	Adds ignores for `.github/` and `.agent-memory/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T13:46:06Z

.github/workflows/ci.yml

+    strategy:
+      matrix:
+        python-version: [3.11]
    steps:
      - uses: actions/checkout@v4

-      - name: Set up uv
-        uses: astral-sh/setup-uv@v5
+      - name: Set up Python
+        uses: actions/setup-python@v4
        with:
-          python-version: "3.12"
+          python-version: ${{ matrix.python-version }}


CI is set to Python 3.11, but the project declares requires-python = ">=3.12" (and .python-version is 3.12). This will cause installs/tests to fail or diverge from supported runtime. Update the matrix to 3.12 (and/or add 3.12 alongside 3.11 only if the project truly supports it).

Copilot · 2026-03-04T13:46:07Z

.github/workflows/ci.yml

      - name: Run tests
-        env:
-          DATABASE_URL: "sqlite+aiosqlite:///:memory:"
-          TELEGRAM_BOT_TOKEN: "test-token"
-          ADMIN_GROUP_ID: "-1001234567890"
-          OPENAI_API_KEY: "test-key"
-          ANTHROPIC_API_KEY: "test-key"
-        run: uv run pytest
+        run: uv run pytest -q


The previous workflow exported required env vars for SETTINGS to import (DATABASE_URL, TELEGRAM_BOT_TOKEN, ADMIN_GROUP_ID, OPENAI_API_KEY, etc.). Those were removed, but src/mitko/config.py loads SETTINGS at import time and requires these variables, so tests/lint/typecheck will fail in CI unless you set them again (or change config loading for tests).

Copilot · 2026-03-04T13:46:07Z

src/mitko/services/cv_parser.py

+
+class CVParserService:
+    def __init__(self, session: AsyncSession) -> None:
+        self.session = session


CVParserService stores an AsyncSession but extract_text() never uses it. If the session isn’t needed, drop it from the constructor (or make extract_text a @staticmethod/module function) to avoid implying DB side effects and simplify usage/testing.

Suggested change

self.session = session

# The session is accepted for interface compatibility but not stored,

# as this service currently performs only pure PDF text extraction.

pass

Copilot · 2026-03-04T13:46:07Z

alembic/versions/041_add_cv_fields_to_users.py

+revision: str = 'c917afd58f41'
+down_revision: str | None = '6991d7de6488'
+branch_labels: str | Sequence[str] | None = None
+depends_on: str | Sequence[str] | None = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True))
+    op.add_column(
+        'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True)
+    )
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_column('users', 'cv_uploaded_at')
+    op.drop_column('users', 'cv_text')


This migration uses single quotes for revision/down_revision and table/column names, while the surrounding migrations use double quotes. For consistency (and to minimize noisy diffs in future autogenerated migrations), align quoting/style with the existing convention in alembic/versions/*.py.

Suggested change

revision: str = 'c917afd58f41'

down_revision: str | None = '6991d7de6488'

branch_labels: str | Sequence[str] | None = None

depends_on: str | Sequence[str] | None = None

def upgrade() -> None:

# ### commands auto generated by Alembic - please adjust! ###

op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True))

op.add_column(

'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True)

)

# ### end Alembic commands ###

def downgrade() -> None:

# ### commands auto generated by Alembic - please adjust! ###

op.drop_column('users', 'cv_uploaded_at')

op.drop_column('users', 'cv_text')

revision: str = "c917afd58f41"

down_revision: str | None = "6991d7de6488"

branch_labels: str | Sequence[str] | None = None

depends_on: str | Sequence[str] | None = None

def upgrade() -> None:

# ### commands auto generated by Alembic - please adjust! ###

op.add_column("users", sa.Column("cv_text", sa.Text(), nullable=True))

op.add_column(

"users", sa.Column("cv_uploaded_at", sa.DateTime(timezone=True), nullable=True)

)

# ### end Alembic commands ###

def downgrade() -> None:

# ### commands auto generated by Alembic - please adjust! ###

op.drop_column("users", "cv_uploaded_at")

op.drop_column("users", "cv_text")

Copilot · 2026-03-04T13:46:08Z

.github/workflows/ci.yml

+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install ".[dev]"

-      - name: Lint code
-        run: uv run ruff check src/ tests/
+      - name: Lint (ruff)
+        run: uv run ruff check src/ tests/ || true

-      - name: Type check
-        run: uv run pyright
+      - name: Type check (pyright)
+        run: uv run pyright || true

      - name: Run tests
-        env:
-          DATABASE_URL: "sqlite+aiosqlite:///:memory:"
-          TELEGRAM_BOT_TOKEN: "test-token"
-          ADMIN_GROUP_ID: "-1001234567890"
-          OPENAI_API_KEY: "test-key"
-          ANTHROPIC_API_KEY: "test-key"
-        run: uv run pytest
+        run: uv run pytest -q
+


The workflow installs dependencies with pip, but all subsequent commands use uv run ... even though uv is no longer installed (the previous setup-uv step was removed). Either restore uv installation (e.g., astral-sh/setup-uv) or run ruff, pyright, and pytest directly without uv run.

Copilot · 2026-03-04T13:46:09Z

src/mitko/agents/config.py

+_openai_model = None
+_openai_responses_model = None
+_anthropic_model = None


These module-level caches are untyped (_openai_model = None, etc.) but later assigned model instances. With pyright in strict mode this typically produces type errors (assigning OpenAIChatModel into a variable inferred as NoneType). Add explicit annotations like OpenAIChatModel | None (and same for the others) to keep type checking clean.

Suggested change

_openai_model = None

_openai_responses_model = None

_anthropic_model = None

_openai_model: OpenAIChatModel | None = None

_openai_responses_model: OpenAIResponsesModel | None = None

_anthropic_model: AnthropicModel | None = None

Copilot · 2026-03-04T13:46:09Z

devops/docker-compose.pgvector.yml

+version: "3.8"
+services:
+  db:
+    image: ankane/pgvector:latest


The compose file pins ankane/pgvector to latest, which can introduce breaking changes unexpectedly when the image updates. Prefer pinning to a specific version tag (or digest) so dev/test environments are reproducible.

Suggested change

image: ankane/pgvector:latest

image: ankane/pgvector:pg16

Copilot · 2026-03-04T13:46:09Z

.github/workflows/ci.yml

+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install ".[dev]"


CI installs .[dev] via pip, but test dependencies like aiosqlite appear to live under [dependency-groups].dev (uv-only) rather than [project.optional-dependencies].dev. Since tests/conftest.py uses sqlite+aiosqlite, pip-based CI may miss aiosqlite and fail. Either add those packages to the pip dev extra or switch CI back to uv sync --extra dev / dependency groups.

Suggested change

pip install ".[dev]"

python -m pip install uv

uv sync --group dev

Copilot · 2026-03-04T13:46:10Z

tests/conftest.py

+def pytest_configure(config: Any) -> None:
+    """Initialize the test database at pytest startup."""
+    global _engine, _connection
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+    _engine, _connection = loop.run_until_complete(_init_db())
+
+
+def pytest_sessionfinish(session: Any, exitstatus: int) -> None:
+    """Clean up the database after all tests."""
+    if _connection or _engine:
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+        if _connection:
+            loop.run_until_complete(_connection.close())
+        if _engine:
+            loop.run_until_complete(_engine.dispose())


The DB engine/connection are created in pytest_configure() on a brand-new event loop and then reused in async tests that run under pytest-asyncio’s own loop. Async SQLAlchemy connections are bound to the loop they were created in, so this pattern is very likely to fail with “attached to a different event loop” errors. Prefer session-scoped async fixtures (and, if needed, a session-scoped event_loop fixture) to create/cleanup the engine/connection within the same loop pytest-asyncio uses.

Copilot · 2026-03-04T13:46:10Z

tests/conftest.py

+@pytest.fixture(scope="session")
+def db_engine() -> AsyncEngine:
+    """Provide the test database engine."""
+    assert _engine is not None
+    return _engine
+
+
 @pytest.fixture(scope="session")
-async def db_connection(
-    db_engine: AsyncEngine,
-) -> AsyncGenerator[AsyncConnection, None]:
-    """Session-scoped connection shared across all tests."""
-    async with db_engine.connect() as connection:
-        yield connection
+def db_connection() -> AsyncConnection:
+    """Provide the database connection for tests."""
+    assert _connection is not None
+    return _connection


db_connection() returns a session-scoped AsyncConnection that’s shared across all tests, but there’s no session-scoped event loop anymore (the previous event_loop fixture was removed). With asyncio_mode="auto", pytest-asyncio may create per-test loops, which makes sharing a single AsyncConnection unsafe. Consider restoring a session-scoped loop (and creating the engine/connection within it) or switching to per-test connections/sessions.

feat: add CV upload functionality with parsing and storage

6184e1b

Copilot AI review requested due to automatic review settings March 4, 2026 13:38

Copilot started reviewing on behalf of ABIvan-Tech March 4, 2026 13:39 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

ABIvan-Tech closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CV upload functionality with parsing and storage#1

feat: add CV upload functionality with parsing and storage#1
ABIvan-Tech wants to merge 1 commit intomainfrom
feat/description-of-changes

ABIvan-Tech commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        self.session = session
+        # The session is accepted for interface compatibility but not stored,
+        # as this service currently performs only pure PDF text extraction.
+        pass

	pip install ".[dev]"
	python -m pip install uv
	uv sync --group dev

Conversation

ABIvan-Tech commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants