Skip to content

feat: add CV upload functionality with parsing and storage#1

Closed
ABIvan-Tech wants to merge 1 commit intomainfrom
feat/description-of-changes
Closed

feat: add CV upload functionality with parsing and storage#1
ABIvan-Tech wants to merge 1 commit intomainfrom
feat/description-of-changes

Conversation

@ABIvan-Tech
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings March 4, 2026 13:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “upload CV as PDF” flow to the bot, extracting text from uploaded PDFs and storing it on the user for use in subsequent chat generations.

Changes:

  • Add /upload_cv command + document handler to download a PDF, extract text, store it on User, and trigger a generation.
  • Introduce pypdf dependency and a small PDF text-extraction service.
  • Add DB schema support (user columns + Alembic migration) and update CI/dev tooling configs.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/mitko/bot/handlers.py Adds /upload_cv and PDF document upload handling, rate limiting, persistence, and generation trigger.
src/mitko/services/cv_parser.py New service to extract text from PDF bytes using pypdf.
src/mitko/models/user.py Adds cv_text and cv_uploaded_at fields.
alembic/versions/041_add_cv_fields_to_users.py Migration adding the new user columns.
src/mitko/i18n/base.py Adds i18n dataclass structure for CV upload messages.
src/mitko/i18n/en.py / src/mitko/i18n/ru.py Adds English/Russian strings for the CV upload flow.
pyproject.toml Adds pypdf dependency and adjusts pytest config stanza placement.
uv.lock Locks pypdf dependency.
tests/conftest.py Reworks async DB fixture initialization and model imports for tests.
src/mitko/agents/config.py Switches model objects to lazy-initialized cached getters.
devops/docker-compose.pgvector.yml Adds a pgvector docker-compose file for local dev.
.github/workflows/ci.yml Reworks CI setup steps (Python/pip-based install, lint/type/test commands).
.gitignore Adds ignores for .github/ and .agent-memory/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +20
strategy:
matrix:
python-version: [3.11]
steps:
- uses: actions/checkout@v4

- name: Set up uv
uses: astral-sh/setup-uv@v5
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.12"
python-version: ${{ matrix.python-version }}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is set to Python 3.11, but the project declares requires-python = ">=3.12" (and .python-version is 3.12). This will cause installs/tests to fail or diverge from supported runtime. Update the matrix to 3.12 (and/or add 3.12 alongside 3.11 only if the project truly supports it).

Copilot uses AI. Check for mistakes.
Comment on lines 41 to +42
- name: Run tests
env:
DATABASE_URL: "sqlite+aiosqlite:///:memory:"
TELEGRAM_BOT_TOKEN: "test-token"
ADMIN_GROUP_ID: "-1001234567890"
OPENAI_API_KEY: "test-key"
ANTHROPIC_API_KEY: "test-key"
run: uv run pytest
run: uv run pytest -q
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous workflow exported required env vars for SETTINGS to import (DATABASE_URL, TELEGRAM_BOT_TOKEN, ADMIN_GROUP_ID, OPENAI_API_KEY, etc.). Those were removed, but src/mitko/config.py loads SETTINGS at import time and requires these variables, so tests/lint/typecheck will fail in CI unless you set them again (or change config loading for tests).

Copilot uses AI. Check for mistakes.

class CVParserService:
def __init__(self, session: AsyncSession) -> None:
self.session = session
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CVParserService stores an AsyncSession but extract_text() never uses it. If the session isn’t needed, drop it from the constructor (or make extract_text a @staticmethod/module function) to avoid implying DB side effects and simplify usage/testing.

Suggested change
self.session = session
# The session is accepted for interface compatibility but not stored,
# as this service currently performs only pure PDF text extraction.
pass

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +34
revision: str = 'c917afd58f41'
down_revision: str | None = '6991d7de6488'
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None


def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True))
op.add_column(
'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True)
)
# ### end Alembic commands ###


def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column('users', 'cv_uploaded_at')
op.drop_column('users', 'cv_text')
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration uses single quotes for revision/down_revision and table/column names, while the surrounding migrations use double quotes. For consistency (and to minimize noisy diffs in future autogenerated migrations), align quoting/style with the existing convention in alembic/versions/*.py.

Suggested change
revision: str = 'c917afd58f41'
down_revision: str | None = '6991d7de6488'
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('users', sa.Column('cv_text', sa.Text(), nullable=True))
op.add_column(
'users', sa.Column('cv_uploaded_at', sa.DateTime(timezone=True), nullable=True)
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column('users', 'cv_uploaded_at')
op.drop_column('users', 'cv_text')
revision: str = "c917afd58f41"
down_revision: str | None = "6991d7de6488"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column("users", sa.Column("cv_text", sa.Text(), nullable=True))
op.add_column(
"users", sa.Column("cv_uploaded_at", sa.DateTime(timezone=True), nullable=True)
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column("users", "cv_uploaded_at")
op.drop_column("users", "cv_text")

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +43
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ".[dev]"

- name: Lint code
run: uv run ruff check src/ tests/
- name: Lint (ruff)
run: uv run ruff check src/ tests/ || true

- name: Type check
run: uv run pyright
- name: Type check (pyright)
run: uv run pyright || true

- name: Run tests
env:
DATABASE_URL: "sqlite+aiosqlite:///:memory:"
TELEGRAM_BOT_TOKEN: "test-token"
ADMIN_GROUP_ID: "-1001234567890"
OPENAI_API_KEY: "test-key"
ANTHROPIC_API_KEY: "test-key"
run: uv run pytest
run: uv run pytest -q

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow installs dependencies with pip, but all subsequent commands use uv run ... even though uv is no longer installed (the previous setup-uv step was removed). Either restore uv installation (e.g., astral-sh/setup-uv) or run ruff, pyright, and pytest directly without uv run.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +11
_openai_model = None
_openai_responses_model = None
_anthropic_model = None
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These module-level caches are untyped (_openai_model = None, etc.) but later assigned model instances. With pyright in strict mode this typically produces type errors (assigning OpenAIChatModel into a variable inferred as NoneType). Add explicit annotations like OpenAIChatModel | None (and same for the others) to keep type checking clean.

Suggested change
_openai_model = None
_openai_responses_model = None
_anthropic_model = None
_openai_model: OpenAIChatModel | None = None
_openai_responses_model: OpenAIResponsesModel | None = None
_anthropic_model: AnthropicModel | None = None

Copilot uses AI. Check for mistakes.
version: "3.8"
services:
db:
image: ankane/pgvector:latest
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compose file pins ankane/pgvector to latest, which can introduce breaking changes unexpectedly when the image updates. Prefer pinning to a specific version tag (or digest) so dev/test environments are reproducible.

Suggested change
image: ankane/pgvector:latest
image: ankane/pgvector:pg16

Copilot uses AI. Check for mistakes.
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ".[dev]"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI installs .[dev] via pip, but test dependencies like aiosqlite appear to live under [dependency-groups].dev (uv-only) rather than [project.optional-dependencies].dev. Since tests/conftest.py uses sqlite+aiosqlite, pip-based CI may miss aiosqlite and fail. Either add those packages to the pip dev extra or switch CI back to uv sync --extra dev / dependency groups.

Suggested change
pip install ".[dev]"
python -m pip install uv
uv sync --group dev

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +77
def pytest_configure(config: Any) -> None:
"""Initialize the test database at pytest startup."""
global _engine, _connection
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
_engine, _connection = loop.run_until_complete(_init_db())


def pytest_sessionfinish(session: Any, exitstatus: int) -> None:
"""Clean up the database after all tests."""
if _connection or _engine:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
if _connection:
loop.run_until_complete(_connection.close())
if _engine:
loop.run_until_complete(_engine.dispose())
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DB engine/connection are created in pytest_configure() on a brand-new event loop and then reused in async tests that run under pytest-asyncio’s own loop. Async SQLAlchemy connections are bound to the loop they were created in, so this pattern is very likely to fail with “attached to a different event loop” errors. Prefer session-scoped async fixtures (and, if needed, a session-scoped event_loop fixture) to create/cleanup the engine/connection within the same loop pytest-asyncio uses.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +91
@pytest.fixture(scope="session")
def db_engine() -> AsyncEngine:
"""Provide the test database engine."""
assert _engine is not None
return _engine


@pytest.fixture(scope="session")
async def db_connection(
db_engine: AsyncEngine,
) -> AsyncGenerator[AsyncConnection, None]:
"""Session-scoped connection shared across all tests."""
async with db_engine.connect() as connection:
yield connection
def db_connection() -> AsyncConnection:
"""Provide the database connection for tests."""
assert _connection is not None
return _connection
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_connection() returns a session-scoped AsyncConnection that’s shared across all tests, but there’s no session-scoped event loop anymore (the previous event_loop fixture was removed). With asyncio_mode="auto", pytest-asyncio may create per-test loops, which makes sharing a single AsyncConnection unsafe. Consider restoring a session-scoped loop (and creating the engine/connection within it) or switching to per-test connections/sessions.

Copilot uses AI. Check for mistakes.
@ABIvan-Tech ABIvan-Tech closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants