-
Notifications
You must be signed in to change notification settings - Fork 2
Python web scraper #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
FilippoMarletta
wants to merge
70
commits into
UNICT-DMI:master
Choose a base branch
from
FilippoMarletta:py_scraper
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,073
−1
Open
Changes from all commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
19d1b5c
chore: setup devcontainer and project structure
FilippoMarletta 4088a4b
feat: implement get_departments and models
FilippoMarletta f783e35
chore: add .gitattributes to enforce LF line endings
FilippoMarletta 171de79
build: add pytest-mock and pytest-cov to requirements.txt
FilippoMarletta a4d2683
feat: add get_courses function and update models for course data hand…
FilippoMarletta 3a8b6b3
feat: add tests for get_departments and get_courses functions
FilippoMarletta 288d224
feat: implement parse_course_name function and add corresponding tests
FilippoMarletta 54abfe1
chore: update .gitignore to include coverage and cache files
FilippoMarletta 519220d
build: update vscode extensions
FilippoMarletta db67ba1
fix: prevents pylance from crushing
FilippoMarletta 47f19f9
feat: add tests for get_activities and parse_insegnamento_data functions
FilippoMarletta a5ea68f
feat: add get_activities function, insegnamento dataclass and parsing…
FilippoMarletta 50068a3
feat: add SchedaOpis dataclass
FilippoMarletta f7d7893
feat: add function parse_scheda_opis and correspondig tests
FilippoMarletta 1c85d9c
feat: add get_questions function and update related models and transf…
FilippoMarletta de2e526
feat: implement scraper functionality with logging and data processing
FilippoMarletta 20696ad
refactor: use requests.Session to improve scraping speed
FilippoMarletta d73c5e1
feat: enhance API client with logging and timeout management
FilippoMarletta b08d43f
feat: extend SchedaOpis model with additional and previously fields
FilippoMarletta 99e4c04
build: add mysql-connector and python-dotenv dependecies and update c…
FilippoMarletta 29f402a
fix: update parse_course_name regex and rewrite parse_scheda_opis to …
FilippoMarletta de010a4
feat: implement database connection and CRUD operations for departmen…
FilippoMarletta 5afaabc
feat: enhance scraper functionality with concurrent processing and im…
FilippoMarletta b6290fb
fixt: update parse_course_name regex for improved matching and add co…
FilippoMarletta 47502a5
fix: ensure professor names default to empty string if not present in…
FilippoMarletta 0092fd2
feat: add random sampling of activities and departments in debug mode…
FilippoMarletta 4778821
fix: update parse_course_name regex to support 'c.u.' and 'cu' format…
FilippoMarletta 3a85d1c
fix: add previously missing nome_modulo field to Insegnamento model a…
FilippoMarletta 6c833c6
refactor: streamline parse_scheda_opis_data function by removing unus…
FilippoMarletta 31b9759
fix: update mock API calls to use session.post and adjust test data f…
FilippoMarletta 0ff9519
fix: temporary solution to handle missing or invalid activity codes
FilippoMarletta e028436
chore: load DEBUG_MODE from .env for better configuration management
FilippoMarletta 689c257
feat: add additional case for alfanumeric activityCode
FilippoMarletta 5264dfc
feat: add unit tests for database.py functions
FilippoMarletta 60f3987
feat: enhance error handling in database insertion functions and add …
FilippoMarletta 0758720
chore: update parse_course_name to accept optional full_name and enha…
FilippoMarletta d98d523
build: update devcontainer settings for improved Python development e…
FilippoMarletta 349b1e1
fix: update process_activity and process_course function signatures t…
FilippoMarletta 35b8edf
fix: add missing space in postCreateCommand
FilippoMarletta 68ec559
feat: add CI workflow for Python testing
FilippoMarletta c92f995
fix: normalize case for "Non Frequentanti" in parse_scheda_opis_data …
FilippoMarletta 07bb0fc
fix: update mock_opis_json to include "Studenti Non Frequentanti" dat…
FilippoMarletta d221b1c
feat: add tests for database connection failure and handling inserts …
FilippoMarletta 69f14ce
chore: update checkout action version to v5 in CI workflow
FilippoMarletta 1a77886
chore: update checkout action version to v6 in CI workflow
FilippoMarletta 2dd88e1
chore: update setup-python action version to v6 in CI workflow
FilippoMarletta 90e8af1
fix: rename test execution step for clarity in CI workflow
FilippoMarletta d2822b9
ci: update CI workflow for linting, type checking and testing
FilippoMarletta d218dda
style: linting with black
FilippoMarletta 3e86721
ci: set minimum coverage threshold to 80%
FilippoMarletta e9396e3
docs: add README.md
FilippoMarletta dcd244e
chore: add .env.example
FilippoMarletta 89a476a
docs: update environment variables section
FilippoMarletta 07996c1
fix: add missing field in Insegnamento
FilippoMarletta d74f15f
feat: add assign_channels function
FilippoMarletta fd43c8b
test: add tests for assign_channels function and adapt previous tests…
FilippoMarletta c85fefe
fix: more general regex for parse_course_name
FilippoMarletta cb1a547
tests: add 2 test cases for test_parse_course_name
FilippoMarletta 8d8c150
feat: enriches debug mode with more customization
FilippoMarletta d2e0669
style: black linting
FilippoMarletta f6291b2
ci: add pylint, flake8, mypy e isort to CI pipeline and split require…
FilippoMarletta d565ebc
feat: add .pylintrc
FilippoMarletta cf7d43a
chore: add Makefile for quick linting checks
FilippoMarletta 10ababa
chore: update devcontainer to python 3.14 and postCreateCommand
FilippoMarletta 928bb93
style: fix some linting issues across multiple files
FilippoMarletta 3736359
refactor: extract _process_cluster_data and _process_graph_pie from p…
FilippoMarletta 5266bfa
Refactor: fix strict linting and typing issues
FilippoMarletta b519687
ci: update isort command to run with black profile
FilippoMarletta 03cea9c
refactor: add missing return types and make insert_schede_opis return…
FilippoMarletta 653b295
Merge branch 'master' into py_scraper
FilippoMarletta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| name: CI | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main, master, py_scraper] | ||
| pull_request: | ||
| branches: [ main, master ] | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| defaults: | ||
| run: | ||
| working-directory: ./python_scraper | ||
|
|
||
| steps: | ||
| - name: Checkout del codice | ||
| uses: actions/checkout@v6 | ||
|
|
||
| - name: Setup Python 3.14 | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.14" | ||
| cache: "pip" | ||
|
|
||
| - name: Installazione dipendenze | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| if [ -f requirements.txt ]; then pip install -r requirements.txt; fi | ||
| if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi | ||
| - name: Controllo Tipizzazione | ||
| run: | | ||
| pyright src tests | ||
| - name: Esecuzione dei test con Coverage | ||
| run: | | ||
| pytest --cov=src --cov-report=term-missing --cov-fail-under=80 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| { | ||
| "name": "OPIS Python Scraper", | ||
| "image": "mcr.microsoft.com/devcontainers/python:3.14", | ||
| "customizations": { | ||
| "vscode": { | ||
| "settings": { | ||
| "python.defaultInterpreterPath": "/usr/local/bin/python", | ||
| "python.languageServer": "Pylance", | ||
| "python.analysis.nodeExecutable": "auto", | ||
| "python.analysis.typeCheckingMode": "standard", | ||
| "python.analysis.autoImportCompletions": true, | ||
| "editor.defaultFormatter": "ms-python.black-formatter", | ||
| "editor.formatOnSave": true, | ||
| "python.formatting.provider": "none", | ||
| "black-formatter.importStrategy": "fromEnvironment", | ||
| "black-formatter.path": [ | ||
| "black" | ||
| ], | ||
| "python.analysis.exclude": [ | ||
| "**/__pycache__", | ||
| "**/.venv", | ||
| "**/node_modules", | ||
| "**/dist", | ||
| "**/build" | ||
| ], | ||
| "python.testing.pytestEnabled": true, | ||
| "python.testing.pytestArgs": [ | ||
| "." | ||
| ] | ||
| }, | ||
| "extensions": [ | ||
| "ms-python.python", | ||
| "ms-python.vscode-pylance", | ||
| "ms-python.debugpy", | ||
| "ms-python.black-formatter", | ||
| "njpwerner.autodocstring", | ||
| "KevinRose.vsc-python-indent", | ||
| "GitHub.copilot-chat" | ||
| ] | ||
| } | ||
| }, | ||
| "postCreateCommand": "sudo pip install --upgrade pip --root-user-action=ignore && pip install -r requirements.txt -r requirements_dev.txt", | ||
| "remoteUser": "vscode" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| DB_HOST= 127.0.0.1 | ||
| DB_PORT= 3306 | ||
| DB_DATABASE= opis_manager | ||
| DB_USERNAME=root | ||
| DB_PASSWORD= | ||
|
|
||
| DEBUG_MODE=False | ||
| DEBUG_NUM_ACTIVITIES=5 | ||
| DEBUG_NUM_COURSES=1 | ||
| DEBUG_NUM_DEPARTMENTS=1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Usa sempre i fine riga stile Linux (LF) quando committi e scarichi. | ||
| * text=auto eol=lf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *$py.class | ||
|
|
||
| # C extensions | ||
| *.so | ||
|
|
||
| # Distribution / packaging | ||
| .Python | ||
| build/ | ||
| develop-eggs/ | ||
| dist/ | ||
| downloads/ | ||
| eggs/ | ||
| .eggs/ | ||
| lib/ | ||
| lib64/ | ||
| parts/ | ||
| sdist/ | ||
| var/ | ||
| wheels/ | ||
| *.egg-info/ | ||
| .installed.cfg | ||
| *.egg | ||
|
|
||
| # Virtual environments | ||
| venv/ | ||
| env/ | ||
| ENV/ | ||
|
|
||
| .vscode/ | ||
|
|
||
| .coverage | ||
| .pytest_cache/ | ||
| .mypy_cache/ | ||
| calc_cov/ | ||
|
|
||
| .env |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.