A comprehensive Django REST API for managing job scrapers, companies, and job listings with real-time search capabilities.
Pe Viitor is a robust web service designed to automate job data collection from various websites and provide powerful search and filtering capabilities. The platform allows you to manage job scrapers, track companies, and serve job data through a modern REST API with real-time search powered by Apache Solr.
- Create, update, and delete company profiles
- Track company job statistics and historical data
- Link companies to their data sources and scrapers
- Company-specific job clearing and synchronization
- Add jobs programmatically via scrapers or manual input
- Edit job details including location, remote options, and publication status
- Publish/unpublish jobs with automatic Solr search index updates
- Filter jobs by company, location, remote work options, and publication status
- Advanced job search with infinite scroll pagination
- Support for Python and JavaScript scrapers
- Automated dependency installation (requirements.txt, package.json)
- Git repository integration for scraper code management
- Containerized scraper execution with Docker
- Scraper testing and validation framework
- Automatic updates from Git repositories
- Apache Solr-powered job search engine
- Fast, full-text search across job titles, companies, and locations
- Advanced filtering by multiple criteria
- Optimized for high-performance queries
- Custom user authentication with email-based login
- JWT token authentication for API access
- Role-based permissions (superuser, company-specific access)
- User-company and user-scraper associations
- Real-time notifications via WebSockets (Django Channels)
- Newsletter subscription management
- Mobile API endpoints
- City/location management (orase module)
- Background task scheduling with APScheduler
- Backend: Django 4.2, Django REST Framework
- Database: MySQL/PostgreSQL with PyMySQL connector
- Search Engine: Apache Solr for real-time job search
- Cache/Message Broker: Redis for caching and WebSocket support
- Real-time Features: Django Channels with WebSocket support
- Authentication: JWT tokens with Django REST Framework SimpleJWT
- Task Scheduling: APScheduler for background jobs
- Containerization: Docker for scraper execution
- Image Processing: Pillow for company logos and images
- Python 3.9+
- MySQL or PostgreSQL database
- Apache Solr instance
- Redis server (for real-time features)
- Git
git clone https://github.com/peviitor-ro/scraper_Api.git
cd scraper_Api/scraper_Apisudo apt update
sudo apt install -y python3-django python3-djangorestframework python3-pymysql \
python3-dotenv python3-requests python3-pil python3-redis \
python3-channels python3-channels-redispip install -r requirements.txtCreate a .env file in the project root:
# Database Configuration
DEBUG=True
DB_NAME=your_database_name
DB_USER=your_database_user
DB_PASSWORD=your_database_password
DB_HOST=localhost
DB_PORT=3306
# Solr Configuration
DATABASE_SOLR=http://localhost:8983
DATABASE_SOLR_USERNAME=your_solr_username
DATABASE_SOLR_PASSWORD=your_solr_password
# Email Configuration (for notifications)
EMAIL_HOST_USER=your_email@domain.com
EMAIL_HOST_PASSWORD=your_email_password
# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000# Run migrations
python manage.py migrate
# Create a superuser account
python manage.py createsuperuser# Start the Django development server
python manage.py runserver
# The API will be available at http://localhost:8000The API uses JWT token authentication. First, obtain a token:
curl -X POST http://localhost:8000/get_token \
-H "Content-Type: application/json" \
-d '{"email": "your_email@domain.com", "password": "your_password"}'Use the token in subsequent requests:
curl -H "Authorization: Bearer your_jwt_token" http://localhost:8000/endpointcurl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/companies/"curl -X POST http://localhost:8000/companies/add/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{
"company": "Tech Corp",
"scname": "TechCorp",
"website": "https://techcorp.com",
"description": "Leading technology company"
}'curl -X PUT http://localhost:8000/companies/update/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{
"id": 1,
"company": "Updated Tech Corp",
"website": "https://newtechcorp.com"
}'# Get all published jobs
curl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/jobs/get/"
# Filter jobs by company
curl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/jobs/get/?company=1"
# Filter jobs by city and remote options
curl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/jobs/get/?city=Bucharest&remote=true"
# Pagination and sorting
curl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/jobs/get/?page=1&limit=20&sort=created_date"curl -X POST http://localhost:8000/jobs/add/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{
"company": 1,
"job_title": "Python Developer",
"job_link": "https://company.com/jobs/python-dev",
"country": "Romania",
"city": "Bucharest, Cluj-Napoca",
"county": "Bucharest, Cluj",
"remote": "hybrid, full-remote"
}'# Publish job (makes it searchable)
curl -X POST http://localhost:8000/jobs/publish/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"job_id": 123}'# Sync all jobs for a company with Solr search index
curl -X POST http://localhost:8000/jobs/sync/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"company": 1}'curl -X POST http://localhost:8000/scraper/add/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com/username/job-scraper.git"}'# List files in a scraper repository
curl -H "Authorization: Bearer your_jwt_token" \
"http://localhost:8000/scraper/your-repo-name/"# Run a specific scraper file
curl -X POST http://localhost:8000/scraper/your-repo-name/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"file": "scraper.py"}'
# Force run a scraper (ignore recent runs)
curl -X POST http://localhost:8000/scraper/your-repo-name/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"file": "scraper.py", "force": "true"}'# Pull latest changes from Git
curl -X POST http://localhost:8000/scraper/your-repo-name/ \
-H "Authorization: Bearer your_jwt_token" \
-H "Content-Type: application/json" \
-d '{"update": "true"}'import requests
# Configuration
API_BASE = "http://localhost:8000"
TOKEN = "your_jwt_token"
HEADERS = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
}
# Get companies
response = requests.get(f"{API_BASE}/companies/", headers=HEADERS)
companies = response.json()
# Add a new job
job_data = {
"company": 1,
"job_title": "Senior Django Developer",
"job_link": "https://example.com/job/123",
"country": "Romania",
"city": "Bucharest",
"county": "Bucharest",
"remote": "hybrid"
}
response = requests.post(f"{API_BASE}/jobs/add/", json=job_data, headers=HEADERS)
# Search jobs with filters
params = {
"city": "Bucharest",
"remote": "true",
"company": 1,
"page": 1,
"limit": 20
}
response = requests.get(f"{API_BASE}/jobs/get/", params=params, headers=HEADERS)
jobs = response.json()
# Run a scraper
scraper_data = {"file": "companies/example_scraper.py"}
response = requests.post(
f"{API_BASE}/scraper/example-repo/",
json=scraper_data,
headers=HEADERS
)scraper_Api/
βββ scraper_Api/ # Main Django project
β βββ scraper_Api/ # Project settings and configuration
β β βββ settings.py # Main settings
β β βββ test_settings.py # Test-specific settings
β β βββ urls.py # URL routing
β β βββ wsgi.py/asgi.py # WSGI/ASGI configuration
β β
β βββ company/ # Company management app
β β βββ models.py # Company, Source, DataSet models
β β βββ views.py # Company CRUD operations
β β βββ serializers.py # API serializers
β β βββ urls.py # Company endpoints
β β
β βββ jobs/ # Job management app
β β βββ models.py # Job model with Solr integration
β β βββ views.py # Job CRUD, search, publish operations
β β βββ serializer.py # Job serializers
β β βββ urls.py # Job endpoints
β β
β βββ scraper/ # Scraper management app
β β βββ models.py # Scraper model
β β βββ views.py # Scraper execution and management
β β βββ utils/ # Scraper utilities
β β β βββ scraper.py # Core scraper logic
β β βββ urls.py # Scraper endpoints
β β
β βββ users/ # User management app
β β βββ models.py # CustomUser model
β β βββ managers.py # Custom user manager
β β βββ views.py # Authentication endpoints
β β βββ middleware.py # Rate limiting middleware
β β βββ urls.py # User/auth endpoints
β β
β βββ newsletter/ # Newsletter management
β βββ mobile/ # Mobile-specific endpoints
β βββ notifications/ # Real-time notifications
β βββ orase/ # City/location management
β βββ utils/ # Shared utilities
β β βββ pagination.py # Custom pagination
β β
β βββ static/ # Static files
β βββ templates/ # Django templates
β βββ manage.py # Django management script
β
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ README.md # This file
- Company: Main company entity with name, website, description
- Source: Data source tracking for companies
- DataSet: Historical job count data for companies
- Job: Core job entity with title, link, location, company relationship
- Integrates with Solr search engine for real-time search
- Supports publish/unpublish workflow
- Automatic ID generation using MD5 hash of job link
- CustomUser: Extends Django's AbstractBaseUser
- Email-based authentication
- Many-to-many relationships with companies and scrapers
- Automatic superuser permissions for all companies/scrapers
- Scraper: Tracks scraper repositories and metadata
- Supports Python, JavaScript, and JMeter scripts
- Linked to users for access control
# Install development dependencies
pip install -r requirements.txt
# Set up pre-commit hooks (optional)
pre-commit install
# Run with debug mode
export DEBUG=True
python manage.py runserver
# Access Django admin
http://localhost:8000/admin/# Build and run with Docker Compose
docker-compose up --build
# Run scrapers in containers
docker run -it your-scraper-image python scraper.pyDEBUG=True
DB_NAME=scraper_dev
DB_USER=dev_user
DB_PASSWORD=dev_password
DB_HOST=localhost
DATABASE_SOLR=http://localhost:8983DEBUG=False
DB_NAME=scraper_prod
DB_USER=prod_user
DB_PASSWORD=secure_password
DB_HOST=your-db-host
DATABASE_SOLR=https://your-solr-host
EMAIL_HOST_USER=noreply@yourdomain.com
EMAIL_HOST_PASSWORD=secure_email_passwordThis project includes comprehensive automated tests for all Django apps to ensure code quality and prevent regressions.
# Navigate to the project directory
cd scraper_Api
# Set the test settings
export DJANGO_SETTINGS_MODULE=scraper_Api.test_settings
# Run all tests
python manage.py test
# Run tests with verbose output
python manage.py test --verbosity=2
# Run specific app tests
python manage.py test scraper
python manage.py test users
python manage.py test company
# Run specific test classes
python manage.py test scraper.tests.ScraperModelTest
python manage.py test users.tests.CustomUserModelTest# Navigate to the project directory
cd scraper_Api
# Run all tests with pytest
python -m pytest
# Run with verbose output
python -m pytest -v
# Run specific tests
python -m pytest scraper/tests.py
python -m pytest users/tests.py -v# Navigate to the project directory
cd scraper_Api
# Run tests with coverage measurement
export DJANGO_SETTINGS_MODULE=scraper_Api.test_settings
python -m coverage run --source='.' manage.py test
# Generate coverage report
python -m coverage report
# Generate HTML coverage report
python -m coverage html
# Open htmlcov/index.html in your browser
# Generate XML coverage report (for CI/CD)
python -m coverage xmlThe project includes tests for:
- Models: Validation, relationships, constraints, and business logic
- Views: API endpoints, authentication, and response handling
- Forms: Data validation and processing
- Managers: Custom user management functionality
- Middleware: Rate limiting and request processing
- Background tasks: Newsletter sending and scheduled operations
- WebSocket consumers: Real-time notifications (when channels is available)
- Overall: 33% code coverage
- Models: 80%+ coverage on critical business logic
- User Management: 95% coverage
- Job Models: 81% coverage including Solr integration
Tests use a separate configuration (scraper_Api.test_settings.py) that:
- Uses SQLite in-memory database for speed
- Disables migrations for faster test execution
- Mocks external services (Solr, Redis, Email)
- Uses simplified authentication for testing
- Provides minimal URL configuration
We welcome contributions from the community! Here's how you can help improve Pe Viitor:
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/your-username/scraper_Api.git cd scraper_Api - Create a virtual environment and install dependencies
- Create a new branch for your feature:
git checkout -b feature/your-feature-name
- Follow PEP 8 for Python code style
- Use meaningful variable and function names
- Add docstrings to all functions and classes
- Keep functions small and focused on a single responsibility
- Use type hints where appropriate
# Run code formatting (if available)
black .
isort .
# Run linting
flake8 .
pylint your_app/
# Check for security issues
bandit -r .- Write tests for all new features and bug fixes
- Ensure test coverage doesn't decrease
- All tests must pass before submitting a PR
# Run tests before committing
python manage.py test --settings=scraper_Api.test_settings
# Check test coverage
python -m coverage run --source='.' manage.py test
python -m coverage report- Update documentation if you're changing APIs or adding features
- Update the README.md if necessary
- Add tests for new functionality
- Ensure all tests pass and coverage is maintained
- Create a pull request with:
- Clear description of changes
- Link to any related issues
- Screenshots for UI changes
- Updated documentation
- Use the GitHub issue tracker
- Include detailed reproduction steps
- Provide environment details (OS, Python version, etc.)
- Include relevant error messages and logs
- Check existing issues first
- Provide clear use case and rationale
- Include mockups or examples if applicable
- Fix typos and improve clarity
- Add examples and tutorials
- Translate documentation
- Improve API documentation
- Fix bugs and implement features
- Improve performance
- Add tests and improve coverage
- Refactor and clean up code
- Use Django's built-in features (ORM, forms, admin)
- Follow Django naming conventions
- Use class-based views appropriately
- Implement proper error handling
- Follow RESTful principles
- Use appropriate HTTP status codes
- Implement consistent response formats
- Add proper validation and error messages
- Write efficient queries
- Use database indexes appropriately
- Handle migrations carefully
- Document schema changes
# Fork and clone the repository
git clone https://github.com/your-username/scraper_Api.git
cd scraper_Api/scraper_Api
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements.txt
# (Optional) Install development requirements if the file exists
[ -f requirements-dev.txt ] && pip install -r requirements-dev.txt
# Set up pre-commit hooks
pre-commit install
# Create test database and run migrations
export DJANGO_SETTINGS_MODULE=scraper_Api.test_settings
python manage.py migrate
# Run tests to ensure everything works
python manage.py test- Be respectful and inclusive
- Help others learn and grow
- Share knowledge and experiences
- Follow the code of conduct
This project is licensed under the MIT License - see the LICENSE file for details.
- β Commercial use - Use for commercial purposes
- β Distribution - Distribute the software
- β Modification - Modify the source code
- β Private use - Use privately
- β License and copyright notice - Include license and copyright notice
- β Liability - No warranty or liability
- β Warranty - No warranty provided
MIT License
Copyright (c) 2023 peviitor
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Our team is composed of specialists and education enthusiasts who aim to make a significant contribution in the field of job market transparency and accessibility.
- Pe Viitor Team - Core development team
- Community Contributors - Thank you to all our contributors!
- Django and Django REST Framework communities
- Apache Solr community
- All contributors who have helped improve this project
- Website: peviitor.ro
- GitHub: github.com/peviitor-ro
- Issues: GitHub Issues
Made with β€οΈ by the Pe Viitor team
We are dedicated to the continuous improvement and development of this project to provide the best resources for everyone interested in job market data and transparency.