AGENTS.md - Peviitor OpenCode AI Scrapers

Project Overview

This is a local development automation project for scraping job data from peviitor.ro platform, developed by Asociația Opportunități și Cariere (NGO). It contains:

PowerShell scripts for browser automation (Chrome DevTools)
Playwright tests for validating commands
Markdown documentation files
HTML documentation in docs/
Configuration for opencode and MCP servers

Project Structure

peviitor_opencode_AI_scrapers/
├── *.ps1                   # Root PowerShell scripts (start-chrome.ps1)
├── *.md                    # Root Markdown documentation
├── opencode.json           # OpenCode configuration
├── SCHEMAS.md             # Data schemas (Job, Company models)
├── AGENTS.md              # Agent instructions and guidelines
├── INSTRUCTIONS.md        # Project instructions
├── LICENSE                # MIT License
├── CONTRIBUTING.md       # Contribution guidelines
├── CODE_OF_CONDUCT.md    # Community code of conduct
├── SECURITY.md           # Security policy
├── README.md             # Project README
├── .github/              # GitHub templates
├── webscraper/            # Web scraping data
│   ├── websites.md        # Companies and their careers pages (DEPRECATED - now in Solr company core)
│   ├── *.md               # Company-specific scraping prompts
├── docs/                  # HTML documentation
│   ├── index.html
│   ├── setup.html
│   ├── schemas.html
│   ├── commands.html
│   ├── structure.html
│   ├── tests.html
│   └── README.md
├── tests/                 # Playwright tests
│   ├── *.test.ts
│   ├── README.md
│   ├── playwright.config.ts
│   └── package.json
└── .opencode/             # OpenCode commands
    ├── package.json
    └── commands/
        ├── scrape.md
        ├── add-website.md
        ├── update-solr.md
        ├── delete-solr.md
        ├── login-solr.md
        ├── docs-update.md
        ├── instructions.md
        └── remove-404.md

Commands

Running PowerShell Scripts

# Run a single script
powershell -ExecutionPolicy Bypass -File script.ps1

# Run inline command
powershell -Command "your-command-here"

Chrome DevTools Automation

Before using Chrome DevTools MCP, you must start Chrome with remote debugging:

# Run the start script
powershell -ExecutionPolicy Bypass -File start-chrome.ps1

# Or manually
Start-Process "C:\Program Files\Google\Chrome\Application\chrome.exe" -ArgumentList "--remote-debugging-port=9222"

Docker Commands

# Start a container
docker start peviitor-solr

# Check container status
docker ps --filter "name=peviitor-solr"

Solr Operations

# Access Solr API (requires credentials)
curl -u "$SOLR_USER:$SOLR_PASSWD" "https://solr.peviitor.ro/solr/job/select?q=*:*&rows=1"

# Ping Solr to verify it's running
curl -s https://solr.peviitor.ro/solr/admin/ping

Running Playwright Tests

cd tests
npx playwright test

Code Style Guidelines

PowerShell Scripts

Naming Conventions

Use PascalCase for function names: Get-CompanyData
Use camelCase for variable names: $companyList
Use singular nouns for function names: Get-Company not Get-Companies
Prefix functions with approved verbs: Get, Set, New, Remove, Start, Stop

File Organization

One function per script when possible, or group related functions
Place functions at the top of scripts
Place main execution at the bottom with if ($MyInvocation.InvocationName -ne '.') { ... }
Use .ps1 extension for all PowerShell scripts

Formatting

Indent with 4 spaces
Maximum line length: 120 characters
Use splatting for cmdlets with many parameters
Use explicit parameter declarations with types

function Get-CompanyData {
    [CmdletBinding()]
    param(
        [Parameter(Mandatory = $true)]
        [string]$CompanyName,
        
        [Parameter(Mandatory = $false)]
        [int]$MaxResults = 10
    )
    
    # Function body
}

Error Handling

Use try/catch/finally blocks for error handling
Use -ErrorAction Stop for critical operations
Write errors to stream: Write-Error "Message"
Use descriptive error messages

try {
    $result = Invoke-RestMethod -Uri $url -ErrorAction Stop
}
catch {
    Write-Error "Failed to fetch data from $url : $_"
    throw
}

Comments

Use comment-based help for functions
Comment complex logic, not obvious code
Include parameter descriptions in help

<#
.SYNOPSIS
    Gets company data from Solr.
.PARAMETER CompanyName
    The name of the company to search for.
#>

Markdown Files

Formatting

Use ATX-style headers (# ## ###)
Use fenced code blocks with language identifiers
Use tables for structured data
Maximum line length: 120 characters

# Header

## Subheader

| Column 1 | Column 2 |
|----------|----------|
| Data 1   | Data 2   |

```powershell
code here


#### File Names
- Use kebab-case: `company-data.md` not `companyData.md`
- Use descriptive names: `setup-guide.md` not `readme.md`

## Environment Variables

No special environment variables are required for this project.

## Dependencies

- PowerShell 5.1+ or PowerShell 7+
- Google Chrome (for automation)
- Docker Desktop (for Solr container)
- Node.js 18+ (for Playwright tests)
- curl (optional, for API calls)

### Node.js Dependencies (for tests)
- playwright (installed via tests/package.json)

## Common Tasks

### Automating Browser Tasks

1. Start Chrome with debug port: `powershell -ExecutionPolicy Bypass -File start-chrome.ps1`
2. Get CDP endpoint: `Invoke-RestMethod -Uri 'http://127.0.0.1:9222/json'`
3. Connect via WebSocket for automation

### Working with Solr

1. Ensure Docker is running
2. Start Solr container: `docker start peviitor-solr`
3. Wait for Solr to be ready (port 8983)
4. Use credentials: `$SOLR_USER:$SOLR_PASSWD`

### Adding a New Company

1. Use the `/add-website` command to add a new company
2. The command will search for CUI using targetare.ro
3. Verify the data with the user before saving

### Scraping Jobs

1. Start Chrome: `powershell -ExecutionPolicy Bypass -File start-chrome.ps1`
2. Use the `/scrape` command with the company name
3. Check for company-specific scraping instructions in `webscraper/{company}.md`

### Creating New Scripts

1. Use the approved naming conventions
2. Add comment-based help
3. Include error handling
4. Test with `-WhatIf` flag if using destructive commands

## Custom Commands

Custom commands are defined in `.opencode/commands/` directory.

### Available Commands

| Command | Description |
|---------|-------------|
| `/scrape` | Scrape jobs from a company career page and push to Solr |
| `/add-website` | Add a new company to Solr company core with automatic research |
| `/update-solr` | Update Solr with new data |
| `/delete-solr` | Delete job documents from Solr by key |
| `/login-solr` | Login to Solr admin panel |
| `/docs-update` | Update project documentation based on all files |
| `/instructions` | Show project instructions |
| `/remove-404` | Remove jobs from Solr that return 404 errors |
| `/clean-project` | Clean temporary files and update documentation |

### Usage

/scrape EPAM

/add-website Softech

/update-solr

/delete-solr url:https://example.com/job

/login-solr

/docs-update


## Notes for Agents

- This project is primarily for local automation and documentation
- No CI/CD pipeline exists
- Scripts are run manually as needed
- PowerShell scripts are Windows-specific (using cmdlets like Start-Process, Invoke-RestMethod)
- **IMPORTANT**: For all OpenCode commands, ALWAYS use curl for Solr operations and URL checks - DO NOT use PowerShell in command instructions
- **IMPORTANT**: Before using Chrome DevTools MCP, always run `start-chrome.ps1` first to ensure Chrome is started with remote debugging enabled
- **IMPORTANT**: Always use the correct Solr delete-by-query format: `{"delete":{"query":"url:\"https://example.com/job\""}}` - do NOT use `_delete_:true` format
- Company-specific scraping instructions are stored in `webscraper/{company}.md` files
- Company data is stored in Solr company core - query using brand, company name, or CUI
- DON'T USE jobs CORE, USE job CORE in SOLR !!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md - Peviitor OpenCode AI Scrapers

Project Overview

Project Structure

Commands

Running PowerShell Scripts

Chrome DevTools Automation

Docker Commands

Solr Operations

Running Playwright Tests

Code Style Guidelines

PowerShell Scripts

Naming Conventions

File Organization

Formatting

Error Handling

Comments

Markdown Files

Formatting

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md - Peviitor OpenCode AI Scrapers

Project Overview

Project Structure

Commands

Running PowerShell Scripts

Chrome DevTools Automation

Docker Commands

Solr Operations

Running Playwright Tests

Code Style Guidelines

PowerShell Scripts

Naming Conventions

File Organization

Formatting

Error Handling

Comments

Markdown Files

Formatting