Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
c5e396c
Fixing range
marcleblanc2 Apr 2, 2025
f48bdf7
Switch customer to marc-scale-repo-converter build
marcleblanc2 Apr 2, 2025
44262d1
Shorten testing cycle
marcleblanc2 Apr 2, 2025
8e4b661
Add run time output for svn log and fetch commands, dedupe git config…
marcleblanc2 Apr 2, 2025
130cbdd
Increasing log level for customer
marcleblanc2 Apr 2, 2025
3a64b5d
Straighten out return_dict from process_dict
marcleblanc2 Apr 2, 2025
74ff9b4
Fix the git config file newline issue
marcleblanc2 Apr 2, 2025
aa85729
Fix run_time key error
marcleblanc2 Apr 2, 2025
5f5fcd4
Increasing retry for psutils, and ignoring VS Code workspace files
marcleblanc2 May 7, 2025
8867d1d
Creating AGENT.md for Amp
marcleblanc2 May 8, 2025
5a43f18
Rearranged repo structure to appease the AI gods
marcleblanc2 May 8, 2025
efcc622
Tidy up directory layout
marcleblanc2 May 8, 2025
76784e8
adding module name back in dir path
marcleblanc2 May 8, 2025
f91adbf
Fixing build
marcleblanc2 May 8, 2025
e4237dd
Copying over the first batch of functions from run.py to the new modules
marcleblanc2 May 8, 2025
25bdc68
Podman build works
marcleblanc2 May 31, 2025
eee64d2
Builds, runs an empty main loop
marcleblanc2 May 31, 2025
2423bef
Fixing imports
marcleblanc2 May 31, 2025
878ecad
Starting to get some SVN repo cloning action again
marcleblanc2 May 31, 2025
75d7dc0
Testing a few iterations
marcleblanc2 May 31, 2025
fd756b0
Fixing log level, undefined functions
marcleblanc2 May 31, 2025
7c1d7b0
Fix broken f string
marcleblanc2 May 31, 2025
32b8489
Adding notes, cleaning up unused run function
marcleblanc2 May 31, 2025
f6ebebf
Update GH Action build context
marcleblanc2 Jun 10, 2025
d103214
Add podman build file
marcleblanc2 Jun 10, 2025
4d7d421
Fix path and mount
marcleblanc2 Jun 10, 2025
d27e30d
Try this
marcleblanc2 Jun 10, 2025
033d7dd
Or this
marcleblanc2 Jun 10, 2025
7571432
Try without ARM
marcleblanc2 Jun 10, 2025
5075963
Fix path to requirements.txt
marcleblanc2 Jun 10, 2025
d0c8799
Fix file paths
marcleblanc2 Jun 10, 2025
6b01e4b
why are you failing
marcleblanc2 Jun 10, 2025
0e03bcb
Remove reference to file that's not in repo
marcleblanc2 Jun 10, 2025
6d9f85e
Update GHCR image path
marcleblanc2 Jun 10, 2025
9d7a0e2
Update MAX_CYCLES logic
marcleblanc2 Jun 10, 2025
67924c7
Try to add env file in GH action
marcleblanc2 Jun 10, 2025
2c9eae4
Copy the env file into the image
marcleblanc2 Jun 10, 2025
1d38266
Fix env vars
marcleblanc2 Jun 10, 2025
8844ea5
Update config for customer 1
marcleblanc2 Jun 10, 2025
6ea7366
fix file paths for customer1
marcleblanc2 Jun 10, 2025
f862163
Add exception handler for process finishing too quickly
marcleblanc2 Jun 10, 2025
60fa676
Add process args to debug message
marcleblanc2 Jun 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
name: Build and Push Docker Image to GitHub Container Registry
name: Docker build and push to GitHub Container Registry

on:
workflow_dispatch: # Creates button in web UI to run the workflow manually, shouldn't be needed
push:
tags:
- '**'
branches:
- main
workflow_dispatch: # Creates button in web UI to run the workflow manually
push: # All pushes
pull_request:
types:
- opened
Expand Down Expand Up @@ -47,9 +43,9 @@ jobs:
- name: Build and push
uses: docker/build-push-action@v6
with:
context: repo-converter/build
context: .
platforms: linux/amd64,linux/arm64
file: repo-converter/build/Dockerfile
file: build/Dockerfile
push: true
sbom: true
tags: |
Expand Down
42 changes: 42 additions & 0 deletions .github/workflows/podman-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Podman build and push to GitHub Container Registry

on:
workflow_dispatch: # Creates button in web UI to run the workflow manually
push: # All pushes
pull_request:
types:
- opened
- reopened
- edited

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set env vars
run: |
BUILD_BRANCH="$(git rev-parse --abbrev-ref HEAD)"
BUILD_COMMIT="$(git rev-parse --short HEAD)"
BUILD_DIRTY="$(git diff --quiet && echo 'False' || echo 'True')"
BUILD_DATE="$(date -u +'%Y-%m-%d %H:%M:%S UTC')"
BUILD_TAG="$(git tag --points-at HEAD)"
ENV_FILE="build/.env"
{
echo "BUILD_BRANCH=${BUILD_BRANCH}"
echo "BUILD_COMMIT=${BUILD_COMMIT}"
echo "BUILD_DATE=${BUILD_DATE}"
echo "BUILD_DIRTY=${BUILD_DIRTY}"
echo "BUILD_TAG=${BUILD_TAG}"
} > "$ENV_FILE"

- name: Log in to container registry
run: echo "${{ secrets.GITHUB_TOKEN }}" | podman login -u "${{ github.repository_owner }}" --password-stdin ghcr.io

- name: Build and push image
run: |
podman build -f build/Dockerfile --format docker --platform linux/amd64 -t ghcr.io/sourcegraph/repo-converter:latest .
podman push ghcr.io/sourcegraph/repo-converter:latest
46 changes: 23 additions & 23 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Sourcegraph
*.code-workspace
config/cloud-agent-config.yaml
config/cloud-agent-service-account-key.json
config/config.yaml
config/service-account-key.json
config/repos-to-convert.yaml
src-serve-root
svn-repo-stats/*.csv
svn-repo-stats/repos.txt
svn-repo-stats/tmp-repo-metadata/
config/service-account-key.json
dev/stats/repos.txt
logs/
src-serve-root/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand All @@ -18,23 +18,23 @@ __pycache__/
*.so

# Distribution / packaging
.eggs/
.installed.cfg
.Python
*.egg
*.egg-info/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
MANIFEST
parts/
sdist/
share/python-wheels/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
Expand All @@ -47,33 +47,33 @@ pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.cache
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.nox/
.pytest_cache/
.tox/
*.cover
*.py,cover
cover/
coverage.xml
htmlcov/
nosetests.xml

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
local_settings.py

# Flask stuff:
instance/
.webassets-cache
instance/

# Scrapy stuff:
.scrapy
Expand Down Expand Up @@ -132,11 +132,11 @@ celerybeat.pid
# Environments
.env
.venv
env.bak/
env/
venv/
ENV/
env.bak/
venv.bak/
venv/

# Spyder project settings
.spyderproject
Expand Down
1 change: 1 addition & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
// VSCode debugpy configuration
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
Expand Down
25 changes: 25 additions & 0 deletions AGENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Agent Guidelines for Implementation Bridges Codebase

## Build/Test Commands
- Build and start all containers: `cd repo-converter/build && docker compose up -d --build`
- View repo-converter logs: `cd repo-converter/build && ./build.sh logs`
- Update requirements: `cd repo-converter/build && pipreqs --force --mode gt .`
- Run single test: No tests in codebase

## Code Style Guidelines
- Python version: 3.13.2
- Imports: Standard libs first, then third-party libs with URLs in comments
- Variables: Snake case (e.g., `local_repo_path`)
- Error handling: Use try/except blocks with specific exception types
- Logging: Use the custom `log()` function with appropriate levels
- Functions: Snake case for function names. Add docstrings (not yet implemented but mentioned in TODOs)
- Security: Use `redact_password()` before logging, if the input contains a password
- Documentation: Use Python best practices for docstrings
- Environment variables: Set defaults with `os.environ.get("VAR", "default")`
- Comments: Use `#` for comments, and add lots of comments

The purpose of this script is to convert repos from Subversion to Git
`repo-converter/build/run.py` is the only file in this repo which runs application code
It runs in a Docker container, and is the entrypoint for the container

The usage of this product is described to users in `repo-converter/README.md`
128 changes: 128 additions & 0 deletions GAMEPLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Refactoring Plan for repo-converter/build/run.py

## Overview

The current `run.py` script (1500+ lines) performs repository conversion from SVN to Git but has several issues:

- Very long file with monolithic structure
- Many functions with multiple responsibilities
- Complex error handling scattered throughout
- No object-oriented design despite handling complex data
- Incomplete features marked with TODOs
- Duplicate code patterns

## Phase 1: Code Organization

### 1.1 Create Package Structure

```
repo_converter/
├── __init__.py
├── main.py # Entry point, contains main() function
├── config/
│ ├── __init__.py
│ ├── environment.py # Environment variable handling
│ └── yaml_config.py # YAML configuration file parsing
├── core/
│ ├── __init__.py
│ └── logging.py # Custom logging functionality
├── utils/
│ ├── __init__.py
│ ├── process.py # Process management utilities
│ └── security.py # Password redaction functionality
└── repositories/
├── __init__.py
├── base.py # Base Repository class
├── svn.py # SVN repository handling
├── git.py # Git repository handling
└── tfs.py # TFS repository handling (future)
```

### 1.2 Implement Base Classes

- Create a `Repository` base class with common methods
- Implement `SVNRepository`, `GitRepository`, and `TFSRepository` subclasses
- Move configuration validation to each repository type

## Phase 2: Refactor Functionality

### 2.1 Configuration Management

- Move environment variable loading to `config/environment.py`
- Move YAML parsing to `config/yaml_config.py`
- Implement proper type validation with clear error messages
- Create unified configuration object merging both sources

### 2.2 Logging Improvements

- Refactor `log()` function to use Python's logging more effectively
- Implement cleaner password redaction
- Add log rotation for long-running instances

### 2.3 Process Management

- Refactor subprocess handling into a cleaner utility class
- Improve zombie process detection and cleanup
- Implement better error handling for process management

## Phase 3: Repository Processing

### 3.1 SVN Repository Handling

- Break down `clone_svn_repo()` into smaller, focused methods
- Implement proper state management for create/update/running
- Add better error handling with specific error types
- Fix batch processing logic

### 3.2 Git Repository Handling

- Implement proper Git repository functionality
- Use GitPython more extensively

### 3.3 Process Concurrency

- Improve multiprocessing implementation
- Add proper resource limiting and queuing
- Implement better status tracking

## Phase 4: Testing & Documentation

### 4.1 Testing

- Add unit tests for each module
- Add integration tests for repository operations
- Implement mock objects for external dependencies

### 4.2 Documentation

- Add proper docstrings to all classes and methods
- Create usage documentation
- Document configuration options

## Phase 5: Implement TODOs

After refactoring, implement the TODOs from the original file:

1. Config file improvements
2. SVN enhancements (timeouts, gitignore handling, etc.)
3. Git SSH clone functionality
4. Permissions improvements
5. Add fetch interval configuration
6. Process status improvements

## Implementation Strategy

1. Start with creating the directory structure and moving code
2. Refactor one component at a time, ensuring functionality is preserved
3. Add tests for each refactored component
4. Keep the original script working until the refactored version is complete
5. Implement a gradual migration strategy

## Benefits

- Improved maintainability through modular design
- Better error handling and recovery
- Clearer separation of concerns
- Easier implementation of new features
- More testable code structure
- Better documentation
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TODO
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,6 @@ Docker compose also allows for easier upgrades, troubleshooting, monitoring, log
- There are docker-compose.yaml and override files in a few different directories in this repo, separated by use case, so that each use case only needs to run `docker compose up -d` in one directory, and not fuss around with `-f` paths.
- The only difference between the docker-compose-override.yaml files in host-ubuntu vs host-wsl is the src-serve-git container's name, which is how we get a separate `dnsName` for each.
- If you're using the repo-converter:
- If you're using the pre-built images, `cd ./repo-converter && docker compose up -d`
- If you're building the Docker images, `cd ./repo-converter/build && docker compose up -d --build`
- If you're using the pre-built images, `cd ./deploy && docker compose up -d`
- If you're building the Docker images, `cd ./build && docker compose up -d --build`
- Either of these will start all 3 containers: cloud-agent, src-serve-git, and the repo-converter
15 changes: 15 additions & 0 deletions build/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Include any files or directories that you don't want to be copied to your
# container here (e.g., local build artifacts, temporary files, etc.).
#
# For more help, visit the .dockerignore file reference guide at
# https://docs.docker.com/go/build-context-dockerignore/


# Ignore everything
*

# Allow specific directories
!/src

# Allow specific files
!/build/.env
Loading
Loading