Skip to content

Commit 64d72d8

Browse files
authored
Refactor into module architecture for easier expansion and maintenance (#59)
* Tidy up directory layout * Switched builds to Podman * Add run time output for svn log and fetch commands * Dedupe git config file * Fix the git config file newline issue * Increase retry for psutils
1 parent b820bd9 commit 64d72d8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2651
-664
lines changed
File renamed without changes.
Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,8 @@
1-
name: Build and Push Docker Image to GitHub Container Registry
1+
name: Docker build and push to GitHub Container Registry
22

33
on:
4-
workflow_dispatch: # Creates button in web UI to run the workflow manually, shouldn't be needed
5-
push:
6-
tags:
7-
- '**'
8-
branches:
9-
- main
4+
workflow_dispatch: # Creates button in web UI to run the workflow manually
5+
push: # All pushes
106
pull_request:
117
types:
128
- opened
@@ -47,9 +43,9 @@ jobs:
4743
- name: Build and push
4844
uses: docker/build-push-action@v6
4945
with:
50-
context: repo-converter/build
46+
context: .
5147
platforms: linux/amd64,linux/arm64
52-
file: repo-converter/build/Dockerfile
48+
file: build/Dockerfile
5349
push: true
5450
sbom: true
5551
tags: |

.github/workflows/podman-build.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Podman build and push to GitHub Container Registry
2+
3+
on:
4+
workflow_dispatch: # Creates button in web UI to run the workflow manually
5+
push: # All pushes
6+
pull_request:
7+
types:
8+
- opened
9+
- reopened
10+
- edited
11+
12+
jobs:
13+
build:
14+
runs-on: ubuntu-latest
15+
16+
steps:
17+
- name: Checkout repository
18+
uses: actions/checkout@v4
19+
20+
- name: Set env vars
21+
run: |
22+
BUILD_BRANCH="$(git rev-parse --abbrev-ref HEAD)"
23+
BUILD_COMMIT="$(git rev-parse --short HEAD)"
24+
BUILD_DIRTY="$(git diff --quiet && echo 'False' || echo 'True')"
25+
BUILD_DATE="$(date -u +'%Y-%m-%d %H:%M:%S UTC')"
26+
BUILD_TAG="$(git tag --points-at HEAD)"
27+
ENV_FILE="build/.env"
28+
{
29+
echo "BUILD_BRANCH=${BUILD_BRANCH}"
30+
echo "BUILD_COMMIT=${BUILD_COMMIT}"
31+
echo "BUILD_DATE=${BUILD_DATE}"
32+
echo "BUILD_DIRTY=${BUILD_DIRTY}"
33+
echo "BUILD_TAG=${BUILD_TAG}"
34+
} > "$ENV_FILE"
35+
36+
- name: Log in to container registry
37+
run: echo "${{ secrets.GITHUB_TOKEN }}" | podman login -u "${{ github.repository_owner }}" --password-stdin ghcr.io
38+
39+
- name: Build and push image
40+
run: |
41+
podman build -f build/Dockerfile --format docker --platform linux/amd64 -t ghcr.io/sourcegraph/repo-converter:latest .
42+
podman push ghcr.io/sourcegraph/repo-converter:latest

.gitignore

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# Sourcegraph
2+
*.code-workspace
23
config/cloud-agent-config.yaml
34
config/cloud-agent-service-account-key.json
45
config/config.yaml
5-
config/service-account-key.json
66
config/repos-to-convert.yaml
7-
src-serve-root
8-
svn-repo-stats/*.csv
9-
svn-repo-stats/repos.txt
10-
svn-repo-stats/tmp-repo-metadata/
7+
config/service-account-key.json
8+
dev/stats/repos.txt
9+
logs/
10+
src-serve-root/
1111

1212
# Byte-compiled / optimized / DLL files
1313
__pycache__/
@@ -18,23 +18,23 @@ __pycache__/
1818
*.so
1919

2020
# Distribution / packaging
21+
.eggs/
22+
.installed.cfg
2123
.Python
24+
*.egg
25+
*.egg-info/
2226
develop-eggs/
2327
dist/
2428
downloads/
2529
eggs/
26-
.eggs/
2730
lib/
2831
lib64/
32+
MANIFEST
2933
parts/
3034
sdist/
35+
share/python-wheels/
3136
var/
3237
wheels/
33-
share/python-wheels/
34-
*.egg-info/
35-
.installed.cfg
36-
*.egg
37-
MANIFEST
3838

3939
# PyInstaller
4040
# Usually these files are written by a python script from a template
@@ -47,33 +47,33 @@ pip-log.txt
4747
pip-delete-this-directory.txt
4848

4949
# Unit test / coverage reports
50-
htmlcov/
51-
.tox/
52-
.nox/
50+
.cache
5351
.coverage
5452
.coverage.*
55-
.cache
56-
nosetests.xml
57-
coverage.xml
58-
*.cover
59-
*.py,cover
6053
.hypothesis/
54+
.nox/
6155
.pytest_cache/
56+
.tox/
57+
*.cover
58+
*.py,cover
6259
cover/
60+
coverage.xml
61+
htmlcov/
62+
nosetests.xml
6363

6464
# Translations
6565
*.mo
6666
*.pot
6767

6868
# Django stuff:
6969
*.log
70-
local_settings.py
7170
db.sqlite3
7271
db.sqlite3-journal
72+
local_settings.py
7373

7474
# Flask stuff:
75-
instance/
7675
.webassets-cache
76+
instance/
7777

7878
# Scrapy stuff:
7979
.scrapy
@@ -132,11 +132,11 @@ celerybeat.pid
132132
# Environments
133133
.env
134134
.venv
135+
env.bak/
135136
env/
136-
venv/
137137
ENV/
138-
env.bak/
139138
venv.bak/
139+
venv/
140140

141141
# Spyder project settings
142142
.spyderproject

.vscode/launch.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
{
2+
// VSCode debugpy configuration
23
// Use IntelliSense to learn about possible attributes.
34
// Hover to view descriptions of existing attributes.
45
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387

AGENT.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Agent Guidelines for Implementation Bridges Codebase
2+
3+
## Build/Test Commands
4+
- Build and start all containers: `cd repo-converter/build && docker compose up -d --build`
5+
- View repo-converter logs: `cd repo-converter/build && ./build.sh logs`
6+
- Update requirements: `cd repo-converter/build && pipreqs --force --mode gt .`
7+
- Run single test: No tests in codebase
8+
9+
## Code Style Guidelines
10+
- Python version: 3.13.2
11+
- Imports: Standard libs first, then third-party libs with URLs in comments
12+
- Variables: Snake case (e.g., `local_repo_path`)
13+
- Error handling: Use try/except blocks with specific exception types
14+
- Logging: Use the custom `log()` function with appropriate levels
15+
- Functions: Snake case for function names. Add docstrings (not yet implemented but mentioned in TODOs)
16+
- Security: Use `redact_password()` before logging, if the input contains a password
17+
- Documentation: Use Python best practices for docstrings
18+
- Environment variables: Set defaults with `os.environ.get("VAR", "default")`
19+
- Comments: Use `#` for comments, and add lots of comments
20+
21+
The purpose of this script is to convert repos from Subversion to Git
22+
`repo-converter/build/run.py` is the only file in this repo which runs application code
23+
It runs in a Docker container, and is the entrypoint for the container
24+
25+
The usage of this product is described to users in `repo-converter/README.md`

GAMEPLAN.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Refactoring Plan for repo-converter/build/run.py
2+
3+
## Overview
4+
5+
The current `run.py` script (1500+ lines) performs repository conversion from SVN to Git but has several issues:
6+
7+
- Very long file with monolithic structure
8+
- Many functions with multiple responsibilities
9+
- Complex error handling scattered throughout
10+
- No object-oriented design despite handling complex data
11+
- Incomplete features marked with TODOs
12+
- Duplicate code patterns
13+
14+
## Phase 1: Code Organization
15+
16+
### 1.1 Create Package Structure
17+
18+
```
19+
repo_converter/
20+
├── __init__.py
21+
├── main.py # Entry point, contains main() function
22+
├── config/
23+
│ ├── __init__.py
24+
│ ├── environment.py # Environment variable handling
25+
│ └── yaml_config.py # YAML configuration file parsing
26+
├── core/
27+
│ ├── __init__.py
28+
│ └── logging.py # Custom logging functionality
29+
├── utils/
30+
│ ├── __init__.py
31+
│ ├── process.py # Process management utilities
32+
│ └── security.py # Password redaction functionality
33+
└── repositories/
34+
├── __init__.py
35+
├── base.py # Base Repository class
36+
├── svn.py # SVN repository handling
37+
├── git.py # Git repository handling
38+
└── tfs.py # TFS repository handling (future)
39+
```
40+
41+
### 1.2 Implement Base Classes
42+
43+
- Create a `Repository` base class with common methods
44+
- Implement `SVNRepository`, `GitRepository`, and `TFSRepository` subclasses
45+
- Move configuration validation to each repository type
46+
47+
## Phase 2: Refactor Functionality
48+
49+
### 2.1 Configuration Management
50+
51+
- Move environment variable loading to `config/environment.py`
52+
- Move YAML parsing to `config/yaml_config.py`
53+
- Implement proper type validation with clear error messages
54+
- Create unified configuration object merging both sources
55+
56+
### 2.2 Logging Improvements
57+
58+
- Refactor `log()` function to use Python's logging more effectively
59+
- Implement cleaner password redaction
60+
- Add log rotation for long-running instances
61+
62+
### 2.3 Process Management
63+
64+
- Refactor subprocess handling into a cleaner utility class
65+
- Improve zombie process detection and cleanup
66+
- Implement better error handling for process management
67+
68+
## Phase 3: Repository Processing
69+
70+
### 3.1 SVN Repository Handling
71+
72+
- Break down `clone_svn_repo()` into smaller, focused methods
73+
- Implement proper state management for create/update/running
74+
- Add better error handling with specific error types
75+
- Fix batch processing logic
76+
77+
### 3.2 Git Repository Handling
78+
79+
- Implement proper Git repository functionality
80+
- Use GitPython more extensively
81+
82+
### 3.3 Process Concurrency
83+
84+
- Improve multiprocessing implementation
85+
- Add proper resource limiting and queuing
86+
- Implement better status tracking
87+
88+
## Phase 4: Testing & Documentation
89+
90+
### 4.1 Testing
91+
92+
- Add unit tests for each module
93+
- Add integration tests for repository operations
94+
- Implement mock objects for external dependencies
95+
96+
### 4.2 Documentation
97+
98+
- Add proper docstrings to all classes and methods
99+
- Create usage documentation
100+
- Document configuration options
101+
102+
## Phase 5: Implement TODOs
103+
104+
After refactoring, implement the TODOs from the original file:
105+
106+
1. Config file improvements
107+
2. SVN enhancements (timeouts, gitignore handling, etc.)
108+
3. Git SSH clone functionality
109+
4. Permissions improvements
110+
5. Add fetch interval configuration
111+
6. Process status improvements
112+
113+
## Implementation Strategy
114+
115+
1. Start with creating the directory structure and moving code
116+
2. Refactor one component at a time, ensuring functionality is preserved
117+
3. Add tests for each refactored component
118+
4. Keep the original script working until the refactored version is complete
119+
5. Implement a gradual migration strategy
120+
121+
## Benefits
122+
123+
- Improved maintainability through modular design
124+
- Better error handling and recovery
125+
- Clearer separation of concerns
126+
- Easier implementation of new features
127+
- More testable code structure
128+
- Better documentation

LICENSE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TODO

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,6 @@ Docker compose also allows for easier upgrades, troubleshooting, monitoring, log
4141
- There are docker-compose.yaml and override files in a few different directories in this repo, separated by use case, so that each use case only needs to run `docker compose up -d` in one directory, and not fuss around with `-f` paths.
4242
- The only difference between the docker-compose-override.yaml files in host-ubuntu vs host-wsl is the src-serve-git container's name, which is how we get a separate `dnsName` for each.
4343
- If you're using the repo-converter:
44-
- If you're using the pre-built images, `cd ./repo-converter && docker compose up -d`
45-
- If you're building the Docker images, `cd ./repo-converter/build && docker compose up -d --build`
44+
- If you're using the pre-built images, `cd ./deploy && docker compose up -d`
45+
- If you're building the Docker images, `cd ./build && docker compose up -d --build`
4646
- Either of these will start all 3 containers: cloud-agent, src-serve-git, and the repo-converter

build/.dockerignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Include any files or directories that you don't want to be copied to your
2+
# container here (e.g., local build artifacts, temporary files, etc.).
3+
#
4+
# For more help, visit the .dockerignore file reference guide at
5+
# https://docs.docker.com/go/build-context-dockerignore/
6+
7+
8+
# Ignore everything
9+
*
10+
11+
# Allow specific directories
12+
!/src
13+
14+
# Allow specific files
15+
!/build/.env

0 commit comments

Comments
 (0)