Skip to content

Latest commit

 

History

History
229 lines (174 loc) · 6.31 KB

File metadata and controls

229 lines (174 loc) · 6.31 KB

Contributing to MCP Apache Spark History Server

Welcome to the MCP Apache Spark History Server project. We'd love to accept your patches and contributions to this project. For detailed information about how to contribute to Kubeflow, please refer to Contributing to Kubeflow.

Thank you for your interest in contributing! This guide will help you get started with contributing to the Spark History Server MCP project.

🚀 Quick Start for Contributors

📋 Prerequisites

  • 🐍 Python 3.12+
  • uv package manager
  • 🔥 Docker (for local testing with Spark History Server)
  • 📦 Node.js (for MCP Inspector testing)

🛠️ Development Setup

  1. 🍴 Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
  1. 📦 Install dependencies
uv sync --group dev --frozen
  1. 🔧 Install pre-commit hooks
uv run pre-commit install
  1. 🧪 Run tests to verify setup
uv run pytest

🧪 Testing Your Changes

🔬 Local Testing with MCP Inspector

# Terminal 1: Start Spark History Server with sample data
./start_local_spark_history.sh

# Terminal 2: Test your changes
npx @modelcontextprotocol/inspector uv run -m spark_history_mcp.core.main
# Opens browser at http://localhost:6274 for interactive testing

✅ Run Full Test Suite

# Run all tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=. --cov-report=html

# Run specific test file
uv run pytest test_tools.py -v

🔍 Code Quality Checks

# Lint and format (runs automatically on commit)
uv run ruff check --fix
uv run ruff format

# Type checking
uv run mypy .

# Security scanning
uv run bandit -r . -f json -o bandit-report.json

📝 Contribution Guidelines

🎯 Areas for Contribution

🔧 High Priority

  • New MCP Tools: Additional Spark analysis tools
  • Performance Improvements: Optimize API calls and data processing
  • Error Handling: Better error messages and recovery
  • Documentation: Examples, tutorials, and guides

📊 Medium Priority

  • Testing: More comprehensive test coverage
  • Monitoring: Metrics and observability features
  • Configuration: More flexible configuration options
  • CI/CD: GitHub Actions improvements

💡 Ideas Welcome

  • AI Agent Examples: New integration patterns
  • Deployment: Additional deployment methods
  • Analytics: Advanced Spark job analysis tools

🔀 Pull Request Process

  1. 🌿 Create a feature branch
git checkout -b feature/your-new-feature
git checkout -b fix/bug-description
git checkout -b docs/improve-readme
  1. 💻 Make your changes
  • Follow existing code style and patterns
  • Add tests for new functionality
  • Update documentation as needed
  • Ensure all pre-commit hooks pass
  1. ✅ Test thoroughly
# Run full test suite
uv run pytest

# Test with MCP Inspector
npx @modelcontextprotocol/inspector uv run -m spark_history_mcp.core.main

# Test with real Spark data if possible
  1. 📤 Submit pull request
  • Use descriptive commit messages
  • Reference any related issues
  • Include screenshots for UI changes
  • Update CHANGELOG.md if applicable

💻 Code Style

We use Ruff for linting and formatting (automatically enforced by pre-commit):

  • Line length: 88 characters
  • Target: Python 3.12+
  • Import sorting: Automatic with Ruff
  • Type hints: Encouraged but not required for all functions

🧪 Adding New MCP Tools

When adding new tools, follow this pattern:

@mcp.tool()
def your_new_tool(
    app_id: str,
    server: Optional[str] = None,
    # other parameters
) -> YourReturnType:
    """
    Brief description of what this tool does.

    Args:
        app_id: The Spark application ID
        server: Optional server name to use

    Returns:
        Description of return value
    """
    ctx = mcp.get_context()
    client = get_client_or_default(ctx, server)

    # Your implementation here
    return client.your_method(app_id)

Don't forget to add tests:

@patch("tools.get_client_or_default")
def test_your_new_tool(self, mock_get_client):
    """Test your new tool functionality"""
    # Setup mocks
    mock_client = MagicMock()
    mock_client.your_method.return_value = expected_result
    mock_get_client.return_value = mock_client

    # Call the tool
    result = your_new_tool("spark-app-123")

    # Verify results
    self.assertEqual(result, expected_result)
    mock_client.your_method.assert_called_once_with("spark-app-123")

🐛 Reporting Issues

🔍 Bug Reports

Include:

  • Environment: Python version, OS, uv version
  • Steps to reproduce: Clear step-by-step instructions
  • Expected vs actual behavior: What should happen vs what happens
  • Logs: Relevant error messages or logs
  • Sample data: Spark application IDs that reproduce the issue (if possible)

💡 Feature Requests

Include:

  • Use case: Why is this feature needed?
  • Proposed solution: How should it work?
  • Alternatives: Other approaches considered
  • Examples: Sample usage or screenshots

📖 Documentation

📝 Types of Documentation

  • README.md: Main project overview and quick start
  • TESTING.md: Comprehensive testing guide
  • examples/integrations/: AI agent integration examples
  • Code comments: Inline documentation for complex logic

🎨 Documentation Style

  • Use emojis consistently for visual appeal
  • Include code examples for all features
  • Provide screenshots for UI elements
  • Keep language clear and beginner-friendly

🌟 Recognition

Contributors are recognized in:

  • GitHub Contributors section
  • Release notes for significant contributions
  • Project documentation for major features

📞 Getting Help

  • 🐛 Issues: GitHub Issues
  • 📚 Documentation: Check existing docs first

Code Reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.