Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,11 @@ updates:
schedule:
interval: "daily"

- directory: "/framework/meltano"
package-ecosystem: "pip"
schedule:
interval: "daily"

- directory: "/framework/mcp"
package-ecosystem: "pip"
schedule:
Expand Down
78 changes: 78 additions & 0 deletions .github/workflows/framework-meltano.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Singer/Meltano

on:
pull_request:
paths:
- '.github/workflows/framework-meltano.yml'
- 'framework/meltano/**'
- 'requirements.txt'
push:
branches: [ main ]
paths:
- '.github/workflows/framework-meltano.yml'
- 'framework/meltano/**'
- 'requirements.txt'

# Allow job to be triggered manually.
workflow_dispatch:

# Run job each night after CrateDB nightly has been published.
schedule:
- cron: '0 3 * * *'

# Cancel in-progress jobs when pushing to the same branch.
concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}

jobs:
test:
name: "
Python: ${{ matrix.python-version }}
CrateDB: ${{ matrix.cratedb-version }}
on ${{ matrix.os }}"
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ 'ubuntu-latest' ]
python-version: [
'3.10',
'3.14',
]
cratedb-version: [ 'nightly' ]

services:
cratedb:
image: crate/crate:${{ matrix.cratedb-version }}
ports:
- 4200:4200
- 5432:5432
options: >-
--health-cmd "curl -f http://localhost:4200/ || exit 1"
--health-interval 10s
--health-timeout 5s
--health-retries 10

steps:

- name: Acquire sources
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
architecture: x64
cache: 'pip'
cache-dependency-path: |
requirements.txt
framework/meltano/requirements.txt
framework/meltano/requirements-dev.txt

- name: Install uv
uses: astral-sh/setup-uv@v7

- name: Validate framework/meltano
run: |
uv run --with=pueblo ngr test framework/meltano
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
.DS_Store
.idea
.env
.venv*
__pycache__
_build
Expand Down
5 changes: 5 additions & 0 deletions framework/meltano/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.meltano
*.json
*.singer
output
plugins
45 changes: 45 additions & 0 deletions framework/meltano/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Meltano Examples

Concise examples about working with [CrateDB] and [Meltano], for conceiving and
running flexible ELT tasks. All the recipes are using [meltano-target-cratedb]
for reading and writing data from/to CrateDB.

## What's inside

- `file-to-cratedb`: Acquire data from Singer File, and load it into
CrateDB database table.

- `github-to-cratedb`: Acquire repository metadata from GitHub API, and load
it separated per entity into 32 CrateDB database tables.

## Prerequisites

Before running the examples within the subdirectories, make sure to install
Meltano and its dependencies.

```shell
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Usage

Then, explore the individual Meltano projects, either invoke them from within
their directories, or by using the `--cwd` option from the root folder.

```shell
meltano --cwd github-to-cratedb install
meltano --cwd github-to-cratedb run tap-github target-cratedb
```

## Software Tests
```shell
pip install -r requirements-dev.txt
poe check
```


[CrateDB]: https://cratedb.com/product
[Meltano]: https://meltano.com/
[meltano-target-cratedb]: https://github.com/crate/meltano-target-cratedb
63 changes: 63 additions & 0 deletions framework/meltano/file-to-cratedb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Use Meltano to import files into CrateDB

## About

Import data from a file in Singer format (JSONL) into CrateDB, using
[tap-singer-jsonl] and [meltano-target-cratedb].

## Configuration

### tap-singer-jsonl

Within the `extractors` section, have a look at `tap-singer-jsonl`'s
`config.local.paths` section, how to configure JSONL files in Singer
format as pipeline source(s).

### target-cratedb

Within the `loaders` section, at `target-cratedb`, adjust
`config.sqlalchemy_url` to match your database connectivity settings
as pipeline target.

## Usage

Install dependencies.
```shell
meltano install
```

Discover data schema.
```shell
meltano invoke tap-singer-jsonl --discover > catalog.json
```

Run plugin standalone, testdrive.
```shell
meltano invoke tap-singer-jsonl --catalog catalog.json
```

Invoke data transfer to CrateDB database.
```shell
meltano run tap-singer-jsonl target-cratedb
```

## Screenshot

Enjoy the list of countries.
```sql
crash --command 'SELECT "code", "name", "capital", "emoji", "languages[1]" FROM "melty"."countries" ORDER BY "name" LIMIT 42;'
```

![image](https://github.com/crate/meltano-target-cratedb/assets/453543/fa7076cc-267e-446c-a4f3-aa1283778ace)


## Development
In order to link the sandbox to a development installation of [meltano-target-cratedb],
configure the `pip_url` of the component like this:
```yaml
pip_url: --editable=/path/to/sources/meltano-target-cratedb
```


[meltano-target-cratedb]: https://github.com/crate/meltano-target-cratedb
[tap-singer-jsonl]: https://github.com/singer-contrib/tap-singer-jsonl
50 changes: 50 additions & 0 deletions framework/meltano/file-to-cratedb/meltano.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# A Meltano project is just a directory on your filesystem containing text-based files.
# At a minimum, a Meltano project must contain a project file named `meltano.yml`,
# which contains your project configuration, and tells Meltano that a particular
# directory is a Meltano project.
---
version: 1
default_environment: dev
send_anonymous_usage_stats: false
project_id: 57d4c2d8-e053-49da-8cc2-8472de112c69

environments:
- name: dev
- name: staging
- name: prod

plugins:

# Configure data source (Singer Tap / Meltano Extractor).
extractors:

- name: tap-singer-jsonl
variant: kgpayne
pip_url: git+https://github.com/singer-contrib/tap-singer-jsonl@preview
config:
source: local
add_record_metadata: false
local:
# Note: Configure Singer file(s) here.
paths:
- "tap_countries.singer"

# Configure data sinks (Singer Target / Meltano Loader).
loaders:

- name: target-jsonl
variant: andyh1203
pip_url: target-jsonl

- name: target-cratedb
namespace: cratedb
variant: cratedb
# Acquire from PyPI.
pip_url: meltano-target-cratedb
# Acquire from GitHub.
# pip_url: git+https://github.com/crate/meltano-target-cratedb.git@preview

# Note: Configure your database server and credentials here.
config:
sqlalchemy_url: crate://crate@localhost/
add_record_metadata: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
{
"plugin_type": "extractors",
"name": "tap-singer-jsonl",
"namespace": "tap_singer_jsonl",
"variant": "kgpayne",
"label": "Singer JSONL",
"docs": "https://hub.meltano.com/extractors/tap-singer-jsonl--kgpayne",
"repo": "https://github.com/kgpayne/tap-singer-jsonl",
"pip_url": "tap-singer-jsonl",
"executable": "tap-singer-jsonl",
"description": "Read Singer-formatted JSONL Files",
"logo_url": "https://hub.meltano.com/assets/logos/extractors/singer.png",
"capabilities": [
"discover"
],
"settings_group_validation": [
[
"local.folders"
],
[
"local.paths"
],
[
"source",
"s3.bucket"
],
[
"source",
"s3.paths"
]
],
"settings": [
{
"name": "source",
"kind": "string",
"value": "local",
"label": "Source",
"description": "The source configuration to use when reading `.singer.gz` files. Currently `local` and `s3` are supported."
},
{
"name": "add_record_metadata",
"kind": "boolean",
"value": true,
"label": "Add Record Metadata",
"description": "Whether to inject `_sdc_*` metadata columns."
},
{
"name": "local.folders",
"kind": "array",
"label": "Folders",
"description": "Array of directory paths to scan for `.singer.gz` files."
},
{
"name": "local.recursive",
"kind": "boolean",
"value": false,
"label": "Recursive",
"description": "Whether to scan directories recursively when discovering `.singer.gz` files."
},
{
"name": "local.paths",
"kind": "array",
"label": "Paths",
"description": "Array of file paths to singer-formatted files. **Note:** extension is ignored, and compression is inferred automatically by `smart_open`. Both `local.folders` and `local.paths` can be specified together."
},
{
"name": "s3.bucket",
"kind": "string",
"label": "Bucket",
"description": "S3 bucket name."
},
{
"name": "s3.prefix",
"kind": "string",
"label": "Prefix",
"description": "S3 key prefix. **Note:** key prefixes will be scanned recursively."
},
{
"name": "s3.paths",
"kind": "array",
"label": "Paths",
"description": "S3 file paths to singer-formatted files. **Note:** extension is ignored, and compression is inferred automatically by `smart_open`. Both `s3.prefix` and `s3.paths` can be specified together."
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"plugin_type": "loaders",
"name": "target-jsonl",
"namespace": "target_jsonl",
"variant": "andyh1203",
"label": "JSON Lines (JSONL)",
"docs": "https://hub.meltano.com/loaders/target-jsonl--andyh1203",
"repo": "https://github.com/andyh1203/target-jsonl",
"pip_url": "target-jsonl",
"description": "JSONL loader",
"logo_url": "https://hub.meltano.com/assets/logos/loaders/jsonl.png",
"settings": [
{
"name": "destination_path",
"kind": "string",
"value": "output",
"label": "Destination Path",
"description": "Sets the destination path the JSONL files are written to, relative\nto the project root.\n\nThe directory needs to exist already, it will not be created\nautomatically.\n\nTo write JSONL files to the project root, set an empty string (`\"\"`).\n"
},
{
"name": "do_timestamp_file",
"kind": "boolean",
"value": false,
"label": "Include Timestamp in File Names",
"description": "Specifies if the files should get timestamped.\n\nBy default, the resulting file will not have a timestamp in the file name (i.e. `exchange_rate.jsonl`).\n\nIf this option gets set to `true`, the resulting file will have a timestamp associated with it (i.e. `exchange_rate-{timestamp}.jsonl`).\n"
},
{
"name": "custom_name",
"kind": "string",
"label": "Custom File Name Override",
"description": "Specifies a custom name for the filename, instead of the stream name.\n\nThe file name will be `{custom_name}-{timestamp}.jsonl`, if `do_timestamp_file` is `true`.\nOtherwise the file name will be `{custom_name}.jsonl`.\n\nIf custom name is not provided, the stream name will be used.\n"
}
]
}
Loading