provide simple implementation of one-level lineage optimized for parent jobs#2657
provide simple implementation of one-level lineage optimized for parent jobs#2657julienledem wants to merge 10 commits intomainfrom
Conversation
…nt jobs Signed-off-by: Julien Le Dem <julien@apache.org>
✅ Deploy Preview for peppy-sprite-186812 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
| } | ||
|
|
||
|
|
||
| lineageDao.getDirectLineageFromParent("foo", "bar"); |
There was a problem hiding this comment.
oops, I will clean up that test
Signed-off-by: Julien Le Dem <julien@apache.org>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2657 +/- ##
============================================
+ Coverage 83.35% 84.08% +0.72%
+ Complexity 1295 1080 -215
============================================
Files 244 203 -41
Lines 5948 5052 -896
Branches 279 244 -35
============================================
- Hits 4958 4248 -710
+ Misses 844 684 -160
+ Partials 146 120 -26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Julien Le Dem <julien@apache.org>
Signed-off-by: Julien Le Dem <julien@apache.org>
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
I like the feature. I put some questions in comments as I would like to understand more why do we need separate DAO methods to support this API call.
api/src/main/java/marquez/db/mappers/SimpleLineageEdgeMapper.java
Outdated
Show resolved
Hide resolved
…ith name Signed-off-by: Julien Le Dem <julien@apache.org>
Signed-off-by: Julien Le Dem <julien@apache.org>
Signed-off-by: Julien Le Dem <julien@apache.org>
Signed-off-by: Julien Le Dem <julien@apache.org>
1354701 to
53ff595
Compare
Signed-off-by: Julien Le Dem <julien@apache.org>
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
Two minor comments added.
I think that SQL query and tests are already fine.
|
|
||
| public record DirectLineageEdge( | ||
| JobId job1, | ||
| String direction, |
There was a problem hiding this comment.
Why not using existing IOType enum? It took me some time to understand what direction is.
Would it make sense to replace job1,job2 with job, upstreamJob?
| @Consumes(APPLICATION_JSON) | ||
| @Produces(APPLICATION_JSON) | ||
| @Path("/lineage/direct") | ||
| public Response getDirectLineage(@QueryParam("parentJobNodeId") @NotNull NodeId parentJobNodeId) { |
There was a problem hiding this comment.
Please mind updating openapi.yaml and changeling
|
Hi ! Do you know if you will continue to work on this PR ? This feature seems quite interesting. Thanks for you reply ! |
Problem
The main lineage graph API focuses on individual jobs and is not easy to use when one wants coverage of a all the children of a parent job.
Solution
This new endpoint provides a non-recursive one level of lineage for all the children of a given parent job.
This will facilitate for example if someone wants to retrieve all the lineage of a given Airflow DAG.
It will return all its children (tasks) and all the datasets they consume or produce as well as the other tasks and DAGs producing and consuming them.
Example:
GET /api/v1/lineage/simple?nodeId=job:default:order_analysis
Checklist
CHANGELOG.md(Depending on the change, this may not be necessary)..sqldatabase schema migration according to Flyway's naming convention (if relevant)