Skip to content

"External relation" API #2801

@techninja1008

Description

@techninja1008

Problem Statement

The design of SpiceDB necessitates syncing relationships into SpiceDB, through one of many possible methods. This is a non-trivial engineering task (to implement correctly) making it overly difficult in a number of cases, and in rare cases, impossible (such as when data isn't available in a form that would permit bulk syncing in advance). A secondary downside is that it also results in duplicated storage of data: once in the source, and once in the SpiceDB.

For these reasons, I'm proposing the addition of an "external relation" API for SpiceDB. This would take the form of a standardized network API (gRPC, although supporting HTTP is also likely worth consideration) that SpiceDB acts as a client of. SpiceDB would then be configured such that specific relations (eg group#member) are set as an external relation. This would change the read path of SpiceDB (practically, a specific subset of calls to QueryRelationships and ReverseQueryRelationships) so that queries of those relations are instead redirected to the configured external relation upstream. Ergonomics and the principal of least surprise likely dictate that relations configured as such should not be writable via the write side of the API. They would also necessarily be excluded from certain features, such as watching and bulk export.

Unordered set of use cases that this would enable:

  • Integration with APIs managed by 3rd parties/other teams in large organisations - a common pattern is to have an API that allows retrieval of the group membership of a specific user, but not necessarily bulk retrieval (for a variety of reasons). Currently, it would be impossible to integrate the data from systems like this into SpiceDB. This proposal would enable integration of this data using a shim.
  • Direct application integration - for some types of application data (in particular, for applications that are event sourced) it could be simpler, and potentially even more performant to integrate certain relations into SpiceDB with the proposed interface above. SpiceDB would still add a large amount of value as a caching and graph-traversal layer. This would be particularly relevant if the API allowed proper integration of the "Zookie"/Revision concept, where the application supports it.
  • Federation of SpiceDB instances - for a variety of reasons (seperation of concerns, scalability and security) it can be beneficial for different teams within a large organisation to operate/own their own individual SpiceDB instances, particularly if they have mostly-non-overlapping data models. Even in these cases though, it's highly likely that a subset of the data model would be the same (for example, users and groups). Presently, a trade off between a single shared instance and multiple isolated instances with duplication would have to be made. If, with the above proposal, SpiceDB also implemented the upstream side of the API, then it would be possible to have a common "enterprise-wide" SpiceDB instance with the common data model, and per-application/context SpiceDB instances that federate back to the common one for specific relations. Such external relations could even be represented as multi-step traversals in the common SpiceDB instance if necessary/useful.

Solution Brainstorm

I've experimented a bit with a local copy of the codebase to implement the above. I ended up implementing it as another proxy datastore, which looks out for a new query option (UseExternalRelationships). That flag is set on datastore queries from pretty much everywhere where the query shape knows at least the resource type and relation in advance. If the flag is set, it validates the shape of the query, and then checks if it matches a list of configured "external" RelationReference. If so, it dispatches it to the remote external relation API backend.

It's mostly just a POC to check the idea works and makes sense. There's a number of things that would need fixing/redoing to actually production-ize it:

  • I used a thrown-together HTTP API. gRPC is almost certainly a must, and the API would need to be more thought out. (eg. my POC implementation just directly exposes the guts of the ellipsis relation etc)
  • No (present) consideration on how revisions play into the API. These would be important for at least some of the listed use cases (app integration, federation)
  • More consideration needs to be given as to what parts of SpiceDB this does/doesn't touch. I simply went for "anything where the query shape fits". This is likely to be close to correct, however there's possibly some places where that's not right.
  • Consideration on how a relation is configured as external. In POC, I simply have a CLI flag. There's an argument that there should be some sort of annotation in the schema, however that raises wider ecosystem questions (how would that behave in the playground?) Either way, the URL/config for the actual backend would have to go on the CLI anyway, and as it's not strictly a schema concern I think it makes slightly more sense to keep it out of the schema.
  • More thought to how configuring a relation as external impacts the write side. IMO, it makes the most sense if you can't write any relations that are external. It would also be nice if there was a way to make sure you can only configure a relation as external if there aren't already any relationships written under it, although I don't forsee any way to enforce this with a CLI flag. This would be a lot more feasible though if it was a schema change instead (another factor in that consideration).
  • Is this even something that fits the vision of SpiceDB? (It does significantly depart from the Zanzibar paper, but IMO the benefits in versatility are worth it given the relative small amount of technical work and change required. Being able to essentially bolt the read side of SpiceDB on to an arbitrary data source would be huge. I'm sure there's more use cases than the ones I've listed above.)
  • If the federation use case is desirable, slightly more thought should be given to what the upstream-implementation side of that would look like for SpiceDB, before design choices are committed to that could make that more difficult than necessary.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions