Skip to content

Potential performance improvement: "Fast forward content streams" #5264

@bwaidelich

Description

@bwaidelich

TL;DR: Today we always copy all edges when bringing a workspace in sync even if it does not contain any changes. It might be a worthwhile performance improvement to only "patch" the existing content stream

Imagine the following scenario:

With this simplified graph:

graph TD
    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1
    
    linkStyle 5,6,7,8,9 stroke:blue;
Loading

(black = live content stream, blue = user content stream)

When node a is modified in the live workspace, a copy is created and the edges moved like this:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    linkStyle 0,2,3 stroke-width: 2px;
    linkStyle 5,6,7,8,9 stroke:blue;
Loading

(bold = new edges)

As a result, the user workspace is out of date and needs to be synced.
Today this is done by creating a new content stream, i.e. copying all edges from the target (live) workspace:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 5,6,7,8,9 stroke:blue;
    linkStyle 10,11,12,13,14 stroke:red, stroke-width: 2px;
Loading

(red = edges for the new user content stream)

In reality this means that loads of edges are now the same in all three content streams.

The resulting events are

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        branch contentstream-u2
        commit id:"CS Forked'"
        checkout workspace-u
        commit id:"WS Rebased"
Loading

Suggestion

Instead of starting a new content stream with the ContentStreamWasForked event we could publish some new event (e.g. ContentStreamWasSynced.. that contains the new versionOfSourceContentStream) and then "only" add/remove edges that are affected:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 6,9 stroke:blue;
    linkStyle 5,7,8 stroke:blue, stroke-width: 2px;
Loading

And the resulting events:

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        checkout contentstream-u1
        commit id:"CS Synced"
Loading

Of course this can only work if the content stream itself does not contain any changes.

Considerations

Most/all places that currently react to ContentStreamWasForked events (currently that is the ContentGraph-, ContentStream and AssetUsage-projection) need to also handle the new ContentStreamWasSynced (or similar) event.
Furthermore the WorkspaceWasRebased event is no longer published in these cases – So probably the Neos content cache flusher needs to handle the new event, too.

The most complex part is probably the actual performance optimization to create only the missing edges. In a first implementation we could simplify this by always removing and re-creating all edges, allowing the main performance gain to be done in a non-breaking manner

Related: #4388

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions