Skip to content

Ability to control order of multi-modal context along with prompts #352

@nabinchha

Description

@nabinchha

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Right now all images are packed before text prompts. There is no way to control the order of multi-modal context with text prompts.

Describe the solution you'd like

The desire is to have finer control over the exact order of multi-modal context:

<images> <text>
Page1 <image1> page2 <image2> <more text>

{“<image1”>: path, “<image2>”: path2}
“Page1 <image1> page2 <image2>”

“Frame 1 timestamp x” <video frame1>
<audio> <image> <video> <image> <image> <audio>

This is necessary when working with multi-modal data.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions