-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
Description
Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Right now all images are packed before text prompts. There is no way to control the order of multi-modal context with text prompts.
Describe the solution you'd like
The desire is to have finer control over the exact order of multi-modal context:
<images> <text>
Page1 <image1> page2 <image2> <more text>
{“<image1”>: path, “<image2>”: path2}
“Page1 <image1> page2 <image2>”
“Frame 1 timestamp x” <video frame1>
<audio> <image> <video> <image> <image> <audio>
This is necessary when working with multi-modal data.
Describe alternatives you've considered
No response
Additional context
No response
Reactions are currently unavailable