Ability to control order of multi-modal context along with prompts

### Priority Level

Medium (Nice to have)

### Is your feature request related to a problem? Please describe.

Right now all images are packed before text prompts. There is no way to control the order of multi-modal context with text prompts.

### Describe the solution you'd like

The desire is to have finer control over the exact order of multi-modal context:

```
<images> <text>
Page1 <image1> page2 <image2> <more text>

{“<image1”>: path, “<image2>”: path2}
“Page1 <image1> page2 <image2>”

“Frame 1 timestamp x” <video frame1>
<audio> <image> <video> <image> <image> <audio>
```

This is necessary when working with multi-modal data.



### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to control order of multi-modal context along with prompts #352

Priority Level

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ability to control order of multi-modal context along with prompts #352

Description

Priority Level

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions