Skip to content

Live API for just quick transcription ? Or is gemini-2.5-flash-lite fasest possible? #1123

@lukaLLM

Description

@lukaLLM

Description of the feature request:

Hi all,
I looked through cookbooks and example to get quick transcription of 10-15 sec of audio pcm mono channel 16kHz. Live api seem quickly on conversation so I though I could get it to just do transcription. I was testing it like 2-3 month ago. For example example below I get (around 1 s +- 100-200 ms to get whole text back) . I tried live api examples but it gave me same time to get the transcription based on examples. Or some errors as the api seem to change a bit. Could anybody let me know example of how to get it quicker than below or it fastest way to do it?

response = await self.client.aio.models.generate_content(
model='models/gemini-2.5-flash-lite',
contents=[
prompt,
types.Part.from_bytes(
data=processed_audio_bytes,
mime_type='audio/wav',
)
]
)
transcribed_text = response.text

I also tried

response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents=[
        file_upload,
        "Transcribe this audio exactly as spoken. Output only the text."
    ],
    config={
        "response_modalities": ["TEXT"], # We want text back, not audio
    }
)

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

Metadata

Metadata

Assignees

Labels

status:awaiting responseAwaiting a response from the authorstatus:staleIssue/PR is marked for closure due to inactivitytype:feature requestNew feature request/enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions