[AI] Add hybrid inference support in `GenerativeModelSession` by andrewheard · Pull Request #16043 · firebase/firebase-ios-sdk

andrewheard · 2026-04-04T22:42:43Z

Started adding support for hybrid (on-device and cloud) inference. This is internally implemented as an array of fallback models, trying one model session and moving onto the next. This will be publicly exposed as "prefer cloud" or "prefer on-device", which just impacts the order of the models in the array. This could be expanded to other fallback strategies in the future if desired (e.g., Vertex AI --> Gemini Dev API, Gemini 3.1 --> Gemini 2.5) to handle cases when backends or models are resource constrained.

Note: This PR adds the internal scaffolding for hybrid support but does not expose it publicly.

TODOs:

Add more integration tests and add unit tests
Add documentation
Add changelog entry
Cleanup

#no-changelog

gemini-code-assist · 2026-04-04T22:42:48Z

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

# Conflicts: # FirebaseAI/Sources/Protocols/Public/ToolRepresentable.swift

gemini-code-assist

Code Review

This pull request implements a hybrid model session architecture that enables fallback capabilities between multiple language models, specifically integrating Apple's Foundation Models alongside Gemini. It introduces a SessionManager to coordinate model selection and session state, refactors GenerativeModelSession to support this multi-model approach, and adds the necessary protocol abstractions and implementations. Review feedback identifies important improvement opportunities regarding resource management, specifically recommending explicit cancellation of detached tasks in streaming responses to prevent unnecessary background activity, and suggests correcting a misleading error message in the unary response path.

FirebaseAI/Sources/Extensions/Internal/LanguageModelSession+ModelSession.swift

FirebaseAI/Sources/Types/Internal/GeminiModelSession.swift

FirebaseAI/Sources/GenerativeModelSession.swift

andrewheard · 2026-04-09T00:07:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a hybrid model session architecture, allowing for fallback mechanisms between different language models (e.g., Gemini and Apple's Foundation Models). It refactors GenerativeModelSession to use a SessionManager for handling multiple LanguageModel instances and introduces internal protocols like ModelSession and LanguageModel to abstract model-specific implementations. Feedback focuses on improving error handling for unsupported part types, reducing code duplication in response processing, and aligning property naming with Swift idioms.

gemini-code-assist · 2026-04-09T00:12:34Z

FirebaseAI/Sources/Types/Public/Part.swift

+            throw GenerativeModelSession.GenerationError.internalError(
+              GenerativeModelSession.GenerationError.Context(
+                debugDescription: """
+                Prompt data type "\(data)" is not supported by Foundation Models.
+                """
+              ),
+              underlyingError: NSError(domain: Constants.baseErrorDomain, code: 0)
+            )


You've noted with a TODO that a custom error type would be better here. I agree. Using a generic internalError with a generic NSError for a specific, known failure condition (unsupported part type) makes error handling less precise for consumers of the API.

I'd recommend adding a more specific case to GenerationError, for example:

enum GenerationError: Error { // ... case unsupportedPartTypeForOnDeviceModel(Context) }

Then you could throw this more specific error, which would make it easier to handle this particular failure case programmatically.

gemini-code-assist · 2026-04-09T00:12:34Z

FirebaseAI/Sources/Extensions/Internal/LanguageModelSession+ModelSession.swift

+      // TODO: Extract common response handling code into a helper method.
+      let responseText: String
+      if schema == nil, case let .string(text) = response.rawContent.kind {
+        responseText = text
+      } else {
+        responseText = response.rawContent.jsonString
+      }
+
+      let generatedContent = response.rawContent.firebaseGeneratedContent
+      let modelContent = ModelContent(
+        role: "model",
+        parts: [InternalPart(.text(responseText), isThought: false, thoughtSignature: nil)]
+      )
+      let candidate = Candidate(
+        content: modelContent,
+        safetyRatings: [],
+        finishReason: nil,
+        citationMetadata: nil
+      )
+      let rawResponse = GenerateContentResponse(
+        candidates: [candidate],
+        modelVersion: SystemLanguageModel.modelName
+      )
+
+      return GenerativeModelSession.Response(
+        content: generatedContent,
+        rawContent: generatedContent,
+        rawResponse: rawResponse
+      )


There's significant code duplication between this block in respond and a similar block in streamResponse (lines 111-141). You've already noted this with a TODO. To improve maintainability, you could extract this logic into a private helper function.

Here's a suggestion for what that might look like:

private func processFoundationModelsResponse( _ fmResponse: FoundationModels.LanguageModelSession.Response<FoundationModels.GeneratedContent>, schema: FirebaseAI.GenerationSchema? ) -> (generatedContent: FirebaseAI.GeneratedContent, rawResponse: GenerateContentResponse) { let responseText: String if schema == nil, case let .string(text) = fmResponse.rawContent.kind { responseText = text } else { responseText = fmResponse.rawContent.jsonString } let generatedContent = fmResponse.rawContent.firebaseGeneratedContent let modelContent = ModelContent( role: "model", parts: [InternalPart(.text(responseText), isThought: false, thoughtSignature: nil)] ) let candidate = Candidate( content: modelContent, safetyRatings: [], finishReason: nil, citationMetadata: nil ) let rawResponse = GenerateContentResponse( candidates: [candidate], modelVersion: SystemLanguageModel.modelName ) return (generatedContent, rawResponse) }

You could then call this helper from both respond and streamResponse to build the response objects.

gemini-code-assist · 2026-04-09T00:12:34Z

FirebaseAI/Sources/GenerativeModelSession.swift

+      private(set) var _modelSessions = [Int: any ModelSession]()
+      private var _activeSessionIndex: Int?


The properties _modelSessions and _activeSessionIndex use an underscore prefix, which is not idiomatic in Swift for private properties. It's more common to see this in Objective-C or for backing properties for a computed property, which isn't the case here.

Consider renaming them to modelSessions and activeSessionIndex respectively to align better with Swift conventions. This would improve readability and adherence to Swift's style guide.

private(set) var modelSessions = [Int: any ModelSession]() private var activeSessionIndex: Int?

Start hybrid inference support in GenerativeModelSession

7e0f4ee

andrewheard added the api: firebaseai label Apr 4, 2026

andrewheard added 23 commits April 5, 2026 11:58

Refactor for unit testing

08f8be6

Limit availability of ModelSession.streamResponse newer platforms

052868a

Add another @available(macOS 12.0, watchOS 8.0, *)

89a6bdd

Refactor for integration testing

707453e

Add workaround for SystemLanguageModel.isAvailable inaccuracy

104ef88

Replace fatalErrors with assertions and throwing errors

62939de

Simplify respond(to:schema:includeSchemaInPrompt:options:)

7e385e2

Add respondGenerable_fallbackOnFoundationModelsError integration test

7d2af99

Fix Xcode 26.1 build

791b2ae

Merge branch 'main' into ah/ai-hybrid-session

de37236

# Conflicts: # FirebaseAI/Sources/Protocols/Public/ToolRepresentable.swift

Remove undefined Swift compiler directive from integration tests

af5ff39

Implement streaming fallback for Gemini models

1a80e10

Merge branch 'main' into ah/ai-hybrid-session

3e4bf03

Fix streaming error handling bug

6c3f84d

Remove extraneous try and awaits

255de69

Fix Xcode 26.1 build

3b74eff

Add streaming support using LanguageModelSession

4d882c4

Merge branch 'main' into ah/ai-hybrid-session

c0313f0

Wrap modelSessions in a lock and restore Sendable conformance

a5edc19

Replace fatalError() and add TODOs

2194f47

Fix unit tests

1b2a5e0

Disallow session fallbacks until history propagation is implemented

4220a24

Throw on concurrent requests and add SessionManager

5520d31

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

FirebaseAI/Sources/Extensions/Internal/LanguageModelSession+ModelSession.swift Outdated Show resolved Hide resolved

FirebaseAI/Sources/Types/Internal/GeminiModelSession.swift Outdated Show resolved Hide resolved

FirebaseAI/Sources/GenerativeModelSession.swift Outdated Show resolved Hide resolved

andrewheard added 3 commits April 8, 2026 18:18

Handle Task cancellation

7d88acd

Fix error message

9dbfc4d

Another Task cancellation fix

8707fe4

Formatting fix

a9f3dac

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] Add hybrid inference support in `GenerativeModelSession`#16043

[AI] Add hybrid inference support in `GenerativeModelSession`#16043
andrewheard wants to merge 28 commits intomainfrom
ah/ai-hybrid-session

andrewheard commented Apr 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewheard commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Uh oh!

gemini-code-assist bot Apr 9, 2026

Uh oh!

gemini-code-assist bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		private(set) var _modelSessions = [Int: any ModelSession]()
		private var _activeSessionIndex: Int?

Conversation

andrewheard commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Apr 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewheard commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andrewheard commented Apr 4, 2026 •

edited

Loading