Skip to content

[AI] Add hybrid inference support in GenerativeModelSession#16043

Draft
andrewheard wants to merge 28 commits intomainfrom
ah/ai-hybrid-session
Draft

[AI] Add hybrid inference support in GenerativeModelSession#16043
andrewheard wants to merge 28 commits intomainfrom
ah/ai-hybrid-session

Conversation

@andrewheard
Copy link
Copy Markdown
Contributor

@andrewheard andrewheard commented Apr 4, 2026

Started adding support for hybrid (on-device and cloud) inference. This is internally implemented as an array of fallback models, trying one model session and moving onto the next. This will be publicly exposed as "prefer cloud" or "prefer on-device", which just impacts the order of the models in the array. This could be expanded to other fallback strategies in the future if desired (e.g., Vertex AI --> Gemini Dev API, Gemini 3.1 --> Gemini 2.5) to handle cases when backends or models are resource constrained.

Note: This PR adds the internal scaffolding for hybrid support but does not expose it publicly.

TODOs:

  • Add more integration tests and add unit tests
  • Add documentation
  • Add changelog entry
  • Cleanup

#no-changelog

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a hybrid model session architecture that enables fallback capabilities between multiple language models, specifically integrating Apple's Foundation Models alongside Gemini. It introduces a SessionManager to coordinate model selection and session state, refactors GenerativeModelSession to support this multi-model approach, and adds the necessary protocol abstractions and implementations. Review feedback identifies important improvement opportunities regarding resource management, specifically recommending explicit cancellation of detached tasks in streaming responses to prevent unnecessary background activity, and suggests correcting a misleading error message in the unary response path.

@andrewheard
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a hybrid model session architecture, allowing for fallback mechanisms between different language models (e.g., Gemini and Apple's Foundation Models). It refactors GenerativeModelSession to use a SessionManager for handling multiple LanguageModel instances and introduces internal protocols like ModelSession and LanguageModel to abstract model-specific implementations. Feedback focuses on improving error handling for unsupported part types, reducing code duplication in response processing, and aligning property naming with Swift idioms.

Comment on lines +372 to +379
throw GenerativeModelSession.GenerationError.internalError(
GenerativeModelSession.GenerationError.Context(
debugDescription: """
Prompt data type "\(data)" is not supported by Foundation Models.
"""
),
underlyingError: NSError(domain: Constants.baseErrorDomain, code: 0)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You've noted with a TODO that a custom error type would be better here. I agree. Using a generic internalError with a generic NSError for a specific, known failure condition (unsupported part type) makes error handling less precise for consumers of the API.

I'd recommend adding a more specific case to GenerationError, for example:

enum GenerationError: Error {
  // ...
  case unsupportedPartTypeForOnDeviceModel(Context)
}

Then you could throw this more specific error, which would make it easier to handle this particular failure case programmatically.

Comment on lines +45 to +73
// TODO: Extract common response handling code into a helper method.
let responseText: String
if schema == nil, case let .string(text) = response.rawContent.kind {
responseText = text
} else {
responseText = response.rawContent.jsonString
}

let generatedContent = response.rawContent.firebaseGeneratedContent
let modelContent = ModelContent(
role: "model",
parts: [InternalPart(.text(responseText), isThought: false, thoughtSignature: nil)]
)
let candidate = Candidate(
content: modelContent,
safetyRatings: [],
finishReason: nil,
citationMetadata: nil
)
let rawResponse = GenerateContentResponse(
candidates: [candidate],
modelVersion: SystemLanguageModel.modelName
)

return GenerativeModelSession.Response(
content: generatedContent,
rawContent: generatedContent,
rawResponse: rawResponse
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

There's significant code duplication between this block in respond and a similar block in streamResponse (lines 111-141). You've already noted this with a TODO. To improve maintainability, you could extract this logic into a private helper function.

Here's a suggestion for what that might look like:

private func processFoundationModelsResponse(
  _ fmResponse: FoundationModels.LanguageModelSession.Response<FoundationModels.GeneratedContent>,
  schema: FirebaseAI.GenerationSchema?
) -> (generatedContent: FirebaseAI.GeneratedContent, rawResponse: GenerateContentResponse) {
  let responseText: String
  if schema == nil, case let .string(text) = fmResponse.rawContent.kind {
    responseText = text
  } else {
    responseText = fmResponse.rawContent.jsonString
  }

  let generatedContent = fmResponse.rawContent.firebaseGeneratedContent
  let modelContent = ModelContent(
    role: "model",
    parts: [InternalPart(.text(responseText), isThought: false, thoughtSignature: nil)]
  )
  let candidate = Candidate(
    content: modelContent,
    safetyRatings: [],
    finishReason: nil,
    citationMetadata: nil
  )
  let rawResponse = GenerateContentResponse(
    candidates: [candidate],
    modelVersion: SystemLanguageModel.modelName
  )
  return (generatedContent, rawResponse)
}

You could then call this helper from both respond and streamResponse to build the response objects.

Comment on lines +441 to +442
private(set) var _modelSessions = [Int: any ModelSession]()
private var _activeSessionIndex: Int?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The properties _modelSessions and _activeSessionIndex use an underscore prefix, which is not idiomatic in Swift for private properties. It's more common to see this in Objective-C or for backing properties for a computed property, which isn't the case here.

Consider renaming them to modelSessions and activeSessionIndex respectively to align better with Swift conventions. This would improve readability and adherence to Swift's style guide.

      private(set) var modelSessions = [Int: any ModelSession]()
      private var activeSessionIndex: Int?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant