Skip to content

Add tts-finished state to simple_state sensor, continue_conversation switch and custom_responses_blueprint#22

Open
relust wants to merge 1 commit intojeffc:devfrom
relust:relusto1
Open

Add tts-finished state to simple_state sensor, continue_conversation switch and custom_responses_blueprint#22
relust wants to merge 1 commit intojeffc:devfrom
relust:relusto1

Conversation

@relust
Copy link

@relust relust commented Jan 23, 2025

Hello. Congratulations on what you've been able to do with this integration. In order to be able to properly implement Hassmic integration in the ViewAssist project, I made some changes to the code.

  1. Tts-listening state implementation
    One of the things needed to use this integration is to be as precise as possible when the tts message actually finishes playing because TTS_END marks when the message is sent to the player and not when it finishes playing. That's why I took this implementation from the StreamAssist integration and looked for the best way to implement it in Hassmic. Practically, when the TTS_END event is executed, the URL of the tts is taken and using mutagen.mp3 to calculate the effective duration of the message. Then in the SimpeState sensor I put a new state tts-finished.
  2. Correcting the publication order of stt-speaking, stt-listening, wake_word-listening/stt-listening statuses
    There is a problem though with the fact that the wake_word-listening state fires before the message finishes playing and at the end it remains as the last state tts-finished and not wake_word-listening. That's why I modified the code to respect the order of publishing the statuses: stt-speaking, stt-listeningand at the endwake_word-listening or stt-listening, depending on how the continue conversation switch is set. Practically, I added a delay in the publication of wake_word-listening or stt-listening statuses until the tts message is finished playing. This does not affect the functionality of the satellite in any way because the Simple state sensor is just a sensor made especially for the View Assist project. Listening to the wake word starts immediately after TTS-END, but in order for the avatar to remain talking until the end of the message playback, this is necessary.
  3. Continue conversation switch
    I also added a switch to enable continue conversation. Changes must also be made in the _init_.py file from the switch directory, in the pipeline_manager.py file and add continue_conversation.py . To be able to use continuous conversation in Hassmic, a switch can be used to choose where to start the pipeline, from WAKE-WORD or from STT-START. In the continue_conversatin switch code, you can set how many seconds to wait for the next command until it switches back to wake word detection. From what I noticed, after 9 seconds it is no longer in the listening state and the pipeline resumes from the start.
            done, pending = await asyncio.wait(
                [vad_task, error_task],
                timeout=9,
                return_when=asyncio.FIRST_COMPLETED,
            )

Because stt-listening starts before the tts message is finished playing, I made an automation in the continue_conversation switch code that mutes the microphone between the tts-speaking and tts-listening statuses of the simple_state sensor. Because I failed to import the simple_state sensor internally in the continue_conversation switch, I chose to import it from the home assistant, but I think it would work better if you managed to import it directly from the hassmic integration.

  1. Conversation_id
    In pipeline_manager I put conversation_id=”123456789” and it works fine with OpenAi to retain conversation history for last 20 messages. Maybe we will think of a logic to generate random ids for conversation_id but keep a specific Id in certain situations if we want to retain the history of the conversation. I believe that the satellite, as in our Hassmic case, is the one that generates the conversation_id and the Home Assistant pipeline takes the conversation_id from the satellite.
                    ),
                   start_stage=start_stage,  # It uses dynamic logic for the starting stage
                   device_id=self._device.id,
                   conversation_id="123456789",  # Or get this value from a dynamic source
               )
  1. Custom responses blueprint
    Optionally you can use uith that the custom_responses blueprint which gives an answer after detecting the wake word (Ex: yes i'm listening) or if you say ok nabu twice or when it is in continue_conversation mode and you don't realize that the satellite is actually waiting for the command and not the wake word.
blueprint:
  name: Viewassist Custom Responses
  description: Automation for custom responses in Viewassist with selectable media content ID, media player, microphone, and custom wake words.
  domain: automation
  input:
    trigger_entity:
      name: Trigger Entity
      description: Microphone entity sensor of satellite ex. sensor.viewassist_office_simple_state
      selector:
        entity:
          domain: sensor
    media_player:
      name: Media Player
      description: Select the media player to use
      selector:
        entity:
          domain: media_player
    microphone_switch:
      name: Microphone Switch
      description: Select the microphone switch to turn off and on
      selector:
        entity:
          domain: switch
    media_content_id:
      name: Media Content ID
      description: The media content ID to play (optional, with a default message)
      default: "media-source://tts/edge_tts?message={{ ['how can I help you', 'yes, i`m listening', 'how can assist you'] | random }}&language=en-US-ChristopherNeural"
      selector:
        text: {}
    custom_wake_words:
      name: Custom Wake Words
      description: Add custom wake words (e.g., 'ok nabu'). Add multiple values if needed.
      default: ["ok nabu", "ok naboo", "okay naboo"]
      selector:
        object: {}
    conversation_response:
      name: Set Conversation Response
      description: The response text for custom wake words
      default: "say please"
      selector:
        text: {}

trigger:
  - platform: state
    entity_id: !input "trigger_entity"
    to: wake_word-detected
    id: wake_word_detected
  - platform: conversation
    command: !input "custom_wake_words"
    id: custom_wake_word_detected

condition: []

action:
  - choose:
      - conditions:
          - condition: trigger
            id: wake_word_detected
        sequence:
          - service: switch.turn_off
            target:
              entity_id: !input "microphone_switch"
          - service: media_player.play_media
            target:
              entity_id: !input "media_player"
            data:
              media_content_id: !input "media_content_id"
              media_content_type: provider
              announce: true
          - wait_for_trigger:
              - platform: state
                entity_id: !input "media_player"
                to: idle
            timeout:
              hours: 0
              minutes: 0
              seconds: 1
              milliseconds: 500
          - service: switch.turn_on
            target:
              entity_id: !input "microphone_switch"
      - conditions:
          - condition: trigger
            id: custom_wake_word_detected
        sequence:
          - service: media_player.play_media
            target:
              entity_id: !input "media_player"
            data:
              media_content_id: !input "media_content_id"
              media_content_type: provider
              announce: true
          - set_conversation_response: !input "conversation_response"
            enabled: true

mode: single

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant