Skip to content

add a method to access Google Cloud voice API with credential file#2598

Open
thzjy wants to merge 2 commits intozhayujie:masterfrom
thzjy:add_method/google-cloud-voice
Open

add a method to access Google Cloud voice API with credential file#2598
thzjy wants to merge 2 commits intozhayujie:masterfrom
thzjy:add_method/google-cloud-voice

Conversation

@thzjy
Copy link
Contributor

@thzjy thzjy commented May 28, 2025

No description provided.

@safe1ine
Copy link

safe1ine commented Feb 6, 2026

@MonkeyCode-AI review 一下

@MonkeyCode-AI
Copy link

MonkeyCode-AI 正在分析任务...

Copy link

@MonkeyCode-AI MonkeyCode-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我是 MonkeyCode AI 编程助手,你可以在 GitHub 仓库的 PR 中 at @MonkeyCode-AI 来呼唤我。

任务执行细节请参考: https://monkeycode-ai.com/tasks/public?id=efa8295a-4056-437e-ba5b-9933898aba10

代码审查结果

新增 Google Cloud STT/TTS 适配实现了基本功能,但模块导入即写入凭据环境变量带来安全/部署副作用,且音频编码一致性与临时文件清理等可靠性问题需修复后再合并。

✨ 代码亮点

  • STT/TTS 均封装为 Voice 子类方法,调用侧使用 Reply/ReplyType 统一返回
  • 对 AMR/MP3 先转 WAV 再识别,具备基本的格式兼容思路
🚨 Critical ⚠️ Warning 💡 Suggestion
1 2 1

Comment on lines +14 to +15
cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

🚨 导入模块即设置 GOOGLE_APPLICATION_CREDENTIALS,且强依赖仓库目录内凭据文件,存在安全/部署风险与副作用

模块顶层拼接 google-credentials.json 路径并写入 os.environ["GOOGLE_APPLICATION_CREDENTIALS"]:1) 导入即产生全局副作用,影响进程内其他 Google SDK 客户端/模块;2) 强依赖代码目录存在凭据文件,容器/线上环境/只读文件系统下易失效;3) 诱导将密钥文件放入仓库目录,凭据泄露风险高。更合理的是使用 Application Default Credentials(ADC);如需指定 key file,应通过配置/参数显式传入,并在初始化时用显式凭据创建 client,而非改全局环境变量。

建议: 移除模块顶层环境变量写入;在 init 支持通过环境变量/配置传入 key file 路径,并使用 service_account.Credentials.from_service_account_file 创建客户端;未提供则走默认凭据(ADC)。

Suggested change
cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path
import os
import time
import uuid
from google.cloud import speech
from google.cloud import texttospeech
from google.api_core.exceptions import GoogleAPIError
from google.oauth2 import service_account
from pydub import AudioSegment
from bridge.reply import Reply, ReplyType
from common.log import logger
from common.tmp_dir import TmpDir
from voice.voice import Voice
class GoogleVoice(Voice):
def __init__(self, credentials_path: str | None = None):
super().__init__()
credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
if credentials_path:
credentials = service_account.Credentials.from_service_account_file(credentials_path)
self.speech_client = speech.SpeechClient(credentials=credentials)
self.tts_client = texttospeech.TextToSpeechClient(credentials=credentials)
else:
self.speech_client = speech.SpeechClient()
self.tts_client = texttospeech.TextToSpeechClient()

)

# 执行语音识别
response = self.speech_client.recognize(config=config, audio=audio)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ 语音识别配置固定为 LINEAR16/16000,但输入 WAV 实际编码可能不匹配,可能导致识别失败或效果差

convert_audio_to_wav 仅设置采样率/声道并导出 wav,但未显式保证导出为 16-bit PCM(LINEAR16)。当输入本身为 wav 时又直接按 LINEAR16/16000 送入 API,若实际为其他采样率或压缩编码,会导致识别报错或质量下降。

建议: 对所有输入统一转成 PCM16 mono 16000Hz,并在 config 中与转换后的参数保持一致;至少在 wav 分支也执行转换以保证一致性。

Comment on lines +49 to +93
try:
file_ext = os.path.splitext(voice_file)[1].lower()
if file_ext in [".amr", ".mp3"]:
temp_wav_file = f"temp_audio_{uuid.uuid4().hex}.wav"
voice_file = self.convert_audio_to_wav(voice_file, temp_wav_file)
if not voice_file:
logger.error("音频转换失败")
return Reply(ReplyType.ERROR, "音频转换失败")
elif file_ext != ".wav":
logger.error("不支持的音频格式,仅支持 AMR、MP3 和 WAV")
return Reply(ReplyType.ERROR, "不支持的音频格式,仅支持 AMR、MP3 和 WAV")

with open(voice_file, "rb") as audio_file:
audio_content = audio_file.read()

# 配置音频和识别设置(中文普通话)
audio = speech.RecognitionAudio(content=audio_content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="cmn-CN",
)

# 执行语音识别
response = self.speech_client.recognize(config=config, audio=audio)

# 提取转录结果
transcript = ""
for result in response.results:
transcript += result.alternatives[0].transcript + " "

transcript = transcript.strip()
if not transcript:
logger.error("语音识别失败:无法理解音频内容")
return Reply(ReplyType.ERROR, "抱歉,我听不懂")

logger.info(f"[Google] voiceToText text={transcript} voice file name={voice_file}")
reply = Reply(ReplyType.TEXT, transcript)

# 清理临时 WAV 文件
if file_ext in [".amr", ".mp3"] and os.path.exists(voice_file):
os.remove(voice_file)

return reply

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ 临时 WAV 文件清理不可靠:异常路径/早返回时可能泄漏临时文件

目前仅在成功路径末尾依据 file_ext 清理临时 wav。若 recognize 抛异常或中途 return(如 transcript 为空)临时文件会遗留。此外 voice_file 变量在 amr/mp3 场景会被覆盖为临时 wav 路径,清理逻辑再依赖 file_ext 判断,容易遗漏其他转换场景。

建议: 使用 try/finally 统一清理临时文件;用单独变量 temp_wav_path 保存临时文件路径,避免覆盖原始 voice_file。

@@ -0,0 +1 @@
需要在google cloud控制台创建授权项目,分配IAM角色和权限,下载自己的密钥文件, 把密钥文件命名为google-credentials.json放在本目录。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

💡 缺少安全指引:密钥文件放入代码目录有泄漏风险

note.txt 指导将密钥文件放在本目录并使用固定文件名,容易被误提交到仓库或打包进镜像,导致凭据泄漏。

建议: 补充安全指引:通过环境变量/Secret Manager/挂载方式提供凭据并确保被 .gitignore 忽略;推荐使用 ADC 或工作负载身份。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants