add a method to access Google Cloud voice API with credential file#2598
add a method to access Google Cloud voice API with credential file#2598thzjy wants to merge 2 commits intozhayujie:masterfrom
Conversation
|
@MonkeyCode-AI review 一下 |
|
MonkeyCode-AI 正在分析任务... |
MonkeyCode-AI
left a comment
There was a problem hiding this comment.
我是 MonkeyCode AI 编程助手,你可以在 GitHub 仓库的 PR 中 at @MonkeyCode-AI 来呼唤我。
任务执行细节请参考: https://monkeycode-ai.com/tasks/public?id=efa8295a-4056-437e-ba5b-9933898aba10
代码审查结果
新增 Google Cloud STT/TTS 适配实现了基本功能,但模块导入即写入凭据环境变量带来安全/部署副作用,且音频编码一致性与临时文件清理等可靠性问题需修复后再合并。
✨ 代码亮点
- STT/TTS 均封装为 Voice 子类方法,调用侧使用 Reply/ReplyType 统一返回
- 对 AMR/MP3 先转 WAV 再识别,具备基本的格式兼容思路
| 🚨 Critical | 💡 Suggestion | |
|---|---|---|
| 1 | 2 | 1 |
| cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json") | ||
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path |
There was a problem hiding this comment.
Caution
🚨 导入模块即设置 GOOGLE_APPLICATION_CREDENTIALS,且强依赖仓库目录内凭据文件,存在安全/部署风险与副作用
模块顶层拼接 google-credentials.json 路径并写入 os.environ["GOOGLE_APPLICATION_CREDENTIALS"]:1) 导入即产生全局副作用,影响进程内其他 Google SDK 客户端/模块;2) 强依赖代码目录存在凭据文件,容器/线上环境/只读文件系统下易失效;3) 诱导将密钥文件放入仓库目录,凭据泄露风险高。更合理的是使用 Application Default Credentials(ADC);如需指定 key file,应通过配置/参数显式传入,并在初始化时用显式凭据创建 client,而非改全局环境变量。
建议: 移除模块顶层环境变量写入;在 init 支持通过环境变量/配置传入 key file 路径,并使用 service_account.Credentials.from_service_account_file 创建客户端;未提供则走默认凭据(ADC)。
| cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json") | |
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path | |
| import os | |
| import time | |
| import uuid | |
| from google.cloud import speech | |
| from google.cloud import texttospeech | |
| from google.api_core.exceptions import GoogleAPIError | |
| from google.oauth2 import service_account | |
| from pydub import AudioSegment | |
| from bridge.reply import Reply, ReplyType | |
| from common.log import logger | |
| from common.tmp_dir import TmpDir | |
| from voice.voice import Voice | |
| class GoogleVoice(Voice): | |
| def __init__(self, credentials_path: str | None = None): | |
| super().__init__() | |
| credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS") | |
| if credentials_path: | |
| credentials = service_account.Credentials.from_service_account_file(credentials_path) | |
| self.speech_client = speech.SpeechClient(credentials=credentials) | |
| self.tts_client = texttospeech.TextToSpeechClient(credentials=credentials) | |
| else: | |
| self.speech_client = speech.SpeechClient() | |
| self.tts_client = texttospeech.TextToSpeechClient() |
| ) | ||
|
|
||
| # 执行语音识别 | ||
| response = self.speech_client.recognize(config=config, audio=audio) |
There was a problem hiding this comment.
Warning
convert_audio_to_wav 仅设置采样率/声道并导出 wav,但未显式保证导出为 16-bit PCM(LINEAR16)。当输入本身为 wav 时又直接按 LINEAR16/16000 送入 API,若实际为其他采样率或压缩编码,会导致识别报错或质量下降。
建议: 对所有输入统一转成 PCM16 mono 16000Hz,并在 config 中与转换后的参数保持一致;至少在 wav 分支也执行转换以保证一致性。
| try: | ||
| file_ext = os.path.splitext(voice_file)[1].lower() | ||
| if file_ext in [".amr", ".mp3"]: | ||
| temp_wav_file = f"temp_audio_{uuid.uuid4().hex}.wav" | ||
| voice_file = self.convert_audio_to_wav(voice_file, temp_wav_file) | ||
| if not voice_file: | ||
| logger.error("音频转换失败") | ||
| return Reply(ReplyType.ERROR, "音频转换失败") | ||
| elif file_ext != ".wav": | ||
| logger.error("不支持的音频格式,仅支持 AMR、MP3 和 WAV") | ||
| return Reply(ReplyType.ERROR, "不支持的音频格式,仅支持 AMR、MP3 和 WAV") | ||
|
|
||
| with open(voice_file, "rb") as audio_file: | ||
| audio_content = audio_file.read() | ||
|
|
||
| # 配置音频和识别设置(中文普通话) | ||
| audio = speech.RecognitionAudio(content=audio_content) | ||
| config = speech.RecognitionConfig( | ||
| encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, | ||
| sample_rate_hertz=16000, | ||
| language_code="cmn-CN", | ||
| ) | ||
|
|
||
| # 执行语音识别 | ||
| response = self.speech_client.recognize(config=config, audio=audio) | ||
|
|
||
| # 提取转录结果 | ||
| transcript = "" | ||
| for result in response.results: | ||
| transcript += result.alternatives[0].transcript + " " | ||
|
|
||
| transcript = transcript.strip() | ||
| if not transcript: | ||
| logger.error("语音识别失败:无法理解音频内容") | ||
| return Reply(ReplyType.ERROR, "抱歉,我听不懂") | ||
|
|
||
| logger.info(f"[Google] voiceToText text={transcript} voice file name={voice_file}") | ||
| reply = Reply(ReplyType.TEXT, transcript) | ||
|
|
||
| # 清理临时 WAV 文件 | ||
| if file_ext in [".amr", ".mp3"] and os.path.exists(voice_file): | ||
| os.remove(voice_file) | ||
|
|
||
| return reply | ||
|
|
There was a problem hiding this comment.
Warning
目前仅在成功路径末尾依据 file_ext 清理临时 wav。若 recognize 抛异常或中途 return(如 transcript 为空)临时文件会遗留。此外 voice_file 变量在 amr/mp3 场景会被覆盖为临时 wav 路径,清理逻辑再依赖 file_ext 判断,容易遗漏其他转换场景。
建议: 使用 try/finally 统一清理临时文件;用单独变量 temp_wav_path 保存临时文件路径,避免覆盖原始 voice_file。
| @@ -0,0 +1 @@ | |||
| 需要在google cloud控制台创建授权项目,分配IAM角色和权限,下载自己的密钥文件, 把密钥文件命名为google-credentials.json放在本目录。 | |||
There was a problem hiding this comment.
Tip
💡 缺少安全指引:密钥文件放入代码目录有泄漏风险
note.txt 指导将密钥文件放在本目录并使用固定文件名,容易被误提交到仓库或打包进镜像,导致凭据泄漏。
建议: 补充安全指引:通过环境变量/Secret Manager/挂载方式提供凭据并确保被 .gitignore 忽略;推荐使用 ADC 或工作负载身份。
No description provided.