backend audio transcriber go

a real-time audio transcriber server in golang using audio ml model & leverage protocol buffer with grpc.

TL;DR

It all begin when I watch coworker start live streaming in one of the biggest social media platform,
afer a few minutes, they got warning violation notification.

As far as they know, they believed that they don't violate their community rules,
since they also believed their video presentation are appropriate.

After couple attempts, the violation appear again, but now we realized that violation notification appear after they says something.

quick preview

idea flow:

client/end-user send the audio -> server check & processing the audio -> send response
if the audio process contain forbidden keywords -> do something (warn, err, etc.)

this looks simple, but you know that when we configuring, build, and test our software it required something else in the process

prequiste

go core compiler tools:
- archlinux pacakge
- go dev install doc
protoc:
- debian package
- archlinux pacakge
protoc-gen-go:
- go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
protoc-gen-go-grpc:
- go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

important

run and tested in:
- go version go1.25.5 X:nodwarf5 linux/amd64
- base model en is provided through git-lfs, see ./assets/models/ggml-base.en.bin
if config.audio.json & config.grpc.json doesn't exists, copy and paste those file from .json.template to .json

you need to build and expose the library from whisper and install the model:

after the installation, you need to export the include path and include lib directory
you may required to export you LD_LIBRARY_PATH if you use custom path, i.e.:

# when you define `-DCMAKE_INSTALL_PREFIX=~/` when build whisper library
# this will add bin, lib, include, share dir to the home directory after build and install `cmake --install build/path
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/lib";
export C_INCLUDE_PATH="$C_INCLUDE_PATH:$HOME/include"

use drun-audio_client.sh to run the client & use drun-grpc_server.sh for the grpc server
use proper model and check the forbidden keywords, a base english model from ggml is still capable to detect specific keyword, you also may adjust this as you need, see whisper model field

build

local system environment/your pc:*
1. build and expose/install whisper library
2. you can run ./dbuild.sh to build this project

--

docker/podman:
1. build image container is provided, you can run:
  - ./build-image-docker.sh for docker, or
  - ./build-image-podman.sh for podman
2. to run build image in container si provided, you can run:
  - ./run-container-docker.sh for docker, or
  - ./run-container-podman.sh for podman
3. if you want to change the model you can substitue the string of ggml-base.en.bin from Dockerfile and create your own custom build/deployment

when all goes well, you should see something from the log as below

...
YYYY/MM/DD 14:23:43 server running on 0.0.0.0:20202
...

that log came from docker logs -f server-backend-audio_transcriber-go or podman logs -f server-backend-audio_transcriber-go

overview

key features

real-time design
modular structure
informative logging
active buffer checking
keywords awareness check
seperate goroutine for send/receive

trade-off

concurrency vs thread safety:
- paralellism use worker pool
- since it using whisper library it's not thread-safe using mutex approach
latency vs transcription accuracy:
- both audio_client & grpc_server are collecting ~1 second audio before sent & processed
- better throughput rather than latency
resposiveness vs data lost:
- audio chanel and request use limited buffer:
  - preferably drop rather than block
  - responsive on high load but audio burst could make grpc server slowdown

stress test

below is a 2 hour stress test using 12 threads cpu and 32GB of RAM

extra

if you had any better options/approach, I would love to read/see that - @prothegee

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
cmd		cmd
docs/img		docs/img
pkg		pkg
protobuf		protobuf
tests/unit_test		tests/unit_test
.dockerignore		.dockerignore
.gitignore		.gitignore
.nvimignore		.nvimignore
Dockerfile		Dockerfile
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build-image-docker.sh		build-image-docker.sh
build-image-podman.sh		build-image-podman.sh
config.audio.json.template		config.audio.json.template
config.grpc.json.template		config.grpc.json.template
dbuild.sh		dbuild.sh
drun-audio_client.sh		drun-audio_client.sh
drun-grpc_server.sh		drun-grpc_server.sh
dtest.sh		dtest.sh
gen-protobuf.sh		gen-protobuf.sh
go.mod		go.mod
go.sum		go.sum
run-container-docker.sh		run-container-docker.sh
run-container-podman.sh		run-container-podman.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

backend audio transcriber go

prequiste

important

build

overview

key features

trade-off

stress test

extra

end of readme

About

Uh oh!

Releases

Packages

Languages

License

prothegee/server-backend-audio_transcriber-go

Folders and files

Latest commit

History

Repository files navigation

backend audio transcriber go

prequiste

important

build

overview

key features

trade-off

stress test

extra

end of readme

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages