Skip to content

prothegee/server-backend-audio_transcriber-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

backend audio transcriber go

a real-time audio transcriber server in golang using audio ml model & leverage protocol buffer with grpc.


TL;DR

It all begin when I watch coworker start live streaming in one of the biggest social media platform,
afer a few minutes, they got warning violation notification.

As far as they know, they believed that they don't violate their community rules,
since they also believed their video presentation are appropriate.

After couple attempts, the violation appear again, but now we realized that violation notification appear after they says something.

quick preview



idea flow:

  1. client/end-user send the audio -> server check & processing the audio -> send response

  2. if the audio process contain forbidden keywords -> do something (warn, err, etc.)

this looks simple, but you know that when we configuring, build, and test our software it required something else in the process



prequiste



important

  1. run and tested in:

    • go version go1.25.5 X:nodwarf5 linux/amd64
    • base model en is provided through git-lfs, see ./assets/models/ggml-base.en.bin
  2. if config.audio.json & config.grpc.json doesn't exists, copy and paste those file from .json.template to .json

  3. you need to build and expose the library from whisper and install the model:

    • after the installation, you need to export the include path and include lib directory
    • you may required to export you LD_LIBRARY_PATH if you use custom path, i.e.:
    # when you define `-DCMAKE_INSTALL_PREFIX=~/` when build whisper library
    # this will add bin, lib, include, share dir to the home directory after build and install `cmake --install build/path
    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/lib";
    export C_INCLUDE_PATH="$C_INCLUDE_PATH:$HOME/include"
  4. use drun-audio_client.sh to run the client & use drun-grpc_server.sh for the grpc server

  5. use proper model and check the forbidden keywords, a base english model from ggml is still capable to detect specific keyword, you also may adjust this as you need, see whisper model field



build

  1. local system environment/your pc:*
    1. build and expose/install whisper library
    2. you can run ./dbuild.sh to build this project

--

  1. docker/podman:
    1. build image container is provided, you can run:
      • ./build-image-docker.sh for docker, or
      • ./build-image-podman.sh for podman
    2. to run build image in container si provided, you can run:
      • ./run-container-docker.sh for docker, or
      • ./run-container-podman.sh for podman
    3. if you want to change the model you can substitue the string of ggml-base.en.bin from Dockerfile and create your own custom build/deployment

when all goes well, you should see something from the log as below

...
YYYY/MM/DD 14:23:43 server running on 0.0.0.0:20202
...

that log came from docker logs -f server-backend-audio_transcriber-go or podman logs -f server-backend-audio_transcriber-go



overview

key features

  • real-time design
  • modular structure
  • informative logging
  • active buffer checking
  • keywords awareness check
  • seperate goroutine for send/receive


trade-off

  1. concurrency vs thread safety:

    • paralellism use worker pool
    • since it using whisper library it's not thread-safe using mutex approach
  2. latency vs transcription accuracy:

    • both audio_client & grpc_server are collecting ~1 second audio before sent & processed
    • better throughput rather than latency
  3. resposiveness vs data lost:

    • audio chanel and request use limited buffer:
      • preferably drop rather than block
      • responsive on high load but audio burst could make grpc server slowdown


stress test

below is a 2 hour stress test using 12 threads cpu and 32GB of RAM

fig1 fig2 fig3



extra

if you had any better options/approach, I would love to read/see that - @prothegee



end of readme

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published