Image size is currently roughly 6gb. Which isn't good. As we aren't actually inferencing any ml model locally.
Most of the image size is due to torch and torchaudio installing 4-5 GBs of nvidia packages. Which can be replaced with lightweight libraries like pydub, wave etc.