google-gemma · bebechien · Feb 24, 2026 · Feb 22, 2026 · Feb 22, 2026 · Feb 22, 2026
diff --git a/Gemma/[Gemma_3]Gradio_LlamaCpp_Chatbot.ipynb b/Gemma/[Gemma_3]Gradio_LlamaCpp_Chatbot.ipynb
@@ -41,7 +41,7 @@
         "\n",
         "Author: Sitam Meur\n",
         "\n",
-        "*   GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n",
+        "*   GitHub: [github.com/sitammeur](https://github.com/sitammeur/)\n",
         "*   X: [@sitammeur](https://x.com/sitammeur)\n",
         "\n",
         "Description: Google recently released Gemma 3 QAT—the [Quantization Aware Trained (QAT) Gemma 3](https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b) checkpoints. These models maintain similar quality to half precision while using three times less memory. This notebook demonstrates creating a user-friendly chat interface for the [gemma-3-1b-it-qat](https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-gguf) text model using Llama.cpp (for inference) and Gradio (for user interface).\n",
@@ -128,7 +128,13 @@
       },
       "outputs": [],
       "source": [
-        "!pip install -q huggingface_hub scikit-build-core llama-cpp-python llama-cpp-agent gradio"
+        "# Use the official pre-built wheels for llama-cpp-python with CPU support for compatibility\n",
+        "!pip install -q \\\n",
+        "  huggingface_hub \\\n",
+        "  scikit-build-core \\\n",
+        "  llama-cpp-python==0.3.9 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu \\\n",
+        "  \"llama-cpp-agent>=0.2.25\" \\\n",
+        "  gradio==5.49.1"
       ]
     },
     {