Skip to content

Infinite loop: "Checking if llama-server is done initializing" #2

@sintanial

Description

@sintanial

start.sh enters an infinite loop even though llama-server already exited with an error.

Logs show the model path is None, so the model fails to load and the server exits. Despite that, the script keeps printing:

Checking if llama-server is done initializing...

Evn Variables

LLAMA_SERVER_CMD_ARGS=--ctx-size 4096 -ngl 999
LLAMA_CACHED_MODEL=bartowski/browser-use_bu-30b-a3b-preview-GGUF:Q5_K_M
LLAMA_CACHED_GGUF_PATH=browser-use_bu-30b-a3b-preview_q5_k_m.gguf
MAX_CONCURRENCY=8

Logs:

start.sh: Caching is enabled. Finding cached model path...
start.sh: Using cached model with arguments: -m None
start.sh: Stopping existing llama-server instances (if any)...
start.sh: No llama-server running
start.sh: Running /app/llama-server -m None --ctx-size 4096 -ngl 999 --port 3098
start.sh: Waiting for llama-server to start...
start.sh: Checking if llama-server is done initializing...
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 6000 Blackwell Server Edition, compute capability 12.0, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-zen4.so
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build: 8006 (4d3daf80f) with GNU 11.4.0 for Linux x86_64
system info: n_threads = 64, n_threads_batch = 64, total_threads = 128
system_info: n_threads = 64 (n_threads_batch = 64) / 128 | CUDA : ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 127 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model 'None'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
gguf_init_from_file: failed to open GGUF file 'None' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from None
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.10 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX PRO 6000 Blackwell Server Edition) (0000:83:00.0) - 96704 MiB free
gguf_init_from_file: failed to open GGUF file 'None' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from None
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'None'
srv    load_model: failed to load model, 'None'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
start.sh: Checking if llama-server is done initializing...
(loops forever)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions