-
Notifications
You must be signed in to change notification settings - Fork 15k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
llama: default to 4096 context size for cpu-only build
#19711
opened Feb 18, 2026 by
taronaeo
Loading…
powerpc: add FP16 MMA path for Q4/Q8 matmul
ggml
changes relating to the ggml tensor library for machine learning
#19709
opened Feb 18, 2026 by
shalinib-ibm
Loading…
Q5_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2)
ggml
changes relating to the ggml tensor library for machine learning
#19707
opened Feb 18, 2026 by
Manogna-Sree
Loading…
Q6_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2)
ggml
changes relating to the ggml tensor library for machine learning
#19706
opened Feb 18, 2026 by
Manogna-Sree
Loading…
common : fix gpt-oss Jinja error with content and thinking on tool-call messages
#19704
opened Feb 18, 2026 by
abhijitb11
Loading…
ggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support.
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
#19700
opened Feb 18, 2026 by
yomaytk
Loading…
New option GGML_CUDA_FORCE_CUBLAS_COMPUTE_32F to use fp32 as compute type in cuBLAS
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#19697
opened Feb 17, 2026 by
wallentri88
Loading…
test(server): add multi-image and no-image vision API tests
examples
python
python script changes
server
#19691
opened Feb 17, 2026 by
jorgeutd
Loading…
3 tasks done
model : Add tokenizer from LFM2.5-Audio-1.5B
model
Model specific
python
python script changes
#19687
opened Feb 17, 2026 by
tdakhran
Loading…
CUDA: fix kernel selection logic for tile FA
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#19686
opened Feb 17, 2026 by
JohannesGaessler
Loading…
Add Pylint workflow for Python code analysis
devops
improvements to build systems and github actions
#19671
opened Feb 16, 2026 by
kerrrang9214-tech
•
Draft
Add Kimi Linear to unified delta net
model
Model specific
#19668
opened Feb 16, 2026 by
ymcki
Loading…
llama : use output_resolve_row() in get_logits_ith/get_embeddings_ith
#19663
opened Feb 16, 2026 by
danbev
Loading…
models : dedup qwen35 graphs
model
Model specific
#19660
opened Feb 16, 2026 by
ggerganov
Loading…
1 of 2 tasks
avx2: compute ksigns instead of loading from table
ggml
changes relating to the ggml tensor library for machine learning
#19657
opened Feb 16, 2026 by
dfriehs
Loading…
common : fix Step-3.5-Flash format detection and thinking support
testing
Everything test related
#19635
opened Feb 15, 2026 by
jesseposner
Loading…
Vulkan Scalar Flash Attention Refactor
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#19625
opened Feb 14, 2026 by
0cc4m
Loading…
fix: GLM 4.5 streaming tool-call parsing + grammar error handling
examples
server
testing
Everything test related
#19612
opened Feb 14, 2026 by
Gunther-Schulz
Loading…
llama : add group feature to split-mode to minimize GPU spread for running a model
examples
server
#19608
opened Feb 13, 2026 by
dan-and
Loading…
metal: use mul_mv_ext for large n on non-simdgroup_mm GPUs
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#19600
opened Feb 13, 2026 by
ai-janitor
Loading…
3 of 4 tasks
mtmd : chat : Fix extra \n between text and media marker
examples
server
#19595
opened Feb 13, 2026 by
tdakhran
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.