-
Notifications
You must be signed in to change notification settings - Fork 696
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Tune max segment length per cta in triton table batched embeddings, and expose the param via cli
cla signed
fb-exported
meta-exported
#5270
opened Dec 22, 2025 by
OmarPavel
Loading…
Replace .data_ptr with .mutable_data_ptr or .const_data_ptr
cla signed
#5267
opened Dec 20, 2025 by
cyyever
Loading…
Optimizations for index_select_scalar_cumsum_kernel on ROCm
cla signed
module: rocm
#5263
opened Dec 18, 2025 by
amd-wsung102
Loading…
Refactor TBE benchmark reporter to use structured data config
cla signed
fb-exported
meta-exported
#5260
opened Dec 18, 2025 by
gchalump
Loading…
Fix blackwell CUTLASS attention meta registration + actually test compile
cla signed
fb-exported
meta-exported
#5259
opened Dec 18, 2025 by
jbschlosser
Loading…
Optimize benchmark index generation with std::sample()
cla signed
fb-exported
meta-exported
#5254
opened Dec 17, 2025 by
terdogan
Loading…
Remove unused dedup_map and associated includes from benchmarks
cla signed
fb-exported
meta-exported
#5253
opened Dec 17, 2025 by
terdogan
Loading…
Move the prefetched info to preallocated buffers
cla signed
fb-exported
meta-exported
#5251
opened Dec 17, 2025 by
chouxi
Loading…
Enable direct MX4→BF16 dequantization to reduce memory (python side) (2/2)
cla signed
fb-exported
meta-exported
#5250
opened Dec 17, 2025 by
armandsauzay
Loading…
Add aarch64 intrinsic-based dequantization to autovec routine
cla signed
fb-exported
meta-exported
#5249
opened Dec 17, 2025 by
Nicoshev
Loading…
Choose _autovec version of GenerateEmbeddingSpMDMRowWiseSparse on AArch64
cla signed
fb-exported
meta-exported
#5247
opened Dec 17, 2025 by
MatzeB
Loading…
Specialize more cases to improve EmbeddingSpMDMNBitBenchmark
cla signed
fb-exported
meta-exported
#5245
opened Dec 17, 2025 by
MatzeB
Loading…
Add EmbeddingSpMDMNBitRowWiseSparse autovectorized variant
cla signed
fb-exported
meta-exported
#5244
opened Dec 17, 2025 by
MatzeB
Loading…
Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions
cla signed
module: rocm
#5233
opened Dec 16, 2025 by
aryaman-gupta
Loading…
support object cache in ssd l2 cache and add more unit tests
cla signed
fb-exported
meta-exported
#5228
opened Dec 16, 2025 by
zhaojuanmao
Loading…
Optimizing 4-bit dequant to FP32 on AArch64 using vectorized intrinsics in EmbeddingSpMDMAutovec
cla signed
#5224
opened Dec 15, 2025 by
marma01
Loading…
Update heuristic to support variant batch sizes
cla signed
fb-exported
meta-exported
#5211
opened Dec 10, 2025 by
zjing14
Loading…
Use H100 runners for OSS CI
cla signed
fb-exported
meta-exported
#5205
opened Dec 9, 2025 by
q10
Loading…
Modifying clear_all_staged_data to accomadate KV Tensor Deletion
cla signed
fb-exported
meta-exported
#5202
opened Dec 9, 2025 by
Raahul46
Loading…
creating delete_rocksdb_checkpoint_dir function under KV Tensor
cla signed
fb-exported
meta-exported
#5201
opened Dec 9, 2025 by
Raahul46
Loading…
Adding returnKVTensorMetaData flag to Staging Read Strategy
cla signed
fb-exported
meta-exported
#5200
opened Dec 9, 2025 by
Raahul46
Loading…
Fix jagged_to_padded_dense autograd
cla signed
fb-exported
meta-exported
#5191
opened Dec 8, 2025 by
yunjiangster
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-11-22.