pflash

Here are 2 public repositories matching this topic...

Luce-Org / lucebox-hub

Fast LLM speculative inference server for consumer hardware.

spark kernel cuda cuda-kernels luce poolside rtx3090 llama-cpp local-ai qwen speculative-decoding dflash megakernel speculative-prefill pflash lucebox

Updated Jul 1, 2026
C++

Tom1tk / mtp-pflash-turboquant-hip

Star

MTP + PFlash speculative prefill + TurboQuant KV cache compression for llama.cpp — HIP/ROCm port for AMD RDNA3 (optimised for RX 7900 XTX 24gb/gfx1100)

hip mtp rocm llama-cpp pflash speculative-decode

Updated Jun 5, 2026
C++

Improve this page

Add a description, image, and links to the pflash topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pflash topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly