Fast LLM speculative inference server for consumer hardware.
spark kernel cuda cuda-kernels luce poolside rtx3090 llama-cpp local-ai qwen speculative-decoding dflash megakernel speculative-prefill pflash lucebox
-
Updated
Jul 1, 2026 - C++