Skip to content

Introduce fused GEMM + 1-NN primitive using cuTile#2249

Draft
divyegala wants to merge 11 commits into
NVIDIA:mainfrom
divyegala:cutile-python-to-cpp
Draft

Introduce fused GEMM + 1-NN primitive using cuTile#2249
divyegala wants to merge 11 commits into
NVIDIA:mainfrom
divyegala:cutile-python-to-cpp

Conversation

@divyegala

@divyegala divyegala commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This PR adds infrastructure built on top of existing JIT LTO architecture to generate kernels using cutile-python at build time, and embed them in the C++ library to make them callable from C++.

@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@divyegala divyegala changed the title cuTile Python to CPP embedding example Introduce fused GEMM + 1-NN primitive using cuTile Jun 24, 2026
Comment thread cpp/tests/CMakeLists.txt Outdated
NAME CLUSTER_TEST
PATH cluster/kmeans.cu cluster/kmeans_balanced.cu cluster/kmeans_find_k.cu cluster/linkage.cu
cluster/connect_knn.cu cluster/spectral.cu
cluster/connect_knn.cu cluster/spectral.cu cluster/soa_unpack_trace.cu

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster/soa_unpack_trace.cu appears to be uncommitted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants