JoeLin2333

JoeLin2333

Highlights

LeetCUDA LeetCUDA Public

Forked from xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda
lectures lectures Public

Forked from gpu-mode/lectures

Material for gpu-mode lectures

Jupyter Notebook
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
FlashSparse FlashSparse Public

Forked from JinliangShi/FlashSparse

FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by…

Cuda
DTC-SpMM_ASPLOS24 DTC-SpMM_ASPLOS24 Public

Forked from HPMLL/DTC-SpMM_ASPLOS24

C++