XMA (Accelerated Model Architectures)¶
XMA is a repository comprising of fast kernels for model training. We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs, Google TPUs and Amazon Trainiums.
Installation¶
git clone https://github.com/open-lm-engine/accelerated-model-architectures
cd accelerated-model-architectures
pip install .
cd ..
Layers¶
Layer |
CUDA |
Pallas |
NKI |
ROCm |
Triton |
|---|---|---|---|---|---|
GRU |
❌ |
❌ |
❌ |
❌ |
✅ |
MoE |
✅ |
❌ |
❌ |
❌ |
✅ |
RNN |
❌ |
❌ |
❌ |
❌ |
✅ |
Functional¶
Function |
CUDA |
Pallas |
NKI |
ROCm |
Triton |
|---|---|---|---|---|---|
bmm |
❌ |
❌ |
❌ |
❌ |
✅ |
continuous_count |
✅ |
❌ |
❌ |
❌ |
❌ |
cross_entropy |
❌ |
❌ |
❌ |
❌ |
✅ |
fused_linear_cross_entropy |
❌ |
❌ |
❌ |
❌ |
✅ |
fused_residual_add_rmsnorm |
❌ |
❌ |
❌ |
❌ |
✅ |
rmsnorm |
❌ |
❌ |
❌ |
❌ |
✅ |
pack_sequence |
✅ |
❌ |
❌ |
❌ |
✅ |
softmax |
❌ |
❌ |
❌ |
❌ |
✅ |
swiglu |
✅ |
✅ |
✅ |
❌ |
✅ |
swiglu_packed |
✅ |
✅ |
✅ |
❌ |
✅ |
unpack_sequence |
✅ |
❌ |
❌ |
❌ |
✅ |
Community¶
Join the Discord server if you are interested in LLM architecture or distributed training/inference research.