XMA (Accelerated Model Architectures)¶

XMA is a repository comprising of fast kernels for model training. We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs, Google TPUs and Amazon Trainiums.

Installation¶

git clone https://github.com/open-lm-engine/accelerated-model-architectures
cd accelerated-model-architectures
pip install .
cd ..

Layers¶

Layer	CUDA	Pallas	NKI	ROCm	Triton
GRU	❌	❌	❌	❌	✅
MoE	✅	❌	❌	❌	✅
RNN	❌	❌	❌	❌	✅

Functional¶

Function	CUDA	Pallas	NKI	ROCm	Triton
bmm	❌	❌	❌	❌	✅
continuous_count	✅	❌	❌	❌	❌
cross_entropy	❌	❌	❌	❌	✅
fused_linear_cross_entropy	❌	❌	❌	❌	✅
fused_residual_add_rmsnorm	❌	❌	❌	❌	✅
rmsnorm	❌	❌	❌	❌	✅
pack_sequence	✅	❌	❌	❌	✅
softmax	❌	❌	❌	❌	✅
swiglu	✅	✅	✅	❌	✅
swiglu_packed	✅	✅	✅	❌	✅
unpack_sequence	✅	❌	❌	❌	✅

Community¶

Join the Discord server if you are interested in LLM architecture or distributed training/inference research.