XMA (Accelerated Model Architectures)

XMA is a repository comprising of fast kernels for model training. We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs, Google TPUs and Amazon Trainiums.

Installation

git clone https://github.com/open-lm-engine/accelerated-model-architectures
cd accelerated-model-architectures
pip install .
cd ..

Layers

Layer

CUDA

Pallas

NKI

ROCm

Triton

GRU

MoE

RNN

Functional

Function

CUDA

Pallas

NKI

ROCm

Triton

bmm

continuous_count

cross_entropy

fused_linear_cross_entropy

fused_residual_add_rmsnorm

rmsnorm

pack_sequence

softmax

swiglu

swiglu_packed

unpack_sequence

Community

Join the Discord server if you are interested in LLM architecture or distributed training/inference research.