xma.layers.linear_attention.op¶
- linear_attention(query: Tensor, key: Tensor, value: Tensor, input_state: Tensor | None, attention_multiplier: float | None = None, cu_seqlens: Tensor | None = None, max_seqlen: int | None = None, CHUNK_SIZE: int = 64, use_fused_kernel_in_forward: bool | None = None, *, kernel_backend: KernelBackend | None = None) tuple[Tensor, Tensor][source]¶