Technology
Transformer Engine
A specialized library for accelerating Transformer models on NVIDIA GPUs using FP8 precision and graph-level optimizations.
Transformer Engine (TE) is an open-source library designed to maximize throughput on Hopper and Blackwell architectures. It uses a custom C++ backend and Python API to automate mixed-precision training: specifically the transition between FP8, BF16, and FP32. By integrating directly with frameworks like PyTorch and JAX, TE handles the heavy lifting of tensor scaling and high-performance kernels (such as fused LayerNorm and Attention). This integration allows developers to reduce memory pressure and increase compute utilization without manual tuning of numerical stability.
Recent Talks & Demos
Showing 1-0 of 0