Technology

Transformer Engine

A specialized library for accelerating Transformer models on NVIDIA GPUs using FP8 precision and graph-level optimizations.

Transformer Engine (TE) is an open-source library designed to maximize throughput on Hopper and Blackwell architectures. It uses a custom C++ backend and Python API to automate mixed-precision training: specifically the transition between FP8, BF16, and FP32. By integrating directly with frameworks like PyTorch and JAX, TE handles the heavy lifting of tensor scaling and high-performance kernels (such as fused LayerNorm and Attention). This integration allows developers to reduce memory pressure and increase compute utilization without manual tuning of numerical stability.

https://github.com/NVIDIA/TransformerEngine

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.