Horovod Projects .

Technology

Horovod

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet that scales model training to hundreds of GPUs with minimal code changes.

Originally developed by Uber, Horovod utilizes the ring-allreduce algorithm via NVIDIA NCCL or MPI to optimize communication between nodes. It eliminates the bottleneck of centralized parameter servers, allowing users to scale a single-GPU training script to a massive cluster by adding just a few lines of Python. Major organizations like Alibaba and Amazon use it to achieve nearly linear scaling efficiency (often exceeding 90%) across thousands of high-performance accelerators.

https://github.com/horovod/horovod
0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

Sign in to see who built these projects

No public projects found for this technology yet.