- NVIDIA H100 is a high-performance GPU designed for data center and cloud applications, optimized for AI workloads
- It is based on the NVIDIA Ampere architecture, has 640 Tensor Cores and 160 SMs, and has 2.5 times more computing power than the V100 GPU
- With 1.6TB/s memory bandwidth and PCIe Gen4 interface, it can efficiently handle large-scale data processing tasks
- Advanced features include Multi-Instance GPU (MIG) technology, enhanced NVLink, and enterprise-class reliability tools
Securely accelerate workloads from enterprise to exascale
Transformational AI Training
The H100 features the fourth-generation Tensor Core and Transformer Engine with FP8 precision, enabling it to train the GPT-3 (175B) model up to four times faster than its predecessor. This advanced technology is complemented by the fourth-generation NVLink, offering 900 GB per second of GPU-to-GPU connectivity; the NDR Quantum-2 InfiniBand network, which speeds up inter-GPU communication across nodes; PCIe Gen5; and NVIDIA Magnum IO? software. Together, these elements ensure seamless scalability from small enterprise setups to extensive unified GPU clusters.
Deploying the H100 GPU at a datacenter scale unlocks exceptional performance, paving the way for the next generation of exascale high-performance computing (HPC) and tera-parameter AI, making these powerful tools accessible to all researchers.
Experience NVIDIA AI and the NVIDIA H100 on NVIDIA LAUNCHPAD
Real-time Deep Learning Inference
AI employs a diverse array of neural networks to address various business challenges. An effective AI inference accelerator must not only deliver top-tier performance but also possess the versatility to accelerate different types of networks.
The H100 strengthens NVIDIA??s leading position in inference acceleration with enhancements that boost inference speed by up to 30x and ensure the lowest latency. The fourth-generation Tensor Cores accelerate all precisions, including FP64, TF32, FP32, FP16, INT8, and the new FP8, reducing memory usage and enhancing performance while maintaining the accuracy of large language models (LLMs).