GPU Performance (Data Sheets) Quick Reference (2023)
2023-10-25 08:0:0 Author: arthurchiao.github.io(查看原文) 阅读量:7 收藏

Published at 2023-10-25 | Last Update 2023-10-25

This post provides a concise reference for the performance of popular GPU models from NVIDIA and Huawei/HiSilicon, primarily intended for personal use.



Naming convention of NVIDIA GPUs

The first letter in GPU model names denote their GPU architectures, with:

  1. T for Turing;
  2. A for Ampere;
  3. V for Volta;
  4. H for Hopper; 2022
  5. L for Ada Lovelace;
  T4 A10 A10G A30 V100 PCIe/SMX2
Designed for Data center workloads (Desktop) Graphics-intensive workloads Desktop Desktop Data center
Year 2018 2020     2017
Manufacturing 12nm 12nm 12nm    
Architecture Turing Ampere Ampere Ampere Volta
Max Power 70 watts 150 watts   165 watts 250/300watts
GPU Mem 16GB GDDR6 24GB GDDR6 48GB GDDR6 24GB HBM2 16/32GB HBM2
GPU Mem BW 400 GB/s 600 GB/s   933GB/s 900 GB/s
Interconnect PCIe Gen3 32GB/s PCIe Gen4 66 GB/s   PCIe Gen4 64GB/s, NVLINK 200GB/s PCIe Gen3 32GB/s, NVLINK 300GB/s
FP32 8.1 TFLOPS 31.2 TFLOPS   10.3TFLOPS 14/15.7 TFLOPS
BFLOAT16 TensorCore   125 TFLOPS   165 TFLOPS  
FP16 TensorCore   125 TFLOPS   165 TFLOPS  
INT8 TensorCore   250 TFLOPS   330 TOPS  
INT4 TensorCore       661 TOPS  

Datasheets:

  1. T4
  2. A10
  3. A30
  4. V100-PCIe/V100-SXM2/V100S-PCIe
  A800 (PCIe/SXM) A100 (PCIe/SXM) Huawei Ascend 910B H800 (PCIe/SXM) H100 (PCIe/SXM)
Year 2022 2020 2023 2022 2022
Manufacturing 7nm 7nm 7+nm 4nm 4nm
Architecture Ampere Ampere HUAWEI Da Vinci Hopper Hopper
Max Power 300/400 watt 300/400 watt 400 watt   350/700 watt
GPU Mem 80G HBM2e 80G HBM2e 64G HBM2e 80G HBM3 80G HBM3
GPU Mem BW   1935/2039 GB/s     2/3.35 TB/s
Interconnect NVLINK 400GB/s PCIe Gen4 64GB/s, NVLINK 600GB/s HCCS 392GB/s NVLINK 400GB/s PCIe Gen5 128GB/s, NVLINK 900GB/s
FP32   19.5 TFLOPS     51/67 TFLOPS
TF32 (TensorFloat)   156/312 TFLOPS     756/989 TFLOPS
BFLOAT16 TensorCore   156/312 TFLOPS      
FP16 TensorCore   312/624 TFLOPS 320 TFLOPS   1513/1979 TFLOPS
FP8 TensorCore NOT support NOT support     3026/3958 TFLOPS
INT8 TensorCore   624/1248 TFLOPS 640 TFLOPS   3026/3958 TFLOPS

H100 vs. A100 in one word: 3x performance, 2x price.

Datasheets:

  1. A100
  2. H100
  3. Huawei Ascend-910

文章来源: https://arthurchiao.github.io/blog/gpu-data-sheets/
如有侵权请联系:admin#unsafe.sh