# **FINN: A Framework for Fast, Scalable Binarized Neural** Network Inference on Reconfigurable Logic

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre and Kees Vissers

## **Binarized Neural Networks (BNNs)**

- Almost all arithmetic is performed using two values: {-1, +1}
- Trained via backprop on GPU, weights constrained during training
- Convolutional, fully-connected, pooling and batchnorm layers
- Competitive accuracy for image classification tasks

# FPGA Potential Performance on BNNs

- Multiplications  $\rightarrow$  XNOR, additions  $\rightarrow$  popcount
- FPGA peak for binary ops is *much* higher than FP32 or INT8
  - ZU19EG: 66 TOPS binary, 4 TOPS INT8, 0.3 TOPS FP32
- Keeping all weights on-chip greatly increases arithmetic intensity
  - Avoid power and performance cost of most off-chip accesses



| MNIST                     | 99%       | 99%       |
|---------------------------|-----------|-----------|
| SVHN                      | 98%       | 97%       |
| CIFAR-10                  | 92%       | 90%       |
| ImageNet (AlexNet arch)   | 80% top-5 | 69% top-5 |
| ImageNet (ResNet-18 arch) | 89% top-5 | 73% top-5 |
| ImageNet (GoogLeNet arch) | 90% top-5 | 86% top-5 |
| ImageNet (DoReFa-Net)     | 56% top-1 | 50% top-1 |

Generating BNN Inference Accelerators with FINN

Compute Arrays: SIMD & Multi-core



• Build architecture for topology instead of compiling for fixed arch

• Streaming architecture generated by Vivado HLS

Top Level: Heterogeneous & Streaming

- Compute resources heterogeneously allocated per-layer...
  - ...to balance the streaming pipeline (big & small layers)
  - ...to meet the user-specified FPS requirement (avoid waste)

# Experimental Evaluation on ZC706

### **BNN Topologies & Scenarios**

#### • Three BNN topologies:

3

- SFC fully-connected, 95.8% on MNIST
- LFC fully-connected, 98.4% on MNIST
- CNV VGG16-like convolutional
  - 80.1% on CIFAR-10, 94.9% on SVHN

#### Name Thr.put Latency LUT BRAM $P_{\rm chip}$ (FPS)(W)(s)SFC-max 12361 k 0.31 91131 7.34.5LFC-max $1561 \mathrm{k}$ 2.4482988 3968.8CNV-max 21.9 k 283462531863.6SFC-fix $12.2 \mathrm{k}$ 24051550.416

#### **Achieved Performance vs Roofline**



Ops:Byte

#### **Key Metrics**

114.5

152.5

- Two use-case scenarios:
  - max : maximum FPS (e.g. datacenter)
  - fix : 9000 FPS (e.g. embedded)
- Up to 12.3 million MNIST images per sec

5636

29274

• 11.6 of 19.7 TOPS (68% of peak) • Up to 12.2 thousand CIFAR-10 images per sec

0.8

2.3

 $P_{\mathrm{wall}}$ 

(W)

21.2

22.6

11.7

8.1

7.9

10



- Even mid-range FPGAs can perform trillions of binary operations per second, which can be harnessed for BNN inference
- Unprecedented image classification rates at <25 W power and <1 ms latency for MNIST and CIFAR-10 datasets
- Future work will focus on larger topologies (ImageNet), mixed precision and supporting off-chip parameters

 $12.2 \mathrm{k}$ 

11.6 k

282

550

LFC-fix

CNV-fix



