

# High bandwidth data transfer on and offchip for HEP: modeling, design and verification

Tomasz Hemperek, Hans Krüger



- Introduction
- Modeling
- Design
- Verification
- Ouput links
- Examples
  - DHP (Belle 2)
  - RD53A (ATLAS/CMS)
  - VeloPix (LHCb)



### Moore's law in HEP (pixel detectors)



| Name             | D-OMEGA lon             | LHC1                   | FE-I3                    | FE-14                   | RD53                 |
|------------------|-------------------------|------------------------|--------------------------|-------------------------|----------------------|
| Year             | 1991                    | ~1996                  | ~2005                    | ~2011                   | 2017/2019            |
| Technology Node  | 3 µm                    | 1μ                     | 250 nm                   | 130 nm                  | 65 nm                |
| Chip size        | 8.3x6.6 mm <sup>2</sup> | 8x6.35 mm <sup>2</sup> | 10.8x7.6 mm <sup>2</sup> | 10.2x19 mm <sup>2</sup> | 20x22mm <sup>2</sup> |
| Pixel size       | 75x500 μm²              | 50x500 μm²             | 50x400 μm²               | 50x250 μm²              | 50x50 μm²            |
| Pixel array      | 16x63                   | 16x127                 | 18x160                   | 80x336                  | ~400x400             |
| Transistor count | ???                     | 800k                   | 3.5M                     | 80M                     | >500M                |

hemperek@uni-bonn.de

# **Memory Density**

|             | $\bigcirc$ |
|-------------|------------|
| UNIVERSITÄT | BONN       |

| Technology node                       | 130nm         | 65nm          | 28nm           |
|---------------------------------------|---------------|---------------|----------------|
| 6T SRAM cell (um <sup>2</sup> )       | 2.4           | 0.52          | 0.127          |
| bit size in memory (um <sup>2</sup> ) | ~ 3.2         | ~ 0.7         | ~ 0.16         |
| 10bit words in 100x100um              | ~ 310         | ~ 1430        | ~ 6250         |
| NRI/MPW* (mm²)                        | \$1000-\$2000 | \$3000-\$4000 | \$8000-\$10000 |

### **Issues in pixel detector readout in HEP**



- Time stamping (40MHz)
- Hit rate (<3GHz/cm<sup>2</sup>)
- Trigger rate (up to 4MHz)
- Wait time for trigger (up to 35us)
- Cooling (+40 to -30C)
- Cables (up to 6m to DAQ)
- Power delivery/support mass
- Data Rates (>2Gbits/s/cm<sup>2</sup>)
- Resolution (<15um)
- Radiation > 5MGry
- SEU/SET





**Radiation levels:** 

- at 5 cm : ~15 MGy (2•10<sup>16</sup> n<sub>eq</sub>/cm<sup>2</sup>)
- at 25cm : ~1 MGy (10<sup>15</sup> n<sub>eq</sub>/cm<sup>2</sup>)

\* estimates for 10years of operations

### **Hybrid Pixel evolution in HEP**

UNIVERSITÄT BONN





#### hemperek@uni-bonn.de

### Switch to big "D", little "A"

Audio Video BB RF ΤV M1 M4 Digital M2 **M**3 Analog BT LCD Controls Image WiFi KPD DSP Digital Processor Analog USB PMU Application Comm. **Functions** Mixed SIM Processor Processor ExM Memory PLI **Traditional Mixed-signal Design** 

Physical hierarchy separates digital and analog Modern Mixed-signal Design Digital and analog distributed throughout design

Same pattern for HEP

UNIVERSITÄT BONN

hemperek@uni-bonn.de

### **Evolution of array organization**





### **Traditional Design:**

- design 1 pixel
- step and repeat identical copies
- custom made digital

ex. FE-I3 (250nm)

#### More Recent:

- design few-pixel region
- step and repeat identical copies
- synthesized digital

ex. FE-I4 (130nm)

#### **Recent:**

 synthesized entire design with analog IP in a hierarchical way

ex. RD53A (65nm)

hemperek@uni-bonn.de

# **Modeling - Introduction**



### **Questions:**

- How to portion pixel array (buffering and readout)?
- How wide buses?
- How fast clocks (domain crossing)?
- What data format/encoding?
- How to compress?
- How big the FIFOs and how many?
- What data processing?
- How much time to send and trigger?

UNIVERSITÄT BONI

## Modeling – python (C++...)

UNIVERSITÄT BONN





https://gist.github.com/themperek/31720b7a186618b17f489a3ad504638c



Core-Column wait time, hit-rate=1Greg/s/cm2, trigger=4MHz



Simple and good for exploration. Use UVM?

#### hemperek@uni-bonn.de

### **Digital Flow – from digital perspective**

UNIVERSITÄT BONN



# The design (RD53A)







Top pad row (debug)

- Fully digital with analog IP
- Hierarchical
- Fully automated
- 1 day to resin and verify the whole chip

# **Verification Introduction**

- Verification takes more time then design.
- It has to start before/together with the design.
- Failing a \$1M chip (65nm) is not a good idea.
- No way out for complex digital chips.



Verification Plan -> Most important part (trash in trash out)

UNIVERSITÄT BON

hemperek@uni-bonn.de

### **Universal Verification Methodology (UVM)**



TEST

SCENARIO

TESTBENCH

TOP

MODULE



UVM

hemperek@uni-bonn.de

Formall verification ...

Very complex industrial standard

https://arxiv.org/pdf/1408.3232.pdf

**Example for RD53A:** 

## **Verification – DAQ Integration**



hemperek@uni-bonn.de

# Work organization/procedures



- Branching for new features
- Poll request review on request
- Features and bugs discussions via issues system (not email)
- Continuous integration
- Agile development vs waterfall

| master ~ RD53    | Q Find File       |
|------------------|-------------------|
| Name             | Last Update       |
| doc doc          | 8 months ago      |
| scripts          | 15 days ago       |
| 🖿 sim            | 9 clays ago       |
| arc a            | 10 days ago       |
| 🕅 "gitignore     | 3 months ago      |
| 🗟 .gitlab-ci.yml | about a month ago |
| CONTRIBUTING.md  | 5 montha ago      |
| README.md        | 3 months ago      |



UNIVERSITÄT BON

### **Output Links**



Limitation is often cables no the transmitter.

hemperek@uni-bonn.de

## **Output Links – Line encoding**



### **Considerations:**

- Run length
- DC balance
- Hamming distance
- Support by DAQ
- Framing/Streaming
- Cables

### **Examples:**

**8b/10b**:Ethernet, Fibre Channel, high-speed video applications

Ex. implemetation: Aurora 8b10b from Xilinx

**64b66b:** SONET and SDH telecommunication Ex. implementation: Aurora 64b66b from Xilinx





#### hemperek@uni-bonn.de





hemperek@uni-bonn.de

### **Output Links – Line Driver**

UNIVERSITÄT BONN

M4

₹R<sub>U</sub>

M2

IN -

OUT-







 $V_{DD}$ 

Ru

RD

M1

IN +

OUT+





Power: ~3-4mA Speed: ~1Gbit/s

Power: ~20-30mA Speed: >10Gbit/s

Power: lower then CML Speed: >10Gbit/s

### **Pre-emphasis**

UNIVERSITÄT BONN



hemperek@uni-bonn.de



# **EXAMPLES**

hemperek@uni-bonn.de

### **DHP - BELLE 2 Pixel Module**







### **Examples – DHP**

UNIVERSITÄT BONN



### **DHP - Modeling/Verification**

# UNIVERSITÄT BONN



hemperek@uni-bonn.de



### **Preemphasis Off**



### **Preemphasis On**



### 20m cables

### ATLAS @ LHC





hemperek@uni-bonn.de

## **RD53A – ATLAS/CMS Prototype**



hemperek@uni-bonn.de

# 50µm X 50µm Pixel floorplan

UNIVERSITÄT BONN

1) 50% Analog Front End (AFE) 50% Digital cells



2) The pixel matrix is built up of 8 x 8 pixel cores  $\rightarrow$  16 analog islands (quads) embedded in a flat digital synthesized sea



3) A pixel core can be simulated at transistor level with analog simulator
4) All cores (for each FE flavour) are identical → Hierarchical verifications

## **Pixel array logic organization**



### basic layout unit: 8x8 digital Pixel Core → synthesized as one digital circuit



- One Pixel Core contains multiple Pixel Regions (PR) and some additional arbitration and clock logic
- Pixel Regions share most of logic and trigger latency buffering

#### **Distributed Buffering Architecture (DBA)**:

- Distributed TOT storage (in pixel)
- Integrated with Lin and Diff FE



- Centralized TOT storage (in region)
- Integrated with Sync FE (Fast ToT)

### **RD53A - Centralized Buffer Architecture - 2x8**





hemperek@uni-bonn.de

### **RD53A – Ouput Link**

UNIVERSITÄT BONN



#### Configurable 3-tap pre-emphasis filter:



Cable limitedFiber -> radiation

# The Low Power GBTX (LpGBTX)

- Low Power Dissipation and Small Footprint:
  - Target: 500 mW
- Bandwidth:
  - Low-Power mode
    - 2.56 Gb/s for the optical down link
    - 5.12 Gb/s for the optical up link
  - High-Speed mode:
    - 2.56 Gb/s for the optical down link
    - 10.24 Gb/s for the optical up link



UNIVERSITÄT BONN

### **VeloPix - LHCb**







- Vertex detector surrounding collision region
  - In vacuum
  - Close to the beam: 5.1 mm
- From silicon strips to pixels
- New R/O chip VeloPix, derived from Timepix3
- In total 624 ASICs, ~41 Mpixels
- Trigger-less readout (~2.9 Tbits/s)



### **Examples – VeloPix**



- Pixel matrix:
  - 256 x 256 pixels
  - 128 x 64 super pixels (2x4 pixels each)
  - @40MHz
- Packet-based architecture:
  - 8 pixels/packet + 9 bit time stamp → 30% reduction in data rate
- Data-driven readout:
  - 20 Mpackets/s/double column
- 40, 80, 160 and 320 MHz TMR clock domains in the periphery
- 1 to 4 configurable serializers (GWT)
- Similar to the GBT frame



hemperek@uni-bonn.de

### **Periphery data path**





### VeloPix - GWT

UNIVERSITÄT BONN



hemperek@uni-bonn.de

## **Example possibilities**





- Integrate storage and data processing it single pixels:
  - pattern recognition
  - histogramming (in pixel spectral analysis)
  - conversion to photons -> compression
  - clustering and subpixel counting (COG)
  - infinite\* dynamic range

- Move from digitally assisted analog design to analog assisted digital
- Chips are more complex with lot of memory and data processing
- New tools for design
- Different type of verification (mistakes are very expensive)
- High speed serial communication
- Lot of opportunities in exploring small feature size

UNIVERSITÄT

hemperek@uni-bonn.de





### **Design Flow – from analog perspective**

UNIVERSITÄT BONN



### SAR ADC IN 65nm - Layout





Only external sample signal needed!



Layout is not area optimal Possible de-cup under DAC?

### **Pre-emphasis**

UNIVERSITÄT BONN



hemperek@uni-bonn.de

### **CML Cable Driver Implementation (RD53A)**



Configurable 3-tap pre-emphasis filter



TAP configuration

• INV\_TAP[2:1]

• EN\_TAP[2:1]

CML output configuration

- EN
- TAP0\_BIAS[9:0]
- TAP1\_BIAS[9:0]
- TAP2\_BIAS[9:0]

### **CML Cable Driver Implementation (RD53A)**







- pre- and post-tap active
- DEL\_POST= 3, DEL\_PRE = 0)
- IN\_MAIN bias =[3, 4, 5 mA)

### Modeling – UVM/Verilog







A lot of work. Can be reused for verification.

http://ieeexplore.ieee.org/document/8069646/

hemperek@uni-bonn.de

### **Verification Plan**



Most important part.

hemperek@uni-bonn.de

### Waterfall vs Agile



Agile development interpreted in the waterfall model

### Changing and unclear specifications?

hemperek@uni-bonn.de

### **Verification - Formal**





// SystemVerilog Assertion

property p\_arb; @(posedge clk) req |=> ##[0:2] gnt; endproperty assert property (p\_arb);

Single Event Upsets ...

### **Design Flow**



- Design in "big A small D" methodology
- Blocks designed and verified individually
- Full chip digital and mixed-signal verification
- Work synchronization with integrated Revision Control System
- Big chip = many difficulties with software and PDK!



| Surface Level Decise  | Functional Design and Verification                           |                        |  |  |
|-----------------------|--------------------------------------------------------------|------------------------|--|--|
| system-Level Disign   | Chip Planning                                                |                        |  |  |
|                       | RTL Design and<br>Verification                               | Design and<br>Analysis |  |  |
| Block Jowel Design    | Synthesis and<br>Verification                                | Circuit<br>Simulation  |  |  |
| BIOLA-Level Design    | Place and<br>Route                                           | Custom Layout          |  |  |
|                       | DRC, LVS, RCX                                                |                        |  |  |
| Chip Assembly         | Chip Assembly                                                |                        |  |  |
| Physical Verification | Full Chip Physical Verification,<br>Extraction, and Analysis |                        |  |  |
| System Verification   | Full Chip System-Level Verification<br>Analog, Digital, RF   |                        |  |  |