# Design of an 8-Channel 40 GS/s 20 mW/Ch Waveform Sampling ASIC in 65 nm CMOS

Jinseo Park<sup>a,∗</sup>, Evan Angelico<sup>f</sup>, Andrew Arzac<sup>a</sup>, Davide Braga<sup>b</sup>, Ahan Datta<sup>a</sup>, Troy England<sup>b</sup>, Camden Ertley<sup>d</sup>, Farah Fahim<sup>b</sup>, Henry J. Frisch<sup>a</sup>, Mary Heintz<sup>a</sup>, Eric Oberla<sup>c</sup>, Nathaniel J. Pastika<sup>b</sup>, Hector D. Rico-Aniles<sup>e</sup>, Paul M. Rubinov<sup>b</sup>, Xiaoran Wang<sup>b</sup>, Y.M. Richmond Yeung<sup>a</sup>, Tom N. Zimmerman<sup>b</sup>

> *<sup>a</sup>Enrico Fermi Institute, the University of Chicago, 933 East 56th Street, 60637, Chicago, IL, USA <sup>b</sup>Fermi National Accelerator Laboratory, 60510, Batavia, IL, USA <sup>c</sup>Kavli Institute for Cosmological Physics, the University of Chicago, 5640 South Ellis Avenue, 60637, Chicago, IL, USA <sup>d</sup>Southwest Research Institute, 6220 Culebra Road, 78238, San Antonio, TX, USA <sup>e</sup>North Central College, 30 N. Brainard Street, 60540, Naperville, IL, USA <sup>f</sup>Stanford University, 450 Jane Stanford Way, 94305, Stanford, CA, USA*

## Abstract

1 ps timing resolution is the entry point to signature based searches relying on secondary/tertiary vertices and particle identification. We describe PSEC5, an 8-channel 40 GS/s waveform-sampling ASIC in the TSMC 65 nm process targetting 1 ps resolution at 20 mW power per channel. Each channel consists of four fast and one slow switched capacitor arrays (SCA), allowing ps time resolution combined with a long effective buffer. Each fast SCA is 1.6 ns long and has a nominal sampling rate of 40 GS/s. The slow SCA is 204.8 ns long and samples at 5 GS/s. Recording of the analog data for each channel is triggered by a fast discriminator capable of multiple triggering during the window of the slow SCA. To achieve a large dynamic range, low leakage, and high bandwidth, the SCA sampling switches are implemented as 2.5 V nMOSFETs controlled by 1.2 V shift registers. Stored analog data are digitized by an external ADC at 10 bits or better.

Specifications on operational parameters include a 4 GHz analog bandwidth and a dead time of 20 microseconds, corresponding to a 50 kHz readout rate, determined by the choice of the external ADC. PSEC5 has been submitted for fabrication.

*Keywords:* Waveform-sampling, ADC, Picosecond, ASIC, 4 GHz bandwidth, 65nm CMOS

### 1. Introduction

1 ps timing resolution is the entry point to signature based searches relying on secondary / tertiary vertices and particle identification. In addition, multiple hit capability and a long time buffer are desirable. An essential requirement for large fast electronics systems is a low power consumption per channel.

With a bandwidth of 4 GHz and a sampling rate of 40 GS/s, the maximum time resolution of an incoming pulse is predicted to be better than 1 ps. The architecture of the chip is designed to provide a high sampling rate with a long buffer as well as multi-hit capability.

The preliminary design of PSEC5 is in TSMC 65 nm process. Simulations predict a maximal power consumption during sampling to be roughly 20 mW/Ch. Each channel consists of four fast and one slow switched capacitor arrays (SCA). This initial version of the chip uses external ADCs. The external ADCs determine the readout rate; there are 1280 sampling capacitors per channel, read out serially once per event. Using a 65MHz ADC per channel, this gives roughly  $20 \mu s$  of the dead time per event.

The organization of the paper is as follows. We present the fast and slow switched capacitor array (SCA) columns in sec-

| Process               | 65 nm TSMC                         |  |
|-----------------------|------------------------------------|--|
| Signal to Noise Ratio | 1000                               |  |
| <b>Sampling Rate</b>  | $40 \text{ GS/s} (5 \text{ GS/s})$ |  |
| <b>Buffer Length</b>  | $6.4 \text{ ns}(204.8 \text{ ns})$ |  |
| Analog Bandwidth      | 4 GHz                              |  |
| Channels              |                                    |  |
| Area                  | 2.4 mm <sup>2</sup>                |  |

Table 1: PSEC5 Specifications.

tion [2.](#page-1-0) We present the source followers we put to achieve higher analog bandwidth in section [3.](#page-2-0) The layout details are elaborated on in section [4.](#page-2-1) Power consumption and source follower simulation results are shown in section [5.](#page-3-0)

The block diagram is shown in Fig [1.](#page-1-1) Each channel consists of a discriminator that generates fast triggers on the rising edges of the signal. There are four fast banks (interleaved SCAs) which are triggered either externally or by the discriminators. Each has four SCA columns which hold 16 samples each, for a total of 64 samples per fast bank. The nominal 40 GS/s of a fast bank is achieved by inter-latching the four 10 GHz SCA columns.

<sup>∗</sup>Corresponding author *Email address:* truewis@uchicago.edu (Jinseo Park )



<span id="page-1-1"></span>Figure 1: Functional Block Diagram.

## <span id="page-1-0"></span>2. Switched Capacitor Array

Switched capacitor arrays (SCA) consist of sampling capacitors which sequentially sample the voltage of the signal line.

#### *2.1. Fast and Slow SCA*

There are four Fast SCAs per channel, 64 samples each. This provides four times 1.6 ns of the sampling window, which can be configured to either run sequentially to capture multiple rising edges of the signal or run as a single longer SCA. The slow SCA is 1024 samples long (204.8 ns) and starts sampling before the fast SCAs. The fast buffers' triggered positions within the slow bank are timestamped. This architecture prevents cumulative timing errors while achieving a long sampling window.

# *2.2. Voltage Level Shifter*

To keep the sampling switch's on-resistance( $R_{on}$ ) low, we want to keep its gate voltage high. A 2.5 V I/O nMOS is used as the switch, as the *R*on of the minimal length 2.5 V I/O device is significantly less than that of the minimal length 1.2 V core device of the same width, given that the gate voltage is 2.5 V and 1.2 V respectively. Also, the input voltage range can be

restricted to the bottom half of the drain voltage range of 2.5 V I/O nMOS, obviating the use of a complementary switch. Instead, this design requires a 1.2 V to 2.5 V voltage level shifter (VLS, Fig. [2\)](#page-2-2) for each of the sampling switches. The limiting factor in voltage uncertainty is the on-to-off transition time of the sampling switch  $(t_{\text{off}})$ ; the transition time from off to on  $(t_{\text{on}})$ is not as important. We adjusted the device widths of the VLS to prioritize reducing the former and achieved 15ps (Best Case)  $-$  22ps (Worst Case) for  $t_{\text{off}}$ .



<span id="page-2-2"></span>Figure 2: Schematic of the voltage level shifter. All I/O pMOS devices (MP2, MP3, MP4, MP5) are minimum size to reduce  $t_{off}$  and power consumption.

### *2.3. Shift Register*

To achieve a nominal 40 GS/s sampling rate of the interleaved SCA, we want each fast SCA column to operate at 10 GS/s. Delivering a 10 GHz clock to a wide area of the chip, however, results in high power consumption. Instead, we used an existing design [\[1\]](#page-3-1) of a dual edge-triggered flip-flop (Fig. [3\)](#page-2-3) so that we only have to deliver a 5 GHz clock, halving the clock power consumption.



<span id="page-2-3"></span>Figure 3: Schematic of a dual edge-triggered filp flop.

# <span id="page-2-0"></span>3. Signal Paths

# *3.1. Chip Entry*

Wire bonds and electrostatic protection devices (ESD) are the limiting factors of the analog bandwidth.

We employ a capacitively-coupled input to control the DC bias of the signal input (Fig. [4\)](#page-2-4), which affects the analog bandwidth of the sampling switches. The lower the DC bias, the higher the analog bandwidth. Both the input and output source followers have limited voltage ranges, however, so we add back a DC bias of approximately 600 mV after the signal input.



<span id="page-2-4"></span>Figure 4: Equivalent Schematic of the wire bond and ESD. R1 represents 50  $\Omega$ input. C1, C3, R2, L1 are components placed on board for capacitive coupling. L2 and C2 are the parasitic inductance and the capacitance of the wire bond and ESD combined.

#### *3.2. Main Signal Path*

Because of the input inductance, delivering the signal directly to sampling capacitors results in a relatively low analog bandwidth  $\left($  <2 GHz). We used a single 5 GHz transistor source follower (Fig. [5\)](#page-2-5). This exceeds the typical 4 GHz bandwidth of the wire-bond used in the quad-flat no-leads (QFN) package. The fast SCA columns are interleaved and hence the switching noise from a column may corrupt the capacitor voltage of a different column; we use a dedicated source follower per fast SCA column to isolation the noise. The entire slow bank receives the signal from another 2 GHz source follower, since it is sampling at 5 GS/s and does not require a 4 GHz bandwidth.



<span id="page-2-5"></span>Figure 5: Input Source Follower.

#### *3.3. Readout*

Since the sampling Metal-Oxide-Metal (MOM) capacitors are small (35fF), charge leakage before readout is an issue. We placed two stages of source followers that deliver the signal from the capacitor to the chip output without charge leakage.

## <span id="page-2-1"></span>4. Physical Layout

The design has been laid out in a 2.4 mm  $\times$  1 mm rectangle, and the post-layout simulation to measure the performance is ongoing.

Both fast and slow SCA columns are laid out as  $25.5\mu$ m×200 $\mu$ m blocks (Fig. [6\)](#page-3-2) with a dedicated clock source, where each column consists of 64 samples. There are 16 slow SCA columns and 4 fast SCA columns per channel, stacked horizontally. Naturally, slow SCA columns take four times the area of the fast SCA columns.



<span id="page-3-2"></span>Figure 6: Layout of a slow (Left) and fast (Right) SCA column.

# *4.1. Clock Gating*

Unlike fast SCA columns, only one slow SCA column is active at any given time during sampling. To reduce clock power consumption, we designed a clock gate (Fig. [7\)](#page-3-3) so that a maximum of two slow SCA columns get the clock at any given time. This also doubles as a clock divider, as the slow columns operate at half the frequency of the fast columns.



<span id="page-3-3"></span>Figure 7: Schematic (Top) and Layout (Bottom) of the clock gate/divider.

# <span id="page-3-0"></span>5. Simulation Results

### *5.1. Input Voltage range*

The simulation of the complete signal path, consisting of the input source follower, the capacitor, and the output source follower is shown in Fig. [8.](#page-3-4) A rough linearity of the output is maintained for the input voltage range of 300 mV to 1.1 V. Calibration software may be used to record the curve and reconstruct the input voltage from the output.



<span id="page-3-4"></span>Figure 8: Voltage plot of the source follower chain.

### *5.2. Power consumption*

The average power consumption in various process corners is simulated and shown in Table [2.](#page-3-5) The input source follower is turned off during the readout, further reducing the power consumption below 1 mW/Ch.

|                               | <b>Worst Case</b> | <b>Best Case</b> |
|-------------------------------|-------------------|------------------|
| Input Source Follower [mW/Ch] | 9.2               | 4.0              |
| $SCA$ (Sampling) $[mW/Ch]$    | 16.6              | 13.9             |

<span id="page-3-5"></span>Table 2: Power consumption.

### *5.3. Source Followers*

Our simulations confirm that the input source followers for the fast SCA columns preserve the signal up to 4.9 GHz. The readout source followers limit voltage drop in the sampling capacitors to less than 1 mV.

### 6. Conclusion and Current Status

All of the blocks in Figure [1](#page-1-1) have been laid out. Currently, we are running post-layout simulations to estimate the voltage and time uncertainty. The design has been submitted for fabrication as of September 2024.

Plans for future work include physical verification and integration. Additionally, new iterations of this design that preserve the core architecture are being developed with high radiation and high event-rate environments in mind.

## References

<span id="page-3-1"></span>[1] J. Yuan, C. Svensson, New tspc latches and flipflops minimizing delay and power, in: 1996 Symposium on VLSI Circuits. Digest of Technical Papers, 1996, pp. 160–161. doi:[10.1109/VLSIC.1996.507754](http://dx.doi.org/10.1109/VLSIC.1996.507754).