# The PRO1 ASIC for Fast Wilkinson Encoding

L. L. Ruckman\* and G. S. Varner

Department of Physics and Astronomy, University of Hawaii, 2505 Correa Road, Honolulu HI 96822, USA E-mail: ruckman@hawaii.edu

ABSTRACT: Wilkinson conversion of stored samples in large Switch Capacitor Array (SCA) ASICs, such as used for high speed waveform sampling, has many benefits in terms of compactness, no missing output codes, low power requirements and robustness. However such Analog-to-Digital conversions are relatively slow, limited by the encoder clock speed. By repeating the same fast sampling technique used by the SCA, combined with a fast priority encoder, significantly faster conversion is demonstrated for a prototype ASIC designated PRO1. For 8-10 bits of resolution, this technique is compact and requires far fewer system resources.

<sup>\*</sup>Corresponding author.

## Contents

| 1. | . Background               |   |  |
|----|----------------------------|---|--|
| 2. | Architectural Details      |   |  |
| 3. | Readout Test System        | 3 |  |
| 4. | Test Results               | 4 |  |
|    | 4.1 Sampling speed         | 5 |  |
|    | 4.2 Temperature Dependence | 5 |  |
|    | 4.3 Timing Performance     | 5 |  |
| 5. | ADC Implementation         | 8 |  |
| 6. | . Future Directions        |   |  |
| 7. | . Summary                  |   |  |
| 8. | Acknowledgements           |   |  |

## 1. Background

Precision timing and amplitude instrumentation of large arrays of photo detector elements for future applications in collider and astroparticle physics has been enabled the proliferation of highperformance and low-cost waveform sampling devices [1, 2, 3, 4]. To expand this technique in a cost-effective manner to systems consisting of 0.1-1 million channels, certain features could be quite useful. In particular, next-generation TeV gamma and Super B-factory detector applications require trigger rates of 10's of kHz while providing multi-buffer capability. This requirement places a premium on analog conversion performance.

We present the results of an ASIC developed for the flash encoding of photodetector signals, a number of methods of which have been evaluated [5]. This concept is an outgrowth of earlier work [6] and is illustrated in Fig. 1 part a) where the leading and trailing edge times are used to determine the timing and Time-Over-Threshold (TOT) for an analog waveform. A high level threshold crossing may be used to improve the intrinsic time determination error due to amplitude dependence ("time walk"). Part b) of the figure illustrates the flash time encoding of the digital output of a ramp (Wilkinson) comparator.



**Figure 1.** Flash encoding concept: a) is the analog waveform recording with leading edge LL and trailing edge TL for TOT and high level HL for coarse Time Walk Correction; b) simplified scheme for fast transition timing edge encoding.



**Figure 2.** A block diagram of the PRO1 readout, where only the Low Level (LL) threshold crossing output is considered. A compact cascade implementation for the priority encoder limited the settling time, and can be improved.

## 2. Architectural Details

The Photodector Read Out version 1 (PRO1) ASIC was developed to evaluate the use of waveform sampling in conjunction with threshold crossing encoding to provide flash, although coarse, determination of signal pulse parameters. Relatively slow risetime signals, combined with channelchannel comparator threshold spreads, resulted in limited performance using this technique; at least for compact arrays and precision timing (sub 100ps resolution) applications. However for the encoding of fast Wilkinson comparator outputs, the recording concept illustrated in Fig. 1b) shows promise to improve upon limitations of the high-speed Gray Code Counter (GCC) scheme usually employed [7]. In this fast Wilkinson technique, a ramp is started coincident in time with the propagation of a write-pointer strobe across the sampling array. The comparator output transition time is analog captured and each sample evaluated with a low-power comparator. The power required to operate each 8-bit sampling row is about 9 mW, and could be lowered further by disabling the comparator bias when conversion cycle is completed.

A priority encoder determines the location of the first threshold crossing cell. This signal flow

is illustrated in Fig. 2. Typical effective times between samples in the array of 100's of ps (many GSa/s effective) are common in  $0.25-0.35\mu$ m CMOS processes [1, 4]. Obtaining similar GHz digital counter rates in a companion FPGA is either difficult or very power and routing resource intensive.

Table 1 shows the system requirements for a Xilinx FPGA functioning as a Time-to-Digital Converter (TDC) using a high speed digital reference clock and a GCC. A proper pipelined, dual clock phase GCC was used when simulating the amount of FPGA logic slices, flip flops, and Look-Up Tables (LUTs) required. A proper pipelined GCC is a Gray Code counter that counts in Gray Code using an array of pipelined flip flops. Each PRO1 has the following specifications, relevant to application as a TDC, as listed in Table 2.

| # of TDC CHs | GCC width (bits) | Logic Slices | Flip flops | LUTs |
|--------------|------------------|--------------|------------|------|
| 8            | 8                | 279          | 204        | 487  |
| 8            | 10               | 339          | 252        | 599  |
| 8            | 12               | 400          | 300        | 709  |
| 8            | 14               | 460          | 348        | 819  |
| 8            | 16               | 520          | 396        | 927  |
| 16           | 8                | 545          | 332        | 958  |
| 16           | 10               | 655          | 412        | 1166 |
| 16           | 12               | 764          | 492        | 1372 |
| 16           | 14               | 875          | 572        | 1578 |
| 16           | 16               | 983          | 652        | 1782 |
| 32           | 8                | 967          | 572        | 1742 |
| 32           | 10               | 1179         | 716        | 2142 |
| 32           | 12               | 1390         | 860        | 2540 |
| 32           | 14               | 1601         | 1004       | 2938 |
| 32           | 16               | 1813         | 1148       | 3334 |

**Table 1.** Programmable logic system requirements for a Xilinx Virtex or Spartan FPGA functioning as a TDC using a high speed digital reference clock and a GCC.

Fig. 3 shows a photograph of the PRO1 bare die. As noted earlier, additional circuitry exists for prototyping other functionality. For the measurement reported, only a single channel and storage row are considered, and only the Low-level comparator output thereof. When implementing only that functionality, the density clearly can and will be increased, though a constraint is provided by the need to reduce the settling time of the select logic tree employed. This logic can take as long as the write-pointer propagation time across the array to settle (100's of ns).

#### 3. Readout Test System

A printed circuit board was fabricated to evaluate PRO1 performance, a photograph of which which is shown in Fig. 4. The three main components on this circuit board are a packaged PRO1 ASIC,

**Table 2.** Relevant sampling specification for the PRO1 ASIC when used as a flash TDC. Measurements from a single channel, single storage row, consisting of 256 samples are presented.

| Parameter         | Value     | Unit              |
|-------------------|-----------|-------------------|
| Sampling Rate     | 1.0 - 2.7 | GSa/s             |
| Full range        | 95-250    | ns                |
| Nominal Time Step | 400       | ps (2.5 GSa/s)    |
| Number of inputs  | 4         | channels per PRO1 |
| Sample rows       | 4         | per channel       |
| Encoding output   | 8         | bits              |



**Figure 3.** A bare die photograph of the PRO1 ASIC. The die is 3.21mm by 3.03mm and is fabricated in the TSMC  $0.25\mu$ m process.

an FPGA, and a Universal Serial Bus (USB) interface chip. The external communication interface is via USB 2.0 and the Cypress CY7C68013-56PVC USB microcontroller is used. This USB microcontroller controls the data being sent to and received from the FPGA to a computer interface. The FPGA controls the digital logic and timing for the PRO1 readout, and the Xilinx XC3S200 is used. RAM banks internal to the FPGA buffer the digitized values while the data is being dumped into the USB data stream. A basic software tool was developed to send commands to the FPGA and record PRO1 data via the USB 2.0 interface.

## 4. Test Results

By using the readout system described in the previous section, a number of the basic performance parameters of the PRO1 ASIC were evaluated. Because timing performance is suchVer.1.2 2008/11/03 a critical feature of this ASIC functioning as a TDC for Wilkinson conversion, each parameter is described in detail in the following subsequent sections.



Figure 4. Photograph of the PRO1 evaluation circuit board

# 4.1 Sampling speed

Determination of the sampling speed is made by measuring the time interval between insertion of the timing strobe and appearance of the output pulse from the last cell of the row, minus pad buffer delays. The sampling speed is calculated by taking the number of cells in a row and dividing it by the propagation time for a given control voltage setting. A plot of the sampling speed versus control voltage (ROVDD) is shown in Fig. 5, where it is seen that sampling rates from below 0.3 GSa/s to above 4.5 GSa/s are possible.

# 4.2 Temperature Dependence

One potential disadvantage of this voltage controlled delay technique is that the circuit is temperature dependent. This dependence is seen in Fig. 6 and is roughly  $0.3\%/^{\circ}C$ , around room temperature, and completely matches expectation from SPICE simulation. While for many applications, this variation would not be significant and can be calibrated out with an external reference clock [4].

## 4.3 Timing Performance

The PRO1 timing performance was evaluated using the test setup shown in Fig. 7. A synchronization pulse from the evaluation circuit board goes through an RF splitter. One copy of the sync pulse is fed back with fixed cable delay into the evaluation board to create the sampling strobe. The other



**Figure 5.** Sampling rate as a function of the ROVDD control voltage, where extended operation (2.5V) is possible.



Figure 6. Temperature dependence of the sampling rate.



Figure 7. Schematic of the PRO1 timing measurement.

copy is used to trigger a Avtech pulser. The pulse from Avtech pulser was tuned to a 500 mV amplitude with a 0.5 ns rise time and 10 ns full width half max (FWHM) duration. The discriminator on the PRO1 ASIC was set to trigger on the pulse's rising edge at 250 mV. The Avtech AVMP-2-C-P-EPIA pulser was used and its output was inserted into the RF input of the PRO1 ASIC and scanned across the sampling window using a variable delay module.



Figure 8. Plot showing the linear response of the PRO1 output with respect to a fixed time displacement.



Figure 9. Plot of the residual data structure from subtracting the linear fit to the data points.

By scanning the sampling window with this test setup, the linear response of the ASIC's digital output versus the time difference is shown in Fig. 8. Each data point for Fig. 8 is the average PRO1 output for 10k events. The slope of the linear fit shows that the average time step to be 373 ps between sampling pixels with ROVDD tied to the ASIC's VDD. The substraction of the linear fit to the data points is shown in Fig. 9. The structure in the residual plot can be used to create a bin-by-bin correction to improve PRO1 ASIC timing. The projection of the all residual events is shown in Fig. 10, which has an RMS timing jitter of about 673ps. By applying a bin-by-bin timing calibration to the PRO1 ASIC's digital output, the RMS timing jitter is reduced down to 163ps, which is shown in Fig. 11. This TDC performance, after applying the calibration corrections, is not



Figure 10. Histogram showing the timing jitter with no calibrations.



Figure 11. Histogram showing the timing jitter with calibrations.

so far from the ideal binary interpolation  $\frac{1}{\sqrt{12}}$  limit (108ps). The additional timing error is attributed to sampling speed temperature drift and storage cell dependent comparator threshold dispersion.

## 5. ADC Implementation

By using this circuit for Wilkinson conversion, a calibration must be performed to remove the systematic errors. This calibration procedure is done by applying a fixed DC voltage to the analog input of the Wilkinson ADC. By stepping through different fixed DC voltages within the ADC digitizing dynamic range, the average PRO1 output is mapped as a function of voltage. From these calibration measurements, a look-up table can be generated to convert the PRO1 output into voltage for a Wilkinson conversion application.

The method of using time interpolation by digital delay lines for boosted Wilkinson conversion has been demonstrated to achieve TDC resolution as low as 20 ps [8]. While the PRO1 TDC resolution is roughly 20 times courser and slower than the digital delay lines method, the PRO1 method is useful in applications with a fast readout duty cycle using deep SCA storage for buffering during the readout deadtime. Since the PRO1 TDC method doesn't require an external clock reference, the stored analog voltage on the SCA has reduced coupling of switching noise from the absents of a reference TDC clock.

## 6. Future Directions

Demonstration of the efficacy of this technique has led to its adoption in the 2nd generation of large, buffered analog storage device for precision timing ASIC (BLAB2). It will also be featured in a device intendend for continuous monitoring of turn-by-turn x-ray emission of high luminosity electron storage rings (STURM). In both cases the low-power and faster conversion speed improve the density and reduce processing speed and overall readout system overhead, essential for future mega-channel readout systems.

## 7. Summary

A first generation of fast Wilkinson encoder CMOS device has been studied in a 0.25  $\mu$ m process. This architecture is optimized to reduce deadtime and power consumption while operating at an effective multi-GHz digital counter rate for fast Wilkinson conversion. Demonstrated low-power and high timing resolution makes this architecture ideal for integrating a data collection FPGA with a SCA waveform sampling ASIC, while reducing the amount of FPGA resources needed.

#### 8. Acknowledgements

Testing was supported in part by Department of Energy Advanced Detector Research Award # DE-FG02-06ER41424.

## References

- [1] S. Kleinfelder, IEEE Trans. Nucl. Sci. 50 (2003) 955.
- [2] S. Ritt, Nucl. Instr. Meth. A518 (2004) 470.
- [3] E. Delagnes et al., Nucl. Instr. Meth. A567 (2006) 21.
- [4] G.S. Varner, L.L. Ruckman, J.W. Nam, R.J. Nichol, J. Cao, P.W. Gorham, M. Wilcox, "The large analog bandwidth recorder and digitizer with ordered readout (LABRADOR) ASIC," Nucl. Instr. Meth. A583 (2007) 447.
- [5] G.S. Varner, L.L. Ruckman, J. Schwiening and J. Vavra, "Compact, low-power and precision timing photodetector readout," Proc. of Science, PD07:026 (2008).
- [6] G. Varner, "The Modern FPGA as Discrim., TDC and ADC", J. Instr. 1 (2006) P07001.
- [7] L. Ruckman, G. Varner and A. Wong, "The first version Buffered Large Analog Bandwidth (BLAB1) ASIC for high luminosity collider and extensive radio neutrino detectors," Nucl. Instr. Meth. A591 (2008) 534.
- [8] T. Fusayasu, "A Fast Integrating ADC Using Precise Time-to-Digital Conversion", IEEE TNS, VOL. 54, NO. 5, OCT 2007 P 1735.