We propose a fine-pitch, highly scalable, heterogeneous integration platform called the Silicon-Interconnect Fabric (Si-IF) where dielets are assembled with fine-pitch interconnects (≤ 10 μm) at short inter-dielet spacings (≤ 100 μm) using direct metal-metal Thermal Compression Bonding process (TCB). As a result, short links on Si-IF (≤ 500 μm) are used for inter-dielet communication, reducing the latency to ≤ 35 ps. We experimentally demonstrated the measured insertion loss in these short Si-IF links (≤ 500 μm) is ≤ 2 dB for frequencies up to 30 GHz. Consequently, we show that assemblies on Si-IF have 10–40X lower parasitic inductance, and 7–35X lower parasitic capacitance compared to assemblies on interposers and PCBs. We propose the Simple Universal Parallel intERface for chips (SuperCHIPS) protocol for data transfer that efficiently utilizes the Si-IF to achieve data-rates of ≥ 10 Gbps/link at an energy/bit of ≤ 0.04 pJ/b. Further, the aggregate bandwidth/mm is ≥ 8 Tbps/mm. This corresponds to an improvement of 120–300X in bandwidth/mm and a reduction of 100–500X in energy/bit compared to conventional systems.

Today, conventional system integration technologies predominantly use packaged dies that are assembled on a PCB substrate using solder-based interconnects. These assemblies have low interconnect densities (0.4 – 1 mm pitch) due to solder extrusion and bridging [1]. Therefore, the data-bandwidth of these systems is limited by interconnect pitch and necessitates the use of serialization and deserialization (SerDes) circuits that are power hungry. High interconnect densities and fine spaced dielet assemblies are essential to increase the data-bandwidth and reduce the energy and latency of data transfer. Recently, technologies such as interposers were developed to act as redistribution layers to integrate few dies with fine-pitch wires. However, interposers only achieve moderate interconnect densities (> 50 μm pitch) [3] and are limited in size. Further, interposers must be finally assembled on a PCB, therefore, adding an additional level in the packaging hierarchy and inflating the overall packaging cost.

We are developing a highly-scalable, package-less, fine-pitch heterogeneous integration platform called the Silicon-Interconnect Fabric (Si-IF) where unpackaged dielets are assembled on a silicon substrate with fine pitch interconnects (10 μm) at small inter-dielet spacings (≤ 100 μm) [1,2]. Correspondingly, we demonstrated an interconnect density of > 4×106cm−2 that allows for large number of parallel short links (≤ 500 μm) between dielets. These short links have parasitics identical to on-chip wires. Therefore, simple inverter-based drivers can be used for signal transfer with the Simple Universal Parallel intERface for chips (SuperCHIPS) protocol [2]. As a result, this fine-pitch integration scheme on Si-IF achieves significant improvements in data bandwidth (> 120X) and dramatic reductions in energy/bit (> 100X) compared to interposers and PCB-based assemblies [2]. Therefore, the Si-IF based integration approaches System-on-Chip (SoC)-like performances while maintaining technology heterogeneity.

We demonstrated the 10 μm pitch interconnects (Au-capped Cu pillars, Ø = 5 μm) using a solderless direct metal-metal (Au-Au) Thermal Compression Bonding (TCB) with low specific contact resistance (<1 Ω-μm2) in [1]. Subsequently, we showed the 100 μm inter-dielet spacing and alignment accuracy of ≤ 1 μm in [1]. The Si-IF assembly with the integration of multiple dielets (4, 9, 16 & 25 mm2) on a 100 mm diameter Si-wafer is illustrated in Fig. 1. A total of 371 dielets were TCB bonded to the Si-IF using the fine-pitch (10 μm) interconnects, at 100 μm inter-dielet spacing. The effective silicon area is > 3100 mm2, demonstrating Si-IF as a scalable heterogeneous integration platform. The assembly was passivated using Parylene C. In this paper, we present the high frequency electrical characterization of the fine-pitch interconnects and the short Si-IF links.

We designed and fabricated daisy chain test structures to characterize the signal transfer between dielets when communicating on the Si-IF. The Si-IF consists of communication links with three parameters varied (a) link length- 585, 125 μm (b) width- 5, 2 μm (c) wiring pitch- 10, 4 μm. The thickness of the links is 2 μm. The links are terminated using the fine-pitch (10 μm) Au-capped Cu pillars. The dielets have metal pads that are connected to the Si-IF link to form a daisy chain. Measuring RF characteristics of these short links is challenging due to the difficulty in probing, and de-embedding of the probe and pad parasitics. Therefore, to get reliable measurements, several short links were cascaded in a daisy chain to form a long link (>2.5 mm) between the two ports. The schematic of the cascaded structure and its cross-section are shown in Fig. 2 (a). The characteristics of actual device under test (DUT), which is the short link segment, were later extracted using de-embedding techniques [8]. The links were configured as Ground-Signal-Ground (GSG) to achieve best insertion loss measurements. The fabricated Si-IF is shown in Fig. 2(b). The dielets were precision aligned (≤ 1 μm) and assembled on the Si-IF using TCB, shown in Fig. 2(c). This assembly ensures the parasitics of the bonded interconnects are also included in the measurements, therefore, representing the operating condition of the dielets.

A. Insertion Loss Measurements

Two-port S-parameter measurements were performed for frequencies from 50 MHz - 30 GHz. De-embedding techniques were used to eliminate the parasitics introduced by the probes and fan-out wires and the characteristics of a single link segment from the cascaded structure were extracted. The measured insertion loss (S21) of the 585 μm and 125 μm link segments after de-embedding are shown in Fig. 3, 4 respectively. The insertion loss for 585 μm is <2 dB for frequencies up to 30 GHz for various wire widths and pitches. The insertion loss for 125 μm is <0.7 dB in the same frequency range. This loss includes losses in the Si-IF wires, the bonded interconnects (Cu pillars), and the pads on the dielets. The measurements show good agreement with the simulated values. The insertion loss is significantly lower than existing interposer technologies [4]. It is observed that the transfer characteristics of these short Si-IF links has only a single pole. This establishes the RC-like behavior of short links on Si-IF (≤ 500 μm) compared to the RLC-like behavior of long links on interposer (> 3 mm) and PCB (> 10 mm). Therefore, there are no resonances and signal reflections reducing inter-symbol interference.

B. Parasitics Extraction

The measured S-parameters were used to extract the parasitics in the Si-IF links using an RLGC transmission line model. The extracted parasitics are shown in Fig. 5. The extracted values include the parasitics of the interconnects and pads amortized across the length of the wires. The measured DC resistance of the links is 4.6 mΩ/μm, the capacitance is 0.20 fF/μm and inductance is 0.42 pH/μm. The measured parasitics are in reasonable agreement with the simulated models. We observe that the Si-IF trace parasitics are comparable to the on-chip global wiring parasitics in a 65–90 nm technology node [4]. The extracted resistance and capacitance of the interconnects are 75 mΩ/pillar and 4 fF/pillar respectively.

C. Parasitics Comparison

A comparison of the total parasitics in Si-IF, interposer, and PCB-based assemblies is shown in Table I. The values presented include the total parasitics of the traces, interconnects, and packages which is the total parasitic load on the driver. The package parasitics are applicable only to PCB substrates. The major difference between interposer and Si-IF links is the length of the traces and the interconnect pitch. In addition, the capacitance due to Electro-Static-Discharge (ESD) protection is not included for Si-IF assemblies that can add more than 0.1 pF of parasitic capacitance, because of the reasons mentioned in [2]. From the table, we notice that the Si-IF assemblies when compared to PCB have 40–200X lower parasitic inductance and 15–80X lower capacitance. Furthermore, the Si-IF assemblies when compared to interposer assemblies, have 10–50X lower parasitic inductance and 7–35X lower parasitic capacitance. The two major reasons for the significant reduction of the parasitics in Si-IF assemblies are (1) package-less integration; (2) Short link lengths (≤ 500 μm). Therefore, the low link parasitics and RC-like behavior of the short Si-IF links reduce the complexity of transceivers and simple inverters can be used as drivers for data transfer between dielets. This highlights the efficiency of using Si-IF for heterogeneous system integration that is close to a monolithic SoC.

A. Transceiver Circuit

We performed circuit simulations with tapered buffer drivers designed in TSMC 16nm technology, shown in Fig. 6(a). The measured S-parameters of the Si-IF links were imported into the circuit simulator. The wire-length was assumed to be 500 μm (worst-case scenario). A 10 Gbps pseudo random bit stream (PRBS) was presented as an input to the driver and the receiver output was analyzed. The rise and fall time were assumed to be 5% (10 ps) of the Unit Interval (UI). The input waveform, and the waveforms at the input and output of the receiver is shown in Fig. 6(b). The transmitted waveform across the link at the input of the receiver shows a full VDD swing (0.9 V) and < 20 ps rise/fall time. The overall latencies from the input of the driver to the output of the receiver was 27.5 ps. Further, the average energy/bit for the PRBS data was 0.03 pJ/b.

B. SuperCHIPS streaming protocol

To effectively utilize the fine-pitch Si-IF technology, we propose the short-range (500 μm) SuperCHIPS protocol [2] that uses the large number of parallel short links and simple inverter drivers to achieves low energy/bit (≤ 0.04 pJ/b), low latency (≤ 35 ps), and high bandwidth/mm of die edge (≥ 8 Tbps/mm) communication between dielets. The data can be transferred either asynchronously or using a synchronous sourced clocking. The schematic of a typical I/O cell is shown in Fig. 7 (a).

In asynchronous mode, the data-rate per link can be as high as 10 Gbps and the maximum data-bandwidth is 2 Tbps/mm for a single wiring layer (4 μm wiring pitch). The Si-IF technology allows for 4 layers of wiring, therefore, increasing the bandwidth to 8 Tbps/mm. Furthermore, a reduction of wiring pitch to 2 μm increases the bandwidth by 2.5X. In synchronous mode of transfer, the data-rate per link is up to 4 Gbps, because of the difficulty in clock generation and synchronization. Therefore, the data bandwidth is 2.5X lower. The energy/bit in asynchronous data transfer is ≤ 0.04 pJ/b and in synchronous data transfer is ≤ 0.2 pJ/b. Overall, the SuperCHIPS protocol corresponds to a 120–300X improvement in data-bandwidth/mm and a 100–500X reduction in energy/bit, when compared to a PCB based integration scheme. Further, compared to interposer assemblies, the improvement in data-bandwidth/mm is 20–55X and the reduction in energy/bit is 20–100X. The comparison of bandwidth/mm and energy/bit for different technologies is shown in Fig. 7(b).

We successfully demonstrated a package-less, fine-pitch, scalable heterogeneous integration platform, the Silicon-Interconnect Fabric. We show that the measured insertion loss of the short Si-IF links (≤ 500 μm) was ≤ 2 dB for frequencies up to 30 GHz. We show that the links in Si-IF assemblies have 10–40X lower parasitic inductance and 7–35X lower parasitic capacitance compared to interposer and PCB-based assemblies. We demonstrate using circuit simulations of the SuperCHIPS protocol the energy/bit is ≤ 0.04 pJ/b and the latency is ≤ 35 ps. The aggregate data-bandwidth/mm is ≥ 8 Tbps/mm.

This work was supported in part by DARPA, ONR, UC-MRPI, the UCLA CHIPS Consortium and the Semiconductor Research Corporation (SRC). We thank Prof. Sudhakar Pamarti, Prof. Dejan Markovic, Dr. Boris Vaisband, Randall Irvin, Saptadeep Pal, Uneeb Rathore, and Sumeet Nagi for valuable discussions.

[1]
A. A.
Bajwa
,
et. al
,
“Heterogeneous Integration at Fine Pitch (≤10 μm) using Thermal Compression Bonding”
,
ECTC
,
2017
.
[2]
S.
Jangam
et. al
,
“Latency, Bandwidth and Power Benefits of the SuperCHIPS Integration Scheme,”
2017 IEEE 67th ECTC
,
2017
.
[3]
M.
O'Connor
“Highlights of the High-Bandwidth Memory (HBM) Standard”
Memory Forum Workshop NVIDIA
,
June 2014
.
[4]
K.
Cho
,
et. al
,
“Signal and power integrity design of 2.5D HBM (High bandwidth memory module) on SI interposer”
,
Pan Pacific Microelectronics Symposium (Pan Pacific)
,
Jan. 2016
,
pp
.
1
5
.
[5]
M. A.
Karim
,
et. al
,
“Power comparison of 2D, 3D and 2.5D interconnect solutions and power optimization of interposer interconnects,”
ECTC
,
2013
.
[6]
R.
Navid
et al
.,
“A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology,”
in
JSSC
2015
.
[7]
Graphics Double Data Rate 6 (GDDR6) SGRAM Standard, JEDEC, JESD250
.
[8]
J.
Frei
,
et. al
,
“Multiport S-Parameter and T-Parameter Conversion With Symmetry Extension,”
in
IEEE Transactions on Microwave Theory and Techniques
,
vol. 56
,
no. 11
,
pp
.
2493
2504
,
Nov.
2008
.