In this study, chiplet design and heterogeneous integration packaging, especially (a) chip partition and heterogeneous integration driven by cost and technology optimization, Figs. 1(a) and 1(b) chip split and heterogeneous integration driven by cost and yield, Figs. 1(b) and 1(c) multiple system and heterogeneous integration with thin-film layers directly on top of a build-up package substrate, Figs. 1(c) and 1(d) multiple system and heterogeneous integration with an organic interposer on top of a build-up package substrate, Figs. 1(d) and 1(e) multiple system and heterogeneous integration with through-silicon via (TSV) interposer on top of a build-up package substrate, Fig. 1(e), will be investigated. Figures 1(c)–1(e) are driven by formfactor and performance. Emphasis is placed on their advantages and disadvantages, design, materials, process, and examples. Some recommendations will also be provided.
In 2011, Xilinx asked Taiwan Semiconductor Manufacturing Company (TSMC) to fabricate its field-programable gate array (FPGA) system-on-chip (SoC) with the 28 nm process technology. Because of the large chip size, the yield was very poor. Then, Xilinx redesigned and split the large FPGA into four smaller chiplets as shown in Fig. 2 and TSMC manufactured the chiplets at high yield (with the 28 nm process technology) and packaged them on their chip-on-wafer-on-substrate (CoWoS) technology. CoWoS is a 2.5D IC integration, which is the key structure (substrate) to let those four chiplets do vertical and mainly lateral communications. The minimum pitch of the four redistribution layers (RDLs) on the TSV-interposer is 0.4 μm, Fig. 3. On Oct. 20, 2013, Xilinx and TSMC  have jointly announced the production release of the Virtex-7 HT family with 28-nm process technology, what the pair claims is the industry's first chiplet design and heterogeneous integration package in production. Since then, there are a few high-volume manufacturing (HVM) products using the chiplet design and heterogeneous integration packaging technology, which will be discussed in this study. SoC will be briefly mentioned first.
System-on-chip integrates ICs with different functions such as CPU (central processing unit), GPU (graphic processing unit), memory, etc. into a single chip for the system or subsystem. The most famous SoC is Apple's application processors (AP), which are simply shown in Fig. 4 for A10 through A15. The number of transistors versus years with various feature sizes (process technology) is shown in Fig. 5. It can be seen the power of Moore's law, which increases the number of transistors and functionalities with a reduction of feature size. Unfortunately, the end of Moore's law is fast approaching, and it is more and more difficult and costly to reduce the feature size (to do the scaling) to make the SoC.
3 Chiplets Design and Heterogeneous Integration Packaging
According to International Business Strategies, Fig. 6 shows the advanced design cost versus feature size through 5 nm. It can be seen that it will take more than $500 × 106 to just design the 5 nm feature size. For the 5 nm process technology high manufacturing yield development, it will take another $1 × 109.
Figure 7 shows the plots of yield (percent of good dies) per wafer versus chip size for monolithic design and two-, three-, and four-chiplet design . In general, the larger the chip size, the lower the semiconductor manufacturing yield. Also, a 360 mm2 monolithic die will have a yield of 15% while a four-chiplet design (each 99 mm2) more than doubles the yield to 37%. The total die area of the four-chiplet design incurs a ∼10% area penalty (36 mm2 for a combined silicon area of 396 mm2) but the significant improvement in yield which directly translates to lower cost.
3.1 Chip Partition.
In chip partition and heterogeneous integration (driven by cost and technology optimization), Fig. 1(a), the SoC, such as the logic and I/Os, are partitioned into functions (chiplets): logic and I/O. These chiplets can be stacked (integrated) by the front-end CoW (chip-on-wafer) or WoW (wafer-on-wafer) methods and then assembled (integrated) on the same substrate of a single package by using heterogeneous integration techniques [1–38]. It should be emphasized that the front-end chiplets' integration can yield a smaller package area and better electrical performance but is optional. This integration is the focus of Secs. 4–8.
3.2 Chip Split.
In chip split and heterogeneous integration (driven by cost and yield), Fig. 1(b), the SoC, such as logic, is split into smaller chiplets, such as logic1, logic2, and logic3. These chiplets can be stacked (integrated) by the front-end CoW or WoW methods and then assembled on the same substrate of a single package by using heterogeneous integration techniques [1–38]. Again, the front-end integration of chiplets is optional. This integration is the focus of Secs. 4–8.
3.3 Multiple System and Heterogeneous Integration.
Besides chip partition and chip split, there is another group of chiplet design and heterogeneous integration packaging, which is called multiple system and heterogeneous integration as shown in Figs. 1(c)–1(e). This group of integration is driven by formfactor and performance and will be discussed in Sec. 9.
4 Advanced Micro Devices Chiplets Design and Heterogeneous Integration Packaging
4.1 UCSB/Advanced Micro Devices Future Chiplets Design and Heterogeneous Integration Packaging.
In 2017, University of California at Santa Barbara (UCSB) and advanced micro devices (AMD) published a paper on “Cost-Effective Design of Scalable High-Performance Systems Using Active and Passive Interposers” . It shows AMD's future chiplets design and heterogeneous integration packaging, which will be three-dimensional (3D) IC integration as shown in Fig. 8, i.e., the chiplets are (stacked) on top of the other chiplet such as logic, so-called the active TSV-interposer.
4.2 Advanced Micro Devices Extreme-Performance Yield Computing.
In mid-2019, AMD introduced the second-generation EPYC (extreme-performance yield computing), 7002-series, codename Rome which doubled the number of cores to sixty-four. The second-Gen EPYC is a new breed of server processors which sets a higher standard for data centers. It shows that Rome server product makes use of a 9–2–9 package (Fig. 9) for signal connectivity with four layers above the package core for signal routing. One of the signaling layers (others are similar) is shown in Fig. 10 along with the physical position of the CCD (CPU compute die), IOD (IO die), as well as main external DRAM (dynamic random-access memory) and SerDes interfaces.
For high-performance servers and desktop processors the I/Os are very heavy. Analog devices and bump pitches for I/Os benefit very little from leading edge technology and is very costly. One of the solutions is to partition the SoC into chiplets reserving the expensive leading-edge silicon for CPU core while leaving the I/Os and memory interfaces in n − 1 generation silicon. Because AMD committed to keeping the EPYC package size and pin-out unchanged, there needs to be a close silicon/package codesign as the number of die increases from four in the first EPYC to nine in the second Gen EPYC.
The second Gen EPYC chiplet performance versus cost is shown in Fig. 11. AMD reveal that on TSMC's 7 nm process technology the cost to manufacture a 16-core monolithic die is more than double that of a multichiplets CPU. It can be seen from Fig. 11 that: (a) the lower the core counts, the lower the saving, (b) higher core counts and performance than possible with a monolithic design, (c) lower costs at all core count/performance points in the product line, (d) cost scales down with performance by depopulating chiplets, and (e) 14 nm process technology for IOD reduces the fixed cost. AMD also optimize the cost structure and improve die yields by using much smaller chiplets. AMD used the expensive 7 nm process technology by TSMC for the core cache dies and moved the DRAM and Pie logic to a 14 nm I/O die fabricated by Global Foundries.
4.3 Advanced Micro Devices Three-Dimensional V-Cache.
During IEEE/ISSC 2022  and IEEE/ECTC 2022 , AMD introduced their 3D V-Cache chiplets design and heterogeneous integration packaging. Figure 12 shows schematically AMD's 3D V-Cache chiplet design and heterogeneous integration packaging. The key components of this structure are a bottom compute die, top static random-access memory (SRAM) die, and structural dies to balance the structure and provide thermal path for heat dissipation from bottom compute die to the heat sink (Fig. 13). The bottom die (81 mm2) is the “Zen 3' CPU which is fabricated by TSMC's 7 nm process technology. The top die (41 mm2) is the extended L3 die which is also fabricated by TSMC's 7 nm process technology. The bottom die with TSV is face-down with C4 (controlled collapse chip connection) bumps. The top die is also face-down, which is face-to-back Cu–Cu hybrid bonding to the bottom die, as shown in Fig. 14. Figure 15 shows the bonding process and the bonded interface, which is fabricated by TSMC' SoIC (system on integrated chips) technology. The Cu–Cu hybrid bonding minimum pitch is 9 μm.
5 Intel's Chiplets Design and Heterogeneous Integration Packaging
5.1 Intel's Foveros Technology.
Figure 16 shows Intel's Foveros technology (announced in Dec. 2018). It can be seen that the TSV interposer is with CMOS (complementary metal-oxide-semiconductor) devices (an active interposer), just like a chip, and is face-to-face thermal compression bonded with the chiplets or SoC.
Figures 17(a) and 17(b) show another technology, announced during SEMICON West in July 2019, by Intel called omni-directional interconnect (ODI). For ODI Type 1, Fig. 17(a), the active TSV interposers (chips) are underneath the big chip such as an SoC, and are face-to-face thermal compression bonded with the SoC. For ODI Type 2, Fig. 17(b), the active TSV interposer (bridge, i.e., chip) is underneath and connecting the chiplets/SoCs. ODI Type 3 is a special case of Type 2, in which the active interposer (or the base logic chip) is connecting the SoCs/chiplets. Intel also announced the management data input/output (MDIO) for die-to-die interface, which is used to replace the current advanced interface bus (AIB). All these heterogeneous integrations on TSV interposers are for extreme high performance, and the active TSV interposers are for even higher performance [18–20].
5.2 Intel's Lakefield.
In July 2020, Intel shipped their mobile (notebook) processor “Lakefield,” which is based on their Foveros technology (TYPE-3 of ODI). The SoC is partitioned (e.g., CPU, GPU, LPDDR4, etc.) and split (e.g., the CPU is split into one big CPU and 4 smaller CPU) into chiplets as shown in Figs. 18 and 19. These chiplets are then face-to-face bonded (stacked) on an active TSV-interposer (a large 22FFL base chip) with a CoW (chip-on-wafer) process. The interconnect between the chiplets and the logic base chip is microbump (Cu pillar + SnAg solder cap) as shown in Figs. 18 and 19. The interconnect between the base chip and the package substrate is C4 bump and between the package substrate and PCB is solder ball. The final package formant is a PoP (package-on-package) (12 mm × 12 mm × 1 mm) as shown in Fig. 18. The chiplet design and heterogeneous integration packaging is in the bottom package and the upper package is housing the memories with wire bonding technology.
The fabrication of the chiplets is with Intel's 10 nm process technology and of the base chip is 22 nm. Since chiplets' size is smaller and not all the chips are using the 10 nm process technology, the overall yield must be higher and thus it translates to lower cost.
It should be noted that this is the very first HVM (high volume manufacturing) of 3D chiplets integration. Also, this is the very first HVM of processors for mobile products such as the notebook by 3D IC integration.
5.3 Intel's Foveros-Direct.
During Intel Architecture Day (Aug. 13, 2020), they announced a Cu–Cu hybrid bonding for their FOVEROS technology. At IEEE Hot Chip Conference (Aug. 2021), they called it FOVEROS-Direct  and demonstrated that with bumpless hybrid bonding the pitch can go down to 10 μm instead of 50 μm like the Lakefield as shown in Fig. 20.
5.4 Intel's Ponte Vecchio.
Another of Intel's chiplet design and heterogeneous integration packaging technology is called Ponte Vecchio GPU, or the “spaceship of a GPU” [11,17], which should be the largest and most chiplets designed to date, Figs. 21–25. The Ponte Vecchio GPU will be making use of several key technologies, which will power 47 different compute chiplets and 16 thermal dies based on different process nodes and architectures. While the GPU primarily makes use of Intel's 7-nm extreme ultraviolet lithography (EUV) process node for those eight RAMBO (random access bandwidth-optimized SRAM tiles), Intel will also be producing some Xe-HPC compute dies through external fabs (such as TSMC with their 5-nm note for those 16 compute tiles). To be precise (Table 1) there are: 47 chiplets consist of 16 Xe-HPCs (internal/external), 16 thermal dies, eight Rambos (internal), two Xe-Bases (internal), 11 EMIBs (internal), two Xe-Links (external), and eight HBMs (external). The maximum top-die (chiplet) size = 41 mm2; the base die size = 650 mm2; die-to-die pitch = 36 μm; and package layers = 11–2–11, package pins = 4468, and package size = 77.5 mm × 62.5 mm (Table 1). The power envelope is 600 W. A close-up of the EMIB is shown in Figs. 23 and 24.
|Integration||Foveros + EMIB|
|Power envelope||600 W|
|Transistor count||> 100 B|
|Total tiles||63 (47 functional + 16 thermal tiles)|
|Package form factor||77.5 × 62.5 mm (4844 mm2)|
|IO||4 × 16 90 G SERDES, 1 × 16PCle Gen5|
|Total Silicon||3100 mm2 Si|
|Silicon footprint||2330 mm2 Si footprint|
|Package layers||11–2–11 (24-layer)|
|2.5D count||11 2.5D connections|
|Resistance||0.15 mΩ Rpath/tile|
|Package pins||4468 pins|
|Package cavity||186 mm2 × four cavities|
|Integration||Foveros + EMIB|
|Power envelope||600 W|
|Transistor count||> 100 B|
|Total tiles||63 (47 functional + 16 thermal tiles)|
|Package form factor||77.5 × 62.5 mm (4844 mm2)|
|IO||4 × 16 90 G SERDES, 1 × 16PCle Gen5|
|Total Silicon||3100 mm2 Si|
|Silicon footprint||2330 mm2 Si footprint|
|Package layers||11–2–11 (24-layer)|
|2.5D count||11 2.5D connections|
|Resistance||0.15 mΩ Rpath/tile|
|Package pins||4468 pins|
|Package cavity||186 mm2 × four cavities|
The thermal management of a structure with 600 W of power envelope is a challenge. Intel's strategies are (Fig. 25): (a) using thick interconnect layers in the base and compute tiles act as lateral heat spreaders, (b) using high microbump density over potential hotspots to compensate for reduced thermal spreading in a thin-die stack, and (c) using high array density of power TSVs to reduce C4 bump temperature. In addition, the compute tile thickness is increased to 160 μm to improve thermal mass for turboperformance. Furthermore, there are 16 additional thermal shield dies stacked to provide a thermal solution over exposed base die area to conduct heat. Backside metallization with solder thermal interface material (TIM) is applied on all the top dies. The TIM eliminates air gaps caused by different die stack heights to reduce thermal resistance.
5.5 Intel's Roadmap.
Intel's roadmap of chiplet design and heterogeneous integration packaging in terms of interconnect density versus power efficiency is shown in Fig. 26 . It can be seen that Cu–Cu hybrid bonding with < 10 μm pad pitch, >10,000/mm2 pad density, and < 0.05ρJ/bit power are their goals soon.
6 TSMC's Chiplets Design and Heterogeneous Integration Packaging
6.1 TSMC's System on Integrated Chips.
TSMC have been working on chiplet design and heterogeneous integration packaging for a few years [22–28]. At the TSMC Annual Technology Symposium (Aug. 25, 2020), TSMC announced their 3DFabric (3D fabrication) technology for mobile, high-performance computing, automotive, and IoT (internet of things) applications. The core technology of 3Dfabric is their SoIC (system on integrated chips), which was announced during the TSMC Annual Technology Symposium (May 1, 2018) in Santa Clara, CA. 3Dfabric provides chiplet heterogeneous integrations that are fully integrated from front to back end. The application-specific platform leverages TSMC's advanced wafer technology, open innovation platform design ecosystem, and 3DFabric for fast improvements and time-to-market.
Frontend 3D hybrid bonding (stacking) technology SoIC with CoW and WoW provides flexible chip-level chiplets design and integration (Fig. 27). Comparing with the conventional microbump flip chip technology, hybrid bonding SoIC has many advantages, e.g., better electrical performance, Fig. 28(a), and density, Fig. 28(b), and better thermal performance and less energy spent per bit data as shown in Fig. 29 .
6.2 TSMC's CoWoS With System on Integrated Chips.
In 3D backend package integration, CoWoS' increased envelope and enriched technology content offers exceptionally high computing performance and high memory bandwidth to meet HPC needs on clouds, data center, and high-end servers as shown in Fig. 30(a). Figure 15 shows one of AMD products fabricated by TSMC's SoIC technology.
6.3 TSMC's Integrated Fan-Out Package-on-Package With System on Integrated Chips.
In another 3D backend package integration, InFO (integrated fan-out) derivative technology offers memory-to-logic, logic-to-logic, PoP (package-on-package), etc. applications as shown in Fig. 30(b).
6.4 TSMC's Roadmap.
TSMC's chiplet design and heterogeneous integration packaging roadmap is shown in Fig. 31 . It can be seen that CoWoS and InFO PoP are already in HVM and CoWoS with SoIC and InFO PoP with SoIC are ramping up in HVMg.
7 Advantages and Disadvantages of Chiplets Design and Heterogeneous Integration Packaging
The key advantages of chiplet heterogeneous integrations (chip partition and chip splitting) compared with SoCs are yield improvement (lower cost) during manufacturing, time-to-market, and cost reduction during design. Figure 7 shows the plots of yield (percent of good dies) per wafer versus chip size for monolithic design and 2-, 3-, and 4-chiplet design . It can be seen that the smaller the chip size the higher the semiconductor manufacturing yield. The significant improvement in yield directly translates to lower costs. Also, chip partitioning will enhance the time-to-market. Furthermore, chiplets with CPU cores can reduce silicon design and manufacturing costs. Finally, there is also thermal benefit to using chiplets as the chips are spread out across the package.
The disadvantages of chiplet heterogeneous integration are: (1) additional package area due to chip partition and chip splitting, (2) the chiplets interfaces (bridges) increase packaging costs, (3) more complexity and packaging design effort, and (4) past methodologies are less suitable for chiplets. Thus, the challenges (opportunities) for packaging technologists are to reduce the size of packages and provide high-density, high-performance, and low-cost chiplets interfaces – bridges.
8 Lateral Coummication Between Chiplets (Bridge)
In the past, lateral communications of chiplet design and heterogeneous integration packaging are by fine metal line width, spacing, and thickness (L/S/H) TSV-interposer or build-up organic package substrate. For example, Figs. 2 and 3 show the Virtex-7 HT family with TSV-interposer shipped by Xilinx in 2013. The TSV-interposer is known to have a very high cost. On the other hand, Fig. 9 shows AMD's second-generation EPYC server processors [8,9], the 7002-series, shipped in mid-2019. The EPYC is a two-dimensional (2D) IC integration technology, i.e., all the chiplets are side-by-side on a 9–2–9 build-up package substrate. The 20-layer fine metal L/S/H organic substrate is not cheap.
It should be noted that the requirement of lateral communications (RDLs) between chipets is fine-metal L/S/H and at a very small and local area of the chiplets. There is no reason to use the whole TSV-interposer or the whole organic build-up package substrate to support the lateral communication between chiplets. Therefore, the concept of using small area and a fine-metal L/S/H RDLs bridge to connect the chiplets to perform lateral communication (to reduce cost and enhance performance) for chiplet design and heterogeneous integration packaging has been proposed and is a very hot topic today. There are at least two different groups of bridge, namely, rigid bridge and flexible bridge. Only rigid bridge will be discussed in this study.
Rigid bridge consists of the RDLs and the substrate. Most rigid bridges are made with silicon substrate and the RDLs are fabricated on a silicon wafer. Some rigid bridges are even with TSVs. Today, most of the products and publications with bridges are rigid bridges. There are at least three groups of rigid bridges, namely, (a) rigid bridges embedded in build-up package substrate, Sec. 8.1, (b) rigid bridges embedded in fan-out EMC (epoxy molding compound) with RDLs, Sec. 8.2, and (c) hybrid bonding bridge, Sec. 8.3.
8.1 Intel's Embedded Multidie Interconnect Bridge.
The most famous rigid bridge is Intel's EMIB (embedded multidie interconnect bridge) [39–42]. Figure 32 shows one of Intel's EMIB patents . It can be seen that the EMIB die is embedded in the cavity of a build-up package substrate, which is supporting the chiplets.
For EMIB, there are at least three important tasks, Figs. 32 and 33, namely: (a) wafer bumping of two different kinds of bumps on the chiplets wafer (but there are not bumps on the bridge); (b) embedding the bridge in the cavity of a build-up substrate and then laminating the top surface of the substrate; and (c) bonding the chiplets on the substrate with the embedded bridge.
8.1.1 Solder Bumps for Embedded Multidie Interconnect Bridge.
It can be seen from Fig. 32 that there are two kinds of bumps on the chiplet, namely, the C4 (controlled collapse chip connection) bumps and the C2 (chip connection or copper-pillar with solder-cap micro) bumps. Thus, wafer bumping of the chiplets wafer poses a challenge, but Intel has already taken care of this issue.
8.1.2 Fabrication of Embedded Multidie Interconnect Bridge Substrate.
There are two major tasks in fabricating the organic package substrate with EMIB (Fig. 33). One is to make the EMIB, and the other is to make the substrate with EMIB. To make the EMIB, one must first build the RDLs (including the contact pads) on a Si-wafer. The way to make the RDLs depends on the L/S/H of the conductive wiring of the RDLs. Finally, attach the non-RDL side of the Si-wafer to a die-attach film, and then singulate the Si-wafer.
To make the substrate with an EMIB, first place the singulated EMIB with the die-attached film on top of the Cu foil in the cavity of the substrate, Fig. 33(a). It is followed by laminating a dielectric film on the whole organic package substrate and then, drilling (on the dielectric film) and Cu plating to fill the holes (vias) to make connections to the contact pads of the EMIB. Continue Cu plating to make lateral connections of the substrate as shown in Fig. 33(b). Then, it is followed by laminating another dielectric film on the whole substrate and drilling (on dielectric) and Cu plating to fill the holes and make contact pads, Fig. 33(c). (Smaller pads on a finer pitch are for C2 bumps, while larger pads on a gross pitch are for C4 bumps.) The organic package substrate with an EMIB is ready for bonding of the chips as shown in Fig. 33(d).
Today, the minimum metal L/S/H is 2 μm/2 μm/2 μm and the bridge size is from 2 mm × 2 mm to 8 mm × 8 mm , but most are equal and less than 5 mm × 5 mm . The dielectric layer thickness is 2 μm. Usually, there are ≤ 4 RDLs. One of the challenges of the EMIB technology is to fabricate the organic build-up package substrate with cavities for the silicon bridges and then laminate (with pressure and temperature) another build-up layer on top (to meet the substrate surface flatness requirement) for chiplets (with both C2 and C4 bumps) bonding. Intel and its suppliers are working toward high-yield manufacturing of the substrate.
8.1.3 Bonding Challenges for Embedded Multidie Interconnect Bridge.
Intel published a paper at IEEE/ECTC 2021  that pointed out the following bonding challenges of chiplets:
Die bonding process.
Die attach film material design.
Via-to-die-pad overlay alignment.
Integrated process considerations.
8.1.4 Intel's Products with Embedded Multidie Interconnect Bridge.
Figure 34 shows Intel's processor (Kaby Lake) that combines its high-performance ×86 cores with AMD's Radeon Graphics into the same processor package using Intel's own EMIB as well as HBM (2017). Unfortunately, Intel canceled all the Kaby Lake-G products in October 2019.
Figure 35 shows the Agilex FPGA (field programable gate array) module. It can be seen that the FPGA and other chips are attached on top of a build-up package substrate with EMIB with fine-metal L/S/H RDLs. The TSV interposer is not needed. Figure 24 shows the Intel spaceship of GPU (Ponte Vecchio), which has 11 EMIBs.
8.2 IBM's Direct Bonded Heterogeneous Integration.
During IEEE/ECTC2021 and 2022, IBM presented seven papers on “Direct Bonded Heterogeneous Integration (DBHi) Si Bridge” [43–50], Fig. 36. The major differences between Intel's EMIB and IBM's DBHi are as follows:
For Intel's EMIB, there are two different (C4 and C2) bumps on the chiplets (and there are no bumps on the bridge), Figs. 32 and 35, while for IBM's DBHi, there are C4 bumps on the chiplets and C2 bumps on the bridge, Fig. 36.
For Intel's EMIB, the bridge is embedded in the cavity of a build-up substrate with a die-attach material and then laminated with another build-up layer on top. Therefore, the substrate fabrication is very complicated as mentioned in Sec. 8.1. For IBM's DBHi, the substrate is just a regular build-up substrate with a cavity on top as shown in Fig. 37(b).
8.2.1 Solder Bumps for Direct Bonded Heterogeneous Integration.
As shown in Fig. 37(a), there are C2 bumps on the bridge. However, there are C4 bumps and Cu pads on the chiplet of the same wafer. Thus, wafer bumping poses a challenge. IBM use a double lithography process to resolve this issue . The first lithography is used for making the UBM and metal pad, and the second lithography is used to make the C4 bumps by injection molded solder (IMS) method.
8.2.2 DBHi Bonding Assembly.
The bonding assembly process of DBHi is very simple, Fig. 38. First, apply the nonconductive paste (NCP) on Chip 1. Then, bond the Chip 1 and the bridge with thermal compression bonding (TCB). After bonding, the NCP becomes the underfill between Chip 1 and the bridge. Then, apply NCP on the bridge and bond Chip 2 and the bridge with TCB. Those steps are followed by placing the module (Chip 1 + bridge + Chip 2) on the organic substrate with a cavity and then going through the standard flip-chip reflow assembly process.
The stage temperature, bonding force, and bond-head temperature versus time during bonding are shown in Fig. 39 . It can be seen that: (a) the bonding stage temperature (T1) is small and kept at constant all the times, (b) the bond-head temperature consists of three stages: (i) at the first stage the temperature (T2) is larger than T1, which is used to melt and flow the NCP; (ii) at the second stage the temperature (T3 = 2 T1) is the largest, which is used to reflow the solder; and (iii) at the final stage the temperature (T4) is less than T2 and larger than T1, which is used to solidify the solder joints. The underfill under the bridge is optional. Figure 36 shows the demonstration by IBM. If the bridge is very thin, e.g., 50 μm and the C2 bump is very short, e.g., 30 μm, then the cavity of the package substrate is not needed if the C4 solder bump height is > 85μm.
8.2.3 Direct Bonded Heterogeneous Integration Challenges.
The challenges in IBM's DBHi are:
Handling and bonding of a portion of the tiny rigid bridge on a portion of the large chiplet with very fine-pitch pads.
Dealing with a situation in which there is more than one rigid bridge on a chiplet.
Dealing with a situation in which there are more than two chiplets on a package substrate.
8.3 Bridge Embedded in Fan-Out Epoxy Molding Compound With Redistribution Layers.
Intel's and IBM's rigid bridges are either embedded in or are on an organic package substrate. There is another class of rigid bridge, which is embedded in the fan-out EMC and connected to the fan-out RDL-substrate.
8.3.1 Applied Materials' Bridge-First and Face-Up Process.
8.3.2 Unimicron's Bridge-First and Face-Down Process.
8.3.3 IME's Bridge-Last Process.
On May 25, 2021, IME obtained the U.S. patent U.S. 11,018,080  in which For the bridge is embedded in the EMC and connected to the RDL-substrate by the chip (bridge) last or RDL-first fan-out process, (Fig. 42).
8.3.4 Publications by TSMC, ASE, Amkor, SPIL, IME, and Universal Chiplet Interconnect Express.
In the past couple of years, there are many publications in rigid bridges embedded in fan-out EMC with RDLs. For examples, On Aug. 25, 2020, during TSMC's Annual Technology Symposium, the company announced its integrated fanout local silicon interconnect (InFO_ LSI) and chip-on-wafer-on-substrate local silicon interconnect (CoWoS® _LSI) (Fig. 43) . During IEEE/ECTC (June 2021), there were at least four papers published regarding the application of fan-out packaging technology to embed the rigid bridge in the EMC with RDLs for the chiplets to perform lateral communications. All four of these papers discuss very similar technologies. In Ref. , ASE embedded the bridge in the EMC using the fan-out packaging method and called it stacked Si bridge fan-out chip-on-substrate (sFOCoS) (Fig. 44). In Ref. , Siliconware Precision Industries Co., Ltd. (SPIL) called its similar technology fan-out embedded bridge (FO-EB) (Fig. 45). In Ref. , Amkor referred to its comparable technology as S-Connect fan-out interposer (Fig. 46). In Ref. , IME presented its bridge and called it embedded fine interconnect (EFI) (Fig. 47). During IEEE/ECTC (June 2022), IBM published six papers [44–50] in this area.
An important consortium concerned with bridge technology is the Universal Chiplet Interconnect Express® (UCIe®), Fig. 48. According to the consortium's website, the organization addresses customer requests for a more customizable, package-level integration—combining best-in-class die-to-die interconnect and protocol connections from an interoperable, multivendor ecosystem. This new open industry standard establishes a universal interconnect at the package level. The UCIe® board of directors and leadership (promoters) include founding members ASE, AMD, Arm, Google Cloud, Intel Corporation, Meta, Microsoft Corporation, Qualcomm Incorporated, Samsung Electronics, and TSMC, along with newly elected members, Alibaba and NVIDIA.
In Ref. , Intel published the UCIe® 1.0 specification, which provides a complete standardized die-to-die interconnect with physical layer, protocol stack, software model, and compliance testing. Figure 49 shows examples of standard packaging and advanced packaging with chiplet design and heterogeneous integration. It can be seen that there are three different kinds of bridges for advanced packaging: (1) bridge embedded in organic package substrate; (2) bridge embedded in Si-interposer; and (3) bridge embedded in fan-out EMC with RDLs.
8.4 Hybrid Bonding Bridge.
Unimicron proposed the use of Cu–Cu hybrid bonding for the bridge between chiplets in chiplet design and heterogeneous integration packaging, (Fig. 50). The advantages of this structure are: (1) higher density, (2) better performance, and (3) ordinary package substrate. There are at least two options: one is with C4 bumps on the package substrate, and the other is with C4 bumps on the chiplet wafer.
8.4.1 C4 Bump on Package Substrate.
Figure 51 shows the process flow of a hybrid bonding bridge with C4 bumps on the package substrate. For the bridge wafer, the processing starts off with chemical vapor deposition (CVD) to make a dielectric material such as SiO2 and then it is planarized by an optimized chemical mechanical polishing (CMP) process to make the Cu dishing. Then, the bridge wafer is diced into individual chips (still on the blue tape of the wafer) after application of a protective coating layer on the wafer surfaces to prevent any particles and contaminants that may cause interface voids during the subsequent bonding process. These steps are followed by activating the bonding surface by using plasma and hydration processes for better hydrophilicity and a higher density of a hydroxyl group on the bonding surface. To process the chiplet wafer, repeat the CVD process for the SiO2, CMP for the Cu dishing, and plasma and hydration of the activation of the bonding surface. Then, pick and place the individual bridge chip on the chiplet wafer and perform the SiO2-to-SiO2 bonding at room temperature. These steps are followed by annealing to achieve covalent bonding between oxide layers and metallic bonding between Cu–Cu contacts and the diffusion of Cu atoms. For the package substrate, the process is to stencil print the solder paste on the substrate and then reflow into C4 solder bumps. For the final assembly, the bridge + chiplets module is picked and placed on the package substrate, then the C4 bumps are reflowed.
8.4.2 C4 Bump on Chiplet Wafer.
Figure 52 shows the process flow of the hybrid bonding bridge with C4 bumps on the chiplet wafer. It can be seen that, compared with the C4 bumps for the package substrate case, the process steps for the bridge wafer and the chiplet wafer are the same up to the bridge-to-chiplet wafer bonding step. After that, the C4 bumps are fabricated by wafer bumping on the chiplet wafer. Then, the chiplet wafer is diced into individual modules (bridge + chiplets with C4 bumps). The final assembly is accomplished by picking and placing the individual module on the package substrate and reflowing the C4 solder bumps.
9 Multiple System and Heterogeneous Integration
Besides the chip partition and chip split there is another group of chiplets design and heterogeneous integration packaging called multiple system and heterogenous integration, which is driven by formfactor and performance [60–62]. There are at least five different kinds of multiple systems and heterogenous integrations as shown in Fig. 53, which will be discussed briefly in Secs. 9.1–9.5.
9.1 Multiple System and Heterogeneous Integration on Package Substrate (2D IC Integration).
Figure 53(a) shows a multiple system and heterogeneous integration on a package substrate (2D IC integration). It can be seen that the multiple system is supported by a high-density package substrate. An example is shown in Fig. 54, where the heterogeneous integration of three chips is supported by a fine metal L/S (2 μm/2 μm) RDL-substrate , Fig. 55(a) shows a TSMC's patent on AiP (antenna-in-package) . Figure 55(b) shows Unimicron's patent on the heterogeneous integration of the RF chip and the baseband chip on a fan-out RDL-substrate, and the patch antenna .
9.2 Multiple System and Heterogeneous Integration on Package Substrate With Thin-Film Layers (2.1D IC Integration).
Figure 53(b) shows a multiple system and heterogeneous integration on a package substrate with thin-film layers . An example is shown in Fig. 56, where the 2 μm/2 μm (L/S) thin-film layers are built directly on top of the build-up package substrate [66,67]. However, because of the flatness of the build-up package substrate the yield loss of manufacturing the thin-film layers is very large.
A new method is shown in Figs. 57 and 58 , where the thin-film layers with a glass temporary carrier, Fig. 57(a), and the build-up package substrate, Fig. 57(b), are built separately, and then they are combined with either PID (photo-imageable dielectric) or ABF (Ajinomoto build-up film), Fig. 58. Vias are drilled and filled (plated) with Cu through the thin-film layers and stopped at the pad of the build-up package substrate.
9.3 Multiple System and Heterogeneous Integration on Package Substrate With TSV-Less Interposer (2.3D IC Integration).
Figure 53(c) shows a multiple system and heterogeneous integration on a hybrid package substrate with a TSV-less interposer (or organic interposer) [60–62]. The structure consists of a multiple system on a hybrid substrate which is a combination of a build-up package substrate [or high-density interconnect (HDI)], solder joints with underfill [69–73], and a fine metal L/S RDL-substrate (or organic interposer). The organic interposer of the hybrid substrate can be fabricated by the fan-out chip-first packaging process [74–79] or the chip-last (or RDL-first) process [80–108].
For hybrid substrate with organic interposer fabricated by fan-out chip-last packaging process [80–108], the organic interposer and the build-up package substrate are fabricated separately. Then, they are combined in two different assembly processes. One is to first bond the chips on the organic interposer, underfilling and EMC (epoxy molding compound) molding, and then assemble the module (chips + organic interposer) on the build-up package substrate [80–101]. For example, Fig. 59 shows Samsung's multiple system and heterogeneous integration on a package substrate with a TSV-less interposer (or organic interposer) [84,85], Fig. 60 shows ASE's [93–95], and Fig. 61 shows TSMC's [86–89].
The other assembly process is first to combine the organic interposer and the build-up package substrate into a hybrid substrate through the solder joints that are enhanced with underfill [102–106] or through the interconnection-layer [107,108]. Then, test the combined substrate and make sure it is a known-good hybrid substrate. Finally, bond the chips on the known-good hybrid substrate. In this case, the yield loss of the hybrid substrate especially the organic interposer is easier to control and smaller. Also, there is very little chance of losing the known-good dies. Furthermore, the logistic is simpler; after receiving the known-good hybrid substrate from the substrate houses, the OSAT (outsourced semiconductor assembly and test) houses just bond the chips/HBMs on the known-good hybrid substrate.
For examples, Fig. 62 shows the heterogeneous integration of two chips on a hybrid substrate with an organic interposer made from PID [102–104], Fig. 63 shows the heterogeneous integration of two chips on a hybrid substrate with an organic interposer made from ABF [105,106], and Fig. 64 shows the heterogeneous integration of three chips on a hybrid substrate made by an interconnect-layer [107,108]. It has been found that (a) the metal lines in the organic interposer made from ABF are flatter than those from PID, and (2) the solder bumps and underfill are replaced by an interconnect-layer made by prepreg with vias filled by conductive paste. Figure 65 shows the heterogeneous integration of the EIC (electrical integrated circuits) and PIC (photonic integrated circuits) on a hybrid substrate .
9.4 Multiple System and Heterogeneous Integration on Package Substrate With Passive TSV-Interposer (2.5D IC Integration).
Figure 53(d) shows a multiple system and heterogeneous integration on a package substrate with a passive TSV interposer [110–233]. 2.5D IC integrations use a through-silicon via (TSV)-interposer to support the SoC and memories such as the high-bandwidth memory (HBM) and then it is attached to a package substrate, Fig. 53(d). The TSV-interposer consists of TSVs and RDLs and is called passive TSV-interposer.
The first papers published on 2.5D IC integration were by Leti [110,111] at ECTC2005 and 2006. On Oct. 20, 2013, Xilinx and TSMC  jointly announced the production release of the Virtex-7 HT family, what the pair claims is the industry's first 2.5D IC integration in production. Since then, AMD shipped their Radeon R9 Fury X GPU , Nvidia shipped their Pascal 100 GPU , Graphcore shipped their intelligence processing unit (IPU) processor (Fig. 66) , Fujitsu shipped their A64FX CPU (Fig. 67) , etc.
9.5 Multiple System and Heterogeneous Integration on Package Substrate With Active TSV-Interposer (3D IC Integration).
Figure 53(e) shows a multiple system and heterogeneous integration on a package substrate with an active TSV interposer [10,18–20, 30,31,233], which is, besides TSVs and RDLs, with CMOS devices. For example, Intel's Foveros (Figs. 16–19) [18–20], Leti/STMicroelectronics' INTACT (Fig. 68) [30,31], and heterogeneous integration of EIC on PIC with TSVs (Fig. 69) .
Figure 70 shows schematically a large-body-sized glass-based interposer for high-performance computing by George Institute of Technology (GIT) . It can be seen that; (a) the glass interposer with TGVs is supporting chiplets as well as active routers and passive components, and (b) there are RDLs on the active interposer's topside and bottom-side. Also, the electrical performance (insertion loss per unit length) for different traces on glass interposer is better than that on silicon. A cross section of the sample is shown in the middle of Fig. 70. It can be seen that a 100 μm-thick die embedded in the glass cavity is connected to the chiplet (not shown) on top of the TGV-interposer with RDLs.
10 Potential Research Topics
10.1 Interconnection Technology Between Chiplets and Bridge.
For chip partition and chip split, the interface (bridge) between chiplets is one of the most important elements in chiplet design and heterogeneous integration packaging. Currently, the most used interconnect technology between the bridge and chiplets is microbump (Cu-pillar + solder cap) as shown in Fig. 71. A potential research topic is: “what is the interconnect technology between the bridge and chiplets, so the system will achieve better performance, higher density, simpler package substrate, and lower cost?”
10.2 Structural Design and Material Selection of Multiple System and Heterogeneous Integration With Very Large Package Substrate.
The package substrate for multiple system and heterogeneous integration packaging is getting larger and larger. For example, Fig. 72(a) shows the one (85 mm x 85 mm) by Samsung  and Fig. 72(b) shows the one (91 mm x 91 mm) by MediaTek . Assembly issues such as warpages, stretch or open solder solders, etc. exist. Thus, optimal structural design and its material selection are of utmost importance.
10.3 Frontend Hybrid Bonding of Chiplets Before Heterogeneous Integration Packaging.
As mentioned in Secs. 3.1 and 3.2 and Figs. 1(a) and 1(b), frontend integration of some of the chiplets (before package heterogeneous integration) can yield a smaller package size and a better performance . Thus, it is a very good R&D topic. Figure 73 shows the example of Cu–Cu hybrid bonding between some chiplets before they are attached to the organic interposer or the TSV interposerk.
11 Summary and Recommendations
Some important results and recommendations are summarized as follows:
SoCs with chip scaling are and will be here to stay. However only a handful of companies such as Apple, Samsung, Intel, AMD, Nvidia, Huawei, Google can afford them at finer feature size (advanced nodes). Chiplet design and heterogeneous integration packaging provide alternatives (options) to SoCs, especially for advanced nodes, which most companies cannot afford.
Chiplet design such as chip partition and chip split are driven by semiconductor manufacturing yield and cost. Examples such as those given by Xilinx/TSMC, AMD/TSMC, and Intel have been presented.
Chiplet design and heterogeneous integration packaging is no-good (the opportunity) for packaging: (a) increase package size and package complexity, (b) increase packaging efforts such as bridge design, fabrication, and assembly, and (c) increase packaging cost.
In general, the semiconductor cost is a few times the packaging cost, therefore, the savings that can be achieved with chiplet design and heterogeneous integration packaging are worth pursuing.
Interface (bridge) is the most important element of chiplets design and heterogeneous integration packaging. Bridges embedded in (a) package substrate such as those given be Intel and IBM, and (b) fan-out EMC with RDLs such as those given by Applied Materials, TSMC, Unimicron, ASE, Amkor, SPIL, and IME have been briefly presented.
A new interconnect between the bridge and the chiplets with hybrid bonding technology has been proposed. Its advantages are higher density (finer pitch), better performance, less process steps, simpler package substrate, and lower cost.
Multiple systems and heterogeneous integration such as 2D, 2.1D, 2.3D, 2.5D, and 3D IC integration are driven by formfactor and performance. Examples such as those given by TSMC, Shinko, Samsung, ASE, Graphcore, Fujitsu, Leti, STMicroelectronics, and Unimicron have been presented.
One of the trends in chiplet design and heterogeneous integration packaging is to develop new interconnect method between the bridge and the chiplets such that higher performance, finer pitch, higher density, simpler package substrate, and lower cost can be achieved.
One of the trends in chiplet design and heterogeneous integration packaging is to optimal design and material selection of multiple system and heterogeneous integration structure with very large package substrate (∼100 mm × 100 mm).
One of the trends in chiplet design and heterogeneous integration packaging is to have frontend integration of some of the chiplets by Cu–Cu hybrid bonding before heterogeneous integration packaging. This will lead to smaller package size, better performance, and reduce the number of bridges.
In order to promote/popular the chiplet design and heterogeneous integration packaging, standards are necessary! The DARPA CHIPS and UCIe are heading in the right direction.
EDA (electronic design automation) tools for automating system splitting and partitioning and design are desperately needed for complex chiplet design and heterogeneous integration packaging.
The author would like to thank all the authors of their papers cited in this study for their contributions to chiplet design and heterogenous integration packaging.
Data Availability Statement
Data provided by a third party listed in Acknowledgment section.