Design for testability (DFT) embeds testable features into an integrated circuit (IC) during design, while silicon bring-up initiates chip evaluation and debugging. Streamlining these sequential processes minimizes design cycles and shortens time-to-market (TTM) for advanced artificial intelligence (AI) accelerators.
This article explores the complexities of AI chip design and outlines strategies for optimizing DFT and silicon bring-up. Key approaches include hierarchical and core group-level DFT implementations, addressing gate-level DFT limitations, and reducing interdependency between DFT and automated test equipment (ATE). These methodologies enable faster, more efficient workflows for designing AI accelerators.
Understanding AI design complexity
AI and machine learning (ML) applications in data centers and at the intelligent edge drive demand for high-performance accelerators. Advanced driver-assistance systems (ADAS), medical research, and Industry 4.0 infrastructure rely on AI accelerators to process vast amounts of data and enable real-time decision-making. Evolving large language models (LLMs), a subset of generative AI (GenAI) models like ChatGPT, Claude, Gemini, and Grok, require massive parallel processing to handle increasing query complexity and workloads. Similarly, high-performance AI accelerators and training, inference, and incremental learning processes are critical for deep reinforcement learning (DRL) tasks.
AI accelerators span various compute architectures, including graphics processing units (GPUs), central processing units (CPUs), field-programmable gate arrays (FPGAs), neural processing units (NPUs), and application-specific integrated circuits (ASICs). Many of these accelerators implement sophisticated designs with billions of logic gates, thousands — or even millions — of cores, and gigabytes of memory.
Increasing design complexity continues to challenge TTM goals, prompting semiconductor engineering teams to optimize DFT and silicon bring-up workflows by:
- Leveraging the uniformity of specific AI chip architectures.
- Shifting-left DFT processes to earlier stages of the design cycle
- Eliminating interdependency and iterative cycles between DFT and ATE.
Accelerating DFT with hierarchical methodologies
Hierarchical methodologies enable efficient DFT by structuring design processes across multiple levels of chip architecture. These methodologies are crucial for high-performance AI chips with many replicated (identical) cores. Hierarchical DFT implementations can deliver up to 10x faster automated test pattern generation (ATPG), significantly accelerate silicon bring-up, debugging, and characterization, and achieve 2x pattern reduction.
A hierarchical approach allows engineering teams to execute all DFT tasks — such as test insertion, test pattern generation, and verification — on a single primary reference core. The finalized, signed-off core is automatically replicated to complete the chip-level DFT implementation. Some electronic design automation (EDA) tools support DFT sign-off at any hierarchical level, from individual cores to the full chip. This capability streamlines verification minimizes test data volume and reduces design cycles.
In AI chip DFT hierarchies, a core is instantiated multiple times within a block (supertile), with these blocks further replicated at the chip level (Figure 1). To further optimize hierarchical DFT and accelerate TTM, EDA tools can leverage packetized scan data architectures with streaming scan networks (SSNs). This configuration connects top-level blocks to a shared SSN bus that efficiently delivers packetized scan data. Configuration data is distributed via an IJTAG network, while embedded IP generates DFT signals locally, enabling each block to execute tasks independently.
Balancing DFT complexity in AI chip design
Determining the optimal DFT hierarchy for AI chips involves balancing area overhead with ATPG runtime. Implementing DFT at the individual core level increases area utilization with duplicated logic, such as isolation wrappers, compression modules, and memory-built-in self-test (BIST) controllers. Conversely, chip-level DFT typically leads to extended ATPG runtimes, higher memory requirements, and layout challenges.
Defining core groups or super cores provides a balanced solution (Figure 2) by consolidating DFT logic at an intermediate level. This approach streamlines test pattern delivery, reduces routing complexity, and improves development time and area overhead trade-offs. Memory BIST controllers can be shared across multiple cores to further reduce area overhead, optimizing memory access buses and interfaces while mitigating routing and timing challenges. A single-memory BIST controller can manage multiple memories across several cores, improving resource utilization and simplifying design complexity.
Semiconductor companies can implement these hierarchical methodologies and support EDA tools for AI chips with 2.5D, 3D, and 5.5D chipset-based multi-die architectures. These chips maintain a streamlined architecture by leveraging the IEEE 1838 standard and flexible parallel ports (FPP), with SSN seamlessly extending to support multi-die implementations. AI design teams can use EDA hierarchical failure diagnosis tools post-manufacture to identify and resolve defects, ensuring high-quality yields and reliable production.
Overcoming limitations of gate-level DFT
DFT logic can be inserted during or after synthesis at the gate-level design stage. For AI accelerator design, this approach introduces two significant limitations:
- Gate-level designs are considerably larger than register-transfer level (RTL), leading to extended simulation and debugging RTL compile and regression debug runtimes, which are approximately four and 20 times faster than their gate-level counterparts.
- Changes to DFT logic or configuration require a full synthesis iteration before verification, further delaying the design process.
Repeated synthesis, simulation, and debugging cycles introduce delays for large, complex AI chips. Advanced EDA tools mitigate this by enabling DFT logic insertion at the RTL stage, where changes can be verified and debugged without synthesis. This approach supports early I/O and floor planning, streamlines development, and shortens design cycles.
RTL-level DFT insertion improves testability by identifying and addressing issues earlier in the design flow. Automated testability checks and fixes improve test quality, helping designers achieve higher accuracy in less time. Resolving testability issues at RTL also optimizes ATPG, enabling faster and more efficient workflows.
Reducing DFT-ATE domain interdependency
Many design workflows involve iterative exchanges between DFT and ATE domains. These iterations — spanning test pattern debugging, performance characterization, test parameter optimization, and test scheduling — rely on collaboration between DFT and test engineers, extending the bring-up process and increasing error risks.
Evaluating intellectual property (IP) during silicon bring-up for AI chips further complicates EDA workflows (Figure 3) with steep learning curves and high error rates that extend cycle times. Advanced EDA tools enable DFT engineers to independently manage the silicon bring-up process while allowing testing teams to diagnose flop-level and net-level issues. These tools streamline efficiency by supporting silicon bring-up in bench-top environments without costly ATE equipment, accelerating test pattern failure diagnosis, and reducing design cycles.
Conclusion
AI and ML applications in data centers and at the intelligent edge drive demand for faster, more efficient accelerators. These accelerators feature sophisticated designs with billions of logic gates and thousands — or even millions — of cores. To minimize design cycles and shorten TTM, engineering teams employ strategies such as hierarchical and core group-level DFT implementations, balancing DFT complexity, addressing gate-level DFT limitations, and reducing DFT-ATE domain interdependency. Together, these methodologies streamline workflows and drive the efficient development of high-performance AI chips.
Leave a Reply