Source author record

Yibo Lin

Yibo Lin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Distributed, Parallel, and Cluster Computing Hardware Architecture physics.app-ph Artificial Intelligence

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ATSim3.5D: A Multiscale Thermal Simulator for 3.5D-IC Systems based on Nonlinear Multigrid Method

To resolve the rising temperatures in 3.5D-ICs, a thermal-aware design flow becomes increasingly crucial, necessitating an accurate and efficient thermal simulation tool. However, previous tools struggle to handle the unique heterogeneous multiscale structures in 3.5D-ICs and the nonlinear thermal effects caused by high temperatures. In this work, we present a multiscale thermal simulator for 3.5D-ICs. We propose a hybrid tree structure to generate multilevel grids and capture the multiscale features and employ the nonlinear multigrid method for quick solving. Compared to ANSYS Icepak, it exhibits high accuracy (mean absolute relative error <1%, max error $<\SI{2}{\degreeCelsius}$), and efficiency ($80\times$ acceleration), delivering a powerful means to evaluate and refine thermal designs.

preprint2026arXiv

ATSim3D: Towards Accurate Thermal Simulator for Heterogeneous 3D-IC Systems Considering Nonlinear Leakage and Conductivity

Thermal simulation plays a fundamental role in the thermal design of integrated circuits, especially 3D ICs. Current simulators require significant runtime for high-resolution simulation, and dismiss the complex nonlinear thermal effects, such as nonlinear thermal conductivity and leakage power. To address these issues, we propose ATSim3D, a thermal simulator for simulating the steady-state temperature profile of nonlinear and heterogeneous 3D IC systems. We utilize the global-local approach, combining a compact thermal model at the global level, and a finite volume method at the local level. We tackle the nonlinear effects with Kirchhoff transformation and iteration. ATSim3D enables local-level parallelization that helps achieve an average speedup of 40x compared to COMSOL, with a relative error <3% and a state-of-the-art resolution of 4096 x 4096, holding promise for enhancing thermal-aware design in 3D ICs.

preprint2026arXiv

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge winner and outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x in end-to-end evaluation. Notably, the workflow generalizes beyond our model, boosting Task 1 performance by 128.7x for Gemini-2.5-Pro and 12.4x for GPT-5. Furthermore, with only 60 images for detector adaptation, the method transfers effectively to AMSBench, achieving zero-shot connectivity reasoning on par with GPT-5 and Claude-Sonnet-4 while surpassing the AMS state-of-the-art method Netlistify.

preprint2026arXiv

GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment

Layer assignment is critical for global routing of VLSI circuits. It converts 2D routing paths into 3D routing solutions by determining the proper metal layer for each routing segments to minimize congestion and via count. As different layers have different unit resistance and capacitance, layer assignment also has significant impacts to timing and power. With growing design complexity, it becomes increasingly challenging to simultaneously optimize timing, power, and congestion efficiently. Existing studies are mostly limited to a subset of objectives. In this paper, we propose a GPU-accelerated performance-driven layer assignment framework, GAP-LA, for holistic optimization the aforementioned objectives. Experimental results demonstrate that we can achieve 0.3%-9.9% better worst negative slack (WNS) and 2.0%-5.4% better total negative slack (TNS) while maintaining power and congestion with competitive runtime compared with ISPD 2025 contest winners, especially on designs with up to 12 millions of nets.

preprint2022arXiv

CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)

The electronic design automation (EDA) community has been actively exploring machine learning (ML) for very large-scale integrated computer-aided design (VLSI CAD). Many studies explored learning-based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building ML models usually requires a large amount of data, most studies can only generate small internal datasets for validation because of the lack of large public datasets. In this essay, we present the first open-source dataset called CircuitNet for ML tasks in VLSI CAD.

preprint2022arXiv

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient implementations of heterogeneous decomposition strategies. Our new CPU-GPU programming model allows users to express a problem in a way that adapts to effective separation of concerns and expertise encapsulation. Compared with existing libraries, Heteroflow is more cost-efficient in performance scaling, programming productivity, and solution generality. We have evaluated Heteroflow on two real applications in VLSI design automation and demonstrated the performance scalability across different CPU-GPU numbers and problem sizes. At a particular example of VLSI timing analysis with million-scale tasking, Heteroflow achieved 7.7x runtime speed-up (99 vs 13 minutes) over a baseline on a machine of 40 CPU cores and 4 GPUs.

preprint2022arXiv

LHNN: Lattice Hypergraph Neural Network for VLSI Congestion Prediction

Precise congestion prediction from a placement solution plays a crucial role in circuit placement. This work proposes the lattice hypergraph (LH-graph), a novel graph formulation for circuits, which preserves netlist data during the whole learning process, and enables the congestion information propagated geometrically and topologically. Based on the formulation, we further developed a heterogeneous graph neural network architecture LHNN, jointing the routing demand regression to support the congestion spot classification. LHNN constantly achieves more than 35% improvements compared with U-nets and Pix2Pix on the F1 score. We expect our work shall highlight essential procedures using machine learning for congestion prediction.

preprint2022arXiv

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient for algorithms that only exploit task parallelism in pipeline. As a result, we introduce a new task-parallel pipeline programming framework called Pipeflow. Pipeflow does not design yet another data abstraction but focuses on the pipeline scheduling itself, enabling more efficient implementation of task-parallel pipeline algorithms than existing frameworks. We have evaluated Pipeflow on both micro-benchmarks and real-world applications. As an example, Pipeflow outperforms oneTBB 24% and 10% faster in a VLSI placement and a timing analysis workloads that adopt pipeline parallelism to speed up runtimes, respectively.

preprint2022arXiv

Towards Machine Learning for Placement and Routing in Chip Design: a Methodological Overview

Placement and routing are two indispensable and challenging (NP-hard) tasks in modern chip design flows. Compared with traditional solvers using heuristics or expert-well-designed algorithms, machine learning has shown promising prospects by its data-driven nature, which can be of less reliance on knowledge and priors, and potentially more scalable by its advanced computational paradigms (e.g. deep networks with GPU acceleration). This survey starts with the introduction of basics of placement and routing, with a brief description on classic learning-free solvers. Then we present detailed review on recent advance in machine learning for placement and routing. Finally we discuss the challenges and opportunities for future research.

Yibo Lin

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

ATSim3.5D: A Multiscale Thermal Simulator for 3.5D-IC Systems based on Nonlinear Multigrid Method

ATSim3D: Towards Accurate Thermal Simulator for Heterogeneous 3D-IC Systems Considering Nonlinear Leakage and Conductivity

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment

CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)

Concurrent CPU-GPU Task Programming using Modern C++

LHNN: Lattice Hypergraph Neural Network for VLSI Congestion Prediction

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Towards Machine Learning for Placement and Routing in Chip Design: a Methodological Overview