Source author record

Ru Huang

Ru Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.app-ph Distributed, Parallel, and Cluster Computing Hardware Architecture Social and Information Networks

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ATSim3.5D: A Multiscale Thermal Simulator for 3.5D-IC Systems based on Nonlinear Multigrid Method

To resolve the rising temperatures in 3.5D-ICs, a thermal-aware design flow becomes increasingly crucial, necessitating an accurate and efficient thermal simulation tool. However, previous tools struggle to handle the unique heterogeneous multiscale structures in 3.5D-ICs and the nonlinear thermal effects caused by high temperatures. In this work, we present a multiscale thermal simulator for 3.5D-ICs. We propose a hybrid tree structure to generate multilevel grids and capture the multiscale features and employ the nonlinear multigrid method for quick solving. Compared to ANSYS Icepak, it exhibits high accuracy (mean absolute relative error <1%, max error $<\SI{2}{\degreeCelsius}$), and efficiency ($80\times$ acceleration), delivering a powerful means to evaluate and refine thermal designs.

preprint2026arXiv

ATSim3D: Towards Accurate Thermal Simulator for Heterogeneous 3D-IC Systems Considering Nonlinear Leakage and Conductivity

Thermal simulation plays a fundamental role in the thermal design of integrated circuits, especially 3D ICs. Current simulators require significant runtime for high-resolution simulation, and dismiss the complex nonlinear thermal effects, such as nonlinear thermal conductivity and leakage power. To address these issues, we propose ATSim3D, a thermal simulator for simulating the steady-state temperature profile of nonlinear and heterogeneous 3D IC systems. We utilize the global-local approach, combining a compact thermal model at the global level, and a finite volume method at the local level. We tackle the nonlinear effects with Kirchhoff transformation and iteration. ATSim3D enables local-level parallelization that helps achieve an average speedup of 40x compared to COMSOL, with a relative error <3% and a state-of-the-art resolution of 4096 x 4096, holding promise for enhancing thermal-aware design in 3D ICs.

preprint2022arXiv

CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)

The electronic design automation (EDA) community has been actively exploring machine learning (ML) for very large-scale integrated computer-aided design (VLSI CAD). Many studies explored learning-based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building ML models usually requires a large amount of data, most studies can only generate small internal datasets for validation because of the lack of large public datasets. In this essay, we present the first open-source dataset called CircuitNet for ML tasks in VLSI CAD.

preprint2021arXiv

Generating a Doppelganger Graph: Resembling but Distinct

Deep generative models, since their inception, have become increasingly more capable of generating novel and perceptually realistic signals (e.g., images and sound waves). With the emergence of deep models for graph structured data, natural interests seek extensions of these generative models for graphs. Successful extensions were seen recently in the case of learning from a collection of graphs (e.g., protein data banks), but the learning from a single graph has been largely under explored. The latter case, however, is important in practice. For example, graphs in financial and healthcare systems contain so much confidential information that their public accessibility is nearly impossible, but open science in these fields can only advance when similar data are available for benchmarking. In this work, we propose an approach to generating a doppelganger graph that resembles a given one in many graph properties but nonetheless can hardly be used to reverse engineer the original one, in the sense of a near zero edge overlap. The approach is an orchestration of graph representation learning, generative adversarial networks, and graph realization algorithms. Through comparison with several graph generative models (either parameterized by neural networks or not), we demonstrate that our result barely reproduces the given graph but closely matches its properties. We further show that downstream tasks, such as node classification, on the generated graphs reach similar performance to the use of the original ones.

preprint2020arXiv

DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging

The state-of-the-art deep learning algorithms rely on distributed training systems to tackle the increasing sizes of models and training data sets. Minibatch stochastic gradient descent (SGD) algorithm requires workers to halt forward/back propagations, to wait for gradients aggregated from all workers, and to receive weight updates before the next batch of tasks. This synchronous execution model exposes the overheads of gradient/weight communication among a large number of workers in a distributed training system. We propose a new SGD algorithm, DaSGD (Local SGD with Delayed Averaging), which parallelizes SGD and forward/back propagations to hide 100% of the communication overhead. By adjusting the gradient update scheme, this algorithm uses hardware resources more efficiently and reduces the reliance on the low-latency and high-throughput inter-connects. The theoretical analysis and the experimental results show its convergence rate O(1/sqrt(K)), the same as SGD. The performance evaluation demonstrates it enables a linear performance scale-up with the cluster size.

preprint2014arXiv

Self-Aligned Double Patterning Friendly Configuration for Standard Cell Library Considering Placement

Self-aligned double patterning (SADP) has become a promising technique to push pattern resolution limit to sub-22nm technology node. Although SADP provides good overlay controllability, it encounters many challenges in physical design stages to obtain conflict-free layout decomposition. In this paper, we study the impact on placement by different standard cell layout decomposition strategies. We propose a SADP friendly standard cell configuration which provides pre-coloring results for standard cells. These configurations are brought into the placement stage to help ensure layout decomposability and save the extra effort for solving conflicts in later stages.