Source author record

Xuan Yang

Xuan Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Distributed, Parallel, and Cluster Computing astro-ph.HE Computation and Language eess.IV Hardware Architecture Neural and Evolutionary Computing physics.med-ph physics.soc-ph Social and Information Networks Software Engineering Tissues and Organs

Catalog footprint

What is connected

10works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Building Digital Twins of Different Human Organs for Personalized Healthcare

Digital twins are virtual replicas of physical entities and are poised to transform personalized medicine through the real-time simulation and prediction of human physiology. Translating this paradigm from engineering to biomedicine requires overcoming profound challenges, including anatomical variability, multi-scale biological processes, and the integration of multi-physics phenomena. This survey systematically reviews methodologies for building digital twins of human organs, structured around a pipeline decoupled into anatomical twinning (capturing patient-specific geometry and structure) and functional twinning (simulating multi-scale physiology from cellular to organ-level function). We categorize approaches both by organ-specific properties and by technical paradigm, with particular emphasis on multi-scale and multi-physics integration. A key focus is the role of artificial intelligence (AI), especially physics-informed AI, in enhancing model fidelity, scalability, and personalization. Furthermore, we discuss the critical challenges of clinical validation and translational pathways. This study not only charts a roadmap for overcoming current bottlenecks in single-organ twins but also outlines the promising, albeit ambitious, future of interconnected multi-organ digital twins for whole-body precision healthcare.

preprint2026arXiv

NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

Recent advancements in generative AI facilitate large-scale synthetic data generation for model evaluation. However, without targeted approaches, these datasets often lack the sociotechnical nuance required for sensitive domains. We introduce NodeSynth, an evidence-grounded methodology that generates socially relevant synthetic queries by leveraging a fine-tuned taxonomy generator (TaG) anchored in real-world evidence. Evaluated against four mainstream LLMs (e.g., Claude 4.5 Haiku), NodeSynth elicited failure rates up to five times higher than human-authored benchmarks. Ablation studies confirm that our granular taxonomic expansion significantly drives these failure rates, while independent validation reveals critical deficiencies in prominent guard models (e.g., Llama-Guard-3). We open-source our end-to-end research prototype and datasets to enable scalable, high-stakes model evaluation and targeted safety interventions (https://github.com/google-research/nodesynth).

preprint2022arXiv

On Label Granularity and Object Localization

Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.

preprint2022arXiv

When Does Contrastive Visual Representation Learning Work?

Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on ImageNet are now relatively well understood, the field still lacks widely accepted best practices for replicating this success on other datasets. As a first step in this direction, we study contrastive self-supervised learning on four diverse large-scale datasets. By looking through the lenses of data quantity, data domain, data quality, and task granularity, we provide new insights into the necessary conditions for successful self-supervised learning. Our key findings include observations such as: (i) the benefit of additional pretraining data beyond 500k images is modest, (ii) adding pretraining images from another domain does not lead to more general representations, (iii) corrupted pretraining images have a disparate impact on supervised and self-supervised pretraining, and (iv) contrastive learning lags far behind supervised learning on fine-grained visual classification tasks.

preprint2022arXiv

Who is next: rising star prediction via diffusion of user interest in social networks

Finding items with potential to increase sales is of great importance in online market. In this paper, we propose to study this novel and practical problem: rising star prediction. We call these potential items Rising Star, which implies their ability to rise from low-turnover items to best-sellers in the future. Rising stars can be used to help with unfair recommendation in e-commerce platform, balance supply and demand to benefit the retailers and allocate marketing resources rationally. Although the study of rising star can bring great benefits, it also poses challenges to us. The sales trend of rising star fluctuates sharply in the short-term and exhibits more contingency caused by some external events (e.g., COVID-19 caused increasing purchase of the face mask) than other items, which cannot be solved by existing sales prediction methods. To address above challenges, in this paper, we observe that the presence of rising stars is closely correlated with the early diffusion of user interest in social networks, which is validated in the case of Taocode (an intermediary that diffuses user interest in Taobao). Thus, we propose a novel framework, RiseNet, to incorporate the user interest diffusion process with the item dynamic features to effectively predict rising stars. Specifically, we adopt a coupled mechanism to capture the dynamic interplay between items and user interest, and a special designed GNN based framework to quantify user interest. Our experimental results on large-scale real-world datasets provided by Taobao demonstrate the effectiveness of our proposed framework.

preprint2020arXiv

Constraining Einstein's Equivalence Principle With Multi-Wavelength Observations of Polarized Blazars

In this paper, we present a novel method to test the Einstein's Equivalence Principle (EEP) using (simultaneous) multi-wavelength radio observations of polarized blazars. We analyze simultaneous multi-wavelength polarization observations of 3C 279 at 22, 43, and 86 GHz obtained by two antennas of the Korean VLBI Network. We obtained 15 groups of polarization data, and applied the Metropolis-Hastings Markov Chain (MHMC) to simulate the parameters when considering the EEP effect and the simplest form of Faraday rotation (single external Faraday screen). The final results show the constraint of the parameterized post-Newtonian (PPN) parameter $γ$ discrepancy as $Δγ_{p} = (1.91\pm0.34)\times10^{-20}$. However, the single external Faraday screen is an oversimplification for blazars because there are numerous observations show complex Faraday rotation behavior for blazars due to internal/external Faraday dispersion, beam depolarization, etc. The value $Δγ_{p}$ results of this paper can only be considered as upper limits. Only if all other effects are revealed and considered, should the result be taken as a direct measurement of the violation of the EEP.

preprint2020arXiv

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.

preprint2016arXiv

A Systematic Approach to Blocking Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations. Most implementations heuristically block the computation to deal with the large data sizes and high data reuse of CNNs. This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-like loop nests. Using this model we automatically derive optimized blockings for common networks that improve the energy efficiency of custom hardware implementations by up to an order of magnitude. Compared to traditional CNN CPU implementations based on highly-tuned, hand-optimized BLAS libraries,our x86 programs implementing the optimal blocking reduce the number of memory accesses by up to 90%.

preprint2016arXiv

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads in two precisions, fabricated in 28nm UTBB FDSOI. Each unit's parameters, e.g pipeline stages, booth encoding etc., were optimized to yield 1.42ns latency at 110GLOPS/W (SP) and 1.39ns latency at 36GFLOPS/W (DP). At 100% activity, body-bias control improves the energy efficiency by about 20%; at 10% activity this saving is almost 2x. Keywords: FPU, energy efficiency, hardware generator, SOI

preprint2016arXiv

Programming Heterogeneous Systems from an Image Processing DSL

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates. We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6x higher performance and 8x lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5x higher performance with 12x lower energy compared to the K1's 192-core GPU.

Xuan Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Building Digital Twins of Different Human Organs for Personalized Healthcare

NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

On Label Granularity and Object Localization

When Does Contrastive Visual Representation Learning Work?

Who is next: rising star prediction via diffusion of user interest in social networks

Constraining Einstein's Equivalence Principle With Multi-Wavelength Observations of Polarized Blazars

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

A Systematic Approach to Blocking Convolutional Neural Networks

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

Programming Heterogeneous Systems from an Image Processing DSL