Source author record

Boyang Zhang

Boyang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.optics Artificial Intelligence quant-ph Hardware Architecture math.OC Methodology physics.med-ph

Catalog footprint

What is connected

10works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

The scaling behavior, in which test performance often improves as model size and data increase, is a central empirical phenomenon in modern deep learning, yet its theoretical basis remains incomplete. In this paper, we study depth expansion in normalized residual networks: starting from a trained model in an old hypothesis class, we insert a new residual block at an intermediate layer and ask when such an expansion can yield a provable improvement in test risk. We develop a unified framework that decomposes this question into representational gain, optimization gain, and generalization transfer. First, under a first-order descent condition near zero initialization, we prove that the expanded hypothesis class contains an auxiliary jumpboard model with strictly smaller population risk than the original model. Second, under norm control tailored to post-normalized residual architectures, we establish a norm-based Rademacher complexity bound for the expanded model class. These ingredients lead to two complementary test-risk guarantees: one route passes through population risk and is tighter when a positive population margin is available, while the other works directly at the train/test level, avoids Hoeffding transfer, and is more robust in degenerate regimes. Together, these results provide a theorem-driven mechanism under which residual depth expansion can improve test performance in normalized residual networks. More broadly, they suggest that scaling is inherently joint: depth creates new improving directions, width enhances the finite-sample observability of weak signals, and data determines whether the statistical cost of expansion can be controlled.

preprint2026arXiv

MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models

Training large-scale Mixture-of-Experts (MoE) models typically requires high-memory, high-bandwidth GPUs (e.g., A100), and their high cost has become a major barrier to large-model training. In contrast, affordable hardware is low-cost but constrained by memory capacity and bandwidth, making it unsuitable for direct LLM training. To address this, we propose MoE-DisCo (Mixture-of-Experts with Disentangled Clustering and Coordination), a staged training framework. MoE-DisCo decomposes the MoE model into multiple dense submodels, each consisting of a shared backbone and a single expert, and partitions the training data into subsets using unsupervised clustering. Each submodel is trained independently and in parallel on its assigned data subset using low-cost devices, without any inter-device communication. Subsequently, all experts are integrated into a complete MoE model and fine-tuned globally for a short period on high-memory, high-bandwidth GPUs. Experiments show that our method matches or even surpasses full-parameter training in performance across multiple downstream tasks, loss function, and perplexity (PPL), while reducing training cost by 47.6 percent to 69.5 percent on Qwen1.5-MoE-2.7B and Llama-MoE-3.5B across different datasets.

preprint2026arXiv

Proximal-Based Generative Modeling for Bayesian Inverse Problems

Score-based diffusion models demonstrate superior performance in generative tasks but encounter fundamental bottlenecks in inverse problems due to the analytical intractability of the time-dependent likelihood score. To bridge this gap, we propose a novel proximal-based generative modeling (PGM) framework that rigorously circumvents explicit likelihood evaluation. Our framework is built upon a theoretical equivalence between Gaussian convolution in diffusion processes and Moreau-Yosida regularization in nonsmooth optimization. This enables a new sampling mechanism driven by the proposed Moreau score, which admits a closed-form expression via proximal operators. Moreover, we introduce Moreau score matching to learn the proximal operators that rely solely on samples drawn from the prior distribution. Theoretically, PGM eliminates the early-stopping bias inherent in the score-based diffusion model and achieves non-asymptotic convergence. Experiments demonstrate that PGM significantly surpasses state-of-the-art methods in reconstruction quality and sampling time.

preprint2024arXiv

Contrastive linear regression

Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the unaffected controls do not have a disease grade or intervention dosage but the affected cases have a disease grade or intervention dosage, as in autism severity, solid tumors stages, polyp sizes, or warfarin dosages. Our contrastive regression model captures shared low-dimensional variation between the predictors in the cases and control groups, and then explains the case-specific response variables through the variance that remains in the predictors after shared variation is removed. We show that, in one single-nucleus RNA sequencing dataset on autism severity in postmortem brain samples from donors with and without autism and in another single-cell RNA sequencing dataset on cellular differentiation in chronic rhinosinusitis with and without nasal polyps, our contrastive linear regression performs feature ranking and identifies biologically-informative predictors associated with response that cannot be identified using other approaches

preprint2016arXiv

Quantum coherence of steered states

Lying at the heart of quantum mechanics, coherence has recently been studied as a key resource in quantum information theory. Quantum steering, a fundamental notion originally considered by Schr{ö}dinger, has also recently received much attention. When Alice and Bob share a correlated quantum system, Alice can perform a local measurement to `steer' Bob's reduced state. We introduce the maximal steered coherence as a measure describing the extent to which steering can remotely create coherence; more precisely, we find the maximal coherence of Bob's steered state in the eigenbasis of his original reduced state, where maximization is performed over all positive-operator valued measurements for Alice. We prove that maximal steered coherence vanishes for quantum-classical states whilst reaching a maximum for pure entangled states with full Schmidt rank. Although invariant under local unitary operations, maximal steered coherence may be increased when Bob performs a channel. For a two-qubit state we find that Bob's channel can increase maximal steered coherence if and only if it is neither unital nor semi-classical, which coincides with the condition for increasing discord. Our results show that the power of steering for coherence generation, though related to discord, is distinct from existing measures of quantum correlation.

preprint2014arXiv

Layout decomposition for triple patterning lithography

As minimum feature size and pitch spacing further decrease, triple patterning lithography (TPL) is a possible 193nm extension along the paradigm of double patterning lithography (DPL). However, there is very little study on TPL layout decomposition. In this paper, we show that TPL layout decomposition is a more difficult problem than that for DPL. We then propose a general integer linear programming formulation for TPL layout decomposition which can simultaneously minimize conflict and stitch numbers. Since ILP has very poor scalability, we propose three acceleration techniques without sacrificing solution quality: independent component computation, layout graph simplification, and bridge computation. For very dense layouts, even with these speedup techniques, ILP formulation may still be too slow. Therefore, we propose a novel vector programming formulation for TPL decomposition, and solve it through effective semidefinite programming (SDP) approximation. Experimental results show that the ILP with acceleration techniques can reduce 82% runtime compared to the baseline ILP. Using SDP based algorithm, the runtime can be further reduced by 42% with some tradeoff in the stitch number (reduced by 7%) and the conflict (9% more). However, for very dense layouts, SDP based algorithm can achieve 140x speed-up even compared with accelerated ILP.

preprint2013arXiv

Multi-spectral near perfect metamaterial absorbers using spatially multiplexed plasmon resonance metal square structures

Near perfect infrared light absorption at multi-spectral wavelengths has been experimentally demonstrated by using multiplexed metal square plasmon resonance structures. Optical power absorption over 95% has been observed in dual-band metamaterial absorbers at two separate wavelengths and optical power absorption over 92.5% has been observed in triple-band metamaterial absorbers at three separate wavelengths. The peak absorption wavelengths are primarily determined by the sizes of the metal squares in the multiplexed structures. Electrical field distributions in the middle of the dielectric spacer layer were calculated at the peak absorption wavelengths. It is shown that strong light absorption corresponds to the local quadrupole plasmon resonance modes in the metamaterial structures.

preprint2012arXiv

A wide-band perfect light absorber at mid-wave infrared using multiplexed metal structures

We experimentally demonstrate a wide band near perfect light absorber in the mid-wave infrared region using multiplexed plasmonic metal structures. The wide band near perfect light absorber is made of two different size gold metal squares multiplexed on a thin dielectric spacing layer on the top of a thick metal layer in each unit cell. We also fabricate regular non-multiplexed structure perfect light absorbers. The multiplexed structure IR absorber absorbs above 98% incident light over a much wider spectral band than the regular non-multiplexed structure perfect light absorbers in the mid-wave IR region.

preprint2012arXiv

Long Lived NMR Signal in Bone

Solids and rigid tissues such as bone, ligaments, and tendons, typically appear dark in magnetic resonance imaging (MRI), which is due to the extremely short-lived proton nuclear magnetic resonance (NMR) signals. This short lifetime is due to strong dipolar interactions between immobilized proton spins, which render it challenging to detect these signals with sufficient resolution and sensitivity. Here we show the possibility of exciting long-lived signals in cortical bone tissue with a signature consistent with that of bound water signals. Contrary to long-standing belief, it is further shown that dipolar coupling networks are an integral requirement for the excitation of these long-lived signals. The use of these signals could enhance the ability to visualize rigid tissues and solid samples with high sensitivity, resolution, and specificity via MRI.

preprint2012arXiv

Wideband Optical Filters with Small Gap Coupled Subwavelength Metal Structures

In this letter, we show that the bandwidth of optical band-stop filters made of subwavelength metal structures can be significantly increased by the strong plasmonic near-field coupling through the corners of the periodic metal squares. The effect of small gap coupling on the spectral bandwidth is investigated by varying the gap size between the metal squares. An equivalent transmission line model is used to fit the transmission and reflection spectra of the metal filters. The transmission line model can characterize well the metal structures with the gap size larger than the near-field decay length. However, it fails to model the transmission and reflection spectra when the gap size reaches the decay range of the near-field in the small gaps.

Boyang Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models

Proximal-Based Generative Modeling for Bayesian Inverse Problems

Contrastive linear regression

Quantum coherence of steered states

Layout decomposition for triple patterning lithography

Multi-spectral near perfect metamaterial absorbers using spatially multiplexed plasmon resonance metal square structures

A wide-band perfect light absorber at mid-wave infrared using multiplexed metal structures

Long Lived NMR Signal in Bone

Wideband Optical Filters with Small Gap Coupled Subwavelength Metal Structures