Source author record

Bowen Zheng

Bowen Zheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision physics.optics Information Theory math.IT math.NA Numerical Analysis Robotics

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A New Construction Structure on MISO Coded Caching with Linear Subpacketization: Half-Sum Disjoint Packing

In the $(L,K,M,N)$ cache-aided multiple-input single-output (MISO) broadcast channel (BC) system, the server is equipped with $L$ antennas and communicates with $K$ single-antenna users through a wireless broadcast channel where the server has a library containing $N$ files, and each user is equipped with a cache of size $M$ files. Under the constraints of uncoded placement and one-shot linear delivery strategies, many schemes achieve the maximum sum Degree-of-Freedom (sum-DoF). However, for general parameters $L$, $M$, and $N$, their subpacketizations increase exponentially with the number of users. We aim to design a MISO coded caching scheme that achieves a large sum-DoF with low subpacketization $F$. An interesting combinatorial structure, called the multiple-antenna placement delivery array (MAPDA), can be used to generate MISO coded caching schemes under these two strategies; moreover, all existing schemes with these strategies can be represented by the corresponding MAPDAs. In this paper, we study the case with $F=K$ (i.e., $F$ grows linearly with $K$) by investigating MAPDAs. Specifically, based on the framework of Latin squares, we transform the design of MAPDA with $F=K$ into the construction of a combinatorial structure called the $L$-half-sum disjoint packing (HSDP). It is worth noting that a $1$-HSDP is exactly the concept of NHSDP, which is used to generate the shared-link coded caching scheme with $F=K$. By constructing $L$-HSDPs, we obtain a class of new schemes with $F=K$. Finally, theoretical and numerical analyses show that our $L$-HSDP schemes significantly reduce subpacketization compared to existing schemes with exponential subpacketization, while only slightly sacrificing sum-DoF, and achieve both a higher sum-DoF and lower subpacketization than the existing schemes with linear subpacketization.

preprint2026arXiv

Autoregressive Visual Generation Needs a Prologue

In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further formalize from an ELBO perspective. On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision. Interestingly, driven only by AR gradients, prologue tokens exhibit emergent semantic structure: linear probing on 16 prologue tokens reaches 35.88% Top-1, far above the 23.71% of the first 16 tokens from a standard tokenizer; resampling with fixed prologue tokens preserves a similar high-level semantic layout. Our results suggest a new direction: generation quality can be improved by introducing a separate learned generative representation while leaving the original representation intact.

preprint2026arXiv

Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow

Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will later generate its tokens. As a result, the learned tokens may preserve image information well but still be difficult for an autoregressive (AR) prior to predict from left to right. We analyze this mismatch using Tripartite Variational Consistency (TVC), which decomposes latent-variable learning into three consistency conditions: conditional-likelihood consistency, prior consistency, and posterior consistency. TVC shows that two-stage training preserves the reconstruction side but leaves prior consistency outside the tokenizer objective: the overall token distribution is fixed before the AR prior participates in training. Motivated by this view, we add a distribution-level prior-matching signal during tokenizer training, while keeping the reconstruction objective unchanged. We optimize this signal with a Wasserstein-gradient-flow update. For hard categorical tokens, the update reduces to a token-level contrast between an auxiliary AR model that tracks the tokenizer's current token distribution and the target AR prior. It requires only forward passes through the two AR models and does not backpropagate through either of them. The resulting tokenizer, wAR-Tok, reduces AR loss and improves generation FID on CIFAR-10 and ImageNet at comparable reconstruction quality.

preprint2026arXiv

Machine Learning-Augmented Acceleration of Iterative Ptychographic Reconstruction

Iterative ptychographic reconstruction algorithms are widely used for coherent diffractive imaging but can exhibit slow convergence under realistic experimental conditions. We propose a machine learning-augmented approach that accelerates iterative ptychographic reconstruction by introducing a learned fast-forward operator applied during reconstruction. Following an initial warm-up using standard iterations, the fast-forward operator advances the reconstruction toward a more converged state, after which conventional iterative updates are resumed. This strategy preserves the physical consistency and flexibility of established ptychographic solvers while reducing the number of iterations required for convergence. The model is trained on diverse ptychographic datasets and evaluated on experimental data acquired in a different year, demonstrating robustness and temporal generalization. Compared with conventional iterative solvers, the machine learning-augmented method achieves comparable reconstruction quality while converging faster in terms of Poisson negative log-likelihood, yielding over a two-fold reduction in wall-clock time. The approach has been integrated into an existing reconstruction pipeline and deployed in production at a synchrotron beamline, demonstrating practicality for real-time experimental operation.

preprint2026arXiv

Neural Green's Function Accelerated Iterative Methods for Solving Indefinite Boundary Value Problems

Neural operators, which learn mappings between the function spaces, have been applied to solve boundary value problems in various ways, including learning mappings from the space of the forcing terms to the space of the solutions with the substantial requirements of data pairs. In this work, we present a data-free neural operator integrated with physics, which learns the Green kernel directly. Our method proceeds in three steps: 1. The governing equations for the Green's function are reformulated into an interface problem, where the delta Dirac function is removed; 2. The interface problem is embedded in a lifted space of higher-dimension to handle the jump in the derivative, but still solved on a two-dimensional surface without additional sampling cost; 3. Deep neural networks are employed to address the curse of dimensionality caused by this lifting operation. The approximate Green's function obtained through our approach is then used to construct preconditioners for the linear systems allowed by its mathematical properties. Furthermore, the spectral bias of it revealed through both theoretical analysis and numerical validation contrasts with the smoothing effects of traditional iterative solvers, which motivates us to propose a hybrid iterative method that combines these two solvers. Numerical experiments demonstrate the effectiveness of our approximate Green's function in accelerating iterative methods, proving fast convergence for solving indefinite problems even involving discontinuous coefficients.

preprint2026arXiv

Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation

Most discrete visual tokenizers rely on a default design: every position in the sequence shares the same codebook. Researchers try to scale the codebook size $K$ to get better reconstruction performance. Such a constant-codebook design hits a fundamental information-theoretic limit. We observe that the per-position conditional entropy of the training set decays so quickly along the sequence that, after a few positions, the conditional distribution becomes essentially deterministic. On ImageNet with $K=16384$, this happens within only 2 out of 256 positions, turning the remaining 254 into a memorization problem. We call this phenomenon the Entropy Cliff and formalize it with a simple expression: $t^{*} = \lceil \log_2 N / \log_2 K \rceil$. Interestingly, this phenomenon is not observed in language, as its natural structure keeps the effective entropy per position well below the codebook capacity. To address this, we propose Variable Codebook Size Quantization (VCQ), where the codebook size $K_t$ grows monotonically along the sequence from $K_{\min}=2$ to $K_{\max}$, leaving the loss function, parameter count, and AR training procedure unchanged. With a vanilla autoregressive Transformer and standard next-token prediction, a base version of VCQ reduces gFID w/o CFG from 27.98 to 14.80 on ImageNet $256\times256$ over the baseline. Scaled up, it reaches gFID 1.71 with 684M autoregressive parameters, without any extra training techniques such as semantic regularization or causal alignment. The extreme information bottleneck at $K_{\min}=2$ naturally induces a coarse-to-fine semantic hierarchy: a linear probe on only the first 10 tokens reaches 43.8% top-1 accuracy on ImageNet, compared to 27.1% for uniform codebooks. Ultimately, these results show that what matters is not only the total capacity of the codebook, but also how that capacity is distributed and organized.

preprint2022arXiv

Physics-Aware Safety-Assured Design of Hierarchical Neural Network based Planner

Neural networks have shown great promises in planning, control, and general decision making for learning-enabled cyber-physical systems (LE-CPSs), especially in improving performance under complex scenarios. However, it is very challenging to formally analyze the behavior of neural network based planners for ensuring system safety, which significantly impedes their applications in safety-critical domains such as autonomous driving. In this work, we propose a hierarchical neural network based planner that analyzes the underlying physical scenarios of the system and learns a system-level behavior planning scheme with multiple scenario-specific motion-planning strategies. We then develop an efficient verification method that incorporates overapproximation of the system state reachable set and novel partition and union techniques for formally ensuring system safety under our physics-aware planner. With theoretical analysis, we show that considering the different physical scenarios and building a hierarchical planner based on such analysis may improve system safety and verifiability. We also empirically demonstrate the effectiveness of our approach and its advantage over other baselines in practical case studies of unprotected left turn and highway merging, two common challenging safety-critical tasks in autonomous driving.

preprint2021arXiv

Deep Convolutional Neural Networks to Predict Mutual Coupling Effects in Metasurfaces

Metasurfaces have provided a novel and promising platform for the realization of compact and large-scale optical devices. The conventional metasurface design approach assumes periodic boundary conditions for each element, which is inaccurate in most cases since the near-field coupling effects between elements will change when surrounded by non-identical structures. In this paper, we propose a deep learning approach to predict the actual electromagnetic (EM) responses of each target meta-atom placed in a large array with near-field coupling effects taken into account. The predicting neural network takes the physical specifications of the target meta-atom and its neighbors as input, and calculates its phase and amplitude in milliseconds. This approach can be applied to explain metasurfaces' performance deterioration caused by mutual coupling and further used to optimize their efficiencies once combined with optimization algorithms. To demonstrate the efficacy of this methodology, we obtain large improvements in efficiency for a beam deflector and a metalens over the conventional design approach. Moreover, we show the correlations between a metasurface's performance and its design errors caused by mutual coupling are not bound to certain specifications (materials, shapes, etc.). As such, we envision that this approach can be readily applied to explore the mutual coupling effects and improve the performance of various metasurface designs.

preprint2020arXiv

A Freeform Dielectric Metasurface Modeling Approach Based on Deep Neural Networks

Metasurfaces have shown promising potentials in shaping optical wavefronts while remaining compact compared to bulky geometric optics devices. Design of meta-atoms, the fundamental building blocks of metasurfaces, relies on trial-and-error method to achieve target electromagnetic responses. This process includes the characterization of an enormous amount of different meta-atom designs with different physical and geometric parameters, which normally demands huge computational resources. In this paper, a deep learning-based metasurface/meta-atom modeling approach is introduced to significantly reduce the characterization time while maintaining accuracy. Based on a convolutional neural network (CNN) structure, the proposed deep learning network is able to model meta-atoms with free-form 2D patterns and different lattice sizes, material refractive indexes and thicknesses. Moreover, the presented approach features the capability to predict meta-atoms' wide spectrum responses in the timescale of milliseconds, which makes it attractive for applications such as fast meta-atom/metasurface on-demand designs and optimizations.

preprint2020arXiv

A Topological Nomenclature for 3D Shape Analysis in Connectomics

One of the essential tasks in connectomics is the morphology analysis of neurons and organelles like mitochondria to shed light on their biological properties. However, these biological objects often have tangled parts or complex branching patterns, which make it hard to abstract, categorize, and manipulate their morphology. In this paper, we develop a novel topological nomenclature system to name these objects like the appellation for chemical compounds to promote neuroscience analysis based on their skeletal structures. We first convert the volumetric representation into the topology-preserving reduced graph to untangle the objects. Next, we develop nomenclature rules for pyramidal neurons and mitochondria from the reduced graph and finally learn the feature embedding for shape manipulation. In ablation studies, we quantitatively show that graphs generated by our proposed method align with the perception of experts. On 3D shape retrieval and decomposition tasks, we qualitatively demonstrate that the encoded topological nomenclature features achieve better results than state-of-the-art shape descriptors. To advance neuroscience, we will release a 3D segmentation dataset of mitochondria and pyramidal neurons reconstructed from a 100um cube electron microscopy volume with their reduced graph and topological nomenclature annotations. Code is publicly available at https://github.com/donglaiw/ibexHelper.

preprint2020arXiv

Multifunctional Metasurface Design with a Generative Adversarial Network

Metasurfaces have enabled precise electromagnetic wave manipulation with strong potential to obtain unprecedented functionalities and multifunctional behavior in flat optical devices. These advantages in precision and functionality come at the cost of tremendous difficulty in finding individual meta-atom structures based on specific requirements (commonly formulated in terms of electromagnetic responses), which makes the design of multifunctional metasurfaces a key challenge in this field. In this paper, we present a Generative Adversarial Networks (GAN) that can tackle this problem and generate meta-atom/metasurface designs to meet multifunctional design goals. Unlike conventional trial-and-error or iterative optimization design methods, this new methodology produces on-demand free-form structures involving only a single design iteration. More importantly, the network structure and the robust training process are independent of the complexity of design objectives, making this approach ideal for multifunctional device design. Additionally, the ability of the network to generate distinct classes of structures with similar electromagnetic responses but different physical features could provide added latitude to accommodate other considerations such as fabrication constraints and tolerances. We demonstrate the network's ability to produce a variety of multifunctional metasurface designs by presenting a bifocal metalens, a polarization-multiplexed beam deflector, a polarization-multiplexed metalens and a polarization-independent metalens.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint