Source author record

Yiping Lu

Yiping Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA quant-ph Computer Vision math.PR Artificial Intelligence Computation math.OC Numerical Analysis q-fin.MF

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

iffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high computational overhead, or both. We introduce \texttt{URGE}, Unbiased Resampling via Girsanov Estimation, a derivative-free inference-time scaling algorithm that performs path-wise importance reweighting via a Girsanov change of measure. Instead of computing gradient-based particle weights in previous work, \texttt{URGE} attaches a simple multiplicative weight to each simulated trajectory and periodically resamples. No score, no Hessian, and no PDE evaluation is required. We establish an equivalence between path-wise and particle-wise SMC: the Girsanov path weight admits a backward conditional expectation that recovers the previous particle-level weights, guaranteeing that both schemes produce the same unbiased terminal law. Empirically, \texttt{URGE} outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks, achieving better generation quality, while being significantly simpler to implement and fully gradient-free.

preprint2026arXiv

SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate

Diffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high computational overhead, or both. We introduce \texttt{URGE}, Unbiased Resampling via Girsanov Estimation, a derivative-free inference-time scaling algorithm that performs path-wise importance reweighting via a Girsanov change of measure. Instead of computing gradient-based particle weights in previous work, \texttt{URGE} attaches a simple multiplicative weight to each simulated trajectory and periodically resamples. No score, no Hessian, and no PDE evaluation is required. We establish an equivalence between path-wise and particle-wise SMC: the Girsanov path weight admits a backward conditional expectation that recovers the previous particle-level weights, guaranteeing that both schemes produce the same unbiased terminal law. Empirically, \texttt{URGE} outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks, achieving better generation quality, while being significantly simpler to implement and fully gradient-free.

preprint2022arXiv

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.

preprint2022arXiv

An Unconstrained Layer-Peeled Perspective on Neural Collapse

Neural collapse is a highly symmetric geometric pattern of neural networks that emerges during the terminal phase of training, with profound implications on the generalization performance and robustness of the trained networks. To understand how the last-layer features and classifiers exhibit this recently discovered implicit bias, in this paper, we introduce a surrogate model called the unconstrained layer-peeled model (ULPM). We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer. Moreover, we show that the ULPM with the cross-entropy loss has a benign global landscape for its loss function, which allows us to prove that all the critical points are strict saddle points except the global minimizers that exhibit the neural collapse phenomenon. Empirically, we show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.

preprint2020arXiv

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.

preprint2020arXiv

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress ($>50$\%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

preprint2020arXiv

Direct estimation of minimum gate fidelity

With the current interest in building quantum computers, there is a strong need for accurate and efficient characterization of the noise in quantum gate implementations. A key measure of the performance of a quantum gate is the minimum gate fidelity, i.e., the fidelity of the gate, minimized over all input states. Conventionally, the minimum fidelity is estimated by first accurately reconstructing the full gate process matrix using the experimental procedure of quantum process tomography (QPT). Then, a numerical minimization is carried out to find the minimum fidelity. QPT is, however, well known to be costly, and it might appear that we can do better, if the goal is only to estimate one single number. In this work, we propose a hybrid numerical-experimental scheme that employs a numerical gradient-free minimization (GFM) and an experimental target-fidelity estimation procedure to directly estimate the minimum fidelity without reconstructing the process matrix. We compare this to an alternative scheme, referred to as QPT fidelity estimation, that does use QPT, but directly employs the minimum gate fidelity as the termination criterion. Both approaches can thus be considered as direct estimation schemes. General theoretical bounds suggest a significant resource savings for the GFM scheme over QPT fidelity estimation; numerical simulations for specific classes of noise, however, show that both schemes have similar performance, reminding us of the need for caution when using general bounds for specific examples. The GFM scheme, however, presents potential for future improvements in resource cost, with the development of even more efficient GFM algorithms.

preprint2016arXiv

Minimum Number of Copies in the Measurement of Multi-Photon Entanglement

Multi-photon entanglement has been successfully made by experimental groups. As the increase of photon number, several problems are encountered, say, greater number of copies, longer time, the error of fidelity and so on. In this paper, we present a new scheme based on Lagrange multiplier and feedback to save the measure copies in multi-photon experiment and five percent of measuring time, also guarantee the acceptable error of fidelity. All the results have been supported by the data of eight photon experiment. Furthermore, same approach is applied in the simulation for ten photon entanglement, and 22.45 percent of copies are saved, optimized copy distribution gives better estimation of fidelity than the average copy distribution.

preprint2014arXiv

Density matrix and fidelity estimation of multiphoton entanglement via phaselift

The experiments of multi-photon entanglements have been made by some groups, including Pan's group (Ref.[2],[3],[5]). Obviously, the increase number of the photon would cause a dramatically increase in the dimension of the measurement matrix, which result in a great consumption of time in the measurements. From a practical view, we wish to gain the most information through as little measurements as possible for the multi-photon entanglements. The low rank matrix recovery (LRMR) provides such a possibility to resolve all the issues of the measurement matrix based on less data. In this paper, we would like to verify that whether the LRMR works for six qubits and eight photons in comparison to the data given by Pan's group, i.e. we input a fraction of the data to calculate all of others. Through exploring their density matrix, fidelity and visibility, we find that the results remain consistent with the data provided by Pan's group, which allows us to confirm that the LRMR can simplify experimental measure- ments for more photons. In particular, we find that very limited data would also give excellent support to the experiment for fidelity when low rank, pure state, sparse or position information are utilized. Our analytical calculations confirm that LRMR would generalize to multi-photon state entanglement.

Yiping Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

An Unconstrained Layer-Peeled Perspective on Neural Collapse

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Direct estimation of minimum gate fidelity

Minimum Number of Copies in the Measurement of Multi-Photon Entanglement

Density matrix and fidelity estimation of multiphoton entanglement via phaselift