Researcher profile

Bin Dong

Bin Dong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Geometric problem solving, as a typical multimodal reasoning problem, has attracted much attention and made great progress recently, however most of works focus on plane geometry while usually fail in solid geometry due to 3D spatial diagrams and complex reasoning. To bridge this gap, we introduce Hilbert-Geo, the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank. Based on this framework, we propose a Parse2Reason method containing two steps of first parsing then reasoning. In the parsing step, we utilize conditional description language (CDL), a formalized language composed of predicates specifically designed to construct geometric conditions, to represent both problem description (natural text) and solid diagrams (visual image). In the reasoning step, we leverage those formal CDL and the theorem bank to perform relational inference and algebraic computation, generating strictly correct, verifiable, and human-readable reasoning processes. Notably, our proposed Hilbert-Geo is also applicable to plane geometry. To advance geometric reasoning, we curate two expert-annotated dataset SolidFGeo2k and PlaneFGeo3k, which are furnished with geometric formal language annotations, solutions and answers. Extensive experiments show that our proposed method achieves the state-of-the-art (SOTA) performance 77.3% in SolidFGeo2k and 84.1% in MathVerse-Solid (one small subset in MathVerse dedicated to solid geometry), substantially outperforming leading MLLMs, such as Gemini-2.5-pro (54.2% on SolidFGeo2k) and GPT-5 (62.9% on MathVerse-Solid). In addition, our method achieves the SOTA accuracy 80.2% in PlaneFGeo3k, demonstrating the generality of the Hilbert-Geo in geometric reasoning. Our code and datasets will be publicly available.

preprint2024arXiv

Analysis of a wavelet frame based two-scale model for enhanced edges

Image restoration is a class of important tasks that emerges from a wide range of scientific disciplines. It has been noticed that most practical images can be modeled as a composition from a sparse singularity set (edges) where the image contents or their gradients change drastically, and cartoon chunks in which a high degree of regularity is dominant. Enhancing edges while promoting regularity elsewhere has been an important criterion for successful restoration in many image classes. In this article, we present a wavelet frame based image restoration model that captures potential edges and facilitates the restoration procedure by a dedicated treatment both of singularity and of cartoon. Moreover, its geometric robustness is enhanced by exploiting subtle inter-scale information available in the coarse image. To substantiate our intuition, we prove that this model converges to one variant of the celebrated Mumford-Shah model when adequate asymptotic specifications are given.

preprint2022arXiv

A Note on Machine Learning Approach for Computational Imaging

Computational imaging has been playing a vital role in the development of natural sciences. Advances in sensory, information, and computer technologies have further extended the scope of influence of imaging, making digital images an essential component of our daily lives. For the past three decades, we have witnessed phenomenal developments of mathematical and machine learning methods in computational imaging. In this note, we will review some of the recent developments of the machine learning approach for computational imaging and discuss its differences and relations to the mathematical approach. We will demonstrate how we may combine the wisdom from both approaches, discuss the merits and potentials of such a combination and present some of the new computational and theoretical challenges it brings about.

preprint2022arXiv

A scalable deep learning approach for solving high-dimensional dynamic optimal transport

The dynamic formulation of optimal transport has attracted growing interests in scientific computing and machine learning, and its computation requires to solve a PDE-constrained optimization problem. The classical Eulerian discretization based approaches suffer from the curse of dimensionality, which arises from the approximation of high-dimensional velocity field. In this work, we propose a deep learning based method to solve the dynamic optimal transport in high dimensional space. Our method contains three main ingredients: a carefully designed representation of the velocity field, the discretization of the PDE constraint along the characteristics, and the computation of high dimensional integral by Monte Carlo method in each time step. Specifically, in the representation of the velocity field, we apply the classical nodal basis function in time and the deep neural networks in space domain with the H1-norm regularization. This technique promotes the regularity of the velocity field in both time and space such that the discretization along the characteristic remains to be stable during the training process. Extensive numerical examples have been conducted to test the proposed method. Compared to other solvers of optimal transport, our method could give more accurate results in high dimensional cases and has very good scalability with respect to dimension. Finally, we extend our method to more complicated cases such as crowd motion problem.

preprint2022arXiv

Learning invariance preserving moment closure model for Boltzmann-BGK equation

As one of the main governing equations in kinetic theory, the Boltzmann equation is widely utilized in aerospace, microscopic flow, etc. Its high-resolution simulation is crucial in these related areas. However, due to the high dimensionality of the Boltzmann equation, high-resolution simulations are often difficult to achieve numerically. The moment method which was first proposed by Grad is among the popular numerical methods to achieve efficient high-resolution simulations. We can derive the governing equations in the moment method by taking moments on both sides of the Boltzmann equation, which effectively reduces the dimensionality of the problem. However, one of the main challenges is that it leads to an unclosed moment system, and closure is needed to obtain a closed moment system. It is truly an art in designing closures for moment systems and has been a significant research field in kinetic theory. Other than the traditional human designs of closures, the machine learning-based approach has attracted much attention lately in Han et al. and Huang et al. In this work, we propose a machine learning-based method to derive a moment closure model for the Boltzmann-BGK equation. In particular, the closure relation is approximated by a carefully designed deep neural network that possesses desirable physical invariances, i.e., the Galilean invariance, reflecting invariance, and scaling invariance, inherited from the original Boltzmann-BGK equation and playing an important role in the correct simulation of the Boltzmann equation. Numerical simulations on the 1D-1D examples including the smooth and discontinuous initial condition problems, Sod shock tube problem, the shock structure problems, and the 1D-3D examples including the smooth and discontinuous problems demonstrate satisfactory numerical performances of the proposed invariance preserving neural closure method.

preprint2022arXiv

MOTR: End-to-End Multiple-Object Tracking with Transformer

Temporal modeling of objects is a key challenge in multiple object tracking (MOT). Existing methods track by associating detections through motion-based and appearance-based similarity heuristics. The post-processing nature of association prevents end-to-end exploitation of temporal variations in video sequence. In this paper, we propose MOTR, which extends DETR and introduces track query to model the tracked instances in the entire video. Track query is transferred and updated frame-by-frame to perform iterative prediction over time. We propose tracklet-aware label assignment to train track queries and newborn object queries. We further propose temporal aggregation network and collective average loss to enhance temporal relation modeling. Experimental results on DanceTrack show that MOTR significantly outperforms state-of-the-art method, ByteTrack by 6.5% on HOTA metric. On MOT17, MOTR outperforms our concurrent works, TrackFormer and TransTrack, on association performance. MOTR can serve as a stronger baseline for future research on temporal modeling and Transformer-based trackers. Code is available at https://github.com/megvii-research/MOTR.

preprint2022arXiv

Region-Aware Metric Learning for Open World Semantic Segmentation via Meta-Channel Aggregation

As one of the most challenging and practical segmentation tasks, open-world semantic segmentation requires the model to segment the anomaly regions in the images and incrementally learn to segment out-of-distribution (OOD) objects, especially under a few-shot condition. The current state-of-the-art (SOTA) method, Deep Metric Learning Network (DMLNet), relies on pixel-level metric learning, with which the identification of similar regions having different semantics is difficult. Therefore, we propose a method called region-aware metric learning (RAML), which first separates the regions of the images and generates region-aware features for further metric learning. RAML improves the integrity of the segmented anomaly regions. Moreover, we propose a novel meta-channel aggregation (MCA) module to further separate anomaly regions, forming high-quality sub-region candidates and thereby improving the model performance for OOD objects. To evaluate the proposed RAML, we have conducted extensive experiments and ablation studies on Lost And Found and Road Anomaly datasets for anomaly segmentation and the CityScapes dataset for incremental few-shot learning. The results show that the proposed RAML achieves SOTA performance in both stages of open world segmentation. Our code and appendix are available at https://github.com/czifan/RAML.

preprint2022arXiv

Trained Model in Supervised Deep Learning is a Conditional Risk Minimizer

We proved that a trained model in supervised deep learning minimizes the conditional risk for each input (Theorem 2.1). This property provided insights into the behavior of trained models and established a connection between supervised and unsupervised learning in some cases. In addition, when the labels are intractable but can be written as a conditional risk minimizer, we proved an equivalent form of the original supervised learning problem with accessible labels (Theorem 2.2). We demonstrated that many existing works, such as Noise2Score, Noise2Noise and score function estimation can be explained by our theorem. Moreover, we derived a property of classification problem with noisy labels using Theorem 2.1 and validated it using MNIST dataset. Furthermore, We proposed a method to estimate uncertainty in image super-resolution based on Theorem 2.2 and validated it using ImageNet dataset. Our code is available on github.

preprint2021arXiv

A Practical Layer-Parallel Training Algorithm for Residual Networks

Gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters, which are time-consuming for deep ResNets. To break the dependencies between modules in both the forward and backward modes, auxiliary-variable methods such as the penalty and augmented Lagrangian (AL) approaches have attracted much interest lately due to their ability to exploit layer-wise parallelism. However, we observe that large communication overhead and lacking data augmentation are two key challenges of these methods, which may lead to low speedup ratio and accuracy drop across multiple compute devices. Inspired by the optimal control formulation of ResNets, we propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost. The proposed strategy first trains the network parameters by solving a succession of independent sub-problems in parallel and then corrects the network parameters through a full serial forward-backward propagation of data. Such a strategy can be applied to most of the existing layer-parallel training methods using auxiliary variables. As an example, we validate the proposed strategy using penalty and AL methods on ResNet and WideResNet across MNIST, CIFAR-10 and CIFAR-100 datasets, achieving significant speedup over the traditional layer-serial training methods while maintaining comparable accuracy.

preprint2021arXiv

Enhancing Certified Robustness via Smoothed Weighted Ensembling

Randomized smoothing has achieved state-of-the-art certified robustness against $l_2$-norm adversarial attacks. However, it is not wholly resolved on how to find the optimal base classifier for randomized smoothing. In this work, we employ a Smoothed WEighted ENsembling (SWEEN) scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. Furthermore, theoretical analysis proves that the optimal SWEEN model can be obtained from training under mild assumptions. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models. Extensive experiments show that SWEEN models outperform the upper envelope of their corresponding candidate models by a large margin. Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.

preprint2021arXiv

NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds

Convolution plays a crucial role in various applications in signal and image processing, analysis, and recognition. It is also the main building block of convolution neural networks (CNNs). Designing appropriate convolution neural networks on manifold-structured point clouds can inherit and empower recent advances of CNNs to analyzing and processing point cloud data. However, one of the major challenges is to define a proper way to "sweep" filters through the point cloud as a natural generalization of the planar convolution and to reflect the point cloud's geometry at the same time. In this paper, we consider generalizing convolution by adapting parallel transport on the point cloud. Inspired by a triangulated surface-based method [Stefan C. Schonsheck, Bin Dong, and Rongjie Lai, arXiv:1805.07857.], we propose the Narrow-Band Parallel Transport Convolution (NPTC) using a specifically defined connection on a voxel-based narrow-band approximation of point cloud data. With that, we further propose a deep convolutional neural network based on NPTC (called NPTC-net) for point cloud classification and segmentation. Comprehensive experiments show that the proposed NPTC-net achieves similar or better results than current state-of-the-art methods on point cloud classification and segmentation.

preprint2020arXiv

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress ($>50$\%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

preprint2020arXiv

Blind Adversarial Training: Balance Accuracy and Robustness

Adversarial training (AT) aims to improve the robustness of deep learning models by mixing clean data and adversarial examples (AEs). Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT requires a prescribed uniform budget to constrain the magnitude of the AE perturbations during training, with the obtained results showing high sensitivity to the budget. On the other hand, unrestricted AT uses unconstrained AEs, resulting in the use of AEs located beyond the decision boundary; these overestimated AEs significantly lower the accuracy on clean data. These limitations mean that the existing AT approaches have difficulty in obtaining a comprehensively robust model with high accuracy and robustness when confronting attacks with varying strengths. Considering this problem, this paper proposes a novel AT approach named blind adversarial training (BAT) to better balance the accuracy and robustness. The main idea of this approach is to use a cutoff-scale strategy to adaptively estimate a nonuniform budget to modify the AEs used in the training, ensuring that the strengths of the AEs are dynamically located in a reasonable range and ultimately improving the overall robustness of the AT model. The experimental results obtained using BAT for training classification models on several benchmarks demonstrate the competitive performance of this method.

preprint2020arXiv

MetaInv-Net: Meta Inversion Network for Sparse View CT Image Reconstruction

X-ray Computed Tomography (CT) is widely used in clinical applications such as diagnosis and image-guided interventions. In this paper, we propose a new deep learning based model for CT image reconstruction with the backbone network architecture built by unrolling an iterative algorithm. However, unlike the existing strategy to include as many data-adaptive components in the unrolled dynamics model as possible, we find that it is enough to only learn the parts where traditional designs mostly rely on intuitions and experience. More specifically, we propose to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model. Other components, such as image priors and hyperparameters, are kept as the original design. Since a hypernetwork is introduced to inference on the initialization of the CG module, it makes the proposed model a certain meta-learning model. Therefore, we shall call the proposed model the meta-inversion network (MetaInv-Net). The proposed MetaInv-Net can be designed with much less trainable parameters while still preserves its superior image reconstruction performance than some state-of-the-art deep models in CT imaging. In simulated and real data experiments, MetaInv-Net performs very well and can be generalized beyond the training setting, i.e., to other scanning settings, noise levels, and data sets.

preprint2020arXiv

Quantization of electromagnetic modes and angular momentum on plasmonic nanowires

Quantum theory of surface plasmons is very important for studying the interactions between light and different metal nanostructures in nanoplasmonics. In this work, using the canonical quantization method, the SPPs on nanowires and their orbital and spin angular momentum are investigated. The results show that the SPPs on nanowire carry both orbital and spin momentum during propagation. Later, the result is applied on the plasmonic nanowire waveguide to show the agreement of the theory. The study is helpful for the nano wire based plasmonic interactions and the quantum information based optical circuit in the future.

preprint2020arXiv

RODE-Net: Learning Ordinary Differential Equations with Randomness from Data

Random ordinary differential equations (RODEs), i.e. ODEs with random parameters, are often used to model complex dynamics. Most existing methods to identify unknown governing RODEs from observed data often rely on strong prior knowledge. Extracting the governing equations from data with less prior knowledge remains a great challenge. In this paper, we propose a deep neural network, called RODE-Net, to tackle such challenge by fitting a symbolic expression of the differential equation and the distribution of parameters simultaneously. To train the RODE-Net, we first estimate the parameters of the unknown RODE using the symbolic networks \cite{long2019pde} by solving a set of deterministic inverse problems based on the measured data, and use a generative adversarial network (GAN) to estimate the true distribution of the RODE's parameters. Then, we use the trained GAN as a regularization to further improve the estimation of the ODE's parameters. The two steps are operated alternatively. Numerical results show that the proposed RODE-Net can well estimate the distribution of model parameters using simulated data and can make reliable predictions. It is worth noting that, GAN serves as a data driven regularization in RODE-Net and is more effective than the $\ell_1$ based regularization that is often used in system identifications.

preprint2020arXiv

Transferred Discrepancy: Quantifying the Difference Between Representations

Understanding what information neural networks capture is an essential problem in deep learning, and studying whether different models capture similar features is an initial step to achieve this goal. Previous works sought to define metrics over the feature matrices to measure the difference between two models. However, different metrics sometimes lead to contradictory conclusions, and there has been no consensus on which metric is suitable to use in practice. In this work, we propose a novel metric that goes beyond previous approaches. Recall that one of the most practical scenarios of using the learned representations is to apply them to downstream tasks. We argue that we should design the metric based on a similar principle. For that, we introduce the transferred discrepancy (TD), a new metric that defines the difference between two representations based on their downstream-task performance. Through an asymptotic analysis, we show how TD correlates with downstream tasks and the necessity to define metrics in such a task-dependent fashion. In particular, we also show that under specific conditions, the TD metric is closely related to previous metrics. Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions. We find that TD may also be used to evaluate the effectiveness of different training strategies. For example, we demonstrate that the models trained with proper data augmentations that improve the generalization capture more similar features in terms of TD, while those with data augmentations that hurt the generalization will not. This suggests a training strategy that leads to more robust representation also trains models that generalize better.

preprint2019arXiv

Hybrid Integrated Photonics Using Bulk Acoustic Resonators

Microwave frequency acousto-optic modulation is realized by exciting high overtone bulk acoustic wave resonances (HBAR resonances) in the photonic stack. These confined mechanical stress waves transmit exhibit vertically transmitting, high quality factor (Q) acoustic Fabry Perot resonances that extend into the Gigahertz domain, and offer stress-optical interaction with the optical modes of the microresonator. Although HBAR are ubiquitously used in modern communication, and often exploited in superconducting circuits, this is the first time they have been incorporated on a photonic circuit based chip. The electro-acousto-optical interaction observed within the optical modes exhibits high actuation linearity, low actuation power and negligible crosstalk. Using the electro-acousto-optic interaction, fast optical resonance tuning is achieved with sub-nanosecond transduction time. By removing the silicon backreflection, broadband acoustic modulation at 4.1 and 8.7 GHz is realized with a 3 dB bandwidth of 250 MHz each. The novel hybrid HBAR nanophotonic platform demonstrated here, allowing on chip integration of micron-scale acoustic and photonic resonators, can find immediate applications in tunable microwave photonics, high bandwidth soliton microcomb stabilization, compact opto-electronic oscillators, and in microwave to optical conversion schemes. Moreover the hybrid platform allows implementation of momentum biasing, which allows realization of on chip non-reciprocal devices such as isolators or circulators and topological photonic bandstructures.