Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.

preprint2026arXiv

Observation of spin-valley locked nodal lines in a quasi-2D altermagnet

The interplay among quantum degrees of freedom-spin, orbital and momentum-has emerged as a fertile ground for realizing magnetic quantum states with transformative potential for electronic and spintronic technologies. Prominent examples include ferromagnetic Weyl semimetals and antiferromagnetic axion insulators. Recently, altermagnets(AMs) have been identified as a distinct spin-splitting class of collinear antiferromagnets(AFMs), characterized by crystal symmetry that connects magnetic sublattices in real space and enforces C-paired spin-momentum locking in reciprocal space. These materials combine the advantages of nonrelativistic spin-polarization akin to FMs and vanished net-magnetization as AFMs, making them highly promising for spintronic applications. Furthermore, they introduce nontrivial spin-momentum locking spin texture as an additional degree of freedom for realizing novel quantum phases. In this work, we report the discovery of a new type of spin-valley-locked nodal line phase in the layered AM Rb-intercalated V{_2}Te{_2}O. By combining high-resolution spin and angle-resolved photoemission spectroscopy with first-principles calculations, we observe the coexistence of both spinless and spinful nodal lines near the Fermi level. Remarkably, the spinful nodal lines exhibit uniform spin polarization within each valley, while displaying opposite spin polarizations across symmetry-paired valleys-a unique feature we term spin-valley-locked nodal lines, which is exclusive to AMs. Direct measurements of out-of-plane band dispersion using a side-cleaving technique reveal the two-dimensional nature of these nodal lines. Our findings not only unveil a previously unexplored topological phase in AMs where valley-locked spin as an additional quantum character but also establish RbV{_2}Te{_2}O as a promising platform for spintronics, valleytronics, and moire-engineered quantum devices.

preprint2026arXiv

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.

preprint2026arXiv

Unified Source-Free Domain Adaptation

In the pursuit of transferring a source model to a target domain without access to the source training data, Source-Free Domain Adaptation (SFDA) has been extensively explored across various scenarios, including Closed-set, Open-set, Partial-set, and Generalized settings. Existing methods, focusing on specific scenarios, not only address a limited subset of challenges but also necessitate prior knowledge of the target domain, significantly limiting their practical utility and deployability. In light of these considerations, we introduce a more practical yet challenging problem, termed unified SFDA, which comprehensively incorporates all specific scenarios in a unified manner. In this paper, we propose a novel approach latent Causal factors discovery for unified SFDA (CausalDA). In contrast to previous alternatives that emphasize learning the statistical description of reality, we formulate CausalDA from a causality perspective. The objective is to uncover potential causality between latent variables and model decisions, enhancing the reliability and robustness of the learned model against domain shifts. To integrate extensive world knowledge, we leverage a pre-trained vision-language model such as CLIP. This aids in the formation and discovery of latent causal factors in the absence of supervision in the variation of distribution and semantics, coupled with a newly designed information bottleneck with theoretical guarantees. Extensive experiments demonstrate that CausalDA can achieve new state-of-the-art results in distinct SFDA settings, as well as source-free out-of-distribution generalization. Our code and data are available at https://github.com/tntek/CausalDA.

preprint2025arXiv

Sub-Ensemble Correlations as a Covariance Geometry

Conventional practice of spatially resolved detection in diffusion-coupled thermal atomic vapors implicitly treat localized responses as mutually independent. However, in this study, it is shown that observable correlations are governed by the intrinsic spatiotemporal covariance of a global spin-fluctuation field, such that spatial separation specifies only overlapping statistical projections rather than independent physical components. A unified field-theoretic description is established in which sub-ensembles are defined as measurement-induced statistical projections of a single stochastic field. Within this formulation, sub-ensemble correlations are determined by the covariance operator, inducing a natural geometry in which statistical independence corresponds to orthogonality of the measurement functionals. For collective spin fluctuations described by a diffusion-relaxation Ornstein-Uhlenbeck stochastic field, the covariance spectrum admits only a finite set of fluctuation modes in a bounded domain, imposing an intrinsic, field-level limit on the number of statistically distinguishable sub-ensembles. The loss of sub-ensemble independence is formalized through the notion of spatial sampling overlap, which quantifies the unavoidable statistical coupling arising from shared access to common low-order fluctuation modes. While multi-channel atomic magnetometry provides a concrete physical setting in which these constraints become explicit, the framework applies generically to diffusion-coupled stochastic fields.

preprint2022arXiv

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.

preprint2022arXiv

Centroid Approximation for Bootstrap: Improving Particle Quality at Inference

Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality ``centroid'' points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.

preprint2022arXiv

Developing a Vehicle Re-routing Algorithm using Connected Vehicle (CV) Technology

Vehicle Ad-hoc Networks (VANETs) act as the core of vehicular communications and provide the fundamental wireless communication architecture to support both vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication. Therefore, by leveraging only communication technologies, Connected Vehicles (CVs) can navigate through the dynamic road network. However, such vehicles are still in their infancy but are expected to have a significant impact on safety and mobility such as reducing non-recurrent congestion in case of a vehicle breakdown or other roadway incidents. To evaluate their impacts, this research examines the benefits of having CVs when a vehicle breakdown occurs by developing an intelligent proactive re-routing algorithm. Due to a lack of real-world data, this paper adopts an integrated simulated framework consisting of a V2X (OMNET++) communication simulator and a traffic microscopic simulator (SUMO). The developed algorithm functions such that when a vehicle is broken down within a live traffic lane, the system detects the breakdown, generates warning messages immediately and transmits them to approaching vehicles. Based on the real-time notification, informed vehicles proactively re-route to alternative roads to avoid the breakdown zone. Two scenarios were developed where a breakdown occurs within and outside a junction for both V2X-enabled and disabled systems. Results show that V2X-enabled CV re-routing mechanism can improve traffic efficiency by reducing congestion and enhance traffic safety by smoothing accelerations and decelerations of affected vehicles with low infrastructure costs. The algorithm would be useful to highway agencies (Department for Transport) and vehicle manufacturers in introducing CVs onto existing road networks.

preprint2022arXiv

Diffusion-based Molecule Generation with Informative Prior Bridges

AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities.

preprint2022arXiv

Dirac nodal lines in the quasi-one-dimensional ternary telluride TaPtTe$_5$

A Dirac nodal-line phase, as a quantum state of topological materials, usually occur in three-dimensional or at least two-dimensional materials with sufficient symmetry operations that could protect the Dirac band crossings. Here, we report a combined theoretical and experimental study on the electronic structure of the quasi-one-dimensional ternary telluride TaPtTe$_5$, which is corroborated as being in a robust nodal-line phase with fourfold degeneracy. Our angle-resolved photoemission spectroscopy measurements show that two pairs of linearly dispersive Dirac-like bands exist in a very large energy window, which extend from a binding energy of $\sim$ 0.75 eV to across the Fermi level. The crossing points are at the boundary of Brillouin zone and form Dirac-like nodal lines. Using first-principles calculations, we demonstrate the existing of nodal surfaces on the $k_y = \pm π$ plane in the absence of spin-orbit coupling (SOC), which are protected by nonsymmorphic symmetry in TaPtTe$_5$. When SOC is included, the nodal surfaces are broken into several nodal lines. By theoretical analysis, we conclude that the nodal lines along $Y$-$T$ and the ones connecting the $R$ points are non-trivial and protected by nonsymmorphic symmetry against SOC.

preprint2022arXiv

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model can be trained as if we were able to look ahead at the future of its deployment. Compared with Batch Update, a widely used paradigm, our theory suggests that the proposed algorithm achieves smaller temporal domain generalization error measured by a gradient variation term in a local regret. We demonstrate the empirical advantage by comparing with various representative baselines.

preprint2022arXiv

Let us Build Bridges: Understanding and Extending Diffusion Generative Models

Diffusion-based generative models have achieved promising results recently, but raise an array of open questions in terms of conceptual understanding, theoretical analysis, algorithm improvement and extensions to discrete, structured, non-Euclidean domains. This work tries to re-exam the overall framework, in order to gain better theoretical understandings and develop algorithmic extensions for data from arbitrary domains. By viewing diffusion models as latent variable models with unobserved diffusion trajectories and applying maximum likelihood estimation (MLE) with latent trajectories imputed from an auxiliary distribution, we show that both the model construction and the imputation of latent trajectories amount to constructing diffusion bridge processes that achieve deterministic values and constraints at end point, for which we provide a systematic study and a suit of tools. Leveraging our framework, we present 1) a first theoretical error analysis for learning diffusion generation models, and 2) a simple and unified approach to learning on data from different discrete and constrained domains. Experiments show that our methods perform superbly on generating images, semantic segments and 3D point clouds.

preprint2022arXiv

Multi-Class 3D Object Detection with Single-Class Supervision

While multi-class 3D detectors are needed in many robotics applications, training them with fully labeled datasets can be expensive in labeling cost. An alternative approach is to have targeted single-class labels on disjoint data samples. In this paper, we are interested in training a multi-class 3D object detection model, while using these single-class labeled data. We begin by detailing the unique stance of our "Single-Class Supervision" (SCS) setting with respect to related concepts such as partial supervision and semi supervision. Then, based on the case study of training the multi-class version of Range Sparse Net (RSN), we adapt a spectrum of algorithms -- from supervised learning to pseudo-labeling -- to fully exploit the properties of our SCS setting, and perform extensive ablation studies to identify the most effective algorithm and practice. Empirical experiments on the Waymo Open Dataset show that proper training under SCS can approach or match full supervision training while saving labeling costs.

preprint2022arXiv

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

preprint2022arXiv

Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set

Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.

preprint2022arXiv

The scope for AI-augmented interpretation of building blueprints in commercial and industrial property insurance

This report, commissioned by the WTW research network, investigates the use of AI in property risk assessment. It (i) reviews existing work on risk assessment in commercial and industrial properties and automated information extraction from building blueprints; and (ii) presents an exploratory 'proof-of concept-solution' exploring the feasibility of using machine learning for the automated extraction of information from building blueprints to support insurance risk assessment.

preprint2021arXiv

Dirac Nodal Lines and Nodal Loops in a Topological Kagome Superconductor CsV$_3$Sb$_5$

The intertwining of charge order, superconductivity and band topology has promoted the AV$_3$Sb$_5$ (A=K, Rb, Cs) family of materials to the center of attention in condensed matter physics. Underlying those mysterious macroscopic properties such as giant anomalous Hall conductivity (AHC) and chiral charge density wave is their nontrivial band topology. While there have been numerous experimental and theoretical works investigating the nontrivial band structure and especially the van Hove singularities, the exact topological phase of this family remains to be clarified. In this work, we identify CsV$_3$Sb$_5$ as a Dirac nodal line semimetal based on the observation of multiple Dirac nodal lines and loops close to the Fermi level. Combining photoemission spectroscopy and density functional theory, we identify two groups of Dirac nodal lines along $k_z$ direction and one group of Dirac nodal loops in the A-H-L plane. These nodal loops are located at the Fermi level within the instrumental resolution limit. Importantly, our first-principle analyses indicate that these nodal loops may be a crucial source of the mysterious giant AHC observed. Our results not only provide a clear picture to categorize the band structure topology of this family of materials, but also suggest the dominant role of topological nodal loops in shaping their transport behavior.

preprint2021arXiv

High-resolution ARPES endstation for in-situ electronic structure investigations at SSRF

Angle-resolved photoemission spectroscopy (ARPES) is one of the most powerful experimental techniques in condensed matter physics. Synchrotron ARPES, which uses photons with high flux and continuously tunable energy, has become particularly important. However, an excellent synchrotron ARPES system must have features such as a small beam spot, super-high energy resolution, and a user-friendly operation interface. A synchrotron beamline and an endstation (BL03U) were designed and constructed at the Shanghai Synchrotron Radiation Facility. The beam spot size at the sample position is 7.5 (V) $μ$m $\times$ 67 (H) $μ$m, and the fundamental photon range is 7-165 eV; the ARPES system enables photoemission with an energy resolution of 2.67 meV@21.2 eV. In addition, the ARPES system of this endstation is equipped with a six-axis cryogenic sample manipulator (the lowest temperature is 7 K) and is integrated with an oxide molecular beam epitaxy system and a scanning tunneling microscope, which can provide an advanced platform for in-situ characterization of the fine electronic structure of condensed matter.

preprint2021arXiv

Overlap-Minimization Scheduling Strategy for Data Transmission in VANET

The vehicular ad-hoc network (VANET) based on dedicated short-range communication (DSRC) is a distributed communication system, in which all the nodes share the wireless channel with carrier sense multiple access/collision avoid (CSMA/CA) protocol. However, the competition and backoff mechanisms of CSMA/CA often bring additional delays and data packet collisions, which may hardly meet the QoS requirements in terms of delay and packets delivery ratio (PDR). Moreover, because of the distribution nature of security information in broadcast mode, the sender cannot know whether the receivers have received the information successfully. Similarly, this problem also exists in no-acknowledge (non-ACK) transmissions of VANET. Therefore, the probability of packet collisions should be considered in broadcast or non-ACK working modes. This paper presents a connection-level scheduling algorithm overlaid on CSMA/CA to schedule the start sending time of each transmission. By converting the object of reducing collision probability to minimizing the overlap of transmission durations of connections, the probability of backoff-activation can be greatly decreased. Then the delay and the probability of packet collisions can also be decreased. Numerical simulations have been conducted in our unified platform containing SUMO, Veins and Omnet++. The result shows that the proposed algorithm can effectively improve the PDR and reduce the packets collision in VANET.

preprint2021arXiv

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.

preprint2021arXiv

QoS-aware Link Scheduling Strategy for Data Transmission in SDVN

The vehicular ad-hoc network (VANET) based on dedicated short-range communication (DSRC) is a distributed communication system, in which all the nodes share the wireless channel with carrier sense multiple access/collision avoid (CSMA/CA) protocol. However, the backoff mechanism of CSMA/CA in the channel contention might cause uncertain transmission delay and impede a certain quality of service (QoS) of applications. Moreover, there still exists a possibility of parlous data-packets collision, especially for broadcast or non-acknowledgement (NACK) transmissions. The original contributions of this paper are summarized as follows: (1) Model the packets collision probability of broadcast or NACK transmission in VANET with the combination theory and investigate the potential influence of miss my packets (MMP) problem. (2) Based on the software define vehicular network (SDVN) framework and QoS requirement, a novel link-level scheduling strategy, which determines the start-sending time for each connection, is proposed to maximize packets delivery ratio (PDR). Alternatively, maximizing PDR has been converted to the overlap minimization among transmission durations. (3) Meanwhile, an innovative transmission scheduling greedy search (TSGS) algorithm is originally proposed to mitigate computational complexity. Extensive simulations have been done in a unified platform Veins combining SUMO and OMNET++. And numerous results show that the proposed algorithm can effectively improve the PDR by at least 15%, enhance the collision-avoidance performance by almost 40%, and reduce the MMP ratio by about 3% compared with the random transmitting, meanwhile meet the QoS requirement.

preprint2021arXiv

Stein Neural Sampler

We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density. Motivated by the success of generative adversarial networks, we construct our samplers using deep neural networks that transform a reference distribution to the target distribution. Training schemes are developed to minimize two variations of the Stein discrepancy, which is designed to work with un-normalized densities. Once trained, our samplers are able to generate samples instantaneously. We show that the proposed methods are theoretically sound and experience fewer convergence issues compared with traditional sampling approaches according to our empirical studies.

preprint2020arXiv

Color-wise Attention Network for Low-light Image Enhancement

Absence of nearby light sources while capturing an image will degrade the visibility and quality of the captured image, making computer vision tasks difficult. In this paper, a color-wise attention network (CWAN) is proposed for low-light image enhancement based on convolutional neural networks. Motivated by the human visual system when looking at dark images, CWAN learns an end-to-end mapping between low-light and enhanced images while searching for any useful color cues in the low-light image to aid in the color enhancement process. Once these regions are identified, CWAN attention will be mainly focused to synthesize these local regions, as well as the global image. Both quantitative and qualitative experiments on challenging datasets demonstrate the advantages of our method in comparison with state-of-the-art methods.

preprint2020arXiv

Disentanglement Then Reconstruction: Learning Compact Features for Unsupervised Domain Adaptation

Recent works in domain adaptation always learn domain invariant features to mitigate the gap between the source and target domains by adversarial methods. The category information are not sufficiently used which causes the learned domain invariant features are not enough discriminative. We propose a new domain adaptation method based on prototype construction which likes capturing data cluster centers. Specifically, it consists of two parts: disentanglement and reconstruction. First, the domain specific features and domain invariant features are disentangled from the original features. At the same time, the domain prototypes and class prototypes of both domains are estimated. Then, a reconstructor is trained by reconstructing the original features from the disentangled domain invariant features and domain specific features. By this reconstructor, we can construct prototypes for the original features using class prototypes and domain prototypes correspondingly. In the end, the feature extraction network is forced to extract features close to these prototypes. Our contribution lies in the technical use of the reconstructor to obtain the original feature prototypes which helps to learn compact and discriminant features. As far as we know, this idea is proposed for the first time. Experiment results on several public datasets confirm the state-of-the-art performance of our method.

preprint2020arXiv

Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient MCMC lgoriathm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional MCMC algorithms. The proposed algorithms have much alleviated the pain of Bayesian methods in big data computing.

preprint2020arXiv

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. First, we sufficiently widen the deep thin network and train it until convergence. Then, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer. Finally, we further fine tune this already well-initialized deep thin network. The theoretical guarantee is established by using the neural mean field analysis. It demonstrates the advantage of our layerwise imitation approach over backpropagation. We also conduct large-scale empirical experiments to validate the proposed method. By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large, when ResNet101 and BERT Large are trained under the standard training procedures as in the literature.

preprint2020arXiv

Joint COCO and Mapillary Workshop at ICCV 2019 Keypoint Detection Challenge Track Technical Report: Distribution-Aware Coordinate Representation for Human Pose Estimation

In this paper, we focus on the coordinate representation in human pose estimation. While being the standard choice, heatmap based representation has not been systematically investigated. We found that the process of coordinate decoding (i.e. transforming the predicted heatmaps to the coordinates) is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method and propose a principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking them together, we formulate a novel Distribution-Aware coordinate Representation for Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on COCO keypoint detection challenge, validating the usefulness and effectiveness of our novel coordinate representation idea. The project page containing more details is at https://ilovepose.github.io/coco

preprint2020arXiv

Learning Various Length Dependence by Dual Recurrent Neural Networks

Recurrent neural networks (RNNs) are widely used as a memory model for sequence-related problems. Many variants of RNN have been proposed to solve the gradient problems of training RNNs and process long sequences. Although some classical models have been proposed, capturing long-term dependence while responding to short-term changes remains a challenge. To this problem, we propose a new model named Dual Recurrent Neural Networks (DuRNN). The DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence. The first part is a recurrent neural network with constrained full recurrent connections to deal with short-term dependence in sequence and generate short-term memory. Another part is a recurrent neural network with independent recurrent connections which helps to learn long-term dependence and generate long-term memory. A selection mechanism is added between two parts to help the needed long-term information transfer to the independent neurons. Multiple modules can be stacked to form a multi-layer model for better performance. Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence. Both theoretical analysis and extensive experiments are conducted to validate the performance of our model, and we also conduct simple visualization experiments and ablation analyses for the model interpretability. Experimental results indicate that the proposed DuRNN model can handle not only very long sequences (over 5000 time steps), but also short sequences very well. Compared with many state-of-the-art RNN models, our model has demonstrated efficient and better performance.

preprint2020arXiv

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.

preprint2020arXiv

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

State-of-the-art NLP models can often be fooled by human-unaware transformations such as synonymous word substitution. For security reasons, it is of critical importance to develop models with certified robustness that can provably guarantee that the prediction is can not be altered by any possible synonymous word substitution. In this work, we propose a certified robust method based on a new randomized smoothing technique, which constructs a stochastic ensemble by applying random word substitutions on the input sentences, and leverage the statistical properties of the ensemble to provably certify the robustness. Our method is simple and structure-free in that it only requires the black-box queries of the model outputs, and hence can be applied to any pre-trained models (such as BERT) and any types of models (world-level or subword-level). Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks. To the best of our knowledge, we are the first work to achieve certified robustness on large systems such as BERT with practically meaningful certified accuracy.

preprint2020arXiv

TDMP-Reliable Target Driven and Mobility Prediction based Routing Protocol in Complex VANET

Vehicle-to-everything (V2X) communication in the vehicular ad hoc network (VANET), an infrastructure-free mechanism, has emerged as a crucial component in the advanced Intelligent Transport System (ITS) for special information transmission and inter-vehicular communications. One of the main research challenges in VANET is the design and implementation of network routing protocols which manage to trigger V2X communication with the reliable end-to-end connectivity and efficient packet transmission. The organically changing nature of road transport vehicles poses a significant threat to VANET with respect to the accuracy and reliability of packet delivery. Therefore, a position-based routing protocol tends to be the predominant method in VANET as they overcome rapid changes in vehicle movements effectively. However, existing routing protocols have some limitations such as (i) inaccurate in high dynamic network topology, (ii) defective link-state estimation (iii) poor movement prediction in heterogeneous road layouts. In this paper, a target-driven and mobility prediction (TDMP) based routing protocol is therefore developed for high-speed mobility and dynamic topology of vehicles, fluctuant traffic flow and diverse road layouts in VANET. The primary idea in TDMP is that the destination target of a driver is included in the mobility prediction to assist the implementation of the routing protocol. Compared to existing geographic routing protocols which mainly greedily forward the packet to the next-hop based on its current position and partial road layout, TDMP is developed to enhance the packet transmission with the consideration of the estimation of inter-vehicles link status, and the prediction of vehicle positions dynamically in fluctuant mobility and global road layout.

preprint2020arXiv

Unsupervised Feature Selection via Multi-step Markov Transition Probability

Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for keeping data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection.

preprint2019arXiv

Metalens With Artificial Focus Pattern

Metalens as one of the most popular applications of emmerging optical metasurfaces has raised widspread interest recently. With nano structures fully controlling phase, polarization and transmission, metalens has achieved comparable performance of commercial objective lenses. While recent studies seeking for the accomplishment of traditional focusing behaviors through metalens are successful, inthis work, we have discovered that instead of focusing light to a point, metasurface further enables shaping the focus into a flexibly designed pattern, with more promises and potentials. New mechanism and generalizations of conventional point-focused metalens guiding principles have been proposed with metalens concentrating light to artificial focus pattern. As proving examples, we have demonstrated the engineering of metalens with artificial focus pattern by creating line and ring-shaped focus as 'drawing tools'. The metalens with 'U' and 'M' shaped focus are characterized for the proof of concepts. These metalens are fabricated through a single layer of silicon-based material through CMOS compatible nano fabrication process. The mechanism to generate artificial focus pattern can be applied to a plethora of future on-chip optical devices with applications ranging from beam engineering to next generation nano lithography.