Source author record

Chang Xu

Chang Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

81works

27topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

Vision-language-action (VLA) models typically rely on large-scale real-world videos, whereas simulated data, despite being inexpensive and highly parallelizable to collect, often suffers from a substantial visual domain gap and limited environmental diversity, resulting in weak real-world generalization. We present an efficient video augmentation framework that converts simulated VLA videos into realistic training videos while preserving task semantics and action trajectories. Our pipeline extracts structured conditions from simulation via video semantic segmentation and video captioning, rewrites captions to diversify environments, and uses a conditional video transfer model to synthesize realistic videos. To make augmentation practical at scale, we introduce a diffusion feature-reuse mechanism that reuses video tokens across adjacent timesteps to accelerate generation, and a coreset sampling strategy that identifies a compact, non-redundant subset for augmentation under limited computation. Extensive experiments on Robotwin 2.0, LIBERO, LIBERO-Plus, and a real robotic platform demonstrate consistent improvements. For example, our method improves RDT-1B by 8% on Robotwin 2.0, and boosts $π_0$ by 5.1% on the more challenging LIBERO-Plus benchmark. Code is available at: https://github.com/nanfangxiansheng/Seeing-Realism-from-Simulation.

preprint2022arXiv

$q$-Supercongruences on triple and quadruple sums

Inspired by the recent work of El Bachraoui, we present some new $q$-supercongruences on triple and quadruple sums of basic hypergeometric series. In particular, we give a $q$-supercongruence modulo the fifth power of a cyclotomic polynomial, which is a $q$-analogue of the quadruple sum of Van Hamme's supercongruence (G.2).

preprint2022arXiv

A $q$-supercongruence modulo the fourth power of a cyclotomic polynomial

In this paper, a new $q$-supercongruence with two free parameters modulo the fourth power of a cyclotomic polynomial is obtained. Our main auxiliary tools are Watson's $_8ϕ_7$ transformation formula for basic hypergeometric series, the `creative microscoping' method recently introduced by Guo and Zudilin and the Chinese remainder theorem for coprime polynomials. By taking suitable parameter substitutions in the established $q$-supercongruence, some nice congruences involving the Bernoulli numbers are derived.

preprint2022arXiv

A Normalized Gaussian Wasserstein Distance for Tiny Object Detection

Detecting tiny objects is a very challenging problem since a tiny object only contains a few pixels in size. We demonstrate that state-of-the-art detectors do not produce satisfactory results on tiny objects due to the lack of appearance information. Our key observation is that Intersection over Union (IoU) based metrics such as IoU itself and its extensions are very sensitive to the location deviation of the tiny objects, and drastically deteriorate the detection performance when used in anchor-based detectors. To alleviate this, we propose a new evaluation metric using Wasserstein distance for tiny object detection. Specifically, we first model the bounding boxes as 2D Gaussian distributions and then propose a new metric dubbed Normalized Wasserstein Distance (NWD) to compute the similarity between them by their corresponding Gaussian distributions. The proposed NWD metric can be easily embedded into the assignment, non-maximum suppression, and loss function of any anchor-based detector to replace the commonly used IoU metric. We evaluate our metric on a new dataset for tiny object detection (AI-TOD) in which the average object size is much smaller than existing object detection datasets. Extensive experiments show that, when equipped with NWD metric, our approach yields performance that is 6.7 AP points higher than a standard fine-tuning baseline, and 6.0 AP points higher than state-of-the-art competitors. Codes are available at: https://github.com/jwwangchn/NWD.

preprint2022arXiv

An Image Patch is a Wave: Phase-Aware Vision MLP

In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp.

preprint2022arXiv

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features. Furthermore, we scale it to obtain a family of models, called CMTs, obtaining much better accuracy and efficiency than previous convolution and transformer based models. In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively. The proposed CMT-S also generalizes well on CIFAR10 (99.2%), CIFAR100 (91.7%), Flowers (98.7%), and other challenging vision datasets such as COCO (44.3% mAP), with considerably less computational cost.

preprint2022arXiv

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Tiny object detection (TOD) in aerial images is challenging since a tiny object only contains a few pixels. State-of-the-art object detectors do not provide satisfactory results on tiny objects due to the lack of supervision from discriminative features. Our key observation is that the Intersection over Union (IoU) metric and its extensions are very sensitive to the location deviation of the tiny objects, which drastically deteriorates the quality of label assignment when used in anchor-based detectors. To tackle this problem, we propose a new evaluation metric dubbed Normalized Wasserstein Distance (NWD) and a new RanKing-based Assigning (RKA) strategy for tiny object detection. The proposed NWD-RKA strategy can be easily embedded into all kinds of anchor-based detectors to replace the standard IoU threshold-based one, significantly improving label assignment and providing sufficient supervision information for network training. Tested on four datasets, NWD-RKA can consistently improve tiny object detection performance by a large margin. Besides, observing prominent noisy labels in the Tiny Object Detection in Aerial Images (AI-TOD) dataset, we are motivated to meticulously relabel it and release AI-TOD-v2 and its corresponding benchmark. In AI-TOD-v2, the missing annotation and location error problems are considerably mitigated, facilitating more reliable training and validation processes. Embedding NWD-RKA into DetectoRS, the detection performance achieves 4.3 AP points improvement over state-of-the-art competitors on AI-TOD-v2. Datasets, codes, and more visualizations are available at: https://chasel-tsui.github.io/AI-TOD-v2/

preprint2022arXiv

DyRep: Bootstrapping Training with Dynamic Re-parameterization

Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model's performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to bootstrap the training with minimal cost by devising a dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures. Concretely, our proposal adaptively finds the operations which contribute most to the loss in the network, and applies Rep to enhance their representational capacity. Besides, to suppress the noisy and redundant operations introduced by Rep, we devise a de-parameterization technique for a more compact re-parameterization. With this regard, DyRep is more efficient than Rep since it smoothly evolves the given network instead of constructing an over-parameterized network. Experimental results demonstrate our effectiveness, e.g., DyRep improves the accuracy of ResNet-18 by $2.04\%$ on ImageNet and reduces $22\%$ runtime over the baseline. Code is available at: https://github.com/hunto/DyRep.

preprint2022arXiv

GhostNets on Heterogeneous Devices via Cheap Operations

Deploying convolutional neural networks (CNNs) on mobile devices is difficult due to the limited memory and computation resources. We aim to design efficient neural networks for heterogeneous devices including CPU and GPU, by exploiting the redundancy in feature maps, which has rarely been investigated in neural architecture design. For CPU-like devices, we propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. C-Ghost bottlenecks are designed to stack C-Ghost modules, and then the lightweight C-GhostNet can be easily established. We further consider the efficient networks for GPU devices. Without involving too many GPU-inefficient operations (e.g.,, depth-wise convolution) in a building stage, we propose to utilize the stage-wise feature redundancy to formulate GPU-efficient Ghost (G-Ghost) stage structure. The features in a stage are split into two parts where the first part is processed using the original block with fewer output channels for generating intrinsic features, and the other are generated using cheap operations by exploiting stage-wise redundancy. Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage. C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively. Code is available at https://github.com/huawei-noah/CV-Backbones.

preprint2022arXiv

GreedyNASv2: Greedier Search with a Greedy Path Filter

Training a good supernet in one-shot NAS methods is difficult since the search space is usually considerably huge (e.g., $13^{21}$). In order to enhance the supernet's evaluation ability, one greedy strategy is to sample good paths, and let the supernet lean towards the good ones and ease its evaluation burden as a result. However, in practice the search can be still quite inefficient since the identification of good paths is not accurate enough and sampled paths still scatter around the whole search space. In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently. Concretely, based on the fact that good paths are much less than the weak ones in the space, we argue that the label of "weak paths" will be more confident and reliable than that of "good paths" in multi-path sampling. In this way, we thus cast the training of path filter in the positive and unlabeled (PU) learning paradigm, and also encourage a \textit{path embedding} as better path/operation representation to enhance the identification capacity of the learned filter. By dint of this embedding, we can further shrink the search space by aggregating similar operations with similar embeddings, and the search can be more efficient and accurate. Extensive experiments validate the effectiveness of the proposed method GreedyNASv2. For example, our obtained GreedyNASv2-L achieves $81.1\%$ Top-1 accuracy on ImageNet dataset, significantly outperforming the ResNet-50 strong baselines.

preprint2022arXiv

Inertia of two-qutrit entanglement witnesses

Entanglement witnesses (EWs) are a fundamental tool for the detection of entanglement. We investigate the inertias of bipartite EWs constructed by the partial transpose of NPT states. Furthermore, we find out most of the inertias of the patial transpose of the two-qutrit bipartite NPT states. As an application, we extend our results to high dimensional states.

preprint2022arXiv

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.

preprint2022arXiv

LightViT: Towards Light-Weight Convolution-Free Vision Transformers

Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to convolutions as a plug-and-play module and embed them in various ViT counterparts. In this paper, we argue that the convolutional kernels perform information aggregation to connect all tokens; however, they would be actually unnecessary for light-weight ViTs if this explicit aggregation could function in a more homogeneous way. Inspired by this, we present LightViT as a new family of light-weight ViTs to achieve better accuracy-efficiency balance upon the pure transformer blocks without convolution. Concretely, we introduce a global yet efficient aggregation scheme into both self-attention and feed-forward network (FFN) of ViTs, where additional learnable tokens are introduced to capture global dependencies; and bi-dimensional channel and spatial attentions are imposed over token embeddings. Experiments show that our model achieves significant improvements on image classification, object detection, and semantic segmentation tasks. For example, our LightViT-T achieves 78.7% accuracy on ImageNet with only 0.7G FLOPs, outperforming PVTv2-B0 by 8.2% while 11% faster on GPU. Code is available at https://github.com/hunto/LightViT.

preprint2022arXiv

New q-supercongruences from the Bailey transformation

Inspired by the recent work of Guo, we establish some new q-supercongruences including q-analogues of some Ramannujan-type supercongruences, by using the Bailey transformation formula and the `creative microscoping' method recently introduced by Guo and Zudilin.

preprint2022arXiv

Patch Slimming for Efficient Vision Transformers

This paper studies the efficiency problem for visual transformers by excavating redundant calculation in given networks. The recent transformer architecture has demonstrated its effectiveness for achieving excellent performance on a series of computer vision tasks. However, similar to that of convolutional neural networks, the huge computational cost of vision transformers is still a severe issue. Considering that the attention mechanism aggregates different patches layer-by-layer, we present a novel patch slimming approach that discards useless patches in a top-down paradigm. We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers. For each layer, the impact of a patch on the final output feature is approximated and patches with less impact will be removed. Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of vision transformers without affecting their performances. For example, over 45% FLOPs of the ViT-Ti model can be reduced with only 0.2% top-1 accuracy drop on the ImageNet dataset.

preprint2022arXiv

Relational Surrogate Loss Learning

Evaluation metrics in machine learning are often hardly taken as loss functions, as they could be non-differentiable and non-decomposable, e.g., average precision and F1 score. This paper aims to address this problem by revisiting the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics. Instead of pursuing an exact recovery of the evaluation metric through a deep neural network, we are reminded of the purpose of the existence of these evaluation metrics, which is to distinguish whether one model is better or worse than another. In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices, and propose a rank correlation-based optimization method to maximize this relation and learn surrogate losses. Compared to previous works, our method is much easier to optimize and enjoys significant efficiency and performance gains. Extensive experiments show that our method achieves improvements on various tasks including image classification and neural machine translation, and even outperforms state-of-the-art methods on human pose estimation and machine reading comprehension tasks. Code is available at: https://github.com/hunto/ReLoss.

preprint2022arXiv

Searching for Network Width with Bilaterally Coupled Network

Searching for a more compact network width recently serves as an effective way of channel pruning for the deployment of convolutional neural networks (CNNs) under hardware constraints. To fulfill the searching, a one-shot supernet is usually leveraged to efficiently evaluate the performance \wrt~different network widths. However, current methods mainly follow a \textit{unilaterally augmented} (UA) principle for the evaluation of each width, which induces the training unfairness of channels in supernet. In this paper, we introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue. In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately. Besides, we propose to reduce the redundant search space and present the BCNetV2 as the enhanced supernet to ensure rigorous training fairness over channels. Furthermore, we leverage a stochastic complementary strategy for training the BCNet, and propose a prior initial population sampling method to boost the performance of the evolutionary search. We also propose the first open-source width benchmark on macro structures named Channel-Bench-Macro for the better comparison of width search algorithms. Extensive experiments on benchmark CIFAR-10 and ImageNet datasets indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Moreover, our method turns out to further boost the performance of NAS models by refining their network widths. For example, with the same FLOPs budget, our obtained EfficientNet-B0 achieves 77.53\% Top-1 accuracy on ImageNet dataset, surpassing the performance of original setting by 0.65\%.

preprint2022arXiv

SimMatch: Semi-supervised Learning with Similarity Matching

Learning with few labeled data has been a longstanding problem in the computer vision and machine learning research community. In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simultaneously considers semantic similarity and instance similarity. In SimMatch, the consistency regularization will be applied on both semantic-level and instance-level. The different augmented views of the same instance are encouraged to have the same class prediction and similar similarity relationship respected to other instances. Next, we instantiated a labeled memory buffer to fully leverage the ground truth labels on instance-level and bridge the gaps between the semantic and instance similarities. Finally, we proposed the \textit{unfolding} and \textit{aggregation} operation which allows these two similarities be isomorphically transformed with each other. In this way, the semantic and instance pseudo-labels can be mutually propagated to generate more high-quality and reliable matching targets. Extensive experimental results demonstrate that SimMatch improves the performance of semi-supervised learning tasks across different benchmark datasets and different settings. Notably, with 400 epochs of training, SimMatch achieves 67.2\%, and 74.4\% Top-1 Accuracy with 1\% and 10\% labeled examples on ImageNet, which significantly outperforms the baseline methods and is better than previous semi-supervised learning frameworks. Code and pre-trained models are available at https://github.com/KyleZheng1997/simmatch.

preprint2022arXiv

Some new results about $q$-trinomial coefficients

In this paper, we present several new congruences on the $q$-trinomial coefficients introduced by Andrews and Baxter. A new congruence on sums of central $q$-binomial coefficients is also established.

preprint2022arXiv

Strain-dependent structural and electronic reconstructions in long-wavelength WS$_{2}$ moiré superlattices

In long-wavelength moiré superlattices of stacked transition metal dichalcogenides (TMDs), structural reconstruction ubiquitously occurs, which has reported to impact significantly their electronic properties. However, complete microscopic understandings of the interplay between the lattice reconstruction and alteration of electronic properties, and their further response to external perturbations in the reconstructed TMDs moiré superlattice are still lacking. Here, using scanning tunneling microscopy (STM) and scanning tunneling spectroscopy (STS) combined with first-principles calculation, we study the strain-dependent structural reconstruction and its correlated electronic reconstruction in long-wavelength H-type WS$_{2}$ moiré superlattice at nanometer scale. We observe that the long-wavelength WS$_{2}$ moiré superlattices experiencing strong atomic reconstruction transform into a hexagonal array of screw dislocations separating large-sized H-stacked domains. Both the geometry and the moiré wavelength of the moiré superlattice are dramatically tuned by external intralayer heterostrain in our experiment. Remarkably, the STS measurements further demonstrate that the location of the K point in conduction band is modulated sensitively by strain-induced lattice deformation at nanometer scale in this system, with the maximum energy shift reaching up to 300 meV. Our results highlight that intralayer strain plays a vital role in determining structural and electronic properties in TMD moiré superlattice.

preprint2022arXiv

Subtle Contact Nuances in the Delivery of Human-to-Human Touch Distinguish Emotional Sentiment

We routinely communicate distinct social and emotional sentiments through nuanced touch. For example, we might gently hold another's arm to offer a sense of calm, yet intensively hold another's arm to express excitement or anxiety. As this example indicates, distinct sentiments may be shaped by the subtlety in one's touch delivery. This work investigates how slight distinctions in skin-to-skin contact influence both the recognition of cued emotional messages (e.g., anger, sympathy) and the rating of emotional content (i.e., arousal, valence). By self-selecting preferred gestures (e.g., holding, stroking), touchers convey distinct messages by touching the receiver's forearm. Skin-to-skin contact attributes (e.g., velocity, depth, area) are optically tracked in high resolution. Contact is then examined within gesture, between messages. The results indicate touchers subtly, but significantly, vary contact attributes of a gesture to communicate distinct messages, which are recognizable by receivers. This tuning also correlates with receivers' arousal and valence. For instance, arousal increases with velocity for stroking, and depth for holding. Moreover, as shown here with human-to-human touch, valence is tied with velocity, which is the same trend as reported with brushes. The findings indicate that subtle nuance in skin-to-skin contact is important in conveying social messages and inducing emotions.

preprint2021arXiv

A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

As modern neural machine translation (NMT) systems have been widely deployed, their security vulnerabilities require close scrutiny. Most recently, NMT systems have been found vulnerable to targeted attacks which cause them to produce specific, unsolicited, and even harmful translations. These attacks are usually exploited in a white-box setting, where adversarial inputs causing targeted translations are discovered for a known target system. However, this approach is less viable when the target system is black-box and unknown to the adversary (e.g., secured commercial systems). In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data. We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data. We then analyse the effectiveness of the targeted poisoning in two common NMT training scenarios: the from-scratch training and the pre-train & fine-tune paradigm. Our results are alarming: even on the state-of-the-art systems trained with massive parallel data (tens of millions), the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets (e.g., 0.006%). Lastly, we discuss potential defences to counter such attacks.

preprint2021arXiv

Burst Eddy Current Testing with a Diamond Magnetometry

In this work, a burst eddy current testing technique based on the employment of a diamond nitrogen vacancy (NV) center magnetometer with the Hahn echo (HE) sequence is demonstrated. With the confocal experiment apparatus, the HE-based NV magnetometer attained a magnetic sensitivity of $4.3 ~ \mathrm{nT} / \sqrt{\mathrm{Hz}}$ and a volume-normalized sensitivity of $3.6 ~ \mathrm{pT} / \sqrt{\mathrm{Hz} \cdot \mathrm{mm}^{-3}}$, which are 5 times better than the already existing method under the same conditions. Based on the proposed magnetometer configuration, a burst eddy current (BEC) testing prototype achieves a minimum detectable sample smaller than ${300~μ \mathrm{m}}$ and measurement accuracy of $9.85~\mathrmμ \mathrm{m}$., which is employed to image different metallic specimens and detect the layered internal structures. Since our prototype comprises superb high sensitivity, it exhibits various potential applications in the fields of deformation monitoring, security screening, and quality control. Moreover, its biocompatibility and promising nanoscale resolution paves the way for electromagnetic testing in the fields of biomaterials.

preprint2021arXiv

Hero: On the Chaos When PATH Meets Modules

Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overcome the limitations of the original one. While these two library modes are incompatible, both are supported by the Golang ecosystem. The heterogeneous use of library-referencing modes across Golang projects has caused numerous dependency management (DM) issues, incurring reference inconsistencies and even build failures. Motivated by the problem, we conducted an empirical study to characterize the DM issues, understand their root causes, and examine their fixing solutions. Based on our findings, we developed \textsc{Hero}, an automated technique to detect DM issues and suggest proper fixing solutions. We applied \textsc{Hero} to 19,000 popular Golang projects. The results showed that \textsc{Hero} achieved a high detection rate of 98.5\% on a DM issue benchmark and found 2,422 new DM issues in 2,356 popular Golang projects. We reported 280 issues, among which 181 (64.6\%) issues have been confirmed, and 160 of them (88.4\%) have been fixed or are under fixing. Almost all the fixes have adopted our fixing suggestions.

preprint2021arXiv

LocalDrop: A Hybrid Regularization for Deep Neural Networks

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.

preprint2021arXiv

Locally Free Weight Sharing for Network Width Search

Searching for network width is an effective way to slim deep neural networks with hardware budgets. With this aim, a one-shot supernet is usually leveraged as a performance evaluator to rank the performance \wrt~different width. Nevertheless, current methods mainly follow a manually fixed weight sharing pattern, which is limited to distinguish the performance gap of different width. In this paper, to better evaluate each width, we propose a locally free weight sharing strategy (CafeNet) accordingly. In CafeNet, weights are more freely shared, and each width is jointly indicated by its base channels and free channels, where free channels are supposed to loCAte FrEely in a local zone to better represent each width. Besides, we propose to further reduce the search space by leveraging our introduced FLOPs-sensitive bins. As a result, our CafeNet can be trained stochastically and get optimized within a min-min strategy. Extensive experiments on ImageNet, CIFAR-10, CelebA and MS COCO dataset have verified our superiority comparing to other state-of-the-art baselines. For example, our method can further boost the benchmark NAS network EfficientNet-B0 by 0.41\% via searching its width more delicately.

preprint2021arXiv

REST: Relational Event-driven Stock Trend Forecasting

Stock trend forecasting, aiming at predicting the stock future trends, is crucial for investors to seek maximized profits from the stock market. Many event-driven methods utilized the events extracted from news, social media, and discussion board to forecast the stock trend in recent years. However, existing event-driven methods have two main shortcomings: 1) overlooking the influence of event information differentiated by the stock-dependent properties; 2) neglecting the effect of event information from other related stocks. In this paper, we propose a relational event-driven stock trend forecasting (REST) framework, which can address the shortcoming of existing methods. To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts. To address the second shortcoming, we construct a stock graph and design a new propagation layer to propagate the effect of event information from related stocks. The experimental studies on the real-world data demonstrate the efficiency of our REST framework. The results of investment simulation show that our framework can achieve a higher return of investment than baselines.

preprint2021arXiv

SCOP: Scientific Control for Reliable Neural Network Pruning

This paper proposes a reliable neural network pruning algorithm by setting up a scientific control. Existing pruning methods have developed various hypotheses to approximate the importance of filters to the network and then execute filter pruning accordingly. To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output. Acting as a control group, knockoff feature is generated to mimic the feature map produced by the network filter, but they are conditionally independent of the example label given the real feature map. We theoretically suggest that the knockoff condition can be approximately preserved given the information propagation of network layers. Besides the real feature map on an intermediate layer, the corresponding knockoff feature is brought in as another auxiliary input signal for the subsequent layers. Redundant filters can be discovered in the adversarial process of different features. Through experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art methods. For example, our method can reduce 57.8% parameters and 60.2% FLOPs of ResNet-101 with only 0.01% top-1 accuracy loss on ImageNet. The code is available at https://github.com/huawei-noah/Pruning/tree/master/SCOP_NeurIPS2020.

preprint2021arXiv

The role of crosslinking density in surface stress and surface energy of soft solids

Surface stress and surface energy are two fundamental parameters that determine the surface properties of any materials. While it is commonly believed that the surface stress and surface energy of liquids are identical, the relationship between the two parameters in soft polymeric gels remains debatable. In this work, we measured the surface stress and surface energy of soft silicone gels with varying weight ratios of crosslinkers in soft wetting experiments. Above a critical density, $k_0$, the surface stress was found to increase significantly with crosslinking density while the surface energy remained unchanged. In this regime, we can estimate a non-zero surface elastic modulus that also increases with the ratio of crosslinkers. By comparing the surface mechanics of the soft gels with their bulk rheology, the surface properties near the critical density $k_0$ were found to be closely related to the underlying percolation transition of the polymer networks.

Chang Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

81 published item(s)

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

$q$-Supercongruences on triple and quadruple sums

A $q$-supercongruence modulo the fourth power of a cyclotomic polynomial

A Normalized Gaussian Wasserstein Distance for Tiny Object Detection

An Image Patch is a Wave: Phase-Aware Vision MLP

CMT: Convolutional Neural Networks Meet Vision Transformers

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

DyRep: Bootstrapping Training with Dynamic Re-parameterization

GhostNets on Heterogeneous Devices via Cheap Operations

GreedyNASv2: Greedier Search with a Greedy Path Filter

Inertia of two-qutrit entanglement witnesses

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

LightViT: Towards Light-Weight Convolution-Free Vision Transformers

New q-supercongruences from the Bailey transformation

Patch Slimming for Efficient Vision Transformers

Relational Surrogate Loss Learning

Searching for Network Width with Bilaterally Coupled Network

SimMatch: Semi-supervised Learning with Similarity Matching

Some new results about $q$-trinomial coefficients

Strain-dependent structural and electronic reconstructions in long-wavelength WS$_{2}$ moiré superlattices

Subtle Contact Nuances in the Delivery of Human-to-Human Touch Distinguish Emotional Sentiment

A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

Burst Eddy Current Testing with a Diamond Magnetometry

Hero: On the Chaos When PATH Meets Modules

LocalDrop: A Hybrid Regularization for Deep Neural Networks

Locally Free Weight Sharing for Network Width Search

REST: Relational Event-driven Stock Trend Forecasting

SCOP: Scientific Control for Reliable Neural Network Pruning

The role of crosslinking density in surface stress and surface energy of soft solids

An Improved Wrist Kinematic Model for Human-Robot Interaction

Approximated Bilinear Modules for Temporal Modeling

Automatic low-bit hybrid quantization of neural networks through meta learning

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks

Bilinear Graph Networks for Visual Question Answering

CARS: Continuous Evolution for Efficient Neural Architecture Search

DAN: Dual-View Representation Learning for Adapting Stance Classifiers to New Domains

DeepMnemonic: Password Mnemonic Generation via Deep Attentive Encoder-Decoder Model

Discernible Image Compression

Distilling portable Generative Adversarial Networks for Image Translation

Effects of density-dependent scenarios of in-medium nucleon-nucleon interactions in heavy-ion collisions

Evolution of clustering structure through the momentum distributions in $^{8-10}$Be isotopes

GhostNet: More Features from Cheap Operations

High-momentum components in the $^4$He nucleus caused by inter-nucleon correlations

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

Learning Disentangled Representations with Latent Variation Predictability

On Positive-Unlabeled Classification in GAN

Operational Calibration: Debugging Confidence Errors for DNNs in the Field

Searching for Low-Bit Weights in Quantized Neural Networks

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Will Dependency Conflicts Affect My Program's Semantics?

Alpha Decay to Doubly Magic Core in Quartetting Wave Function Approach

Data-Free Learning of Student Networks

Efficient Residual Dense Block Search for Image Super-Resolution

A new approach for calculating nuclear symmetry energy

DroidLeaks: Benchmarking Resource Leak Bugs for Android Applications

Parts for the Whole: The DCT Norm for Extreme Visual Recovery

Streaming Label Learning for Modeling Labels on the Fly

Streaming View Learning

Theoretical study on nuclear structure by the multiple Coulomb scattering and magnetic scattering of relativistic electrons

Alpha Decay Width of $^{212}$Po from a quartetting wave function approach

Bound clusters on top of doubly magic nuclei

Convex hulls of random walks and their scaling limits

Investigation of ${}^9$Be from nonlocalized clustering concept

Local Rademacher Complexity for Multi-label Learning

Probing isospin- and momentum-dependent nuclear effective interactions in neutron-rich matter

Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks

A Survey on Multi-view Learning

Convex hulls of planar random walks with drift

Extracting the nuclear symmetry potential and energy from neutron-nucleus scattering data

Nonlocalized cluster dynamics and nuclear molecular structure

Nonlocalized Clustering: A New Concept in Nuclear Cluster Structure Physics

Relationship between the symmetry energy and the single-nucleon potential in isospin-asymmetric nucleonic matter

Delineating effects of tensor force on the density dependence of nuclear symmetry energy

Single-nucleon potential decomposition of the nuclear symmetry energy