Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge

Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs. However, current evaluation practices fall short: existing benchmarks often include questions that can be directly answered using existing triples in KG, making it unclear whether models perform reasoning or simply retrieve answers directly. Moreover, inconsistent evaluation metrics and lenient answer matching criteria further obscure meaningful comparisons. In this work, we introduce a general method for constructing benchmarks and present BRINK (Benchmark for Reasoning under Incomplete Knowledge) to systematically assess KG-RAG methods under knowledge incompleteness. Our empirical results show that current KG-RAG methods have limited reasoning ability under missing knowledge, often rely on internal memorization, and exhibit varying degrees of generalization depending on their design.

preprint2022arXiv

$\rm ^{83}Rb$/$\rm ^{83m}Kr$ production and cross-section measurement with 3.4 MeV and 20 MeV proton beams

$\rm ^{83m}Kr$, with a short lifetime, is an ideal calibration source for liquid xenon or liquid argon detectors. The $\rm ^{83m}Kr$ isomer can be generated through the decay of $\rm ^{83} Rb$ isotope which is usually produced by proton beams bombarding natural krypton atoms. In this paper, we report a successful production of $\rm ^{83}Rb/^{83m}Kr$ with a proton beam energy of 3.4 MeV, and the first measurement of the production rate with such low energy proton beams. Another production attempt is performed using the newly available 20 MeV proton beam in China, and the measured production rate is consistent with previous measurements. The produced $\rm ^{83m}Kr$ source has been successfully injected into the PandaX-II liquid xenon detector, yielding enough statistics for detector calibration.

preprint2022arXiv

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

It is well-known that deep learning models are vulnerable to adversarial examples. Existing studies of adversarial training have made great progress against this challenge. As a typical trait, they often assume that the class distribution is overall balanced. However, long-tail datasets are ubiquitous in a wide spectrum of applications, where the amount of head class instances is larger than the tail classes. Under such a scenario, AUC is a much more reasonable metric than accuracy since it is insensitive toward class distribution. Motivated by this, we present an early trial to explore adversarial training methods to optimize AUC. The main challenge lies in that the positive and negative examples are tightly coupled in the objective function. As a direct result, one cannot generate adversarial examples without a full scan of the dataset. To address this issue, based on a concavity regularization scheme, we reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function. This leads to an end-to-end training protocol. Furthermore, we provide a convergence guarantee of the proposed algorithm. Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem. Finally, the extensive experimental results show the performance and robustness of our algorithm in three long-tail datasets.

preprint2022arXiv

BERTMap: A BERT-based Ontology Alignment System

Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we propose a novel OM system named BERTMap which can support both unsupervised and semi-supervised settings. It first predicts mappings using a classifier based on fine-tuning the contextual embedding model BERT on text semantics corpora extracted from ontologies, and then refines the mappings through extension and repair by utilizing the ontology structure and logic. Our evaluation with three alignment tasks on biomedical ontologies demonstrates that BERTMap can often perform better than the leading OM systems LogMap and AML.

preprint2022arXiv

Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains

Adversarial examples have posed a severe threat to deep neural networks due to their transferable nature. Currently, various works have paid great efforts to enhance the cross-model transferability, which mostly assume the substitute model is trained in the same domain as the target model. However, in reality, the relevant information of the deployed model is unlikely to leak. Hence, it is vital to build a more practical black-box threat model to overcome this limitation and evaluate the vulnerability of deployed models. In this paper, with only the knowledge of the ImageNet domain, we propose a Beyond ImageNet Attack (BIA) to investigate the transferability towards black-box domains (unknown classification tasks). Specifically, we leverage a generative model to learn the adversarial function for disrupting low-level features of input images. Based on this framework, we further propose two variants to narrow the gap between the source and target domains from the data and model perspectives, respectively. Extensive experiments on coarse-grained and fine-grained domains demonstrate the effectiveness of our proposed methods. Notably, our methods outperform state-of-the-art approaches by up to 7.71\% (towards coarse-grained domains) and 25.91\% (towards fine-grained domains) on average. Our code is available at \url{https://github.com/qilong-zhang/Beyond-ImageNet-Attack}.

preprint2022arXiv

Cosmological-model-independent tests of cosmic distance duality relation with Type Ia supernovae and radio quasars

In this paper, we investigate the possible deviations of the cosmic distance duality relation (CDDR) using the combination of the largest SNe Ia (Pantheon) and compact radio quasar (QSO) samples through two model-independent approaches. The deviation of CDDR is written as $D_L(z)/D_A(z)(1+z)^{-2}=η(z)$ and $η(z)=e^{τ(z)/2}$, with the parameterizations of $F_1$ ($τ(z) = 2ε_1 z$) and $F_2$ ($τ(z) = (1+z)^{2ε_2}-1$). Furthermore, in order to compare the two resulting distances, two cosmological-model-independent methods, i.e., the nearby SNe Ia method and the GP method are employed to match the two distinct data at the same redshift. Our findings indicate that, compared with the results obtained in the literature, there is an improvement in precision when the latest SNe Ia and QSO samples are used. Specially, in the framework of nearby SNe Ia method, the CDDR would be constrained at the precision of $Δε_{1} = 0.013$ in Model $F_1$ and $Δε_{2}=0.018$ in Model $F_2$. Regarding the GP method, one observes that a larger data size would produce more stringent constraints on the CDDR parameters. Therefore, accompanied by further developments in cosmological observations and the analysis methods, our analysis provides an insight into the evidence for unaccounted opacity sources at an earlier stage of the universe, or at the very least the new physics involved.

preprint2022arXiv

Cross-Technology Communication for the Internet of Things: A Survey

The ever-developing Internet of Things (IoT) brings the prosperity of wireless sensing and control applications. In many scenarios, different wireless technologies coexist in the shared frequency medium as well as the physical space. Such wireless coexistence may lead to serious cross-technology interference (CTI) problems, e.g. channel competition, signal collision, throughput degradation. Compared with traditional methods like interference avoidance, tolerance, and concurrency mechanism, direct and timely information exchange among heterogeneous devices is therefore a fundamental requirement to ensure the usability, inter-operability, and reliability of the IoT. Under this circumstance, Cross-Technology Communication (CTC) technique thus becomes a hot topic in both academic and industrial fields, which aims at directly exchanging data among heterogeneous devices that follow different standards. This paper comprehensively summarizes the CTC techniques and reveals that the key challenge for CTC lies in the heterogeneity of IoT devices, including the incompatibility of technical standards and the asymmetry of connection capability. Based on the above finding, we present a taxonomy of the existing CTC works (packet-level CTCs and physical-level CTCs) and compare the existing CTC techniques in terms of throughput, reliability, hardware modification, and concurrency.

preprint2022arXiv

D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention

DETR is the first fully end-to-end detector that predicts a final set of predictions without post-processing. However, it suffers from problems such as low performance and slow convergence. A series of works aim to tackle these issues in different ways, but the computational cost is yet expensive due to the sophisticated encoder-decoder architecture. To alleviate this issue, we propose a decoder-only detector called D^2ETR. In the absence of encoder, the decoder directly attends to the fine-fused feature maps generated by the Transformer backbone with a novel computationally efficient cross-scale attention module. D^2ETR demonstrates low computational complexity and high detection accuracy in evaluations on the COCO benchmark, outperforming DETR and its variants.

preprint2022arXiv

Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition

Previous works on multi-label image recognition (MLIR) usually use CNNs as a starting point for research. In this paper, we take pure Vision Transformer (ViT) as the research base and make full use of the advantages of Transformer with long-range dependency modeling to circumvent the disadvantages of CNNs limited to local receptive field. However, for multi-label images containing multiple objects from different categories, scales, and spatial relations, it is not optimal to use global information alone. Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images, named diverse instance discovery (DiD). To this end, we propose a semantic category-aware module and a spatial relationship-aware module, respectively, and then combine the two by a re-constraint strategy to obtain instance-aware attention maps. Finally, we propose a weakly supervised object localization-based approach to extract multi-scale local features, to form a multi-view pipeline. Our method requires only weakly supervised information at the label level, no additional knowledge injection or other strongly supervised information is required. Experiments on three benchmark datasets show that our method significantly outperforms previous works and achieves state-of-the-art results under fair experimental comparisons.

preprint2022arXiv

High precision measurement of cosmic curvature: from gravitational waves and cosmic chronometer

Although the spatial curvature has been measured with very high precision, it still suffers from the well known cosmic curvature tension. In this paper, we propose an improved method to determine the cosmic curvature, by using the simulated data of binary neutron star mergers observed by the second generation space-based DECi-hertz Interferometer Gravitational-wave Observatory (DECIGO). By applying the Hubble parameter observations of cosmic chronometers to the DECIGO standard sirens, we explore different possibilities of making measurements of the cosmic curvature referring to a distant past: one is to reconstruct the Hubble parameters through the Gaussian process without the influence of hypothetical models, and the other is deriving constraints on $Ω_K$ in the framework of non-flat $Λ$ cold dark matter model. It is shown that in the improved method DECIGO could provide a reliable and stringent constraint on the cosmic curvature ($Ω_{K} = -0.007\pm0.016$), while we could only expect the zero cosmic curvature to be established at the precision of $ΔΩ_K=0.12$ in the second model-dependent method. Therefore, our results indicate that in the framework of methodology proposed in this paper, the increasing number of well-measured standard sirens in DECIGO could significantly reduce the bias of estimations for cosmic curvature. Such constraint is also comparable to the precision of Planck 2018 results with the newest cosmic microwave background (CMB) observations ($ΔΩ_{K} \approx 0.018$), based on the concordance $Λ$CDM model.

preprint2022arXiv

Optimizing Two-way Partial AUC with an End-to-end Framework

The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which evaluates the average performance over all possible True Positive Rates (TPRs) and False Positive Rates (FPRs). Based on the knowledge that a skillful classifier should simultaneously embrace a high TPR and a low FPR, we turn to study a more general variant called Two-way Partial AUC (TPAUC), where only the region with $\mathsf{TPR} \ge α, \mathsf{FPR} \le β$ is included in the area. Moreover, recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics where only the FPR range is restricted, opening a new problem to seek solutions to leverage high TPAUC. Motivated by this, we present the first trial in this paper to optimize this new metric. The critical challenge along this course lies in the difficulty of performing gradient-based optimization with end-to-end stochastic training, even with a proper choice of surrogate loss. To address this issue, we propose a generic framework to construct surrogate optimization problems, which supports efficient end-to-end training with deep learning. Moreover, our theoretical analyses show that: 1) the objective function of the surrogate problems will achieve an upper bound of the original problem under mild conditions, and 2) optimizing the surrogate problems leads to good generalization performance in terms of TPAUC with a high probability. Finally, empirical studies over several benchmark datasets speak to the efficacy of our framework.

preprint2022arXiv

Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval

This paper aims for the task of text-to-video retrieval, where given a query in the form of a natural-language sentence, it is asked to retrieve videos which are semantically relevant to the given query, from a great number of unlabeled videos. The success of this task depends on cross-modal representation learning that projects both videos and sentences into common spaces for semantic similarity computation. In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval. Inspired by the reading strategy of humans, we propose a Reading-strategy Inspired Visual Representation Learning (RIVRL) to represent videos, which consists of two branches: a previewing branch and an intensive-reading branch. The previewing branch is designed to briefly capture the overview information of videos, while the intensive-reading branch is designed to obtain more in-depth information. Moreover, the intensive-reading branch is aware of the video overview captured by the previewing branch. Such holistic information is found to be useful for the intensive-reading branch to extract more fine-grained features. Extensive experiments on three datasets are conducted, where our model RIVRL achieves a new state-of-the-art on TGIF and VATEX. Moreover, on MSR-VTT, our model using two video features shows comparable performance to the state-of-the-art using seven video features and even outperforms models pre-trained on the large-scale HowTo100M dataset.

preprint2022arXiv

RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on

Virtual try-on(VTON) aims at fitting target clothes to reference person images, which is widely adopted in e-commerce.Existing VTON approaches can be narrowly categorized into Parser-Based(PB) and Parser-Free(PF) by whether relying on the parser information to mask the persons' clothes and synthesize try-on images. Although abandoning parser information has improved the applicability of PF methods, the ability of detail synthesizing has also been sacrificed. As a result, the distraction from original cloth may persistin synthesized images, especially in complicated postures and high resolution applications. To address the aforementioned issue, we propose a novel PF method named Regional Mask Guided Network(RMGN). More specifically, a regional mask is proposed to explicitly fuse the features of target clothes and reference persons so that the persisted distraction can be eliminated. A posture awareness loss and a multi-level feature extractor are further proposed to handle the complicated postures and synthesize high resolution images. Extensive experiments demonstrate that our proposed RMGN outperforms both state-of-the-art PB and PF methods.Ablation studies further verify the effectiveness ofmodules in RMGN.

preprint2022arXiv

Towards Robust Vision Transformer

Recent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention-based networks surpass traditional Convolutional Neural Networks (CNNs) in most vision tasks. However, existing ViTs focus on the standard accuracy and computation cost, lacking the investigation of the intrinsic influence on model robustness and generalization. In this work, we conduct systematic evaluation on components of ViTs in terms of their impact on robustness to adversarial examples, common corruptions and distribution shifts. We find some components can be harmful to robustness. By using and combining robust components as building blocks of ViTs, we propose Robust Vision Transformer (RVT), which is a new vision transformer and has superior performance with strong robustness. We further propose two new plug-and-play techniques called position-aware attention scaling and patch-wise augmentation to augment our RVT, which we abbreviate as RVT*. The experimental results on ImageNet and six robustness benchmarks show the advanced robustness and generalization ability of RVT compared with previous ViTs and state-of-the-art CNNs. Furthermore, RVT-S* also achieves Top-1 rank on multiple robustness leaderboards including ImageNet-C and ImageNet-Sketch. The code will be available at \url{https://github.com/alibaba/easyrobust}.

preprint2021arXiv

Fabrication and cold test of prototype of spatially periodic radio frequency quadrupole focusing linac

A 325 MHz aluminum prototype of a spatially periodic rf quadrupole focusing linac was developed at the Institute of Modern Physics, Chinese Academy of Sciences, as a promising candidate for the front end of a high-current linac. It consists of an alternating series of crossbar H-type drift tubes and rf quadrupole sections. Owing to its special geometry, cavity fabrication is a major hurdle for its engineering development and application. In this paper, we report the detailed mechanical design of this structure and describe its fabrication process, including machining, assembly, and inspection. The field distribution was measured by the bead-pull technique. The results show that the field errors of both the accelerating and focusing fields are within an acceptable range. A tuning scheme for this new structure is proposed and verified. The cold test process and results are presented in detail. The development of this prototype provides valuable guidance for the application of the spatially periodic rf quadrupole structure.

preprint2021arXiv

Hierarchical Similarity Learning for Language-based Product Image Retrieval

This paper aims for the language-based product image retrieval task. The majority of previous works have made significant progress by designing network structure, similarity measurement, and loss function. However, they typically perform vision-text matching at certain granularity regardless of the intrinsic multiple granularities of images. In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network. HSL first learns multi-level representations of input data by stacked encoders, and object-granularity similarity and image-granularity similarity are computed at each level. All the similarities are combined as the final hierarchical cross-modal similarity. Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method. Code and data are available at https://github.com/liufh1/hsl.

preprint2021arXiv

Investigation of Factors Affecting Vertical Sag of Stretched Wire

To study vertical sag requirements and factors affecting the stretched wire alignment method, the vertical sag equation is first derived theoretically. Subsequently, the influencing factors,such as the hanging weight or tension, span length, temperature change, elastic deformation, and the Earths rotation, of the vertical sag are summarized, and their validity is verified through actual measurements. Finally, the essential factors affecting vertical sag, the specific strength and length, are discussed. It is believed that the vertical sag of a stretched wire is proportional to the square of the length and inversely proportional to the specific strength of the material.

preprint2021arXiv

Self-Supervised Learning For Few-Shot Image Classification

Few-shot image classification aims to classify unseen classes with limited labelled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta-learning becomes an essential component and can largely affect the performance in practice. To this end, most of the existing methods highly rely on the efficient embedding network. Due to the limited labelled data, the scale of embedding network is constrained under a supervised learning(SL) manner which becomes a bottleneck of the few-shot learning methods. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB) and achieve better performance over baselines. Tests on four datasets in cross-domain few-shot learning classification show that the proposed method achieves state-of-the-art results and further prove the robustness of the proposed model. Our code is available at \hyperref[https://github.com/phecy/SSL-FEW-SHOT.]{https://github.com/phecy/SSL-FEW-SHOT.}

preprint2020arXiv

An Asynchronous Computability Theorem for Fair Adversaries

This paper proposes a simple topological characterization of a large class of fair adversarial models via affine tasks: sub-complexes of the second iteration of the standard chromatic subdivision. We show that the task computability of a model in the class is precisely captured by iterations of the corresponding affine task. Fair adversaries include, but are not restricted to, the models of wait-freedom, t-resilience, and $k$-concurrency. Our results generalize and improve all previously derived topological characterizations of the ability of a model to solve distributed tasks.

preprint2020arXiv

Charge density wave and superconductivity in the family of telluride chalcogenides Zn1-xCuxIr2-yN(N = Al, Ti, Rh)yTe4

The interplay between superconductivity and charge density wave (CDW)/metal-to-insulator transition (MIT) has long been interested and studied in condensed matter physics. Here we study systematically the charge density wave and superconductivity properties in the solid solutions Zn1-xCuxIr2-yN(N = Al, Ti, Rh)yTe4. Resistivity, magnetic susceptibility and specific heat measurements indicate that the CDW state was suppressed immediately while the superconducting critical temperature (Tc) differs from each system. In the Al- and Ti-substitution cases, Tc increase as y increases and reaches a maximum around 2.75 K and 2.84 K respectively at y = 0.075, followed by a decrease of Tc before the chemical phase boundary is reached at y = 0.2. Nevertheless, Tc decreases monotonously with Rh-doping content y increases and disappears above 0.3 with measuring temperature down to 2 K. Surprisingly, in the Zn1-xCuxIr2Te4 solid solution, Tc enhances as x increases and reaches a maximum value of 2.82 K for x = 0.5 but subsequently survives over the whole doping range of 0.00 - 0.9 despite Tc changes slightly with higher doping content, which differs from the observation of zinc doping suppressing the superconductivity quickly in the high Tc cuprate superconductors. The specific heat anomaly at the superconducting transitions for the representative optimal doping samples are all slightly higher than the BCS value of 1.43 and indicate bulk superconductivity in these compounds. Finally, the CDW transition temperature (TCDW) and superconducting transition temperature (Tc) vs. x/y content phase diagrams of Zn1-xCuxIr2-yN(N = Al, Ti, Rh)yTe4 have been established and compared, which offers good opportunity to study the competition between CDW and superconductivity in the telluride chalcogenides.

preprint2020arXiv

Energy-Efficient On-Chip Networks through Profiled Hybrid Switching

Virtual channel flow control is the de facto choice for modern networks-on-chip to allow better utilization of the link bandwidth through buffering and packet switching, which are also the sources of large power footprint and long per-hop latency. On the other hand, bandwidth can be plentiful for parallel workloads under virtual channel flow control. Thus, dated but simpler flow controls such as circuit switching can be utilized to improve the energy efficiency of modern networks-on-chip. In this paper, we propose to utilize part of the link bandwidth under circuit switching so that part of the traffic can be transmitted bufferlessly without routing. Our evaluations reveal that this proposal leads to a reduction of energy per flit by up to 32% while also provides very competitive latency per flit when compared to networks under virtual channel flow control.

preprint2020arXiv

Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

This paper strives to learn fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute among fashion items, which has potential values in many fashion related applications such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings in an end-to-end manner, thus measure the fine-grained similarity in the corresponding space. With two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, ASEN is able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on four fashion-related datasets show the effectiveness of ASEN for fine-grained fashion similarity learning and its potential for fashion reranking.

preprint2020arXiv

GAP++: Learning to generate target-conditioned adversarial examples

Adversarial examples are perturbed inputs which can cause a serious threat for machine learning models. Finding these perturbations is such a hard task that we can only use the iterative methods to traverse. For computational efficiency, recent works use adversarial generative networks to model the distribution of both the universal or image-dependent perturbations directly. However, these methods generate perturbations only rely on input images. In this work, we propose a more general-purpose framework which infers target-conditioned perturbations dependent on both input image and target label. Different from previous single-target attack models, our model can conduct target-conditioned attacks by learning the relations of attack target and the semantics in image. Using extensive experiments on the datasets of MNIST and CIFAR10, we show that our method achieves superior performance with single target attack models and obtains high fooling rates with small perturbation norms.

preprint2020arXiv

Machine Learning Empowered Beam Management for Intelligent Reflecting Surface Assisted MmWave Networks

Recently, intelligent reflecting surface (IRS) assisted mmWave networks are emerging, which bear the potential to address the blockage issue of the millimeter wave (mmWave) communication in a more cost-effective way. In particular, IRS is built by passive and programmable electromagnetic elements that can manipulate the mmWave propagation channel into a more favorable condition that is free of blockage via judicious joint BS-IRS transmission design. However, the coexistence of IRSs and mmWave BSs complicates the network architecture, and thus poses great challenges for efficient beam management (BM) that is one critical prerequisite for high performance mmWave networks. In this paper, we systematically evaluate the key issues and challenges of BM for IRS-assisted mmWave networks to bring insights into the future network design. Specifically, we carefully classify and discuss the extensibility and limitations of the existing BM of conventional mmWave towards the IRS-assisted new paradigm. Moreover, we propose a novel machine learning empowered BM framework for IRS-assisted networks with representative showcases, which processes environmental and mobility awareness to achieve highly efficient BM with significantly reduced system overhead. Finally, some interesting future directions are also suggested to inspire further researches.

preprint2020arXiv

Progressive Relation Learning for Group Activity Recognition

Group activities usually involve spatiotemporal dynamics among many interactive individuals, while only a few participants at several key frames essentially define the activity. Therefore, effectively modeling the group-relevant and suppressing the irrelevant actions (and interactions) are vital for group activity recognition. In this paper, we propose a novel method based on deep reinforcement learning to progressively refine the low-level features and high-level relations of group activities. Firstly, we construct a semantic relation graph (SRG) to explicitly model the relations among persons. Then, two agents adopting policy according to two Markov decision processes are applied to progressively refine the SRG. Specifically, one feature-distilling (FD) agent in the discrete action space refines the low-level spatio-temporal features by distilling the most informative frames. Another relation-gating (RG) agent in continuous action space adjusts the high-level semantic graph to pay more attention to group-relevant relations. The SRG, FD agent, and RG agent are optimized alternately to mutually boost the performance of each other. Extensive experiments on two widely used benchmarks demonstrate the effectiveness and superiority of the proposed approach.

preprint2020arXiv

Robust Generative Adversarial Network

Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability and generalization problem which may lead to poor generations. Most existing works focus on stabilizing the training of the discriminator while ignoring the generalization properties. In this work, we aim to improve the generalization capability of GANs by promoting the local robustness within the small neighborhood of the training samples. We also prove that the robustness in small neighborhood of training sets can lead to better generalization. Particularly, we design a robust optimization framework where the generator and discriminator compete with each other in a \textit{worst-case} setting within a small Wasserstein ball. The generator tries to map \textit{the worst input distribution} (rather than a Gaussian distribution used in most GANs) to the real data distribution, while the discriminator attempts to distinguish the real and fake distribution \textit{with the worst perturbation}. We have proved that our robust method can obtain a tighter generalization upper bound than traditional GANs under mild assumptions, ensuring a theoretical superiority of RGAN over GANs. A series of experiments on CIFAR-10, STL-10 and CelebA datasets indicate that our proposed robust framework can improve on five baseline GAN models substantially and consistently.

preprint2020arXiv

Self-supervised Adversarial Training

Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns robust feature representation so as to resist adversarial attacks. Meanwhile, the self-supervised learning aims to learn robust and semantic embedding from data itself. With these views, we introduce self-supervised learning to against adversarial examples in this paper. Specifically, the self-supervised representation coupled with k-Nearest Neighbour is proposed for classification. To further strengthen the defense ability, self-supervised adversarial training is proposed, which maximizes the mutual information between the representations of original examples and the corresponding adversarial examples. Experimental results show that the self-supervised representation outperforms its supervised version in respect of robustness and self-supervised adversarial training can further improve the defense ability efficiently.

preprint2020arXiv

Semantic Regularization: Improve Few-shot Image Classification by Reducing Meta Shift

Few-shot image classification requires the classifier to robustly cope with unseen classes even if there are only a few samples for each class. Recent advances benefit from the meta-learning process where episodic tasks are formed to train a model that can adapt to class change. However, these task sare independent to each other and existing works mainly rely on limited samples of individual support set in a single meta task. This strategy leads to severe meta shift issues across multiple tasks, meaning the learned prototypes or class descriptors are not stable as each task only involves their own support set. To avoid this problem, we propose a concise Semantic RegularizationNetwork to learn a common semantic space under the framework of meta-learning. In this space, all class descriptors can be regularized by the learned semantic basis, which can effectively solve the meta shift problem. The key is to train a class encoder and decoder structure that can encode the sample embedding features into the semantic domain with trained semantic basis, and generate a more stable and general class descriptor from the decoder. We evaluate our work by extensive comparisons with previous methods on three benchmark datasets (MiniImageNet, TieredImageNet, and CUB). The results show that the semantic regularization module improves performance by 4%-7% over the baseline method, and achieves competitive results over the current state-of-the-art models.

preprint2020arXiv

Sharp Multiple Instance Learning for DeepFake Video Detection

With the rapid development of facial manipulation techniques, face forgery has received considerable attention in multimedia and computer vision community due to security concerns. Existing methods are mostly designed for single-frame detection trained with precise image-level labels or for video-level prediction by only modeling the inter-frame inconsistency, leaving potential high risks for DeepFake attackers. In this paper, we introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated. We address this problem by multiple instance learning framework, treating faces and input video as instances and bag respectively. A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction, rather than from instance embeddings to instance prediction and then to bag prediction in traditional MIL. Theoretical analysis proves that the gradient vanishing in traditional MIL is relieved in S-MIL. To generate instances that can accurately incorporate the partially manipulated faces, spatial-temporal encoded instance is designed to fully model the intra-frame and inter-frame inconsistency, which further helps to promote the detection performance. We also construct a new dataset FFPMS for partially attacked DeepFake video detection, which can benefit the evaluation of different methods at both frame and video levels. Experiments on FFPMS and the widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection. In addition, S-MIL can also be adapted to traditional DeepFake image detection tasks and achieve state-of-the-art performance on single-frame datasets.

preprint2019arXiv

NbSeTe -A New Layered Transition Metal Dichalcogenide Superconductor

Transition metal dichalcogenides (TMDCs) usually exhibit layered polytypic structures due to the weak interlayer coupling. 2H-NbSe2 is one of the most widely studied in the pristine TMDC family due to its high superconducting transition temperature (Tc = 7.3K) and the occurrence of a charge-density wave (CDW) order below 33 K. The coexistence of CDW with superconductivity poses an intriguing open question about the relationship between Fermi surface nesting and Cooper pairing. Past studies of this issue have mostly been focused on doping 2H-NbSe2 by 3d transition metals without significantly changing its crystal structure. Here we replaced the Se by Te in 2H-NbSe2 in order to design a new 1T polytype layered TMDC NbSeTe, which adopts a trigonal structure with space group P-3m1. We successfully grew large size and high-quality single crystals of 1T-NbSeTe via the vapor transport method using I2 as the transport agent. Temperature-dependent resistivity and specific heat data revealed a bulk Tc at 1.3 K, which is the first observation of superconductivity in pure 1T-NbSeTe phase. This compound enlarged the family of superconducting TMDCs and provides an opportunity to study the interplay between CDW and superconductivity in the trigonal structure.