Source author record

Kai Chen

Kai Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

109works

32topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Decompose to Understand, Fuse to Detect: Frequency-Decoupled Anomaly Detection for Encrypted Network Traffic

Network traffic anomaly detection represents a critical cybersecurity task, yet widespread encryption makes this task increasingly challenging. In response, image-based methods that model traffic as visual patterns have emerged as the dominant approach. However, this work pioneers the identification of a pervasive ``full-frequency'' characteristic and an associated limitation termed ``spectral mismatch'' within this paradigm. Specifically, while encrypted traffic exhibits prominent high-frequency components, mainstream reconstruction methods demonstrate an inherent bias toward learning low-frequency information. This fundamental mismatch results in incomplete representations that consequently degrade anomaly detection performance. To address this challenge, we propose FreeUp, a novel frequency-decoupled framework designed explicitly for encrypted traffic analysis. FreeUp decomposes traffic data into distinct low- and high-frequency bands, processing them through separate, dedicated branches along with a customized training strategy that ensures stable and independent frequency-specific learning. Furthermore, recognizing that simple reconstruction error proves inadequate for evaluating dual-branch architectures, we introduce an uncertainty-inspired fusion scoring mechanism. This mechanism quantifies the reconstruction uncertainty of the frequency-specific branches and dynamically integrates their outputs, yielding a more comprehensive and reliable anomaly score. Extensive experiments across multiple benchmarks demonstrate that FreeUp consistently outperforms state-of-the-art baselines. The code is available at https://github.com/ikun0124/FreeUp.

preprint2026arXiv

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generation as an auxiliary representation for policy learning. Consequently, they insufficiently explore the inverse problem: leveraging action signals to guide video synthesis, thereby often failing to preserve precise robot spatial geometry and fine-grained robot-object interaction dynamics in the generated rollouts. To bridge this gap, we present EA-WM, an Event-Aware Generative World Model that effectively closes the loop between kinematic control and visual perception. Rather than injecting joint or end-effector actions as abstract, low-dimensional tokens, EA-WM projects actions and kinematic states directly into the target camera view as Structured Kinematic-to-Visual Action Fields. To fully exploit this geometrically grounded representation, we introduce event-aware bidirectional fusion blocks that modulate cross-branch attention, capturing object state changes and interaction dynamics. Evaluated on the comprehensive WorldArena benchmark, EA-WM achieves state-of-the-art performance, outperforming existing baselines by a significant margin.

preprint2024arXiv

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

Recently we have witnessed the rapid development of video question answering models. However, most models can only handle simple videos in terms of temporal reasoning, and their performance tends to drop when answering temporal-reasoning questions on long and informative videos. To tackle this problem we propose STAIR, a Spatial-Temporal Reasoning model with Auditable Intermediate Results for video question answering. STAIR is a neural module network, which contains a program generator to decompose a given question into a hierarchical combination of several sub-tasks, and a set of lightweight neural modules to complete each of these sub-tasks. Though neural module networks are already widely studied on image-text tasks, applying them to videos is a non-trivial task, as reasoning on videos requires different abilities. In this paper, we define a set of basic video-text sub-tasks for video question answering and design a set of lightweight modules to complete them. Different from most prior works, modules of STAIR return intermediate outputs specific to their intentions instead of always returning attention maps, which makes it easier to interpret and collaborate with pre-trained models. We also introduce intermediate supervision to make these intermediate outputs more accurate. We conduct extensive experiments on several video question answering datasets under various settings to show STAIR's performance, explainability, compatibility with pre-trained models, and applicability when program annotations are not available. Code: https://github.com/yellow-binary-tree/STAIR

preprint2023arXiv

Boosting Neural Networks to Decompile Optimized Binaries

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

preprint2022arXiv

A Novel Multi-Agent Scheduling Mechanism for Adaptation of Production Plans in Case of Supply Chain Disruptions

Manufacturing companies typically use sophisticated production planning systems optimizing production steps, often delivering near-optimal solutions. As a downside for delivering a near-optimal schedule, planning systems have high computational demands resulting in hours of computation. Under normal circumstances this is not issue if there is enough buffer time before implementation of the schedule (e.g. at night for the next day). However, in case of unexpected disruptions such as delayed part deliveries or defectively manufactured goods, the planned schedule may become invalid and swift replanning becomes necessary. Such immediate replanning is unsuited for existing optimal planners due to the computational requirements. This paper proposes a novel solution that can effectively and efficiently perform replanning in case of different types of disruptions using an existing plan. The approach is based on the idea to adhere to the existing schedule as much as possible, adapting it based on limited local changes. For that purpose an agent-based scheduling mechanism has been devised, in which agents represent materials and production sites and use local optimization techniques and negotiations to generate an adapted (sufficient, but non-optimal) schedule. The approach has been evaluated using real production data from Huawei, showing that efficient schedules are produced in short time. The system has been implemented as proof of concept and is currently reimplemented and transferred to a production system based on the Jadex agent platform.

preprint2022arXiv

A two-stage full-band speech enhancement model with effective spectral compression mapping

The direct expansion of deep neural network (DNN) based wide-band speech enhancement (SE) to full-band processing faces the challenge of low frequency resolution in low frequency range, which would highly likely lead to deteriorated performance of the model. In this paper, we propose a learnable spectral compression mapping (SCM) to effectively compress the high frequency components so that they can be processed in a more efficient manner. By doing so, the model can pay more attention to low and middle frequency range, where most of the speech power is concentrated. Instead of suppressing noise in a single network structure, we first estimate a spectral magnitude mask, converting the speech to a high signal-to-ratio (SNR) state, and then utilize a subsequent model to further optimize the real and imaginary mask of the pre-enhanced signal. We conduct comprehensive experiments to validate the efficacy of the proposed method.

preprint2022arXiv

APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation

Style-guided text image generation tries to synthesize text image by imitating reference image's appearance while keeping text content unaltered. The text image appearance includes many aspects. In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image. To achieve this goal, we propose 1) a content-style cross attention based pixel sampling approach to roughly mimicking the style text image's background; 2) a pixel-wise style modulation technique to transfer varying color patterns of the style image to the content image spatial-adaptively; 3) a cross attention based multi-scale style fusion approach to solving text foreground misalignment issue between style and content images; 4) an image patch shuffling strategy to create style, content and ground truth image tuples for training. Experimental results on Chinese handwriting text image synthesis with SCUT-HCCDoc and CASIA-OLHWDB datasets demonstrate that the proposed method can improve the quality of synthetic text images and make them more photo-realistic.

preprint2022arXiv

Attacking Video Recognition Models with Bullet-Screen Comments

Recent research has demonstrated that Deep Neural Networks (DNNs) are vulnerable to adversarial patches which introduce perceptible but localized changes to the input. Nevertheless, existing approaches have focused on generating adversarial patches on images, their counterparts in videos have been less explored. Compared with images, attacking videos is much more challenging as it needs to consider not only spatial cues but also temporal cues. To close this gap, we introduce a novel adversarial attack in this paper, the bullet-screen comment (BSC) attack, which attacks video recognition models with BSCs. Specifically, adversarial BSCs are generated with a Reinforcement Learning (RL) framework, where the environment is set as the target model and the agent plays the role of selecting the position and transparency of each BSC. By continuously querying the target models and receiving feedback, the agent gradually adjusts its selection strategies in order to achieve a high fooling rate with non-overlapping BSCs. As BSCs can be regarded as a kind of meaningful patch, adding it to a clean video will not affect people' s understanding of the video content, nor will arouse people' s suspicion. We conduct extensive experiments to verify the effectiveness of the proposed method. On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video. Our code is available at https://github.com/kay-ck/BSC-attack.

preprint2022arXiv

Dense Siamese Network for Dense Unsupervised Learning

This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.

preprint2022arXiv

Double-pass multiple-plate continuum for high temporal contrast nonlinear pulse compression

We propose a new architecture, double-pass multiple-plate continuum (DPMPC), for nonlinear pulse compression. In addition to smaller footprint, a double-pass configuration is designed to achieve substantial bandwidth broadening without incurring noticeable higher-order dispersion, thus improving the temporal contrast over those of traditional single-pass geometry when only quadratic spectral phase can be compensated. In our proof-of-concept experiment, 187~$μ$J, 190-fs Yb-based laser pulse is compressed to 20~fs with high throughput (75%), high Strehl ratio (0.76) and excellent beam homogeneity by using DPMPC. Subsequently generated octave-spanning spectrum exhibits a significantly raised blue tail compared with that driven by pulses from a single-pass counterpart.

preprint2022arXiv

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks caused by either static topology heterogeneity or dynamic bandwidth contentions. Existing solutions, either system-level optimizations strengthening BSP (e.g., Ring or Hierarchical All-reduce) or algorithmic optimizations replacing BSP (e.g., ASP or SSP, which relax the global barriers), do not completely solve the problem, as they may still suffer from communication inefficiency or risk convergence inaccuracy. In this paper, we present a novel divide-and-shuffle synchronization (DS-Sync) to realize communication efficiency without sacrificing convergence accuracy for distributed DNN training. At its heart, by taking into account the network bottlenecks, DS-Sync improves communication efficiency by dividing workers into non-overlap groups to synchronize independently in a bottleneck-free manner. Meanwhile, it maintains convergence accuracy by iteratively shuffling workers among different groups to ensure a global consensus. We theoretically prove that DS-Sync converges properly in non-convex and smooth conditions like DNN. We further implement DS-Sync and integrate it with PyTorch, and our testbed experiments show that DS-Sync can achieve up to $94\%$ improvements on the end-to-end training time with existing solutions while maintaining the same accuracy.

preprint2022arXiv

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.

preprint2022arXiv

Experimental Demonstration of Quantum Pseudotelepathy

Quantum pseudotelepathy is a strong form of nonlocality. Different from the conventional non-local games where quantum strategies win statistically, e.g., the Clauser-Horne-Shimony-Holt game, quantum pseudotelepathy in principle allows quantum players to with probability 1. In this work, we report a faithful experimental demonstration of quantum pseudotelepathy via playing the non-local version of Mermin-Peres magic square game, where Alice and Bob cooperatively fill in a 3 by 3 magic square. We adopt the hyperentanglement scheme and prepare photon pairs entangled in both the polarization and the orbital angular momentum degrees of freedom, such that the experiment is carried out in a resource-efficient manner. Under the locality and fair-sampling assumption, our results show that quantum players can simultaneously win all the queries over any classical strategy.

preprint2022arXiv

Few-Shot Object Detection via Association and DIscrimination

Object detection has achieved substantial progress in the last decade. However, detecting novel classes with only few samples remains challenging, since deep learning under low data regime usually leads to a degraded feature space. Existing works employ a holistic fine-tuning paradigm to tackle this problem, where the model is first pre-trained on all base classes with abundant samples, and then it is used to carve the novel class feature space. Nonetheless, this paradigm is still imperfect. Durning fine-tuning, a novel class may implicitly leverage the knowledge of multiple base classes to construct its feature space, which induces a scattered feature space, hence violating the inter-class separability. To overcome these obstacles, we propose a two-step fine-tuning framework, Few-shot object detection via Association and DIscrimination (FADI), which builds up a discriminative feature space for each novel class with two integral steps. 1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space. Specifically, we associate each novel class with a base class according to their semantic similarity. After that, the feature space of a novel class can readily imitate the well-trained feature space of the associated base class. 2) In the discrimination step, to ensure the separability between the novel classes and associated base classes, we disentangle the classification branches for base and novel classes. To further enlarge the inter-class separability between all classes, a set-specialized margin loss is imposed. Extensive experiments on Pascal VOC and MS-COCO datasets demonstrate FADI achieves new SOTA performance, significantly improving the baseline in any shot/split by +18.7. Notably, the advantage is most announced on extremely few-shot scenarios.

preprint2022arXiv

GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors

Face image super resolution (face hallucination) usually relies on facial priors to restore realistic details and preserve identity information. Recent advances can achieve impressive results with the help of GAN prior. They either design complicated modules to modify the fixed GAN prior or adopt complex training strategies to finetune the generator. In this work, we propose a generative and controllable face SR framework, called GCFSR, which can reconstruct images with faithful identity information without any additional priors. Generally, GCFSR has an encoder-generator architecture. Two modules called style modulation and feature modulation are designed for the multi-factor SR task. The style modulation aims to generate realistic face details and the feature modulation dynamically fuses the multi-level encoded features and the generated ones conditioned on the upscaling factor. The simple and elegant architecture can be trained from scratch in an end-to-end manner. For small upscaling factors (<=8), GCFSR can produce surprisingly good results with only adversarial loss. After adding L1 and perceptual losses, GCFSR can outperform state-of-the-art methods for large upscaling factors (16, 32, 64). During the test phase, we can modulate the generative strength via feature modulation by changing the conditional upscaling factor continuously to achieve various generative effects.

preprint2022arXiv

Group R-CNN for Weakly Semi-supervised Object Detection with Points

We study the problem of weakly semi-supervised object detection with points (WSSOD-P), where the training data is combined by a small set of fully annotated images with bounding boxes and a large set of weakly-labeled images with only a single point annotated for each instance. The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. We challenge the prior belief that existing CNN-based detectors are not compatible with this task. Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation and thus can obtain a high recall rate. To better distinguish different instances and improve precision, we propose instance-level proposal assignment to replace the vanilla assignment strategy adopted in the original R-CNN methods. As naive instance-level assignment brings converging difficulty, we propose instance-aware representation learning which consists of instance-aware feature enhancement and instance-aware parameter generation to overcome this issue. Comprehensive experiments on the MS-COCO benchmark demonstrate the effectiveness of our method. Specifically, Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images, which is the most challenging scenario. The source code can be found at https://github.com/jshilong/GroupRCNN

preprint2022arXiv

Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain

With the broad application of deep neural networks (DNNs), backdoor attacks have gradually attracted attention. Backdoor attacks are insidious, and poisoned models perform well on benign samples and are only triggered when given specific inputs, which cause the neural network to produce incorrect outputs. The state-of-the-art backdoor attack work is implemented by data poisoning, i.e., the attacker injects poisoned samples into the dataset, and the models trained with that dataset are infected with the backdoor. However, most of the triggers used in the current study are fixed patterns patched on a small fraction of an image and are often clearly mislabeled, which is easily detected by humans or defense methods such as Neural Cleanse and SentiNet. Also, it's difficult to be learned by DNNs without mislabeling, as they may ignore small patterns. In this paper, we propose a generalized backdoor attack method based on the frequency domain, which can implement backdoor implantation without mislabeling and accessing the training process. It is invisible to human beings and able to evade the commonly used defense methods. We evaluate our approach in the no-label and clean-label cases on three datasets (CIFAR-10, STL-10, and GTSRB) with two popular scenarios (self-supervised learning and supervised learning). The results show our approach can achieve a high attack success rate (above 90%) on all the tasks without significant performance degradation on main tasks. Also, we evaluate the bypass performance of our approach for different kinds of defenses, including the detection of training data (i.e., Activation Clustering), the preprocessing of inputs (i.e., Filtering), the detection of inputs (i.e., SentiNet), and the detection of models (i.e., Neural Cleanse). The experimental results demonstrate that our approach shows excellent robustness to such defenses.

preprint2022arXiv

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image. A paradigm for tackling this problem is to leverage a powerful vision-language ("cross-modal") decoder to fuse features independently extracted from a vision encoder and a language encoder. Recent methods have made remarkable advancements in this paradigm by exploiting Transformers as cross-modal decoders, concurrent to the Transformer's overwhelming success in many other vision-language tasks. Adopting a different approach in this work, we show that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network. By conducting cross-modal feature fusion in the visual feature encoding stage, we can leverage the well-proven correlation modeling power of a Transformer encoder for excavating helpful multi-modal context. This way, accurate segmentation results are readily harvested with a light-weight mask predictor. Without bells and whistles, our method surpasses the previous state-of-the-art methods on RefCOCO, RefCOCO+, and G-Ref by large margins.

preprint2022arXiv

MMRotate: A Rotated Object Detection Benchmark using PyTorch

We present an open-source toolbox, named MMRotate, which provides a coherent algorithm framework of training, inferring, and evaluation for the popular rotated object detection algorithm based on deep learning. MMRotate implements 18 state-of-the-art algorithms and supports the three most frequently used angle definition methods. To facilitate future research and industrial applications of rotated object detection-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of rotated object detection. MMRotate is publicly released at https://github.com/open-mmlab/mmrotate.

preprint2022arXiv

Neural-iLQR: A Learning-Aided Shooting Method for Trajectory Optimization

Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory optimization problems with nonlinear system models. However, as a model-based shooting method, it relies heavily on an accurate system model to update the optimal control actions and the trajectory determined with forward integration, thus becoming vulnerable to inevitable model inaccuracies. Recently, substantial research efforts in learning-based methods for optimal control problems have been progressing significantly in addressing unknown system models, particularly when the system has complex interactions with the environment. Yet a deep neural network is normally required to fit substantial scale of sampling data. In this work, we present Neural-iLQR, a learning-aided shooting method over the unconstrained control space, in which a neural network with a simple structure is used to represent the local system model. In this framework, the trajectory optimization task is achieved with simultaneous refinement of the optimal policy and the neural network iteratively, without relying on the prior knowledge of the system model. Through comprehensive evaluations on two illustrative control tasks, the proposed method is shown to outperform the conventional iLQR significantly in the presence of inaccuracies in system models.

preprint2022arXiv

No Free Lunch Theorem for Security and Utility in Federated Learning

In a federated learning scenario where multiple parties jointly learn a model from their respective data, there exist two conflicting goals for the choice of appropriate algorithms. On one hand, private and sensitive training data must be kept secure as much as possible in the presence of \textit{semi-honest} partners, while on the other hand, a certain amount of information has to be exchanged among different parties for the sake of learning utility. Such a challenge calls for the privacy-preserving federated learning solution, which maximizes the utility of the learned model and maintains a provable privacy guarantee of participating parties' private data. This article illustrates a general framework that a) formulates the trade-off between privacy loss and utility loss from a unified information-theoretic point of view, and b) delineates quantitative bounds of privacy-utility trade-off when different protection mechanisms including Randomization, Sparsity, and Homomorphic Encryption are used. It was shown that in general \textit{there is no free lunch for the privacy-utility trade-off} and one has to trade the preserving of privacy with a certain degree of degraded utility. The quantitative analysis illustrated in this article may serve as the guidance for the design of practical federated learning algorithms.

preprint2022arXiv

Non-Hermitian $C_{NH} = 2$ Chern insulator protected by generalized rotational symmetry

We propose a non-Hermitian topological system protected by the generalized rotational symmetry which invokes rotation in space and Hermitian conjugation. The system, described by the tight-binding model with nonreciprocal hopping, is found to host two pairs of in-gap edge modes in the gapped topological phase and is characterized by the non-Hermitian (NH) Chern number $C_{NH}=2$. The quantization of the non-Hermitian Chern number is shown to be protected by the generalized rotational symmetry $Ĥ^{+}=ÛĤÛ^{+}$ of the system. Our finding paves the way towards novel non-Hermitian topological systems characterized by large values of topological invariants and hosting multiple in-gap edge states, which can be used for topologically resilient multiplexing.

preprint2022arXiv

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

preprint2022arXiv

OCSampler: Compressing Videos to One Clip with Single-step Sampling

In this paper, we propose a framework named OCSampler to explore a compact yet effective video representation with one short clip for efficient video recognition. Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step. Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially. Accordingly, these policies are derived from a light-weighted skim network together with a simple yet effective policy network within one step. Moreover, we extend the proposed method with a frame number budget, enabling the framework to produce correct predictions in high confidence with as few frames as possible. Experiments on four benchmarks, i.e., ActivityNet, Mini-Kinetics, FCVID, Mini-Sports1M, demonstrate the effectiveness of our OCSampler over previous methods in terms of accuracy, theoretical computational expense, actual inference speed. We also evaluate its generalization power across different classifiers, sampled frames, and search spaces. Especially, we achieve 76.9% mAP and 21.7 GFLOPs on ActivityNet with an impressive throughput: 123.9 Videos/s on a single TITAN Xp GPU.

preprint2022arXiv

PPA: Preference Profiling Attack Against Federated Learning

Federated learning (FL) trains a global model across a number of decentralized users, each with a local dataset. Compared to traditional centralized learning, FL does not require direct access to local datasets and thus aims to mitigate data privacy concerns. However, data privacy leakage in FL still exists due to inference attacks, including membership inference, property inference, and data inversion. In this work, we propose a new type of privacy inference attack, coined Preference Profiling Attack (PPA), that accurately profiles the private preferences of a local user, e.g., most liked (disliked) items from the client's online shopping and most common expressions from the user's selfies. In general, PPA can profile top-k (i.e., k = 1, 2, 3 and k = 1 in particular) preferences contingent on the local client (user)'s characteristics. Our key insight is that the gradient variation of a local user's model has a distinguishable sensitivity to the sample proportion of a given class, especially the majority (minority) class. By observing a user model's gradient sensitivity to a class, PPA can profile the sample proportion of the class in the user's local dataset, and thus the user's preference of the class is exposed. The inherent statistical heterogeneity of FL further facilitates PPA. We have extensively evaluated the PPA's effectiveness using four datasets (MNIST, CIFAR10, RAF-DB and Products-10K). Our results show that PPA achieves 90% and 98% top-1 attack accuracy to the MNIST and CIFAR10, respectively. More importantly, in real-world commercial scenarios of shopping (i.e., Products-10K) and social network (i.e., RAF-DB), PPA gains a top-1 attack accuracy of 78% in the former case to infer the most ordered items (i.e., as a commercial competitor), and 88% in the latter case to infer a victim user's most often facial expressions, e.g., disgusted.

preprint2022arXiv

Practical and Secure Federated Recommendation with Personalized Masks

Federated recommendation addresses the data silo and privacy problems altogether for recommender systems. Current federated recommender systems mainly utilize cryptographic or obfuscation methods to protect the original ratings from leakage. However, the former comes with extra communication and computation costs, and the latter damages model accuracy. Neither of them could simultaneously satisfy the real-time feedback and accurate personalization requirements of recommender systems. In this paper, we proposed federated masked matrix factorization (FedMMF) to protect the data privacy in federated recommender systems without sacrificing efficiency and effectiveness. In more details, we introduce the new idea of personalized mask generated only from local data and apply it in FedMMF. On the one hand, personalized mask offers protection for participants' private data without effectiveness loss. On the other hand, combined with the adaptive secure aggregation protocol, personalized mask could further improve efficiency. Theoretically, we provide security analysis for personalized mask. Empirically, we also show the superiority of the designed model on different real-world data sets.

preprint2022arXiv

Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

With the enactment of privacy-preserving regulations, e.g., GDPR, federated SVD is proposed to enable SVD-based applications over different data sources without revealing the original data. However, many SVD-based applications cannot be well supported by existing federated SVD solutions. The crux is that these solutions, adopting either differential privacy (DP) or homomorphic encryption (HE), suffer from accuracy loss caused by unremovable noise or degraded efficiency due to inflated data. In this paper, we propose FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. At the heart of FedSVD is a lossless matrix masking scheme delicately designed for SVD: 1) While adopting the masks to protect private data, FedSVD completely removes them from the final results of SVD to achieve lossless accuracy; and 2) As the masks do not inflate the data, FedSVD avoids extra computation and communication overhead during the factorization to maintain high efficiency. Experiments with real-world datasets show that FedSVD is over 10000 times faster than the HE-based method and has 10 orders of magnitude smaller error than the DP-based solution on SVD tasks. We further build and evaluate FedSVD over three real-world applications: principal components analysis (PCA), linear regression (LR), and latent semantic analysis (LSA), to show its superior performance in practice. On federated LR tasks, compared with two state-of-the-art solutions: FATE and SecureML, FedSVD-LR is 100 times faster than SecureML and 10 times faster than FATE.

preprint2022arXiv

PYSKL: Towards Good Practices for Skeleton Action Recognition

We present PYSKL: an open-source toolbox for skeleton-based action recognition based on PyTorch. The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN. In contrast to existing open-source skeleton action recognition projects that include only one or two algorithms, PYSKL implements six different algorithms under a unified framework with both the latest and original good practices to ease the comparison of efficacy and efficiency. We also provide an original GCN-based skeleton action recognition model named ST-GCN++, which achieves competitive recognition performance without any complicated attention schemes, serving as a strong baseline. Meanwhile, PYSKL supports the training and testing of nine skeleton-based action recognition benchmarks and achieves state-of-the-art recognition performance on eight of them. To facilitate future research on skeleton action recognition, we also provide a large number of trained models and detailed benchmark results to give some insights. PYSKL is released at https://github.com/kennymckormick/pyskl and is actively maintained. We will update this report when we add new features or benchmarks. The current version corresponds to PYSKL v0.2.

preprint2022arXiv

Revisiting Skeleton-based Action Recognition

Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features on top of human skeletons. Despite the positive results shown in previous works, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseC3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, which provides a great design space to further boost the performance. On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.

preprint2022arXiv

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.

preprint2022arXiv

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Temporal factors are tied to the growth of facts in realistic applications, such as the progress of diseases and the development of political situation, therefore, research on Temporal Knowledge Graph (TKG) attracks much attention. In TKG, relation patterns inherent with temporality are required to be studied for representation learning and reasoning across temporal facts. However, existing methods can hardly model temporal relation patterns, nor can capture the intrinsic connections between relations when evolving over time, lacking of interpretability. In this paper, we propose a novel temporal modeling method which represents temporal entities as Rotations in Quaternion Vector Space (RotateQVS) and relations as complex vectors in Hamilton's quaternion space. We demonstrate our method can model key patterns of relations in TKG, such as symmetry, asymmetry, inverse, and can further capture time-evolved relations by theory. Empirically, we show that our method can boost the performance of link prediction tasks over four temporal knowledge graph benchmarks.

preprint2022arXiv

Secure Forward Aggregation for Vertical Federated Neural Networks

Vertical federated learning (VFL) is attracting much attention because it enables cross-silo data cooperation in a privacy-preserving manner. While most research works in VFL focus on linear and tree models, deep models (e.g., neural networks) are not well studied in VFL. In this paper, we focus on SplitNN, a well-known neural network framework in VFL, and identify a trade-off between data security and model performance in SplitNN. Briefly, SplitNN trains the model by exchanging gradients and transformed data. On the one hand, SplitNN suffers from the loss of model performance since multiply parties jointly train the model using transformed data instead of raw data, and a large amount of low-level feature information is discarded. On the other hand, a naive solution of increasing the model performance through aggregating at lower layers in SplitNN (i.e., the data is less transformed and more low-level feature is preserved) makes raw data vulnerable to inference attacks. To mitigate the above trade-off, we propose a new neural network protocol in VFL called Security Forward Aggregation (SFA). It changes the way of aggregating the transformed data and adopts removable masks to protect the raw data. Experiment results show that networks with SFA achieve both data security and high model performance.

preprint2022arXiv

Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation

The recently proposed semi-blind source separation (SBSS) method for nonlinear acoustic echo cancellation (NAEC) outperforms adaptive NAEC in attenuating the nonlinear acoustic echo. However, the multiplicative transfer function (MTF) approximation makes it unsuitable for real-time applications especially in highly reverberant environments, and the natural gradient makes it hard to balance well between fast convergence speed and stability. In this paper, we propose two more effective SBSS methods based on auxiliary-function-based independent vector analysis (AuxIVA) and independent low-rank matrix analysis (ILRMA). The convolutive transfer function (CTF) approximation is used instead of MTF so that a long impulse response can be modeled with a short latency. The optimization schemes used in AuxIVA and ILRMA are carefully regularized according to the constrained demixing matrix of NAEC. Experimental results validate significantly better echo cancellation performance of the proposed methods.

preprint2022arXiv

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.

preprint2022arXiv

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

Self-supervised learning (SSL), especially contrastive methods, has raised attraction recently as it learns effective transferable representations without semantic annotations. A common practice for self-supervised pre-training is to use as much data as possible. For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance, observed from our extensive experiments. On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks. To address this issue, we propose a novel SSL paradigm called Scalable Dynamic Routing (SDR), which can be trained once and deployed efficiently to different downstream tasks with task-customized pre-trained models. Specifically, we construct the SDRnet with various sub-nets and train each sub-net with only one subset of the data by data-aware progressive training. When a downstream task arrives, we route among all the pre-trained sub-nets to get the best along with its corresponding weights. Experiment results show that our SDR can train 256 sub-nets on ImageNet simultaneously, which provides better transfer performance than a unified model trained on the full ImageNet, achieving state-of-the-art (SOTA) averaged accuracy over 11 downstream classification tasks and AP on PASCAL VOC detection task.

preprint2022arXiv

Towards Robust Part-aware Instance Segmentation for Industrial Bin Picking

Industrial bin picking is a challenging task that requires accurate and robust segmentation of individual object instances. Particularly, industrial objects can have irregular shapes, that is, thin and concave, whereas in bin-picking scenarios, objects are often closely packed with strong occlusion. To address these challenges, we formulate a novel part-aware instance segmentation pipeline. The key idea is to decompose industrial objects into correlated approximate convex parts and enhance the object-level segmentation with part-level segmentation. We design a part-aware network to predict part masks and part-to-part offsets, followed by a part aggregation module to assemble the recognized parts into instances. To guide the network learning, we also propose an automatic label decoupling scheme to generate ground-truth part-level labels from instance-level labels. Finally, we contribute the first instance segmentation dataset, which contains a variety of industrial objects that are thin and have non-trivial shapes. Extensive experimental results on various industrial objects demonstrate that our method can achieve the best segmentation results compared with the state-of-the-art approaches.

preprint2022arXiv

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for self-supervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative RecogTrans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporal-related downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Top1 Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.

preprint2022arXiv

What Are Expected Queries in End-to-End Object Detection?

End-to-end object detection is rapidly progressed after the emergence of DETR. DETRs use a set of sparse queries that replace the dense candidate boxes in most traditional detectors. In comparison, the sparse queries cannot guarantee a high recall as dense priors. However, making queries dense is not trivial in current frameworks. It not only suffers from heavy computational cost but also difficult optimization. As both sparse and dense queries are imperfect, then \emph{what are expected queries in end-to-end object detection}? This paper shows that the expected queries should be Dense Distinct Queries (DDQ). Concretely, we introduce dense priors back to the framework to generate dense queries. A duplicate query removal pre-process is applied to these queries so that they are distinguishable from each other. The dense distinct queries are then iteratively processed to obtain final sparse outputs. We show that DDQ is stronger, more robust, and converges faster. It obtains 44.5 AP on the MS COCO detection dataset with only 12 epochs. DDQ is also robust as it outperforms previous methods on both object detection and instance segmentation tasks on various datasets. DDQ blends advantages from traditional dense priors and recent end-to-end detectors. We hope it can serve as a new baseline and inspires researchers to revisit the complementarity between traditional methods and end-to-end detectors. The source code is publicly available at \url{https://github.com/jshilong/DDQ}.

preprint2021arXiv

Cylindrical vector beams reveal radiationless anapole condition in a resonant state

Nonscattering optical anapole condition is corresponding to the excitation of radiationless field distributions in open resonators, which offers new degrees of freedom for tailoring light-matter interaction. Conventional mechanisms for achieving such a condition relies on sophisticated manipulation of electromagnetic multipolar moments of all orders to guarantee superpositions of vanished moment strengths at the same wavelength. In contrast, here we report on the excitation of optical radiationless anapole hidden in a resonant state of a Si nanoparticle utilizing tightly focused radially polarized (RP) beam. The coexistence of magnetic resonant state and anapole condition at the same wavelength further enables the triggering of resonant state by tightly focused azimuthally polarized (AP) beam whose corresponding electric multipole coefficient could be zero. As a result, high contrast inter-transition between radiationless anapole condition and ideal magnetic resonant scattering can be achieved experimentally in visible spectrum. The proposed mechanism is general which can be realized in different types of nanostructures. Our results showcase that the unique combination of structured light and structured Mie resonances could provide new degrees of freedom for tailoring light-matter interaction, which might shed new light on functional meta-optics.

preprint2021arXiv

Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic Framework

The Spatio-Temporal Traffic Prediction (STTP) problem is a classical problem with plenty of prior research efforts that benefit from traditional statistical learning and recent deep learning approaches. While STTP can refer to many real-world problems, most existing studies focus on quite specific applications, such as the prediction of taxi demand, ridesharing order, traffic speed, and so on. This hinders the STTP research as the approaches designed for different applications are hardly comparable, and thus how an application-driven approach can be generalized to other scenarios is unclear. To fill in this gap, this paper makes three efforts: (i) we propose an analytic framework, called STAnalytic, to qualitatively investigate STTP approaches regarding their design considerations on various spatial and temporal factors, aiming to make different application-driven approaches comparable; (ii) we design a spatio-temporal meta-model, called STMeta, which can flexibly integrate generalizable temporal and spatial knowledge identified by STAnalytic, (iii) we build an STTP benchmark platform including ten real-life datasets with five scenarios to quantitatively measure the generalizability of STTP approaches. In particular, we implement STMeta with different deep learning techniques, and STMeta demonstrates better generalizability than state-of-the-art approaches by achieving lower prediction error on average across all the datasets.

preprint2021arXiv

Extracting Quantitative Dielectric Properties from Pump-Probe Spectroscopy

Optical pump-probe spectroscopy is a powerful tool for the study of non-equilibrium electronic dynamics and finds wide applications across a range of fields, from physics and chemistry to material science and biology. However, a shortcoming of conventional pump-probe spectroscopy is that photoinduced changes in transmission, reflection and scattering can simultaneously contribute to the measured differential spectra, leading to ambiguities in assigning the origin of spectral signatures and ruling out quantitative interpretation of the spectra. Ideally, these methods would measure the underlying dielectric function (or the complex refractive index) which would then directly provide quantitative information on the transient excited state dynamics free of these ambiguities. Here we present and test a model independent route to transform differential transmission or reflection spectra, measured via conventional optical pump-probe spectroscopy, to changes in the quantitative transient dielectric function. We benchmark this method against changes in the real refractive index measured using time-resolved Frequency Domain Interferometry in prototypical inorganic and organic semiconductor films. Our methodology can be applied to existing and future pump-probe data sets, allowing for an unambiguous and quantitative characterisation of the transient photoexcited spectra of materials. This in turn will accelerate the adoption of pump-probe spectroscopy as a facile and robust materials characterisation and screening tool.

preprint2021arXiv

Possible multi-orbital ground state in CeCu$_2$Si$_2$

The crystal-field ground state wave function of CeCu$_2$Si$_2$ has been investigated with linear polarized $M$-edge x-ray absorption spectroscopy from 250mK to 250K, thus covering the superconducting ($T_{\text{c}}$=0.6K), the Kondo ($T_{\text{K}}$$\approx$20K) as well as the Curie-Weiss regime. The comparison with full-multiplet calculations shows that the temperature dependence of the experimental linear dichroism is well explained with a $Γ_7^{(1)}$ crystal-field ground-state and the thermal population of excited states at around 30meV. The crystal-field scheme does not change throughout the entire temperature range thus making the scenario of orbital switching unlikely. Spectroscopic evidence for the presence of the Ce 4$f^0$ configuration in the ground state is consistent with the possibility for a multi-orbital character of the ground state. We estimate from the Kondo temperature and crystal-field splitting energies that several percents of the higher lying $Γ_6$ state and $Γ_7^{(2)}$ crystal-field states are mixed into the primarily $Γ_7^{(1)}$ ground state. This estimate is also supported by re-normalized band-structure calculations that uses the experimentally determined crystal-field scheme.

preprint2021arXiv

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to recover semantic information. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate and user-friendly high-level PL, effectively improving its readability and understandability. Furthermore, we implement the proposed approaches called SEAM. Evaluations on four real-world applications show that SEAM has an average accuracy of 94.41%, which is much better than prior neural machine translation (NMT) models. Finally, we evaluate the effectiveness of semantic information recovery through a questionnaire survey, and the average accuracy is 92.64%, which is comparable or superior to the state-of-the-art compilers.

preprint2021arXiv

SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization

To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. SEPAL applies the NLP technique and employs and trains a wide&deep model to quickly and precisely predict whether one rule is unregulated or not.Our evaluation shows SEPAL is effective, practical and scalable. We verify SEPAL outperforms the state of the art approach (i.e., EASEAndroid) by 15% accuracy rate on average. In our experiments, SEPAL successfully identifies 7,111 unregulated policy rules with a low false positive rate from 595,236 customized rules (extracted from 774 Android firmware images of 72 manufacturers). We further discover the policy customization problem is getting worse in newer Android versions (e.g., around 8% for Android 7 and nearly 20% for Android 9), even though more and more efforts are made. Then, we conduct a deep study and discuss why the unregulated rules are introduced and how they can compromise user devices. Last, we report some unregulated rules to seven vendors and so far four of them confirm our findings.

preprint2020arXiv

A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

Authorship identification is the process of identifying and classifying authors through given codes. Authorship identification can be used in a wide range of software domains, e.g., code authorship disputes, plagiarism detection, exposure of attackers' identity. Besides the inherent challenges from legacy software development, framework programming and crowdsourcing mode in Android raise the difficulties of authorship identification significantly. More specifically, widespread third party libraries and inherited components (e.g., classes, methods, and variables) dilute the primary code within the entire Android app and blur the boundaries of code written by different authors. However, prior research has not well addressed these challenges. To this end, we design a two-phased approach to attribute the primary code of an Android app to the specific developer. In the first phase, we put forward three types of strategies to identify the relationships between Java packages in an app, which consist of context, semantic and structural relationships. A package aggregation algorithm is developed to cluster all packages that are of high probability written by the same authors. In the second phase, we develop three types of features to capture authors' coding habits and code stylometry. Based on that, we generate fingerprints for an author from its developed Android apps and employ several machine learning algorithms for authorship classification. We evaluate our approach in three datasets that contain 15,666 apps from 257 distinct developers and achieve a 92.5% accuracy rate on average. Additionally, we test it on 2,900 obfuscated apps and our approach can classify apps with an accuracy rate of 80.4%.

preprint2020arXiv

All-optical nonreciprocity due to valley polarization in transition metal dichalcogenides

Nonreciprocity and nonreciprocal optical devices play a vital role in modern photonic technologies by enforcing one-way propagation of light. Most nonreciprocal devices today are made from a special class of low-loss ferrites that exhibit a magneto-optical response in the presence of an external static magnetic field. While breaking transmission symmetry, ferrites fail to satisfy the need for miniaturization of photonic circuitry due to weak character of nonreciprocal responses at optical wavelengths and are not easy to integrate into on-chip photonic systems. These challenges led to the emergence of magnetic-free approaches relying on breaking time reversal symmetry, e.g. with nonlinear effects modulating optical system in time. Here, we demonstrate an all-optical approach to nonreciprocity based on nonlinear valley-selective response in transition metal dichalcogenides (TMDs). This approach overcomes the limitations of magnetic materials and it does not require an external magnetic field. We provide experimental evidence of photoinduced nonreciprocity in a monolayer WS2 pumped by circularly polarized light. Nonreciprocity stems from valley-selective exciton-exciton interactions, giving rise to nonlinear circular dichroism controlled by circularly polarized pump fields. Our experimental results reveal a significant effect even at room temperature, despite considerable intervalley-scattering, showing potential for practical applications in magnetic-free nonreciprocal platforms. As an example, we propose a device scheme to realize an optical isolator based on a pass-through silicon nitride (SiN) ring resonator integrating the optically biased TMD monolayer.

preprint2020arXiv

Anapole mediated giant photothermal nonlinearity in nanostructured silicon

Featured with a plethora of electric and magnetic Mie resonances, high index dielectric nanostructures offer a versatile platform to concentrate light-matter interactions at the nanoscale. By integrating unique features of far-field scattering control and near-field concentration from radiationless anapole states, here, we demonstrate a giant photothermal nonlinearity in single subwavelength-sized silicon nanodisks. The nanoscale energy concentration and consequent near-field enhancements mediated by the anapole mode yield a reversible nonlinear scattering with a large modulation depth and a broad dynamic range, unveiling a record-high nonlinear index change up to 0.5 at mild incident light intensities on the order of MW/cm2. The observed photothermal nonlinearity showcases three orders of magnitude enhancement compared with that of unstructured bulk silicon, as well as nearly one order of magnitude higher than that through the radiative electric dipolar mode. Such nonlinear scattering can empower distinctive point spread functions in confocal reflectance imaging, offering the potential for far-field localization of nanostructured Si with an accuracy approaching 40 nm. Our findings shed new light on active silicon photonics based on optical anapoles.

preprint2020arXiv

Confidential Attestation: Efficient in-Enclave Verification of Privacy Policy Compliance

A trusted execution environment (TEE) such as Intel Software Guard Extension (SGX) runs a remote attestation to prove to a data owner the integrity of the initial state of an enclave, including the program to operate on her data. For this purpose, the data-processing program is supposed to be open to the owner, so its functionality can be evaluated before trust can be established. However, increasingly there are application scenarios in which the program itself needs to be protected. So its compliance with privacy policies as expected by the data owner should be verified without exposing its code. To this end, this paper presents CAT, a new model for TEE-based confidential attestation. Our model is inspired by Proof-Carrying Code, where a code generator produces proof together with the code and a code consumer verifies the proof against the code on its compliance with security policies. Given that the conventional solutions do not work well under the resource-limited and TCB-frugal TEE, we propose a new design that allows an untrusted out-enclave generator to analyze the source code of a program when compiling it into binary and a trusted in-enclave consumer efficiently verifies the correctness of the instrumentation and the presence of other protection before running the binary. Our design strategically moves most of the workload to the code generator, which is responsible for producing well-formatted and easy-to-check code, while keeping the consumer simple. Also, the whole consumer can be made public and verified through a conventional attestation. We implemented this model on Intel SGX and demonstrate that it introduces a very small part of TCB. We also thoroughly evaluated its performance on micro- and macro- benchmarks and real-world applications, showing that the new design only incurs a small overhead when enforcing several categories of security policies.

preprint2020arXiv

Cross Architectural Power Modelling

Existing power modelling research focuses on the model rather than the process for developing models. An automated power modelling process that can be deployed on different processors for developing power models with high accuracy is developed. For this, (i) an automated hardware performance counter selection method that selects counters best correlated to power on both ARM and Intel processors, (ii) a noise filter based on clustering that can reduce the mean error in power models, and (iii) a two stage power model that surmounts challenges in using existing power models across multiple architectures are proposed and developed. The key results are: (i) the automated hardware performance counter selection method achieves comparable selection to the manual method reported in the literature, (ii) the noise filter reduces the mean error in power models by up to 55%, and (iii) the two stage power model can predict dynamic power with less than 8% error on both ARM and Intel processors, which is an improvement over classic models.

preprint2020arXiv

Domain-specific Communication Optimization for Distributed DNN Training

Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years. Despite continuous efforts, prior solutions such as gradient compression/reduction, compute/communication overlapping and layer-wise flow scheduling, etc., are still coarse-grained and insufficient for an efficient distributed training especially when the network is under pressure. We present DLCP, a novel solution exploiting the domain-specific properties of deep learning to optimize communication overhead of DNN training in a fine-grained manner. At its heart, DLCP comprises of several key innovations beyond prior work: e.g., it exploits {\em bounded loss tolerance} of SGD-based training to improve tail communication latency which cannot be avoided purely through gradient compression. It then performs fine-grained packet-level prioritization and dropping, as opposed to flow-level scheduling, based on layers and magnitudes of gradients to further speedup model convergence without affecting accuracy. In addition, it leverages inter-packet order-independency to perform per-packet load balancing without causing classical re-ordering issues. DLCP works with both Parameter Server and collective communication routines. We have implemented DLCP with commodity switches, integrated it with various training frameworks including TensorFlow, MXNet and PyTorch, and deployed it in our small-scale testbed with 10 Nvidia V100 GPUs. Our testbed experiments and large-scale simulations show that DLCP delivers up to $84.3\%$ additional training acceleration over the best existing solutions.

preprint2020arXiv

Feature Pyramid Grids

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale. In this paper, we present Feature Pyramid Grids (FPG), a deep multi-pathway feature pyramid, that represents the feature scale-space as a regular grid of parallel bottom-up pathways which are fused by multi-directional lateral connections. FPG can improve single-pathway feature pyramid networks by significantly increasing its performance at similar computation cost, highlighting importance of deep pyramid representations. In addition to its general and uniform structure, over complicated structures that have been found with neural architecture search, it also compares favorably against such approaches without relying on search. We hope that FPG with its uniform and effective nature can serve as a strong component for future work in object recognition.

preprint2020arXiv

FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning

With the increasing awareness of privacy protection and data fragmentation problem, federated learning has been emerging as a new paradigm of machine learning. Federated learning tends to utilize various privacy preserving mechanisms to protect the transferred intermediate data, among which homomorphic encryption strikes a balance between security and ease of utilization. However, the complicated operations and large operands impose significant overhead on federated learning. Maintaining accuracy and security more efficiently has been a key problem of federated learning. In this work, we investigate a hardware solution, and design an FPGA-based homomorphic encryption framework, aiming to accelerate the training phase in federated learning. The root complexity lies in searching for a compact architecture for the core operation of homomorphic encryption, to suit the requirement of federated learning about high encryption throughput and flexibility of configuration. Our framework implements the representative Paillier homomorphic cryptosystem with high level synthesis for flexibility and portability, with careful optimization on the modular multiplication operation in terms of processing clock cycle, resource usage and clock frequency. Our accelerator achieves a near-optimal execution clock cycle, with a better DSP-efficiency than existing designs, and reduces the encryption time by up to 71% during training process of various federated learning models.

preprint2020arXiv

Gliding vertex on the horizontal bounding box for multi-oriented object detection

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

preprint2020arXiv

Neural Data-to-Text Generation with Dynamic Content Planning

Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.

preprint2020arXiv

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and far-end signal. Usually a post processing module is required to further suppress the echo. In this paper, we propose a residual echo suppression method based on the modification of fully convolutional time-domain audio separation network (Conv-TasNet). Both the residual signal of the linear acoustic echo cancellation system, and the output of the adaptive filter are adopted to form multiple streams for the Conv-TasNet, resulting in more effective echo suppression while keeping a lower latency of the whole system. Simulation results validate the efficacy of the proposed method in both single-talk and double-talk situations.

preprint2020arXiv

Performance of CMOS pixel sensor prototypes in ams H35 and aH18 technology for the ATLAS ITk upgrade

Pixel sensors based on commercial high-voltage CMOS processes are an exciting technology that is considered as an option for the outer layer of the ATLAS inner tracker upgrade at the High Luminosity LHC. Here, charged particles are detected using deep n-wells as sensor diodes with the depleted region extending into the silicon bulk. Both analog and digital readout electronics can be added to achieve different levels of integration up to a fully monolithic sensor. Small scale prototypes using the ams CMOS technology have previously demonstrated that it can achieve the required radiation tolerance of $10^{15}~\text{n}_\text{eq}/\text{cm}^2$ and detection efficiencies above $99.5~\%$. Recently, large area prototypes, comparable in size to a full sensor, have been produced that include most features required towards a final design: the H35demo prototype produced in ams H35 technology that supports both external and integrated readout and the monolithic ATLASPix1 pre-production design produced in ams aH18 technology. Both chips are based on large fill-factor pixel designs, but differ in readout structure. Performance results for H35DEMO with capacitively-coupled external readout and first results for the monolithic ATLASPix1 are shown.

preprint2020arXiv

Phase-Matching Quantum Cryptographic Conferencing

Quantum cryptographic conferencing (QCC) holds promise for distributing information-theoretic secure keys among multiple users over long distance. Limited by the fragility of Greenberger-Horne-Zeilinger (GHZ) state, QCC networks based on directly distributing GHZ states at long distance still face big challenge. Another two potential approaches are measurement device independent QCC and conference key agreement with single-photon interference, which was proposed based on the post-selection of GHZ states and the post-selection of W state, respectively. However, implementations of the former protocol are still heavily constrained by the transmission rate $η$ of optical channels and the complexity of the setups for post-selecting GHZ states. Meanwhile, the latter protocol cannot be cast to a measurement device independent prepare-and-measure scheme. Combining the idea of post-selecting GHZ state and recently proposed twin-field quantum key distribution protocols, we report a QCC protocol based on weak coherent state interferences named phase-matching quantum cryptographic conferencing, which is immune to all detector side-channel attacks. The proposed protocol can improve the key generation rate from $\mathrm{O}(η^N)$ to $\mathrm{O}(η^{N-1})$ compared with the measurement device independent QCC protocols. Meanwhile, it can be easily scaled up to multiple parties due to its simple setup.

preprint2020arXiv

Revisiting the Effect of f-Functions in Predicting the Right Reaction Mechanism for Hypervalent Iodine Reagents

To understand the effect of f-functions in predicting the right reaction mechanism for hypervalent iodine reagents, we adopt the Ahlrichs basis set family def2-SVP and def2-TZVP to revisit the potential energy surfaces of IBX-mediated oxidation and Togni I's isomerisation. Our results further prove that f-functions (in either Pople, Dunning, or Ahlrichs basis set series) are indispensable to predict the correct rate-determining step of hypervalent iodine reagents. The f-functions have a significant impact on the predicted reaction barriers for processes involving the I-X (X = O, OH, CF$_3$, etc.) bond cleavage and formation, e.g. in the reductive elimination step or the hypervalent twist step. We furthermore explore two hypervalent twist modes that account for the different influences of f-functions for IBX and Togni I. Our findings may be helpful for theoretical chemists to appropriately study the reaction mechanism of hypervalent iodine reagents.

preprint2020arXiv

Room-temperature ferrimagnetism of anti-site-disordered Ca2MnOsO6

Room-temperature ferrimagnetism was discovered for the anti-site-disordered perovskite Ca2MnOsO6 with Tc = 305 K. Ca2MnOsO6 crystallizes into an orthorhombic structure with a space group of Pnma, in which Mn and Os share the oxygen-coordinated-octahedral site at an equal ratio without a noticeable ordered arrangement. The material is electrically semiconducting with variable-range-hopping behavior. X-ray absorption spectroscopy confirmed the trivalent state of the Mn and the pentavalent state of the Os. X-ray magnetic circular dichroism spectroscopy reveals that the Mn and Os magnetic moments are aligned antiferromagnetically, thereby classifying the material as a ferrimagnet which is in accordance with band structure calculations. It is intriguing that the magnetic signal of the Os is very weak, and that the observed total magnetic moment is primarily due to the Mn. The Tc = 305 K is the second highest in the material category of so-called disordered ferromagnets such as CaRu1-xMnxO3, SrRu1-xCrxO3, and CaIr1-xMnxO3, and hence, may support the development of spintronic oxides with relaxed requirements concerning the anti-site disorder of the magnetic ions.

preprint2020arXiv

Side-Aware Boundary Localization for More Precise Object Detection

Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets. In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.7%, and 0.9%, respectively. Code is available at https://github.com/open-mmlab/mmdetection.

preprint2020arXiv

The Clock and Control System for the ATLAS Liquid Argon Calorimeter Phase-I Upgrade

A Liquid-argon Trigger Digitizer Board (LTDB) is being developed to upgrade the ATLAS Liquid Argon Calorimeter Phase-I trigger electronics. The LTDB located at the front end needs to obtain the clock signals and be configured and monitored remotely from the back end. A clock and control system is being developed for the LTDB and the major functions of the system have been evaluated. The design and evaluation of the clock and control system are presented in this paper.

preprint2020arXiv

U-net Based Direct-path Dominance Test for Robust Direction-of-arrival Estimation

It has been noted that the identification of the time-frequency bins dominated by the contribution from the direct propagation of the target speaker can significantly improve the robustness of the direction-of-arrival estimation. However, the correct extraction of the direct-path sound is challenging especially in adverse environments. In this paper, a U-net based direct-path dominance test method is proposed. Exploiting the efficient segmentation capability of the U-net architecture, the direct-path information can be effectively retrieved from a dedicated multi-task neural network. Moreover, the training and inference of the neural network only need the input of a single microphone, circumventing the problem of array-structure dependence faced by common end-to-end deep learning based methods. Simulations demonstrate that significantly higher estimation accuracy can be achieved in high reverberant and low signal-to-noise ratio environments.

preprint2020arXiv

Unified Approach to Witness Nonentanglement-Breaking Quantum Channels

The ability of quantum devices to preserve or distribute entanglement is essential in employing quantum technologies. Such ability is described and guaranteed by the nonentanglement-breaking (nonEB) feature of participating quantum channels. For quantum information applications relying on entanglement, the certification of the nonEB feature is thus indispensable in designing, testing, and benchmarking quantum devices. Here, we develop a direct and operational approach for the certification of nonEB quantum channels. By utilizing the prepare-and-measure test, we derive a necessary and sufficient condition for witnessing nonEB channels, which is applicable in almost all experimental scenarios. The approach not only unifies and simplifies existing methods in the standard scenario and the measurement-device-independent scenario, but also goes further allowing for certifying the nonEB feature in the semi-device-independent scenario.

preprint2020arXiv

Vanishing Point Guided Natural Image Stitching

Recently, works on improving the naturalness of stitching images gain more and more extensive attention. Previous methods suffer the failures of severe projective distortion and unnatural rotation, especially when the number of involved images is large or images cover a very wide field of view. In this paper, we propose a novel natural image stitching method, which takes into account the guidance of vanishing points to tackle the mentioned failures. Inspired by a vital observation that mutually orthogonal vanishing points in Manhattan world can provide really useful orientation clues, we design a scheme to effectively estimate prior of image similarity. Given such estimated prior as global similarity constraints, we feed it into a popular mesh deformation framework to achieve impressive natural stitching performances. Compared with other existing methods, including APAP, SPHP, AANAP, and GSP, our method achieves state-of-the-art performance in both quantitative and qualitative experiments on natural image stitching.

preprint2020arXiv

Zipper Stack: Shadow Stacks Without Shadow

Return-Oriented Programming (ROP) is a typical attack technique that exploits return addresses to abuse existing code repeatedly. Most of the current return address protecting mechanisms (also known as the Backward-Edge Control-Flow Integrity) work only in limited threat models. For example, the attacker cannot break memory isolation, or the attacker has no knowledge of a secret key or random values. This paper presents a novel, lightweight mechanism protecting return addresses, Zipper Stack, which authenticates all return addresses by a chain structure using cryptographic message authentication codes (MACs). This innovative design can defend against the most powerful attackers who have full control over the program's memory and even know the secret key of the MAC function. This threat model is stronger than the one used in related work. At the same time, it produces low-performance overhead. We implemented Zipper Stack by extending the RISC-V instruction set architecture, and the evaluation on FPGA shows that the performance overhead of Zipper Stack is only 1.86%. Thus, we think Zipper Stack is suitable for actual deployment.

preprint2019arXiv

An Experimentally Verified Approach to non-Entanglement-Breaking Channel Certification

Ensuring the non-entanglement-breaking (non-EB) property of quantum channels is crucial for the effective distribution and storage of quantum states. However, a practical method for direct and accurate certification of the non-EB feature is highly desirable. Here, we propose and verify a realistic source based measurement device independent certification of non-EB channels. Our method is resilient to repercussions on the certification from experimental conditions, such as multiphotons and imperfect state preparation, and can be implemented with information incomplete set. We achieve good agreement between experimental outcomes and theoretical predictions, which is validated by the expected results of the ideal semi-quantum signaling game, and accurately certify the non-EB channels. Furthermore, our approach is highly robust to effects from noise. Therefore, the proposed approach can be expected to play a significant role in the design and evaluation of realistic quantum channels.

preprint2019arXiv

Quantifying the Performance of Federated Transfer Learning

The scarcity of data and isolated data islands encourage different organizations to share data with each other to train machine learning models. However, there are increasing concerns on the problems of data privacy and security, which urges people to seek a solution like Federated Transfer Learning (FTL) to share training data without violating data privacy. FTL leverages transfer learning techniques to utilize data from different sources for training, while achieving data privacy protection without significant accuracy loss. However, the benefits come with a cost of extra computation and communication consumption, resulting in efficiency problems. In order to efficiently deploy and scale up FTL solutions in practice, we need a deep understanding on how the infrastructure affects the efficiency of FTL. Our paper tries to answer this question by quantitatively measuring a real-world FTL implementation FATE on Google Cloud. According to the results of carefully designed experiments, we verified that the following bottlenecks can be further optimized: 1) Inter-process communication is the major bottleneck; 2) Data encryption adds considerable computation overhead; 3) The Internet networking condition affects the performance a lot when the model is large.

preprint2019arXiv

Secure Federated Matrix Factorization

To protect user privacy and meet law regulations, federated (machine) learning is obtaining vast interests in recent years. The key principle of federated learning is training a machine learning model without needing to know each user's personal raw private data. In this paper, we propose a secure matrix factorization framework under the federated learning setting, called FedMF. First, we design a user-level distributed matrix factorization framework where the model can be learned when each user only uploads the gradient information (instead of the raw preference data) to the server. While gradient information seems secure, we prove that it could still leak users' raw data. To this end, we enhance the distributed matrix factorization framework with homomorphic encryption. We implement the prototype of FedMF and test it with a real movie rating dataset. Results verify the feasibility of FedMF. We also discuss the challenges for applying FedMF in practice for future research.

preprint2016arXiv

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Scene text recognition plays an important role in many computer vision applications. The small size of available public available scene text datasets is the main challenge when training a text recognition CNN model. In this paper, we propose a CNN based Chinese text recognition algorithm. To enlarge the dataset for training the CNN model, we design a synthetic data engine for Chinese scene character generation, which generates representative character images according to the fonts use frequency of Chinese texts. As the Chinese text is more complex, the English text recognition CNN architecture is modified for Chinese text. To ensure the small size nature character dataset and the large size artificial character dataset are comparable in training, the CNN model are trained progressively. The proposed Chinese text recognition algorithm is evaluated with two Chinese text datasets. The algorithm achieves better recognize accuracy compared to the baseline methods.

preprint2016arXiv

A Novel Scene Text Detection Algorithm Based On Convolutional Neural Network

Candidate text region extraction plays a critical role in convolutional neural network (CNN) based text detection from natural images. In this paper, we propose a CNN based scene text detection algorithm with a new text region extractor. The so called candidate text region extractor I-MSER is based on Maximally Stable Extremal Region (MSER), which can improve the independency and completeness of the extracted candidate text regions. Design of I-MSER is motivated by the observation that text MSERs have high similarity and are close to each other. The independency of candidate text regions obtained by I-MSER is guaranteed by selecting the most representative regions from a MSER tree which is generated according to the spatial overlapping relationship among the MSERs. A multi-layer CNN model is trained to score the confidence value of the extracted regions extracted by the I-MSER for text detection. The new text detection algorithm based on I-MSER is evaluated with wide-used ICDAR 2011 and 2013 datasets and shows improved detection performance compared to the existing algorithms.

preprint2016arXiv

Context-aware System Service Call-oriented Symbolic Execution of Android Framework with Application to Exploit Generation

Android Framework is a layer of software that exists in every Android system managing resources of all Android apps. A vulnerability in Android Framework can lead to severe hacks, such as destroying user data and leaking private information. With tens of millions of Android devices unpatched due to Android fragmentation, vulnerabilities in Android Framework certainly attract attackers to exploit them. So far, enormous manual effort is needed to craft such exploits. To our knowledge, no research has been done on automatic generation of exploits that take advantage of Android Framework vulnerabilities. We make a first step towards this goal by applying symbolic execution of Android Framework to finding bugs and generating exploits. Several challenges have been raised by the task. (1) The information of an app flows to Android Framework in multiple intricate steps, making it difficult to identify symbolic inputs. (2) Android Framework has a complex initialization phase, which exacerbates the state space explosion problem. (3) A straightforward design that builds the symbolic executor as a layer inside the Android system will not work well: not only does the implementation have to ensure the compatibility with the Android system, but it needs to be maintained whenever Android gets updated. We present novel ideas and techniques to resolve the challenges, and have built the first system for symbolic execution of Android Framework. It fundamentally changes the state of the art in exploit generation on the Android system, and has been applied to constructing new techniques for finding vulnerabilities.

preprint2016arXiv

Convolutional Regression for Visual Tracking

Recently, discriminatively learned correlation filters (DCF) has drawn much attention in visual object tracking community. The success of DCF is potentially attributed to the fact that a large amount of samples are utilized to train the ridge regression model and predict the location of object. To solve the regression problem in an efficient way, these samples are all generated by circularly shifting from a search patch. However, these synthetic samples also induce some negative effects which weaken the robustness of DCF based trackers. In this paper, we propose a Convolutional Regression framework for visual tracking (CRT). Instead of learning the linear regression model in a closed form, we try to solve the regression problem by optimizing a one-channel-output convolution layer with Gradient Descent (GD). In particular, the receptive field size of the convolution layer is set to the size of object. Contrary to DCF, it is possible to incorporate all "real" samples clipped from the whole image. A critical issue of the GD approach is that most of the convolutional samples are negative and the contribution of positive samples will be suppressed. To address this problem, we propose a novel Automatic Hard Negative Mining method to eliminate easy negatives and enhance positives. Extensive experiments are conducted on a widely-used benchmark with 100 sequences. The results show that the proposed algorithm achieves outstanding performance and outperforms almost all the existing DCF based algorithms.

preprint2016arXiv

Development of an ADC Radiation Tolerance Characterization System for the Upgrade of the ATLAS LAr Calorimeter

ATLAS LAr calorimeter will perform its Phase-I upgrade during the long shut down (LS2) in 2018, a new LAr Trigger Digitizer Board (LTDB) will be designed and installed. Several commercial-off-the-shelf (COTS) multichannel high-speed ADCs have been selected as possible backups of the radiation tolerant ADC ASICs for LTDB. In order to evaluate the radiation tolerance of these back up commercial ADCs, we developed an ADC radiation tolerance characterization system, which includes the ADC boards, data acquisition (DAQ) board, signal generator, external power supplies and a host computer. The ADC board is custom designed for different ADCs, which has ADC driver and clock distribution circuits integrated on board. The Xilinx ZC706 FPGA development board is used as DAQ board. The data from ADC are routed to the FPGA through the FMC (FPGA Mezzanine Card) connector, de-serialized and monitored by the FPGA, and then transmitted to the host computer through the Gigabit Ethernet. A software program has been developed with Python, and all the commands are sent to the DAQ board through Gigabit Ethernet by this program. Two ADC boards have been designed for the TI ADS52J90 and ADI AD9249 respectively. TID test of both ADCs have been performed at BNL, and SEE test for ADS52J90 has been performed at Massachusetts General Hospital (MGH). Test results have been analyzed and presented. The test results demonstrate that our test system is very versatile, and working well for the radiation tolerance characterization of commercial multi-channel high-speed ADC for the upgrade of the ATLAS LAr calorimeter. It is applicable to other collider physics experiments where radiation tolerance is required as well.

preprint2016arXiv

Electromechanically Tunable Suspended Optical Nano-antenna

Coupling mechanical degrees of freedom with plasmonic resonances has potential applications in optomechanics, sensing, and active plasmonics. Here we demonstrate a suspended two-wire plasmonic nano-antenna acting like a nano-electrometer. The antenna wires are supported and electrically connected via thin leads without disturbing the antenna resonance. As a voltage is applied, equal charges are induced on both antenna wires. The resulting equilibrium between the repulsive Coulomb force and the restoring elastic bending force enables us to precisely control the gap size. As a result the resonance wavelength and the field enhancement of the suspended optical nano-antenna (SONA) can be reversibly tuned. Our experiments highlight the potential to realize large bandwidth optical nanoelectromechanical systems (NEMS).

preprint2016arXiv

Four paradoxes about the special theory of relativity

Various paradoxes about the relativity theory have been developed since the birth of this theory. Each paradox somewhat shows people's query about the relativity theory, and solving of each paradox demonstrates the correctness of relativity theory once again. In this paper, four paradoxes about the special theory of relativity are brought forward: displacement paradox, electromagnetic transformation paradox, Doppler paradox and magnetic force paradox. We hope some researchers can reasonably explain these paradoxes, and then knowledge of the relativity theory will become more abundant.

preprint2016arXiv

Highly Efficient Quantum Key Distribution Immune to All Detector Attacks

Vulnerabilities and imperfections of single-photon detectors have been shown to compromise security for quantum key distribution (QKD). The measurement-device-independent QKD (MDI-QKD) appears to be the most appealing solution to solve the issues. However, in practice one faces severe obstacles of having significantly lower key generation rate, difficult two photon interferences, and remote synchronization etc. In this letter, we propose a highly efficient and simple quantum key distribution scheme to remove all of these drawbacks. Our proposal can be implemented with only small modifications over the standard decoy BB84 system. Remarkably it enjoys both the advantages of high key generation rate (being almost two orders of magnitude higher than that based on conventional MDI-QKD) comparable to the normal decoy system, and security against any detector side channel attacks. Most favorably one can achieve complete Bell state measurements with resort to single photon interference, which reduces significantly experimental costs. Our approach enables utilization of high speed and efficient secure communication, particularly in real-life scenario of both metropolitan and intercity QKD network, with an attack free fashion from arbitrary detector side channels.

preprint2016arXiv

Non-local games and optimal steering at the boundary of the quantum set

The boundary between classical and quantum correlations is well characterised by linear constraints called Bell inequalities. It is much harder to characterise the boundary of the quantum set itself in the space of no-signaling correlations. For the points on the quantum boundary that violate maximally some Bell inequalities, Oppenheim and Wehner [Science 330, 1072 (2010)] pointed out a complex property: the optimal measurements of Alice steer Bob's local state to the eigenstate of an effective operator corresponding to its maximal eigenvalue. This effective operator is the linear combination of Bob's local operators induced by the coefficients of the Bell inequality, and it can be interpreted as defining a fine-grained uncertainty relation. It is natural to ask whether the same property holds for other points on the quantum boundary, using the Bell expression that defines the tangent hyperplane at each point. We prove that this is indeed the case for a large set of points, including some that were believed to provide counterexamples. The price to pay is to acknowledge that the Oppenheim-Wehner criterion does not respect equivalence under the no-signaling constraint: for each point, one has to look for specific forms of writing the Bell expressions.

preprint2016arXiv

Once for All: a Two-flow Convolutional Neural Network for Visual Tracking

One of the main challenges of visual object tracking comes from the arbitrary appearance of objects. Most existing algorithms try to resolve this problem as an object-specific task, i.e., the model is trained to regenerate or classify a specific object. As a result, the model need to be initialized and retrained for different objects. In this paper, we propose a more generic approach utilizing a novel two-flow convolutional neural network (named YCNN). The YCNN takes two inputs (one is object image patch, the other is search image patch), then outputs a response map which predicts how likely the object appears in a specific location. Unlike those object-specific approach, the YCNN is trained to measure the similarity between two image patches. Thus it will not be confined to any specific object. Furthermore the network can be end-to-end trained to extract both shallow and deep convolutional features which are dedicated for visual tracking. And once properly trained, the YCNN can be applied to track all kinds of objects without further training and updating. Benefiting from the once-for-all model, our algorithm is able to run at a very high speed of 45 frames-per-second. The experiments on 51 sequences also show that our algorithm achieves an outstanding performance.

preprint2015arXiv

A Decision-Aided Parallel SC-List Decoder for Polar Codes

In this paper, we propose a decision-aided scheme for parallel SC-List decoding of polar codes. At the parallel SC-List decoder, each survival path is extended based on multiple information bits, therefore the number of split paths becomes very large and the sorting to find the top L paths becomes very complex. We propose a decision-aided scheme to reduce the number of split paths and thus reduce the sorting complexity.

preprint2015arXiv

Capacity-Achieving Rateless Polar Codes

A rateless coding scheme transmits incrementally more and more coded bits over an unknown channel until all the information bits are decoded reliably by the receiver. We propose a new rateless coding scheme based on polar codes, and we show that this scheme is capacity-achieving, i.e. its information rate is as good as the best code specifically designed for the unknown channel. Previous rateless coding schemes are designed for specific classes of channels such as AWGN channels, binary erasure channels, etc. but the proposed rateless coding scheme is capacity-achieving for broad classes of channels as long as they are ordered via degradation. Moreover, it inherits the conceptual and computational simplicity of polar codes.

preprint2015arXiv

Comment on "Correlation between Bulk Thermodynamic Measurements and the Low-Temperature-Resistance Plateau in SmB6"

Low-temperature-resistivity plateau observed in $\rm SmB_6$ single crystal,which is due to surface, not bulk, conduction has been confirmed from electrical transport measurements. Recently, the correlation between bulk thermodynamic measurements and the low-temperature-resistance plateau in $\rm SmB_6$ have been investigated and a change in Sm valence at the surface has been obtained from x-ray absorption spectroscopy and x-ray magnetic circular dichroism. Here we show that the statement of the report are not supported by the results from x-ray absorption spectroscopy and x-ray magnetic circular dichroism.

preprint2015arXiv

Evaluation of commercial ADC radiation tolerance for accelerator experiments

Electronic components used in high energy physics experiments are subjected to a radiation background composed of high energy hadrons, mesons and photons. These particles can induce permanent and transient effects that affect the normal device operation. Ionizing dose and displacement damage can cause chronic damage which disable the device permanently. Transient effects or single event effects are in general recoverable with time intervals that depend on the nature of the failure. The magnitude of these effects is technology dependent with feature size being one of the key parameters. Analog to digital converters are components that are frequently used in detector front end electronics, generally placed as close as possible to the sensing elements to maximize signal fidelity. We report on radiation effects tests conducted on 17 commercially available analog to digital converters and extensive single event effect measurements on specific twelve and fourteen bit ADCs that presented high tolerance to ionizing dose. Mitigation strategies for single event effects (SEE) are discussed for their use in the large hadron collider environment.

preprint2015arXiv

Genuine High-Order Einstein-Podolsky-Rosen Steering

Einstein-Podolsky-Rosen (EPR) steering demonstrates that two parties share entanglement even if the measurement devices of one party are untrusted. Here, going beyond this bipartite concept, we develop a novel formalism to explore a large class of EPR steering from generic multipartite quantum systems of arbitrarily high dimensionality and degrees of freedom, such as graph states and hyperentangled systems. All of these quantum characteristics of genuine high-order EPR steering can be efficiently certified with few measurement settings in experiments. We faithfully demonstrate for the first time such generality by experimentally showing genuine four-partite EPR steering and applications to universal one-way quantum computing. Our formalism provides a new insight into the intermediate type of genuine multipartite Bell non-locality and potential applications to quantum of untrusted measurement devices.

preprint2015arXiv

Reduce the Complexity of List Decoding of Polar Codes by Tree-Pruning

Polar codes under cyclic redundancy check aided successive cancellation list (CA-SCL) decoding can outperform the turbo codes and the LDPC codes when code lengths are configured to be several kilobits. In order to reduce the decoding complexity, a novel tree-pruning scheme for the \mbox{SCL/CA-SCL} decoding algorithms is proposed in this paper. In each step of the decoding procedure, the candidate paths with metrics less than a threshold are dropped directly to avoid the unnecessary computations for the path searching on the descendant branches of them. Given a candidate path, an upper bound of the path metric of its descendants is proposed to determined whether the pruning of this candidate path would affect frame error rate (FER) performance. By utilizing this upper bounding technique and introducing a dynamic threshold, the proposed scheme deletes the redundant candidate paths as many as possible while keeping the performance deterioration in a tolerant region, thus it is much more efficient than the existing pruning scheme. With only a negligible loss of FER performance, the computational complexity of the proposed pruned decoding scheme is only about $40\%$ of the standard algorithm in the low signal-to-noise ratio (SNR) region (where the FER under CA-SCL decoding is about $0.1 \sim 0.001$), and it can be very close to that of the successive cancellation (SC) decoder in the moderate and high SNR regions.

preprint2015arXiv

RepNet: Cutting Tail Latency in Data Center Networks with Flow Replication

Data center networks need to provide low latency, especially at the tail, as demanded by many interactive applications. To improve tail latency, existing approaches require modifications to switch hardware and/or end-host operating systems, making them difficult to be deployed. We present the design, implementation, and evaluation of RepNet, an application layer transport that can be deployed today. RepNet exploits the fact that only a few paths among many are congested at any moment in the network, and applies simple flow replication to mice flows to opportunistically use the less congested path. RepNet has two designs for flow replication: (1) RepSYN, which only replicates SYN packets and uses the first connection that finishes TCP handshaking for data transmission, and (2) RepFlow which replicates the entire mice flow. We implement RepNet on {\tt node.js}, one of the most commonly used platforms for networked interactive applications. {\tt node}'s single threaded event-loop and non-blocking I/O make flow replication highly efficient. Performance evaluation on a real network testbed and in Mininet reveals that RepNet is able to reduce the tail latency of mice flows, as well as application completion times, by more than 50\%.

preprint2015arXiv

Unauthorized Cross-App Resource Access on MAC OS X and iOS

On modern operating systems, applications under the same user are separated from each other, for the purpose of protecting them against malware and compromised programs. Given the complexity of today's OSes, less clear is whether such isolation is effective against different kind of cross-app resource access attacks (called XARA in our research). To better understand the problem, on the less-studied Apple platforms, we conducted a systematic security analysis on MAC OS~X and iOS. Our research leads to the discovery of a series of high-impact security weaknesses, which enable a sandboxed malicious app, approved by the Apple Stores, to gain unauthorized access to other apps' sensitive data. More specifically, we found that the inter-app interaction services, including the keychain, WebSocket and NSConnection on OS~X and URL Scheme on the MAC OS and iOS, can all be exploited by the malware to steal such confidential information as the passwords for iCloud, email and bank, and the secret token of Evernote. Further, the design of the app sandbox on OS~X was found to be vulnerable, exposing an app's private directory to the sandboxed malware that hijacks its Apple Bundle ID. As a result, sensitive user data, like the notes and user contacts under Evernote and photos under WeChat, have all been disclosed. Fundamentally, these problems are caused by the lack of app-to-app and app-to-OS authentications. To better understand their impacts, we developed a scanner that automatically analyzes the binaries of MAC OS and iOS apps to determine whether proper protection is missing in their code. Running it on hundreds of binaries, we confirmed the pervasiveness of the weaknesses among high-impact Apple apps. Since the issues may not be easily fixed, we built a simple program that detects exploit attempts on OS~X, helping protect vulnerable apps before the problems can be fully addressed.

preprint2014arXiv

Development of COTS ADC SEE Test System for the ATLAS LAr Calorimeter Upgrade

Radiation-tolerant, high speed, high density and low power commercial off-the-shelf (COTS) analog-to-digital converters (ADCs) are planned to be used in the upgrade to the Liquid Argon (LAr) calorimeter front end (FE) trigger readout electronics. Total ionization dose (TID) and single event effect (SEE) are two important radiation effects which need to be characterized on COTS ADCs. In our initial TID test, Texas Instruments (TI) ADS5272 was identified to be the top performer after screening a total 17 COTS ADCs from different manufacturers with dynamic range and sampling rate meeting the requirements of the FE electronics. Another interesting feature of ADS5272 is its 6.5 clock cycles latency, which is the shortest among the 17 candidates. Based on the TID performance, we have designed a SEE evaluation system for ADS5272, which allows us to further assess its radiation tolerance. In this paper, we present a detailed design of ADS5272 SEE evaluation system and show the effectiveness of this system while evaluating ADS5272 SEE characteristics in multiple irradiation tests. According to TID and SEE test results, ADS5272 was chosen to be implemented in the full-size LAr Trigger Digitizer Board (LTDB) demonstrator, which will be installed on ATLAS calorimeter during the 2014 Long Shutdown 1 (LS1).

preprint2014arXiv

Polar Coded HARQ Scheme with Chase Combining

A hybrid automatic repeat request scheme with Chase combing (HARQ-CC) of polar codes is proposed. The existing analysis tools of the underlying rate-compatible punctured polar (RCPP) codes for additive white Gaussian noise (AWGN) channels are extended to Rayleigh fading channels. Then, an approximation bound of the throughput efficiency for the polar coded HARQ-CC scheme is derived. Utilizing this bound, the parameter configurations of the proposed scheme can be optimized. Simulation results show that, the proposed HARQ-CC scheme under a low-complexity SC decoding is only about $1.0$dB away from the existing schemes with incremental redundancy (\mbox{HARQ-IR}). Compared with the polar coded \mbox{HARQ-IR} scheme, the proposed HARQ-CC scheme requires less retransmissions and has the advantage of good compatibility to other communication techniques.

preprint2014arXiv

Space-Time Polar Coded Modulation

The polar codes are proven to be capacity-achieving and are shown to have equivalent or even better finite-length performance than the turbo/LDPC codes under some improved decoding algorithms over the additive white Gaussian noise (AWGN) channels. Polar coding is based on the so-called channel polarization phenomenon induced by a transform over the underlying binary-input channel. The channel polarization is found to be universal in many signal processing problems and has been applied to the coded modulation schemes. In this paper, the channel polarization is further extended to the multiple antenna transmission following a multilevel coding principle. The multiple-input multile-output (MIMO) channel under quadrature amplitude modulation (QAM) are transformed into a series of synthesized binary-input channels under a three-stage channel transform. Based on this generalized channel polarization, the proposed space-time polar coded modulation (STPCM) scheme allows a joint optimization of the binary polar coding, modulation and MIMO transmission. In addition, a practical solution of polar code construction over the fading channels is also provided, where the fading channels are approximated by an AWGN channel which shares the same capacity with the original. The simulations over the MIMO channel with uncorrelated Rayleigh fast fading show that the proposed STPCM scheme can outperform the bit-interleaved turbo coded scheme in all the simulated cases, where the latter is adopted in many existing communication systems.

preprint2013arXiv

A Hybrid ARQ Scheme Based on Polar Codes

A hybrid automatic repeat request (HARQ) scheme based on a novel class of rate-compatible polar (\mbox{RCP}) codes are proposed. The RCP codes are constructed by performing punctures and repetitions on the conventional polar codes. Simulation results over binary-input additive white Gaussian noise channels (BAWGNCs) show that, using a low-complexity successive cancellation (SC) decoder, the proposed HARQ scheme performs as well as the existing schemes based on turbo codes and low-density parity-check (LDPC) codes. The proposed transmission scheme is only about 1.0-1.5dB away from the channel capacity with the information block length of 1024 bits.

preprint2013arXiv

A real-time QKD system based on FPGA

A real-time Quantum Key Distribution System is developed in this paper. In the system, based on the feature of Field Programmable Gate Array (FPGA), secure key extraction control and algorithm have been optimally designed to perform sifting, error correction and privacy amplification altogether in real-time. In the QKD experiment information synchronization mechanism and high-speed classic data channel are designed to ensure the steady operation of the system. Decoy state and synchronous laser light source are used in the system, while the length of optical fiber between Alice and Bob is 20 km. With photons repetition frequency of 20 MHz, the final key rate could reach 17 kbps. Smooth and robust operation is verified with 6-hour continuous test and associated with encrypted voice communication test.

preprint2013arXiv

Direct and full-scale experimental verifications towards ground-satellite quantum key distribution

Quantum key distribution (QKD), provides the only intrinsically unconditional secure method for communication based on principle of quantum mechanics. Compared with fiber-based demonstrations-, free-space links could provide the most appealing solution for much larger distance. Despite of significant efforts, so far all realizations rely on stationary sites. Justifications are therefore extremely crucial for applications via a typical Low Earth Orbit Satellite (LEOS). To achieve direct and full-scale verifications, we demonstrate here three independent experiments with a decoy-state QKD system overcoming all the demanding conditions. The system is operated in a moving platform through a turntable, a floating platform through a hot-air balloon, and a huge loss channel, respectively, for substantiating performances under rapid motion, attitude change, vibration, random movement of satellites and in high-loss regime. The experiments cover expanded ranges for all the leading parameters of LEOS. Our results pave the way towards ground-satellite QKD and global quantum communication network.

preprint2013arXiv

Distributed Representations of Words and Phrases and their Compositionality

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

preprint2013arXiv

Efficient Estimation of Word Representations in Vector Space

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

preprint2013arXiv

Improved Successive Cancellation Decoding of Polar Codes

As improved versions of successive cancellation (SC) decoding algorithm, successive cancellation list (SCL) decoding and successive cancellation stack (SCS) decoding are used to improve the finite-length performance of polar codes. Unified descriptions of SC, SCL and SCS decoding algorithms are given as path searching procedures on the code tree of polar codes. Combining the ideas of SCL and SCS, a new decoding algorithm named successive cancellation hybrid (SCH) is proposed, which can achieve a better trade-off between computational complexity and space complexity. Further, to reduce the complexity, a pruning technique is proposed to avoid unnecessary path searching operations. Performance and complexity analysis based on simulations show that, with proper configurations, all the three improved successive cancellation (ISC) decoding algorithms can have a performance very close to that of maximum-likelihood (ML) decoding with acceptable complexity. Moreover, with the help of the proposed pruning technique, the complexities of ISC decoders can be very close to that of SC decoder in the moderate and high signal-to-noise ratio (SNR) regime.

preprint2013arXiv

Low-Complexity Sphere Decoding of Polar Codes based on Optimum Path Metric

Sphere decoding (SD) of polar codes is an efficient method to achieve the error performance of maximum likelihood (ML) decoding. But the complexity of the conventional sphere decoder is still high, where the candidates in a target sphere are enumerated and the radius is decreased gradually until no available candidate is in the sphere. In order to reduce the complexity of SD, a stack SD (SSD) algorithm with an efficient enumeration is proposed in this paper. Based on a novel path metric, SSD can effectively narrow the search range when enumerating the candidates within a sphere. The proposed metric follows an exact ML rule and takes the full usage of the whole received sequence. Furthermore, another very simple metric is provided as an approximation of the ML metric in the high signal-to-noise ratio regime. For short polar codes, simulation results over the additive white Gaussian noise channels show that the complexity of SSD based on the proposed metrics is up to 100 times lower than that of the conventional SD.

preprint2013arXiv

Polar Coded Modulation with Optimal Constellation Labeling

A practical $2^m$-ary polar coded modulation (PCM) scheme with optimal constellation labeling is proposed. To efficiently find the optimal labeling rule, the search space is reduced by exploiting the symmetry properties of the channels. Simulation results show that the proposed PCM scheme can outperform the bit-interleaved turbo coded modulation scheme used in the WCDMA (Wideband Code Division Multiple Access) mobile communication systems by up to 1.5dB.

preprint2011arXiv

Experimental demonstration of counterfactual quantum communication

Based on principle of quantum mechanics, quantum cryptography provides an intriguing way to establish secret keys between remote parties, generally relying on actual transmission of signal particles. Surprisingly, an even more striking method is recently proposed by Noh named as `counterfactual quantum cryptography' enabling key distribution, in which particles carrying secret information are seemly not being transmitted through quantum channel. We experimentally give here a faithful implementation by following the scheme with an on-table realization. Furthermore, we report an illustration on a 1 km fiber operating at telecom wavelength to verify its feasibility for extending to long distance. For both cases, high visibilities of better than 98% are maintained with active stabilization of interferometers, while a quantum bit error rate around 5.5% is attained after 1 km channel.

preprint2011arXiv

Observation of new intrinsic properties of VO2

The single crystal VO2, exihibiting a first-order metal-insulator transition (MIT) at 67.2 degrees C and an insulator-insulator transition (IIT) at ~49.7 degrees C, is grown. From synchrotron-based x-ray microdiffraction analysis, the IIT shows structural phase transition (SPT) of monoclinic M2 to M1 phases while the MIT displays M1 to rutile R phases. The IIT exhibits percolative SPT while the MIT shows abrupt transition width of < 0.02 degrees C, supporting Mott's prediction. The MIT occurs non-percolatively with a sharp boundary between R and M1 phases. The MIT onset temperature shows significant variation.

preprint2010arXiv

Experimental demonstration of a heralded entanglement source

The heralded generation of entangled states is a long-standing goal in quantum information processing, because it is indispensable for a number of quantum protocols. Polarization entangled photon pairs are usually generated through spontaneous parametric down-conversion, but the emission is probabilistic. Their applications are generally accompanied by post-selection and destructive photon detection. Here, we report a source of entanglement generated in an event-ready manner by conditioned detection of auxiliary photons. This scheme benefits from the stable and robust properties of spontaneous parametric down-conversion and requires only modest experimental efforts. It is flexible and allows the preparation efficiency to be significantly improved by using beamsplitters with different transmission ratios. We have achieved a fidelity better than 87% and a state preparation efficiency of 45% for the source. This could offer promise in essential photonics-based quantum information tasks, and particularly in enabling optical quantum computing by reducing dramatically the computational overhead.

preprint2010arXiv

Metropolitan all-pass and inter-city quantum communication network

We have demonstrated a metropolitan all-pass quantum communication network in field fiber for four nodes. Any two nodes of them can be connected in the network to perform quantum key distribution (QKD). An optical switching module is presented that enables arbitrary 2-connectivity among output ports. Integrated QKD terminals are worked out, which can operate either as a transmitter, a receiver, or even both at the same time. Furthermore, an additional link in another city of 60 km fiber (up to 130 km) is seamless integrated into this network based on a trusted relay architecture. On all the links, we have implemented protocol of decoy state scheme. All of necessary electrical hardware, synchronization, feedback control, network software, execution of QKD protocols are made by tailored designing, which allow a completely automatical and stable running. Our system has been put into operation in Hefei in August 2009, and publicly demonstrated during an evaluation conference on quantum network organized by the Chinese Academy of Sciences on August 29, 2009. Real-time voice telephone with one-time pad encoding between any two of the five nodes (four all-pass nodes plus one additional node through relay) is successfully established in the network within 60km.

preprint2010arXiv

Verifying Genuine High-Order Entanglement

High-order entanglement embedded in multipartite multilevel quantum systems (qudits) with many degrees of freedom (DOFs) plays an important role in quantum foundation and quantum engineering. Verifying high-order entanglement without the restriction of system complexity is a critical need in any experiments on general entanglement. Here, we introduce a scheme to efficiently detect genuine high-order entanglement, such as states close to genuine qudit Bell, Greenberger-Horne-Zeilinger, and cluster states as well as multilevel multi-DOF hyperentanglement. All of them can be identified with two local measurement settings per DOF regardless of the qudit or DOF number. The proposed verifications together with further utilities such as fidelity estimation could pave the way for experiments by reducing dramatically the measurement overhead.

preprint2009arXiv

Experimental Determination of Entanglement for Arbitrary Pure States

We present a way of experimentally determining the concurrence in terms of the expectation values of local observables for arbitrary multipartite pure states. In stead of the joint measurements on two copies of a state in the experiment for two-qubit systems [S. P. Walborn et al. Nature (London)440, 20(2006)], we only need one copy of the state in every measurement for any arbitrary dimensional multipartite systems, avoiding the preparation of twin states or the imperfect copy of the state.

preprint2007arXiv

Concurrence-based entanglement measure for Werner States

We give explicit expressions for entanglement measures of Werner states in arbitrary dimensions in terms of concurrence and tangle. We show that an optimal ensemble decomposition for a joint density matrix of a Werner state can achieve the minimum average concurrence and tangle simultaneously. Furthermore, the same decomposition also attains entanglement of formation for Werner states.

preprint2007arXiv

Experimental realization of one-way quantum computing with two-photon four-qubit cluster states

We report an experimental realization of one-way quantum computing on a two-photon four-qubit cluster state. This is accomplished by developing a two-photon cluster state source entangled both in polarization and spatial modes. With this special source, we implemented a highly efficient Grover's search algorithm and high-fidelity two qubits quantum gates. Our experiment demonstrates that such cluster states could serve as an ideal source and a building block for rapid and precise optical quantum computation.

preprint2006arXiv

Decoy state quantum key distribution with two-way classical post-processing

Decoy states have recently been proposed as a useful method for substantially improving the performance of quantum key distribution protocols when a coherent state source is used. Previously, data post-processing schemes based on one-way classical communications were considered for use with decoy states. In this paper, we develop two data post-processing schemes for the decoy-state method using two-way classical communications. Our numerical simulation (using parameters from a specific QKD experiment as an example) results show that our scheme is able to extend the maximal secure distance from 142km (using only one-way classical communications with decoy states) to 181km. The second scheme is able to achieve a 10% greater key generation rate in the whole regime of distances.

preprint2001arXiv

The Dn Ruijsenaars-Schneider model

The Lax pair of the Ruijsenaars-Schneider model with interaction potential of trigonometric type based on Dn Lie algebra is presented. We give a general form for the Lax pair and prove partial results for small n. Liouville integrability of the corresponding system follows a series of involutive Hamiltonians generated by the characteristic polynomial of the Lax matrix. The rational case appears as a natural degeneration and the nonrelativistic limit exactly leads to the well-known Calogero-Moser system associated with Dn Lie algebra.

preprint2001arXiv

The Lax pairs for elliptic C_n and BC_n Ruijsenaars-Schneider models and their spectral curves

We study the elliptic C_n and BC_n Ruijsenaars-Schneider models which is elliptic generalization of system given in hep-th/0006004. The Lax pairs for these models are constructed by Hamiltonian reduction technology. We show that the spectral curves can be parameterized by the involutive integrals of motion for these models. Taking nonrelativistic limit and scaling limit, we verify that they lead to the systems corresponding to Calogero-Moser and Toda types.

preprint2000arXiv

Integrability of the $C_{n}$ and $BC_{n}$ Ruijsenaars-Schneider models

We study the $C_{n}$ and $BC_{n}$ Ruijsenaars-Schneider(RS) models with interaction potential of trigonometric and rational types. The Lax pairs for these models are constructed and the involutive Hamiltonians are also given. Taking nonrelativistic limit, we also obtain the Lax pairs for the corresponding Calogero-Moser systems.

Kai Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

109 published item(s)

Decompose to Understand, Fuse to Detect: Frequency-Decoupled Anomaly Detection for Encrypted Network Traffic

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

Boosting Neural Networks to Decompile Optimized Binaries

A Novel Multi-Agent Scheduling Mechanism for Adaptation of Production Plans in Case of Supply Chain Disruptions

A two-stage full-band speech enhancement model with effective spectral compression mapping

APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation

Attacking Video Recognition Models with Bullet-Screen Comments

Dense Siamese Network for Dense Unsupervised Learning

Double-pass multiple-plate continuum for high temporal contrast nonlinear pulse compression

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Experimental Demonstration of Quantum Pseudotelepathy

Few-Shot Object Detection via Association and DIscrimination

GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors

Group R-CNN for Weakly Semi-supervised Object Detection with Points

Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

MMRotate: A Rotated Object Detection Benchmark using PyTorch

Neural-iLQR: A Learning-Aided Shooting Method for Trajectory Optimization

No Free Lunch Theorem for Security and Utility in Federated Learning

Non-Hermitian $C_{NH} = 2$ Chern insulator protected by generalized rotational symmetry

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

OCSampler: Compressing Videos to One Clip with Single-step Sampling

PPA: Preference Profiling Attack Against Federated Learning

Practical and Secure Federated Recommendation with Personalized Masks

Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data

PYSKL: Towards Good Practices for Skeleton Action Recognition

Revisiting Skeleton-based Action Recognition

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Secure Forward Aggregation for Vertical Federated Neural Networks

Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

Towards Robust Part-aware Instance Segmentation for Industrial Bin Picking

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

What Are Expected Queries in End-to-End Object Detection?

Cylindrical vector beams reveal radiationless anapole condition in a resonant state

Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic Framework

Extracting Quantitative Dielectric Properties from Pump-Probe Spectroscopy

Possible multi-orbital ground state in CeCu$_2$Si$_2$

Semantics-Recovering Decompilation through Neural Machine Translation

SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization

A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

All-optical nonreciprocity due to valley polarization in transition metal dichalcogenides

Anapole mediated giant photothermal nonlinearity in nanostructured silicon

Confidential Attestation: Efficient in-Enclave Verification of Privacy Policy Compliance

Cross Architectural Power Modelling

Domain-specific Communication Optimization for Distributed DNN Training

Feature Pyramid Grids

FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning

Gliding vertex on the horizontal bounding box for multi-oriented object detection

Neural Data-to-Text Generation with Dynamic Content Planning

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Performance of CMOS pixel sensor prototypes in ams H35 and aH18 technology for the ATLAS ITk upgrade

Phase-Matching Quantum Cryptographic Conferencing

Revisiting the Effect of f-Functions in Predicting the Right Reaction Mechanism for Hypervalent Iodine Reagents

Room-temperature ferrimagnetism of anti-site-disordered Ca2MnOsO6

Side-Aware Boundary Localization for More Precise Object Detection

The Clock and Control System for the ATLAS Liquid Argon Calorimeter Phase-I Upgrade

U-net Based Direct-path Dominance Test for Robust Direction-of-arrival Estimation

Unified Approach to Witness Nonentanglement-Breaking Quantum Channels

Vanishing Point Guided Natural Image Stitching

Zipper Stack: Shadow Stacks Without Shadow

An Experimentally Verified Approach to non-Entanglement-Breaking Channel Certification

Quantifying the Performance of Federated Transfer Learning

Secure Federated Matrix Factorization

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

A Novel Scene Text Detection Algorithm Based On Convolutional Neural Network

Context-aware System Service Call-oriented Symbolic Execution of Android Framework with Application to Exploit Generation

Convolutional Regression for Visual Tracking

Development of an ADC Radiation Tolerance Characterization System for the Upgrade of the ATLAS LAr Calorimeter

Electromechanically Tunable Suspended Optical Nano-antenna