Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
44works
0followers
27topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

44 published item(s)

preprint2026arXiv

Cross-modal Retrieval Models for Stripped Binary Analysis

Retrieving binary code via natural language queries is a pivotal capability for downstream tasks in the software security domain, such as vulnerability detection and malware analysis. However, it is challenging to identify binary functions semantically relevant to the user query from thousands of candidates, as the absence of symbolic information distinguishes this task from source code retrieval. In this paper, we introduce, BinSeek, a two-stage cross-modal retrieval framework for stripped binary code analysis. It consists of two models: BinSeek-Embedding is trained on large-scale dataset to learn the semantic relevance of the binary code and the natural language description, furthermore, BinSeek-Reranker learns to carefully judge the relevance of the candidate code to the description with context augmentation. To this end, we built an LLM-based data synthesis pipeline to automate training construction, also deriving a domain benchmark for future research. Our evaluation results show that BinSeek achieved the state-of-the-art performance, surpassing the the same scale models by 31.42% in Rec@3 and 27.17% in MRR@3, as well as leading the advanced general-purpose models that have 16 times larger parameters.

preprint2026arXiv

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes spanning five scenarios of different complexity, SG-Agent improves spatial fidelity and physical plausibility over prior methods, while SG-Mini performs competitively against larger LLM-based baselines on single-shot generation scenarios.

preprint2026arXiv

VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

Despite the rapid advancements in Vision-Language Models (VLMs), a critical gap remains in their ability to handle structured, controllable diagrammatic tasks essential for professional workflows. Existing methods predominantly rely on pixel-based synthesis, which operates in probabilistic pixel spaces and is inherently limited in editability and fidelity. Instead, we propose a new Diagram-as-Code paradigm with symbolic logic that leverages mxGraph Extensible Markup Language (XML) for precise diagram generation and editing. We present VCG-Bench, a unified benchmark for visual-centric \texttt{mxGraph} tasks. VCG-Bench comprises: (1) a taxonomized dataset of 1,449 diverse diagrams spanning 6 domains and 15 sub-domains, (2) a paradigm definition that integrates Generation (Vision-to-Code) and Editability (Code-to-Code), (3) a Tailored Evaluation Protocol employing multi-dimensional metrics such as \texttt{mxGraph} Execution Success Rate, Style Consistency Score (SCS), etc. Experimental results highlight the challenges faced by current State-of-the-Art (SOTA) VLMs in structured fidelity and instruction compliance, reflecting their vision and reasoning capabilities.

preprint2022arXiv

Adaptive Target-Condition Neural Network: DNN-Aided Load Balancing for Hybrid LiFi and WiFi Networks

Load balancing (LB) is a challenging issue in the hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets), due to the nature of heterogeneous access points (APs). Machine learning has the potential to provide a complexity-friendly LB solution with near-optimal network performance, at the cost of a training process. The state-of-the-art (SOTA) learning-aided LB methods, however, need retraining when the network environment (especially the number of users) changes, significantly limiting its practicability. In this paper, a novel deep neural network (DNN) structure named adaptive target-condition neural network (A-TCNN) is proposed, which conducts AP selection for one target user upon the condition of other users. Also, an adaptive mechanism is developed to map a smaller number of users to a larger number through splitting their data rate requirements, without affecting the AP selection result for the target user. This enables the proposed method to handle different numbers of users without the need for retraining. Results show that A-TCNN achieves a network throughput very close to that of the testing dataset, with a gap less than 3%. It is also proven that A-TCNN can obtain a network throughput comparable to two SOTA benchmarks, while reducing the runtime by up to three orders of magnitude.

preprint2022arXiv

Amplification of quantum signals by the non-Hermitian skin effect

The non-Hermitian skin effect (NHSE) is a phenomenon whereby certain non-Hermitian lattice Hamiltonians, particularly those with nonreciprocal couplings, can host an extensive number of eigenmodes condensed to the boundary, called skin modes. Although the NHSE has mostly been studied in the classical regime, we show that it can also manifest in quantum systems containing boson number nonconserving processes arising from uniform parametric driving. We study lattices of coupled nonlinear resonators that can function as reciprocal quantum amplifiers. A one-dimensional chain exhibiting NHSE can perform strong photon amplification, aided by the skin modes, that scales exponentially with the chain length and outperforms alternative lattice configurations that lack the NHSE. We show also that two-dimensional nonlinear lattices can perform directional photon amplification between different lattice corners, due to the two-dimensional NHSE.

preprint2022arXiv

An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation

Temporal action segmentation (TAS) aims to classify and locate actions in the long untrimmed action sequence. With the success of deep learning, many deep models for action segmentation have emerged. However, few-shot TAS is still a challenging problem. This study proposes an efficient framework for the few-shot skeleton-based TAS, including a data augmentation method and an improved model. The data augmentation approach based on motion interpolation is presented here to solve the problem of insufficient data, and can increase the number of samples significantly by synthesizing action sequences. Besides, we concatenate a Connectionist Temporal Classification (CTC) layer with a network designed for skeleton-based TAS to obtain an optimized model. Leveraging CTC can enhance the temporal alignment between prediction and ground truth and further improve the segment-wise metrics of segmentation results. Extensive experiments on both public and self-constructed datasets, including two small-scale datasets and one large-scale dataset, show the effectiveness of two proposed methods in improving the performance of the few-shot skeleton-based TAS task.

preprint2022arXiv

Automatic dataset generation for specific object detection

In the past decade, object detection tasks are defined mostly by large public datasets. However, building object detection datasets is not scalable due to inefficient image collecting and labeling. Furthermore, most labels are still in the form of bounding boxes, which provide much less information than the real human visual system. In this paper, we present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information. In brief, given a set of images containing a target object, our algorithm first trains a model to find an approximate center of the object as an anchor, then makes an outline regression to estimate its boundary, and finally blends the object into a new scene. Our result shows that in the synthesized image, the boundaries of objects blend very well with the background. Experiments also show that SOTA segmentation models work well with our synthesized data.

preprint2022arXiv

Complementary Calibration: Boosting General Continual Learning with Collaborative Distillation and Self-Supervision

General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa.

preprint2022arXiv

Compositions and parities of complete mappings and of orthomorphisms

We determine the permutation groups $P_{\mathrm{comp}}(\mathbb{F}_q),P_{\mathrm{orth}}(\mathbb{F}_q)\leq\operatorname{Sym}(\mathbb{F}_q)$ generated by the complete mappings, respectively the orthomorphisms, of the finite field $\mathbb{F}_q$ -- both are equal to $\operatorname{Sym}(\mathbb{F}_q)$ unless $q\in\{2,3,4,5,8\}$. More generally, denote by $P_{\mathrm{comp}}(G)$, respectively $P_{\mathrm{orth}}(G)$, the subgroup of $\operatorname{Sym}(G)$ generated by the complete mappings, respectively the orthomorphisms, of the group $G$. Using recent results of Eberhard-Manners-Mrazović and Müyesser-Pokrovskiy, we show that for each large enough finite group $G$ that has a complete mapping (i.e., whose Sylow $2$-subgroups are trivial or noncyclic), $P_{\mathrm{comp}}(G)=\operatorname{Sym}(G)$ and $P_{\mathrm{orth}}(G)\geq\operatorname{Alt}(G)$. We also prove that $P_{\mathrm{orth}}(G)=\operatorname{Sym}(G)$ for every large enough finite solvable group $G$ that has a complete mapping. Proving these results requires us to study the parities of complete mappings and of orthomorphisms. Some connections with known results in cryptography and with parity types of Latin squares are also discussed.

preprint2022arXiv

Deep Learning-Assisted Co-registration of Full-Spectral Autofluorescence Lifetime Microscopic Images with H&E-Stained Histology Images

Autofluorescence lifetime images reveal unique characteristics of endogenous fluorescence in biological samples. Comprehensive understanding and clinical diagnosis rely on co-registration with the gold standard, histology images, which is extremely challenging due to the difference of both images. Here, we show an unsupervised image-to-image translation network that significantly improves the success of the co-registration using a conventional optimisation-based regression network, applicable to autofluorescence lifetime images at different emission wavelengths. A preliminary blind comparison by experienced researchers shows the superiority of our method on co-registration. The results also indicate that the approach is applicable to various image formats, like fluorescence intensity images. With the registration, stitching outcomes illustrate the distinct differences of the spectral lifetime across an unstained tissue, enabling macro-level rapid visual identification of lung cancer and cellular-level characterisation of cell variants and common types. The approach could be effortlessly extended to lifetime images beyond this range and other staining technologies.

preprint2022arXiv

Disentangled Representation Learning for Text-Video Retrieval

Cross-modality interaction is a critical component in Text-Video Retrieval (TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance. This paper first studies the interaction paradigm in depth, where we find that its computation can be split into two terms, the interaction contents at different granularity and the matching function to distinguish pairs with the same semantics. We also observe that the single-vector representation and implicit intensive function substantially hinder the optimization. Based on these findings, we propose a disentangled framework to capture a sequential and hierarchical representation. Firstly, considering the natural sequential structure in both text and video inputs, a Weighted Token-wise Interaction (WTI) module is performed to decouple the content and adaptively exploit the pair-wise correlations. This interaction can form a better disentangled manifold for sequential inputs. Secondly, we introduce a Channel DeCorrelation Regularization (CDCR) to minimize the redundancy between the components of the compared vectors, which facilitate learning a hierarchical representation. We demonstrate the effectiveness of the disentangled representation on various benchmarks, e.g., surpassing CLIP4Clip largely by +2.9%, +3.1%, +7.9%, +2.3%, +2.8% and +6.5% R@1 on the MSR-VTT, MSVD, VATEX, LSMDC, AcitivityNet, and DiDeMo, respectively.

preprint2022arXiv

EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching

Recent advanced studies have spent considerable human efforts on optimizing network architectures for stereo matching but hardly achieved both high accuracy and fast inference speed. To ease the workload in network design, neural architecture search (NAS) has been applied with great success to various sparse prediction tasks, such as image classification and object detection. However, existing NAS studies on the dense prediction task, especially stereo matching, still cannot be efficiently and effectively deployed on devices of different computing capabilities. To this end, we propose to train an elastic and accurate network for stereo matching (EASNet) that supports various 3D architectural settings on devices with different computing capabilities. Given the deployment latency constraint on the target device, we can quickly extract a sub-network from the full EASNet without additional training while the accuracy of the sub-network can still be maintained. Extensive experiments show that our EASNet outperforms both state-of-the-art human-designed and NAS-based architectures on Scene Flow and MPI Sintel datasets in terms of model accuracy and inference speed. Particularly, deployed on an inference GPU, EASNet achieves a new SOTA 0.73 EPE on the Scene Flow dataset with 100 ms, which is 4.5$\times$ faster than LEAStereo with a better quality model.

preprint2022arXiv

fETSmcs: Feature-based ETS model component selection

The well-developed ETS (ExponenTial Smoothing or Error, Trend, Seasonality) method incorporating a family of exponential smoothing models in state space representation has been widely used for automatic forecasting. The existing ETS method uses information criteria for model selection by choosing an optimal model with the smallest information criterion among all models fitted to a given time series. The ETS method under such a model selection scheme suffers from computational complexity when applied to large-scale time series data. To tackle this issue, we propose an efficient approach for ETS model selection by training classifiers on simulated data to predict appropriate model component forms for a given time series. We provide a simulation study to show the model selection ability of the proposed approach on simulated data. We evaluate our approach on the widely used forecasting competition data set M4, in terms of both point forecasts and prediction intervals. To demonstrate the practical value of our method, we showcase the performance improvements from our approach on a monthly hospital data set.

preprint2022arXiv

Learning a Structured Latent Space for Unsupervised Point Cloud Completion

Unsupervised point cloud completion aims at estimating the corresponding complete point cloud of a partial point cloud in an unpaired manner. It is a crucial but challenging problem since there is no paired partial-complete supervision that can be exploited directly. In this work, we propose a novel framework, which learns a unified and structured latent space that encoding both partial and complete point clouds. Specifically, we map a series of related partial point clouds into multiple complete shape and occlusion code pairs and fuse the codes to obtain their representations in the unified latent space. To enforce the learning of such a structured latent space, the proposed method adopts a series of constraints including structured ranking regularization, latent code swapping constraint, and distribution supervision on the related partial point clouds. By establishing such a unified and structured latent space, better partial-complete geometry consistency and shape completion accuracy can be achieved. Extensive experiments show that our proposed method consistently outperforms state-of-the-art unsupervised methods on both synthetic ShapeNet and real-world KITTI, ScanNet, and Matterport3D datasets.

preprint2022arXiv

RCL: Recurrent Continuous Localization for Temporal Action Detection

Temporal representation is the cornerstone of modern action detection techniques. State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a discretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. Specifically, the proposed representation builds upon an explicit model conditioned with video embeddings and temporal coordinates, which ensure the capability of detecting segments with arbitrary length. To optimize the continuous representation, we develop an effective scale-invariant sampling strategy and recurrently refine the prediction in subsequent iterations. Our continuous anchoring scheme is fully differentiable, allowing to be seamlessly integrated into existing detectors, e.g., BMN and G-TAD. Extensive experiments on two benchmarks demonstrate that our continuous representation steadily surpasses other discretized counterparts by ~2% mAP. As a result, RCL achieves 52.92% mAP@0.5 on THUMOS14 and 37.65% mAP on ActivtiyNet v1.3, outperforming all existing single-model detectors.

preprint2022arXiv

Real Robot Challenge: A Robotics Competition in the Cloud

Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.

preprint2022arXiv

SiamMask: A Framework for Fast Online Object Tracking and Segmentation

In this paper we introduce SiamMask, a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method. We improve the offline training procedure of popular fully-convolutional Siamese approaches by augmenting their losses with a binary segmentation task. Once the offline training is completed, SiamMask only requires a single bounding box for initialization and can simultaneously carry out visual object tracking and segmentation at high frame-rates. Moreover, we show that it is possible to extend the framework to handle multiple object tracking and segmentation by simply re-using the multi-task model in a cascaded fashion. Experimental results show that our approach has high processing efficiency, at around 55 frames per second. It yields real-time state-of-the-art results on visual-object tracking benchmarks, while at the same time demonstrating competitive performance at a high speed for video object segmentation benchmarks.

preprint2022arXiv

Solving the Real Robot Challenge using Deep Reinforcement Learning

This paper details our winning submission to Phase 1 of the 2021 Real Robot Challenge; a challenge in which a three-fingered robot must carry a cube along specified goal trajectories. To solve Phase 1, we use a pure reinforcement learning approach which requires minimal expert knowledge of the robotic system, or of robotic grasping in general. A sparse, goal-based reward is employed in conjunction with Hindsight Experience Replay to teach the control policy to move the cube to the desired x and y coordinates of the goal. Simultaneously, a dense distance-based reward is employed to teach the policy to lift the cube to the z coordinate (the height component) of the goal. The policy is trained in simulation with domain randomisation before being transferred to the real robot for evaluation. Although performance tends to worsen after this transfer, our best policy can successfully lift the real cube along goal trajectories via an effective pinching grasp. Our approach outperforms all other submissions, including those leveraging more traditional robotic control techniques, and is the first pure learning-based method to solve this challenge.

preprint2022arXiv

Spatial Transformer Network with Transfer Learning for Small-scale Fine-grained Skeleton-based Tai Chi Action Recognition

Human action recognition is a quite hugely investigated area where most remarkable action recognition networks usually use large-scale coarse-grained action datasets of daily human actions as inputs to state the superiority of their networks. We intend to recognize our small-scale fine-grained Tai Chi action dataset using neural networks and propose a transfer-learning method using NTU RGB+D dataset to pre-train our network. More specifically, the proposed method first uses a large-scale NTU RGB+D dataset to pre-train the Transformer-based network for action recognition to extract common features among human motion. Then we freeze the network weights except for the fully connected (FC) layer and take our Tai Chi actions as inputs only to train the initialized FC weights. Experimental results show that our general model pipeline can reach a high accuracy of small-scale fine-grained Tai Chi action recognition with even few inputs and demonstrate that our method achieves the state-of-the-art performance compared with previous Tai Chi action recognition methods.

preprint2022arXiv

Topological phenomena at topological defects

There are two prominent applications of the mathematical concept of topology to the physics of materials: band topology, which classifies different topological insulators and semimetals, and topological defects that represent immutable deviations of a solid lattice from its ideal crystalline form. While these two classes of topological phenomena have generally been treated as separate topics, recent experimental advancements have begun to probe their intricate and surprising interactions, in real materials as well as synthetic metamaterials. Topological lattice defects in topological materials offer a platform to explore a diverse range of novel phenomena, such as topological pumping via topological defects, embedded topological phases, synthetic dimensions, and non-Hermitian skin effects. In this Perspective, we survey the developments in this rapidly moving field, and give an outlook of its impact on materials science and applications.

preprint2021arXiv

Accurate Visual-Inertial SLAM by Feature Re-identification

We propose a novel feature re-identification method for real-time visual-inertial SLAM. The front-end module of the state-of-the-art visual-inertial SLAM methods (e.g. visual feature extraction and matching schemes) relies on feature tracks across image frames, which are easily broken in challenging scenarios, resulting in insufficient visual measurement and accumulated error in pose estimation. In this paper, we propose an efficient drift-less SLAM method by re-identifying existing features from a spatial-temporal sensitive sub-global map. The re-identified features over a long time span serve as augmented visual measurements and are incorporated into the optimization module which can gradually decrease the accumulative error in the long run, and further build a drift-less global map in the system. Extensive experiments show that our feature re-identification method is both effective and efficient. Specifically, when combining the feature re-identification with the state-of-the-art SLAM method [11], our method achieves 67.3% and 87.5% absolute translation error reduction with only a small additional computational cost on two public SLAM benchmark DBs: EuRoC and TUM-VI respectively.

preprint2021arXiv

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking

Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation. With this, monitoring the operation status of UAVs is crucially important. In this work, we consider the task of tracking UAVs, providing rich information such as location and trajectory. To facilitate research on this topic, we propose a dataset, Anti-UAV, with more than 300 video pairs containing over 580k manually annotated bounding boxes. The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs. Furthermore, the advancement of addressing research challenges in Anti-UAV can help the design of anti-UAV systems, leading to better surveillance of UAVs. Besides, a novel approach named dual-flow semantic consistency (DFSC) is proposed for UAV tracking. Modulated by the semantic flow across video sequences, the tracker learns more robust class-level semantic information and obtains more discriminative instance-level features. Experimental results demonstrate that Anti-UAV is very challenging, and the proposed method can effectively improve the tracker's performance. The Anti-UAV benchmark and the code of the proposed approach will be publicly available at https://github.com/ucas-vg/Anti-UAV.

preprint2021arXiv

EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual

Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps at multiple scales and thus improve the residual learning efficiency. Extensive experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works and achieves state-of-the-art performance with significantly faster speed and less memory consumption.

preprint2021arXiv

Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

Nowadays, live-stream and short video shopping in E-commerce have grown exponentially. However, the sellers are required to manually match images of the selling products to the timestamp of exhibition in the untrimmed video, resulting in a complicated process. To solve the problem, we present an innovative demonstration of multi-modal retrieval system called "Fashion Focus", which enables to exactly localize the product images in the online video as the focuses. Different modality contributes to the community localization, including visual content, linguistic features and interaction context are jointly investigated via presented multi-modal learning. Our system employs two procedures for analysis, including video content structuring and multi-modal retrieval, to automatically achieve accurate video-to-shop matching. Fashion Focus presents a unified framework that can orientate the consumers towards relevant product exhibitions during watching videos and help the sellers to effectively deliver the products over search and recommendation.

preprint2021arXiv

Force-guided High-precision Grasping Control of Fragile and Deformable Objects using sEMG-based Force Prediction

Regulating contact forces with high precision is crucial for grasping and manipulating fragile or deformable objects. We aim to utilize the dexterity of human hands to regulate the contact forces for robotic hands and exploit human sensory-motor synergies in a wearable and non-invasive way. We extracted force information from the electric activities of skeletal muscles during their voluntary contractions through surface electromyography (sEMG). We built a regression model based on a Neural Network to predict the gripping force from the preprocessed sEMG signals and achieved high accuracy (R2 = 0.982). Based on the force command predicted from human muscles, we developed a force-guided control framework, where force control was realized via an admittance controller that tracked the predicted gripping force reference to grasp delicate and deformable objects. We demonstrated the effectiveness of the proposed method on a set of representative fragile and deformable objects from daily life, all of which were successfully grasped without any damage or deformation.

preprint2021arXiv

Generalized cyclotomic mappings: Switching between polynomial, cyclotomic, and wreath product form

This paper is concerned with so-called index $d$ generalized cyclotomic mappings of a finite field $\mathbb{F}_q$, which are functions $\mathbb{F}_q\rightarrow\mathbb{F}_q$ that agree with a suitable monomial function $x\mapsto ax^r$ on each coset of the index $d$ subgroup of $\mathbb{F}_q^{\ast}$. We discuss two important rewriting procedures in the context of generalized cyclotomic mappings and present applications thereof that concern index $d$ generalized cyclotomic permutations of $\mathbb{F}_q$ and pertain to cycle structures, the classification of $(q-1)$-cycles and involutions, as well as inversion.

preprint2021arXiv

Multi-cell NOMA: Coherent Reconfigurable Intelligent Surfaces Model With Stochastic Geometry

Reconfigurable intelligent surfaces (RISs) become promising for enhancing non-orthogonal multiple access (NOMA) systems, i.e., enhancing the channel quality and altering the SIC orders. Invoked by stochastic geometry methods, we investigate the downlink coverage performance of RIS-aided multi-cell NOMA networks. We first derive the RIS-aided channel model, concluding the direct and reflecting links. The analytical results demonstrate that the RIS-aided channel model can be closely modeled as a Gamma distribution. Additionally, interference from other cells is analyzed. Lastly, we derive closed-form coverage probability expressions for the paired NOMA users. Numerical results indicate that 1) although the interference from other cells is enhanced via the RISs, the performance of the RIS-aided user still enhances since the channel quality is strengthened more obviously; and 2) the SIC order can be altered by employing the RISs since the RISs improve the channel quality of the aided user.

preprint2020arXiv

Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs

Distributed Deep Learning (DDL) has rapidly grown its popularity since it helps boost the training performance on high-performance GPU clusters. Efficient job scheduling is indispensable to maximize the overall performance of the cluster when training multiple jobs simultaneously. However, existing schedulers do not consider the communication contention of multiple communication tasks from different distributed training jobs, which could deteriorate the system performance and prolong the job completion time. In this paper, we first establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient algorithm, LWF-$κ$, to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling those communication tasks, we observe that neither avoiding all the contention nor blindly accepting them is optimal to minimize the job completion time. We thus propose a provable algorithm, AdaDUAL, to efficiently schedule those communication tasks. Based on AdaDUAL, we finally propose Ada-SRSF for the DDL job scheduling problem. Simulations on a 64-GPU cluster connected with 10 Gbps Ethernet show that LWF-$κ$ achieves up to $1.59\times$ improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by $20.1\%$ and $36.7\%$, as compared to the SRSF(1) scheme (avoiding all the contention) and the SRSF(2) scheme (blindly accepting all of two-way communication contention) respectively.

preprint2020arXiv

Data Poisoning Attacks on Federated Machine Learning

Federated machine learning which enables resource constrained node devices (e.g., mobile phones and IoT devices) to learn a shared model while keeping the training data local, can provide privacy, security and economic benefits by designing an effective communication protocol. However, the communication protocol amongst different nodes could be exploited by attackers to launch data poisoning attacks, which has been demonstrated as a big threat to most machine learning models. In this paper, we attempt to explore the vulnerability of federated machine learning. More specifically, we focus on attacking a federated multi-task learning framework, which is a federated learning framework via adopting a general multi-task learning framework to handle statistical challenges. We formulate the problem of computing optimal poisoning attacks on federated multi-task learning as a bilevel program that is adaptive to arbitrary choice of target nodes and source attacking nodes. Then we propose a novel systems-aware optimization method, ATTack on Federated Learning (AT2FL), which is efficiency to derive the implicit gradients for poisoned data, and further compute optimal attack strategies in the federated machine learning. Our work is an earlier study that considers issues of data poisoning attack for federated learning. To the end, experimental results on real-world datasets show that federated multi-task learning model is very sensitive to poisoning attacks, when the attackers either directly poison the target nodes or indirectly poison the related nodes by exploiting the communication protocol.

preprint2020arXiv

DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks

Convolution operator is the core of convolutional neural networks (CNNs) and occupies the most computation cost. To make CNNs more efficient, many methods have been proposed to either design lightweight networks or compress models. Although some efficient network structures have been proposed, such as MobileNet or ShuffleNet, we find that there still exists redundant information between convolution kernels. To address this issue, we propose a novel dynamic convolution method to adaptively generate convolution kernels based on image contents. To demonstrate the effectiveness, we apply dynamic convolution on multiple state-of-the-art CNNs. On one hand, we can reduce the computation cost remarkably while maintaining the performance. For ShuffleNetV2/MobileNetV2/ResNet18/ResNet50, DyNet can reduce 37.0/54.7/67.2/71.3% FLOPs without loss of accuracy. On the other hand, the performance can be largely boosted if the computation cost is maintained. Based on the architecture MobileNetV3-Small/Large, DyNet achieves 70.3/77.1% Top-1 accuracy on ImageNet with an improvement of 2.9/1.9%. To verify the scalability, we also apply DyNet on segmentation task, the results show that DyNet can reduce 69.3% FLOPs while maintaining Mean IoU on segmentation task.

preprint2020arXiv

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing works under-look the performance optimization of SpDM on modern many-core architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal and Tesla P100) with CUDA-8.0 using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8$\times$ speedup over Nvidia's library cuSPARSE in many matrices. We also analyze instruction-level operations on a particular GPU to understand the performance gap between GCOOSpDM and cuSPARSE. The profiled instructions confirm that cuSPARSE spends a lot of time on slow memory access (including DRAM access and L2 cache access), while GCOOSpDM transfers such slow memory access to faster shared memory, which mainly contributes to the performance gain. Results also show that GCOOSpDM would outperform the dense algorithm (cuBLAS) with lower sparsity than cuSPARSE on GPUs.

preprint2020arXiv

FADNet: A Fast and Accurate Network for Disparity Estimation

Deep neural networks (DNNs) have achieved great success in the area of computer vision. The disparity estimation problem tends to be addressed by DNNs which achieve much better prediction accuracy in stereo matching than traditional hand-crafted feature based methods. On one hand, however, the designed DNNs require significant memory and computation resources to accurately predict the disparity, especially for those 3D convolution based networks, which makes it difficult for deployment in real-time applications. On the other hand, existing computation-efficient networks lack expression capability in large-scale datasets so that they cannot make an accurate prediction in many scenarios. To this end, we propose an efficient and accurate deep network for disparity estimation named FADNet with three main features: 1) It exploits efficient 2D based correlation layers with stacked blocks to preserve fast computation; 2) It combines the residual structures to make the deeper model easier to learn; 3) It contains multi-scale predictions so as to exploit a multi-scale weight scheduling training technique to improve the accuracy. We conduct experiments to demonstrate the effectiveness of FADNet on two popular datasets, Scene Flow and KITTI 2015. Experimental results show that FADNet achieves state-of-the-art prediction accuracy, and runs at a significant order of magnitude faster speed than existing 3D models. The codes of FADNet are available at https://github.com/HKBU-HPML/FADNet.

preprint2020arXiv

Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees

To reduce the long training time of large deep neural network (DNN) models, distributed synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers. However, the speedup brought by multiple workers is limited by the communication overhead. Two approaches, namely pipelining and gradient sparsification, have been separately proposed to alleviate the impact of communication overheads. Yet, the gradient sparsification methods can only initiate the communication after the backpropagation, and hence miss the pipelining opportunity. In this paper, we propose a new distributed optimization method named LAGS-SGD, which combines S-SGD with a novel layer-wise adaptive gradient sparsification (LAGS) scheme. In LAGS-SGD, every worker selects a small set of "significant" gradients from each layer independently whose size can be adaptive to the communication-to-computation ratio of that layer. The layer-wise nature of LAGS-SGD opens the opportunity of overlapping communications with computations, while the adaptive nature of LAGS-SGD makes it flexible to control the communication time. We prove that LAGS-SGD has convergence guarantees and it has the same order of convergence rate as vanilla S-SGD under a weak analytical assumption. Extensive experiments are conducted to verify the analytical assumption and the convergence performance of LAGS-SGD. Experimental results on a 16-GPU cluster show that LAGS-SGD outperforms the original S-SGD and existing sparsified S-SGD without losing obvious model accuracy.

preprint2020arXiv

Multi-layer Representation Fusion for Neural Machine Translation

Neural machine translation systems require a number of stacked layers for deep models. But the prediction depends on the sentence representation of the top-most layer with no access to low-level representations. This makes it more difficult to train the model and poses a risk of information loss to prediction. In this paper, we propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers. In particular, we design three fusion functions to learn a better representation from the stack. Experimental results show that our approach yields improvements of 0.92 and 0.56 BLEU points over the strong Transformer baseline on IWSLT German-English and NIST Chinese-English MT tasks respectively. The result is new state-of-the-art in German-English translation.

preprint2020arXiv

Neural Machine Translation with Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to- Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point.We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

preprint2020arXiv

Observation of photonic antichiral edge states

Chiral edge states are a hallmark feature of two-dimensional topological materials. Such states must propagate along the edges of the bulk either clockwise or counterclockwise, and thus produce oppositely propagating edge states along the two parallel edges of a strip sample. However, recent theories have predicted a counterintuitive picture, where the two edge states at the two parallel strip edges can propagate in the same direction; these anomalous topological edge states are named as antichiral edge states. Here we report the experimental observation of antichiral edge states in a gyromagnetic photonic crystal. The crystal consists of gyromagnetic cylinders in a honeycomb lattice, with the two triangular sublattices magnetically biased in opposite directions. With microwave measurement, unique properties of antichiral edge states have been observed directly, which include the titled dispersion, the chiral-like robust propagation in samples with certain shapes, and the scattering into backward bulk states at certain terminations. These results extend and supplement the current understanding of chiral edge states.

preprint2020arXiv

Observation of Protected Photonic Edge States Induced By Real-Space Topological Lattice Defects

Topological defects (TDs) in crystal lattices are elementary lattice imperfections that cannot be removed by local perturbations, due to their real space topology. We show that adding TDs into a valley photonic crystal generates a lattice disclination that acts like a domain wall and hosts topological edge states. The disclination functions as a freeform waveguide connecting a pair of TDs of opposite topological charge. This interplay between the real-space topology of lattice defects and band topology provides a novel scheme to implement large-scale photonic structures with complex arrangements of robust topological waveguides and resonators.

preprint2020arXiv

On Inverses of Permutation Polynomials of Small Degree over Finite Fields

Permutation polynomials (PPs) and their inverses have applications in cryptography, coding theory and combinatorial design theory. In this paper, we make a brief summary of the inverses of PPs of finite fields, and give the inverses of all PPs of degree $\leq 6$ over finite fields $\mathbb{F}_{q}$ for all $q$ and the inverses of all PPs of degree $7$ over $\mathbb{F}_{2^n}$. The explicit inverse of a class of fifth degree PPs is the main result, which is obtained by using Lucas' theorem, some congruences of binomial coefficients, and a known formula for the inverses of PPs of finite fields.

preprint2020arXiv

Rethinking Performance Estimation in Neural Architecture Search

Neural architecture search (NAS) remains a challenging problem, which is attributed to the indispensable and time-consuming component of performance estimation (PE). In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space. Since searching an optimal BPE is extremely time-consuming as it requires to train a large number of networks for evaluation, we propose a Minimum Importance Pruning (MIP) approach. Given a dataset and a BPE search space, MIP estimates the importance of hyper-parameters using random forest and subsequently prunes the minimum one from the next iteration. In this way, MIP effectively prunes less important hyper-parameters to allocate more computational resource on more important ones, thus achieving an effective exploration. By combining BPE with various search algorithms including reinforcement learning, evolution algorithm, random search, and differentiable architecture search, we achieve 1, 000x of NAS speed up with a negligible performance drop comparing to the SOTA

preprint2019arXiv

Exceptional Cones in 4D Parameter Space

The notion of synthetic dimensions has expanded the realm of topological physics to four dimensional (4D) space lately. In this work, non-Hermiticity is used as a synthetic parameter in PT-symmetric photonic crystals to study the topological physics in 4D non-Hermitian synthetic parameter space. We realize a 3D exceptional hypersurface (EHS) in such 4D parameter space, and the degeneracy points emerge due to the symmetry of synthetic parameters. We further demonstrate the existence of exceptional degenerate points (EDPs) on the EHS that originates from the chirality of exceptional points (EPs), and the exceptional surface near EDPs behaves like a Dirac cone. We further show that a very narrow reflection plateau can be found near these EDPs, and their sensitivity towards the PT-symmetry breaking environmental perturbation can make these degeneracy points useful in optical sensing and many other nonlinear and quantum optical applications.

preprint2019arXiv

Non-Hermitian Dirac Cones

Non-Hermitian systems, which contain gain or loss, commonly host exceptional point degeneracies rather than the diabolic points found in Hermitian systems. We present a class of non-Hermitian lattice models with symmetry-stabilized diabolic points, such as Dirac or Weyl points. They exhibit non-Hermiticity-induced phenomena previously existing in the Hermitian regime, including topological phase transitions, Landau levels induced by pseudo-magnetic fields, and Fermi arc surface states. These behaviors are controllable via gain and loss, with promising applications in tunable active topological devices.

preprint2019arXiv

Wall Crossing Structures and Application to SU(3) Seiberg-Witten Integrable system

We apply the wall crossing structure formalism of Kontsevich and Soibelman to Seiberg-Witten integrable systems associated to pure $SU(3)$. This gives an algorithm for computing the Donaldson-Thomas invariants, which correspond to BPS degeneracy of the corresponding BPS states in physics. The main ingredients of this algorithm are the use of split attractor flows and Kontsevich Soibelman wall crossing formulas. Besides the known BPS spectrum in pure $SU(3)$ case, we obtain new family of BPS states with BPS-invariants equal to 2.