Source author record

Li Liu

Li Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

80works

43topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising phenomenon for the first time: Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness. This finding directly exposes the inconsistency between accuracy and robustness score fluctuations, leading us to identify the misalignment between input and latent spaces as a critical driver of the robustness-accuracy trade-off. To mitigate this misalignment for harmonizing accuracy and robustness, we define Robust Alignment as a new AT target, encouraging the model perception to change with input perturbations provided the final label prediction remains unchanged, which can be achieved via two novel ideas. First, we suggest a reduced and fixed perturbation intensity for those boundary samples, which facilitates the model to utilize the perturbations as learnable patterns, instead of noises that complicate decision boundaries meaninglessly. Second, we propose a Domain Interpolation Consistency Adversarial Regularization (DICAR), based on rigorous theoretical derivations, which explicitly introduces semantic alignment between input and latent spaces into AT. Based on these two ideas, we end up with a new Robust Alignment Adversarial Training (RAAT) method, effectively harmonizing accuracy and robustness. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet with ResNet-18, PreActResNet-18, and WideResNet-28-10 demonstrate the effectiveness of RAAT in improving the trade-off beyond four common baselines and a total of 14 related state-of-the-art (SOTA) works.

preprint2024arXiv

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans. Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system, such as backdoor attack occurring at the pre-training, in-training and inference stage; weight attack occurring at the post-training, deployment and inference stage; adversarial attack occurring at the inference stage. However, although these adversarial paradigms share a common goal, their developments are almost independent, and there is still no big picture of AML. In this work, we aim to provide a unified perspective to the AML community to systematically review the overall progress of this field. We firstly provide a general definition about AML, and then propose a unified mathematical framework to covering existing attack paradigms. According to the proposed unified framework, we build a full taxonomy to systematically categorize and review existing representative methods for each paradigm. Besides, using this unified framework, it is easy to figure out the connections and differences among different attack paradigms, which may inspire future researchers to develop more advanced attack paradigms. Finally, to facilitate the viewing of the built taxonomy and the related literature in adversarial machine learning, we further provide a website, \ie, \url{http://adversarial-ml.com}, where the taxonomies and literature will be continuously updated.

preprint2023arXiv

BS3D: Building-scale 3D Reconstruction from RGB-D Images

Various datasets have been proposed for simultaneous localization and mapping (SLAM) and related problems. Existing datasets often include small environments, have incomplete ground truth, or lack important sensor data, such as depth and infrared images. We propose an easy-to-use framework for acquiring building-scale 3D reconstruction using a consumer depth camera. Unlike complex and expensive acquisition setups, our system enables crowd-sourcing, which can greatly benefit data-hungry algorithms. Compared to similar systems, we utilize raw depth maps for odometry computation and loop closure refinement which results in better reconstructions. We acquire a building-scale 3D dataset (BS3D) and demonstrate its value by training an improved monocular depth estimation model. As a unique experiment, we benchmark visual-inertial odometry methods using both color and active infrared images.

preprint2022arXiv

A weighted first-order formulation for solving anisotropic diffusion equations with deep neural networks

In this paper, a new weighted first-order formulation is proposed for solving the anisotropic diffusion equations with deep neural networks. For many numerical schemes, the accurate approximation of anisotropic heat flux is crucial for the overall accuracy. In this work, the heat flux is firstly decomposed into two components along the two eigenvectors of the diffusion tensor, thus the anisotropic heat flux approximation is converted into the approximation of two isotropic components. Moreover, to handle the possible jump of the diffusion tensor across the interface, the weighted first-order formulation is obtained by multiplying this first-order formulation by a weighted function. By the decaying property of the weighted function, the weighted first-order formulation is always well-defined in the pointwise way. Finally, the weighted first-order formulation is solved with deep neural network approximation. Compared to the neural network approximation with the original second-order elliptic formulation, the proposed method can significantly improve the accuracy, especially for the discontinuous anisotropic diffusion problems.

preprint2022arXiv

Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature

Acoustic-to-articulatory inversion (AAI) is to obtain the movement of articulators from speech signals. Until now, achieving a speaker-independent AAI remains a challenge given the limited data. Besides, most current works only use audio speech as input, causing an inevitable performance bottleneck. To solve these problems, firstly, we pre-train a speech decomposition network to decompose audio speech into speaker embedding and content embedding as the new personalized speech features to adapt to the speaker-independent case. Secondly, to further improve the AAI, we propose a novel auxiliary feature network to estimate the lip auxiliary features from the above personalized speech features. Experimental results on three public datasets show that, compared with the state-of-the-art only using the audio speech feature, the proposed method reduces the average RMSE by 0.25 and increases the average correlation coefficient by 2.0% in the speaker-dependent case. More importantly, the average RMSE decreases by 0.29 and the average correlation coefficient increases by 5.0% in the speaker-independent case.

preprint2022arXiv

Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation

Currently end-to-end deep learning based open-domain dialogue systems remain black box models, making it easy to generate irrelevant contents with data-driven models. Specifically, latent variables are highly entangled with different semantics in the latent space due to the lack of priori knowledge to guide the training. To address this problem, this paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving mesoscopic scale feature disentanglement. Particularly, the model integrates the macro-level guided-category knowledge and micro-level open-domain dialogue data for the training, leveraging the priori knowledge into the latent space, which enables the model to disentangle the latent variables within the mesoscopic scale. Besides, we propose a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution. Finally, we validate our model on different datasets and experimentally demonstrate that our model is able to generate higher quality and more interpretable dialogues than other models.

preprint2022arXiv

Anti-Forgery: Towards a Stealthy and Robust DeepFake Disruption Attack via Adversarial Perceptual-aware Perturbations

DeepFake is becoming a real risk to society and brings potential threats to both individual privacy and political security due to the DeepFaked multimedia are realistic and convincing. However, the popular DeepFake passive detection is an ex-post forensics countermeasure and failed in blocking the disinformation spreading in advance. To address this limitation, researchers study the proactive defense techniques by adding adversarial noises into the source data to disrupt the DeepFake manipulation. However, the existing studies on proactive DeepFake defense via injecting adversarial noises are not robust, which could be easily bypassed by employing simple image reconstruction revealed in a recent study MagDR. In this paper, we investigate the vulnerability of the existing forgery techniques and propose a novel \emph{anti-forgery} technique that helps users protect the shared facial images from attackers who are capable of applying the popular forgery techniques. Our proposed method generates perceptual-aware perturbations in an incessant manner which is vastly different from the prior studies by adding adversarial noises that is sparse. Experimental results reveal that our perceptual-aware perturbations are robust to diverse image transformations, especially the competitive evasion technique, MagDR via image reconstruction. Our findings potentially open up a new research direction towards thorough understanding and investigation of perceptual-aware adversarial attack for protecting facial images against DeepFakes in a proactive and robust manner. We open-source our tool to foster future research. Code is available at https://github.com/AbstractTeen/AntiForgery/.

preprint2022arXiv

Attentional Feature Refinement and Alignment Network for Aircraft Detection in SAR Imagery

Aircraft detection in Synthetic Aperture Radar (SAR) imagery is a challenging task in SAR Automatic Target Recognition (SAR ATR) areas due to aircraft's extremely discrete appearance, obvious intraclass variation, small size and serious background's interference. In this paper, a single-shot detector namely Attentional Feature Refinement and Alignment Network (AFRAN) is proposed for detecting aircraft in SAR images with competitive accuracy and speed. Specifically, three significant components including Attention Feature Fusion Module (AFFM), Deformable Lateral Connection Module (DLCM) and Anchor-guided Detection Module (ADM), are carefully designed in our method for refining and aligning informative characteristics of aircraft. To represent characteristics of aircraft with less interference, low-level textural and high-level semantic features of aircraft are fused and refined in AFFM throughly. The alignment between aircraft's discrete back-scatting points and convolutional sampling spots is promoted in DLCM. Eventually, the locations of aircraft are predicted precisely in ADM based on aligned features revised by refined anchors. To evaluate the performance of our method, a self-built SAR aircraft sliced dataset and a large scene SAR image are collected. Extensive quantitative and qualitative experiments with detailed analysis illustrate the effectiveness of the three proposed components. Furthermore, the topmost detection accuracy and competitive speed are achieved by our method compared with other domain-specific,e.g., DAPN, PADN, and general CNN-based methods,e.g., FPN, Cascade R-CNN, SSD, RefineDet and RPDet.

preprint2022arXiv

Constant gap between conventional strategies and those based on C*-dynamics for self-embezzlement

We consider a bipartite transformation that we call self-embezzlement and use it to prove a constant gap between the capabilities of two models of quantum information: the conventional model, where bipartite systems are represented by tensor products of Hilbert spaces; and a natural model of quantum information processing for abstract states on C*-algebras, where joint systems are represented by tensor products of C*-algebras. We call this the C*-circuit model and show that it is a special case of the commuting-operator model (in that it can be translated into such a model). For the conventional model, we show that there exists a constant $ε_0 > 0$ such that self-embezzlement cannot be achieved with precision parameter less than $ε_0$ (i.e., the fidelity cannot be greater than $1 - ε_0$); whereas, in the C*-circuit model -- as well as in a commuting-operator model -- the precision can be $0$ (i.e., fidelity~$1$).

preprint2022arXiv

Decoupling Makes Weakly Supervised Local Feature Better

Weakly supervised learning can help local feature methods to overcome the obstacle of acquiring a large-scale dataset with densely labeled correspondences. However, since weak supervision cannot distinguish the losses caused by the detection and description steps, directly conducting weakly supervised learning within a joint describe-then-detect pipeline suffers limited performance. In this paper, we propose a decoupled describe-then-detect pipeline tailored for weakly supervised local feature learning. Within our pipeline, the detection step is decoupled from the description step and postponed until discriminative and robust descriptors are learned. In addition, we introduce a line-to-window search strategy to explicitly use the camera pose information for better descriptor learning. Extensive experiments show that our method, namely PoSFeat (Camera Pose Supervised Feature), outperforms previous fully and weakly supervised methods and achieves state-of-the-art performance on a wide range of downstream tasks.

preprint2022arXiv

Efficient Video Transformers with Spatial-Temporal Token Selection

Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost. In this paper, we present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples. Specifically, we formulate token selection as a ranking problem, which estimates the importance of each token through a lightweight scorer network and only those with top scores will be used for downstream evaluation. In the temporal dimension, we keep the frames that are most relevant to the action categories, while in the spatial dimension, we identify the most discriminative region in feature maps without affecting the spatial context used in a hierarchical way in most video transformers. Since the decision of token selection is non-differentiable, we employ a perturbed-maximum based differentiable Top-K operator for end-to-end training. We mainly conduct extensive experiments on Kinetics-400 with a recently introduced video transformer backbone, MViT. Our framework achieves similar results while requiring 20% less computation. We also demonstrate our approach is generic for different transformer architectures and video datasets. Code is available at https://github.com/wangjk666/STTS.

preprint2022arXiv

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Knowledge Distillation has shown very promising abil-ity in transferring learned representation from the largermodel (teacher) to the smaller one (student).Despitemany efforts, prior methods ignore the important role ofretaining inter-channel correlation of features, leading tothe lack of capturing intrinsic distribution of the featurespace and sufficient diversity properties of features in theteacher network.To solve the issue, we propose thenovel Inter-Channel Correlation for Knowledge Distillation(ICKD), with which the diversity and homology of the fea-ture space of the student network can align with that ofthe teacher network. The correlation between these twochannels is interpreted as diversity if they are irrelevantto each other, otherwise homology. Then the student isrequired to mimic the correlation within its own embed-ding space. In addition, we introduce the grid-level inter-channel correlation, making it capable of dense predictiontasks. Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation,demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation. Toour knowledge, we are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification. Code is available at:https://github.com/ADLab-AutoDrive/ICKD.

preprint2022arXiv

Graph Signal Processing for Heterogeneous Change Detection Part I: Vertex Domain Filtering

This paper provides a new strategy for the Heterogeneous Change Detection (HCD) problem: solving HCD from the perspective of Graph Signal Processing (GSP). We construct a graph for each image to capture the structure information, and treat each image as the graph signal. In this way, we convert the HCD into a GSP problem: a comparison of the responses of the two signals on different systems defined on the two graphs, which attempts to find structural differences (Part I) and signal differences (Part II) due to the changes between heterogeneous images. In this first part, we analyze the HCD with GSP from the vertex domain. We first show that for the unchanged images, their structures are consistent, and then the outputs of the same signal on systems defined on the two graphs are similar. However, once a region has changed, the local structure of the image changes, i.e., the connectivity of the vertex containing this region changes. Then, we can compare the output signals of the same input graph signal passing through filters defined on the two graphs to detect changes. We design different filters from the vertex domain, which can flexibly explore the high-order neighborhood information hidden in original graphs. We also analyze the detrimental effects of changing regions on the change detection results from the viewpoint of signal propagation. Experiments conducted on seven real data sets show the effectiveness of the vertex domain filtering based HCD method.

preprint2022arXiv

Graph Signal Processing for Heterogeneous Change Detection Part II: Spectral Domain Analysis

This is the second part of the paper that provides a new strategy for the heterogeneous change detection (HCD) problem, that is, solving HCD from the perspective of graph signal processing (GSP). We construct a graph to represent the structure of each image, and treat each image as a graph signal defined on the graph. In this way, we can convert the HCD problem into a comparison of responses of signals on systems defined on the graphs. In the part I, the changes are measured by comparing the structure difference between the graphs from the vertex domain. In this part II, we analyze the GSP for HCD from the spectral domain. We first analyze the spectral properties of the different images on the same graph, and show that their spectra exhibit commonalities and dissimilarities. Specially, it is the change that leads to the dissimilarities of their spectra. Then, we propose a regression model for the HCD, which decomposes the source signal into the regressed signal and changed signal, and requires the regressed signal have the same spectral property as the target signal on the same graph. With the help of graph spectral analysis, the proposed regression model is flexible and scalable. Experiments conducted on seven real data sets show the effectiveness of the proposed method.

preprint2022arXiv

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i.e., feature vectors) and structured (i.e., related attributes) constraints. In this paper, we present HQANN, a simple yet highly efficient hybrid query processing framework which can be easily embedded into existing proximity graph-based ANNS algorithms. We guarantee both low latency and high recall by leveraging navigation sense among attributes and fusing vector similarity search with attribute filtering. Experimental results on both public and in-house datasets demonstrate that HQANN is 10x faster than the state-of-the-art hybrid ANNS solutions to reach the same recall quality and its performance is hardly affected by the complexity of attributes. It can reach 99\% recall@10 in just around 50 microseconds On GLOVE-1.2M with thousands of attribute constraints.

preprint2022arXiv

Investigate the Essence of Long-Tailed Recognition from a Unified Perspective

As the data scale grows, deep recognition models often suffer from long-tailed data distributions due to the heavy imbalanced sample number across categories. Indeed, real-world data usually exhibit some similarity relation among different categories (e.g., pigeons and sparrows), called category similarity in this work. It is doubly difficult when the imbalance occurs between such categories with similar appearances. However, existing solutions mainly focus on the sample number to re-balance data distribution. In this work, we systematically investigate the essence of the long-tailed problem from a unified perspective. Specifically, we demonstrate that long-tailed recognition suffers from both sample number and category similarity. Intuitively, using a toy example, we first show that sample number is not the unique influence factor for performance dropping of long-tailed recognition. Theoretically, we demonstrate that (1) category similarity, as an inevitable factor, would also influence the model learning under long-tailed distribution via similar samples, (2) using more discriminative representation methods (e.g., self-supervised learning) for similarity reduction, the classifier bias can be further alleviated with greatly improved performance. Extensive experiments on several long-tailed datasets verify the rationality of our theoretical analysis, and show that based on existing state-of-the-arts (SOTAs), the performance could be further improved by similarity reduction. Our investigations highlight the essence behind the long-tailed problem, and claim several feasible directions for future work.

preprint2022arXiv

Median Pixel Difference Convolutional Network for Robust Face Recognition

Face recognition is one of the most active tasks in computer vision and has been widely used in the real world. With great advances made in convolutional neural networks (CNN), lots of face recognition algorithms have achieved high accuracy on various face datasets. However, existing face recognition algorithms based on CNNs are vulnerable to noise. Noise corrupted image patterns could lead to false activations, significantly decreasing face recognition accuracy in noisy situations. To equip CNNs with built-in robustness to noise of different levels, we proposed a Median Pixel Difference Convolutional Network (MeDiNet) by replacing some traditional convolutional layers with the proposed novel Median Pixel Difference Convolutional Layer (MeDiConv) layer. The proposed MeDiNet integrates the idea of traditional multiscale median filtering with deep CNNs. The MeDiNet is tested on the four face datasets (LFW, CA-LFW, CP-LFW, and YTF) with versatile settings on blur kernels, noise intensities, scales, and JPEG quality factors. Extensive experiments show that our MeDiNet can effectively remove noisy pixels in the feature map and suppress the negative impact of noise, leading to achieving limited accuracy loss under these practical noises compared with the standard CNN under clean conditions.

preprint2022arXiv

Noncollinear Antiferromagnetic Spintronics

Antiferromagnetic spintronics is one of the leading candidates for next-generation electronics. Among abundant antiferromagnets, noncollinear antiferromagnets are promising for achieving practical applications due to coexisting ferromagnetic and antiferromagnetic merits. In this perspective, we briefly review the recent progress in the emerging noncollinear antiferromagnetic spintronics from fundamental physics to device applications. Current challenges and future research directions for this field are also discussed.

preprint2022arXiv

Residual-guided Personalized Speech Synthesis based on Face Image

Previous works derive personalized speech features by training the model on a large dataset composed of his/her audio sounds. It was reported that face information has a strong link with the speech sound. Thus in this work, we innovatively extract personalized speech features from human faces to synthesize personalized speech using neural vocoder. A Face-based Residual Personalized Speech Synthesis Model (FR-PSS) containing a speech encoder, a speech synthesizer and a face encoder is designed for PSS. In this model, by designing two speech priors, a residual-guided strategy is introduced to guide the face feature to approach the true speech feature in the training. Moreover, considering the error of feature's absolute values and their directional bias, we formulate a novel tri-item loss function for face encoder. Experimental results show that the speech synthesized by our model is comparable to the personalized speech synthesized by training a large amount of audio data in previous works.

preprint2022arXiv

Rethinking Few-Shot Class-Incremental Learning with Open-Set Hypothesis in Hyperbolic Geometry

Few-Shot Class-Incremental Learning (FSCIL) aims at incrementally learning novel classes from a few labeled samples by avoiding the overfitting and catastrophic forgetting simultaneously. The current protocol of FSCIL is built by mimicking the general class-incremental learning setting, while it is not totally appropriate due to the different data configuration, i.e., novel classes are all in the limited data regime. In this paper, we rethink the configuration of FSCIL with the open-set hypothesis by reserving the possibility in the first session for incoming categories. To assign better performances on both close-set and open-set recognition to the model, Hyperbolic Reciprocal Point Learning module (Hyper-RPL) is built on Reciprocal Point Learning (RPL) with hyperbolic neural networks. Besides, for learning novel categories from limited labeled data, we incorporate a hyperbolic metric learning (Hyper-Metric) module into the distillation-based framework to alleviate the overfitting issue and better handle the trade-off issue between the preservation of old knowledge and the acquisition of new knowledge. The comprehensive assessments of the proposed configuration and modules on three benchmark datasets are executed to validate the effectiveness concerning three evaluation indicators.

preprint2022arXiv

Several classes of optimal ternary cyclic codes

Cyclic codes have efficient encoding and decoding algorithms over finite fields, so that they have practical applications in communication systems, consumer electronics and data storage systems. The objective of this paper is to give eight new classes of optimal ternary cyclic codes with parameters $[3^m-1,3^m-1-2m,4]$, according to a result on the non-existence of solutions to a certain equation over $F_{3^m}$. It is worth noticing that some recent conclusions on such optimal ternary cyclic codes are some special cases of our work. More importantly, three of the nine open problems proposed by Ding and Helleseth in [8] are solved completely. In addition, another one among the nine open problems is also promoted.

preprint2022arXiv

SFS: Smart OS Scheduling for Serverless Functions

Serverless computing enables a new way of building and scaling cloud applications by allowing developers to write fine-grained serverless or cloud functions. The execution duration of a cloud function is typically short-ranging from a few milliseconds to hundreds of seconds. However, due to resource contentions caused by public clouds' deep consolidation, the function execution duration may get significantly prolonged and fail to accurately account for the function's true resource usage. We observe that the function duration can be highly unpredictable with huge amplification of more than 50x for an open-source FaaS platform (OpenLambda). Our experiments show that the OS scheduling policy of cloud functions' host server can have a crucial impact on performance. The default Linux scheduler, CFS (Completely Fair Scheduler), being oblivious to workloads, frequently context-switches short functions, causing a turnaround time that is much longer than their service time. We propose SFS (Smart Function Scheduler),which works entirely in the user space and carefully orchestrates existing Linux FIFO and CFS schedulers to approximate Shortest Remaining Time First (SRTF). SFS uses two-level scheduling that seamlessly combines a new FILTER policy with Linux CFS, to trade off increased duration of long functions for significant performance improvement for short functions. We implement {\proj} in the Linux user space and port it to OpenLambda. Evaluation results show that SFS significantly improves short functions' duration with a small impact on relatively longer functions, compared to CFS.

preprint2022arXiv

WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking

Unmanned aerial vehicle (UAV) tracking is of great significance for a wide range of applications, such as delivery and agriculture. Previous benchmarks in this area mainly focused on small-scale tracking problems while ignoring the amounts of data, types of data modalities, diversities of target categories and scenarios, and evaluation protocols involved, greatly hiding the massive power of deep UAV tracking. In this work, we propose WebUAV-3M, the largest public UAV tracking benchmark to date, to facilitate both the development and evaluation of deep UAV trackers. WebUAV-3M contains over 3.3 million frames across 4,500 videos and offers 223 highly diverse target categories. Each video is densely annotated with bounding boxes by an efficient and scalable semiautomatic target annotation (SATA) pipeline. Importantly, to take advantage of the complementary superiority of language and audio, we enrich WebUAV-3M by innovatively providing both natural language specifications and audio descriptions. We believe that such additions will greatly boost future research in terms of exploring language features and audio cues for multimodal UAV tracking. In addition, a fine-grained UAV tracking-under-scenario constraint (UTUSC) evaluation protocol and seven challenging scenario subtest sets are constructed to enable the community to develop, adapt and evaluate various types of advanced trackers. We provide extensive evaluations and detailed analyses of 43 representative trackers and envision future research directions in the field of deep UAV tracking and beyond. The dataset, toolkits and baseline results are available at \url{https://github.com/983632847/WebUAV-3M}.

preprint2022arXiv

WiVelo: Fine-grained Walking Velocity Estimation for Wi-Fi Passive Tracking

Passive human tracking via Wi-Fi has been researched broadly in the past decade. Besides straight-forward anchor point localization, velocity is another vital sign adopted by the existing approaches to infer user trajectory. However, state-of-the-art Wi-Fi velocity estimation relies on Doppler-Frequency-Shift (DFS) which suffers from the inevitable signal noise incurring unbounded velocity errors, further degrading the tracking accuracy. In this paper, we present WiVelo\footnote{Code\&datasets are available at \textit{https://github.com/liecn/WiVelo\_SECON22}} that explores new spatial-temporal signal correlation features observed from different antennas to achieve accurate velocity estimation. First, we use subcarrier shift distribution (SSD) extracted from channel state information (CSI) to define two correlation features for direction and speed estimation, separately. Then, we design a mesh model calculated by the antennas' locations to enable a fine-grained velocity estimation with bounded direction error. Finally, with the continuously estimated velocity, we develop an end-to-end trajectory recovery algorithm to mitigate velocity outliers with the property of walking velocity continuity. We implement WiVelo on commodity Wi-Fi hardware and extensively evaluate its tracking accuracy in various environments. The experimental results show our median and 90\% tracking errors are 0.47~m and 1.06~m, which are half and a quarter of state-of-the-arts.

preprint2022arXiv

WL-Align: Weisfeiler-Lehman Relabeling for Aligning Users across Networks via Regularized Representation Learning

Aligning users across networks using graph representation learning has been found effective where the alignment is accomplished in a low-dimensional embedding space. Yet, achieving highly precise alignment is still challenging, especially when nodes with long-range connectivity to the labeled anchors are encountered. To alleviate this limitation, we purposefully designed WL-Align which adopts a regularized representation learning framework to learn distinctive node representations. It extends the Weisfeiler-Lehman Isormorphism Test and learns the alignment in alternating phases of "across-network Weisfeiler-Lehman relabeling" and "proximity-preserving representation learning". The across-network Weisfeiler-Lehman relabeling is achieved through iterating the anchor-based label propagation and a similarity-based hashing to exploit the known anchors' connectivity to different nodes in an efficient and robust manner. The representation learning module preserves the second-order proximity within individual networks and is regularized by the across-network Weisfeiler-Lehman hash labels. Extensive experiments on real-world and synthetic datasets have demonstrated that our proposed WL-Align outperforms the state-of-the-art methods, achieving significant performance improvements in the "exact matching" scenario. Data and code of WL-Align are available at https://github.com/ChenPengGang/WLAlignCode.

preprint2021arXiv

Deep Learning for Scene Classification: A Survey

Scene classification, aiming at classifying a scene image to one of the predefined scene categories by comprehending the entire image, is a longstanding, fundamental and challenging problem in computer vision. The rise of large-scale datasets, which constitute the corresponding dense sampling of diverse real-world scenes, and the renaissance of deep learning techniques, which learn powerful feature representations directly from big raw data, have been bringing remarkable progress in the field of scene representation and classification. To help researchers master needed advances in this field, the goal of this paper is to provide a comprehensive survey of recent achievements in scene classification using deep learning. More than 200 major publications are included in this survey covering different aspects of scene classification, including challenges, benchmark datasets, taxonomy, and quantitative performance comparisons of the reviewed methods. In retrospect of what has been achieved so far, this paper is also concluded with a list of promising research opportunities.

preprint2021arXiv

How to Train Your Agent to Read and Write

Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (\eg{students}) to fully {grasp} this ability. It would be fascinating if we could train an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (\emph{i.e.}, reading) the knowledge in a given text or generating (\emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a \textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text \textit{Writer} that generates a novel paragraph, and a \textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at https://github.com/menggehe/DRAW.

preprint2021arXiv

Semi-Supervised Active Learning for COVID-19 Lung Ultrasound Multi-symptom Classification

Ultrasound (US) is a non-invasive yet effective medical diagnostic imaging technique for the COVID-19 global pandemic. However, due to complex feature behaviors and expensive annotations of US images, it is difficult to apply Artificial Intelligence (AI) assisting approaches for lung's multi-symptom (multi-label) classification. To overcome these difficulties, we propose a novel semi-supervised Two-Stream Active Learning (TSAL) method to model complicated features and reduce labeling costs in an iterative procedure. The core component of TSAL is the multi-label learning mechanism, in which label correlations information is used to design multi-label margin (MLM) strategy and confidence validation for automatically selecting informative samples and confident labels. On this basis, a multi-symptom multi-label (MSML) classification network is proposed to learn discriminative features of lung symptoms, and a human-machine interaction is exploited to confirm the final annotations that are used to fine-tune MSML with progressively labeled data. Moreover, a novel lung US dataset named COVID19-LUSMS is built, currently containing 71 clinical patients with 6,836 images sampled from 678 videos. Experimental evaluations show that TSAL using only 20% data can achieve superior performance to the baseline and the state-of-the-art. Qualitatively, visualization of both attention map and sample distribution confirms the good consistency with the clinic knowledge.

Li Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

80 published item(s)

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

BS3D: Building-scale 3D Reconstruction from RGB-D Images

A weighted first-order formulation for solving anisotropic diffusion equations with deep neural networks

Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature

Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation

Anti-Forgery: Towards a Stealthy and Robust DeepFake Disruption Attack via Adversarial Perceptual-aware Perturbations

Attentional Feature Refinement and Alignment Network for Aircraft Detection in SAR Imagery

Constant gap between conventional strategies and those based on C*-dynamics for self-embezzlement

Decoupling Makes Weakly Supervised Local Feature Better

Efficient Video Transformers with Spatial-Temporal Token Selection

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Graph Signal Processing for Heterogeneous Change Detection Part I: Vertex Domain Filtering

Graph Signal Processing for Heterogeneous Change Detection Part II: Spectral Domain Analysis

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

Investigate the Essence of Long-Tailed Recognition from a Unified Perspective

Median Pixel Difference Convolutional Network for Robust Face Recognition

Noncollinear Antiferromagnetic Spintronics

Residual-guided Personalized Speech Synthesis based on Face Image

Rethinking Few-Shot Class-Incremental Learning with Open-Set Hypothesis in Hyperbolic Geometry

Several classes of optimal ternary cyclic codes

SFS: Smart OS Scheduling for Serverless Functions

WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking

WiVelo: Fine-grained Walking Velocity Estimation for Wi-Fi Passive Tracking

WL-Align: Weisfeiler-Lehman Relabeling for Aligning Users across Networks via Regularized Representation Learning

Deep Learning for Scene Classification: A Survey

How to Train Your Agent to Read and Write

Semi-Supervised Active Learning for COVID-19 Lung Ultrasound Multi-symptom Classification

An Investigation into the Stochasticity of Batch Whitening

Attention-based Residual Speech Portrait Model for Speech to Face Generation

Auto-Encoding Twin-Bottleneck Hashing

Combinatorial Laser Molecular Beam Epitaxy System Integrated with Specialized Low-temperature Scanning Tunneling Microscopy

Controllable Orthogonalization in Training DNNs

Data Dissemination Using Interest Tree in Socially Aware Networking

Deep Learning for 3D Point Clouds: A Survey

Deep Video Super-Resolution using HR Optical Flow Estimation

Dynamic Group Convolution for Accelerating Convolutional Neural Networks

FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond

Improved Residual Networks for Image and Video Recognition

JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

NLH: A Blind Pixel-level Non-local Method for Real-world Image Denoising

Nonconvex Nonsmooth Low-Rank Minimization for Generalized Image Compressed Sensing via Group Sparse Representation

PIS: A Multi-dimensional Routing Protocol for Socially-aware Networking

Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation

Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues

STAR: A Structure and Texture Aware Retinex Model

Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection

The First Round Result from the TianQin-1 Satellite

The TianQin project: current progress on science and technology

User Popularity-based Packet Scheduling for Congestion Control in Ad-hoc Social Networks

Practical quantum key distribution with non-phase-randomized coherent states

RANet: Ranking Attention Network for Fast Video Object Segmentation

A Class of Two-Weight and Three-Weight Linear Codes and Their Duals

DAP3D-Net: Where, What and How Actions Occur in Videos?

DAVE: A Unified Framework for Fast Vehicle Detection and Annotation

First-principles modeling of electromagnetic scattering by discrete and discretely heterogeneous random media

Near-linear constructions of exact unitary 2-designs

Repeated-root constacyclic codes of length $3lp^{s}$ and their dual codes

Decomposition of Three-Dimensional Steady Non-isentropic Compressible Euler System and Stability of Spherically Symmetric Subsonic Flows and Transonic Shocks under Multidimensional Perturbations

Distributed Data Association in Smart Camera Networks via Dual Decomposition

Evolutionary and topological properties of gene modules and driver mutations in a leukemia gene regulatory network

Kernelized Multiview Projection

Network assisted analysis to reveal the genetic basis of autism

Projection Bank: From High-dimensional Data to Medium-length Binary Codes

Distributed Bayesian inference for consistent labeling of tracked objects in non-overlapping camera networks

Experimental observation of optical differentiation and optical Hilbert transformation using a single SOI microdisk chip

Measurement of X-ray Photon Energy and Arrival Time Using a Silicon Drift Detector

Socially-Aware Networking: A Survey

Phases of the infinite U Hubbard model