Source author record

Ming Liu

Ming Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

89works

43topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

Can linearly decodable failure signals in LLM hidden states be leveraged to correct those failures? We investigate this classification-correction gap via Overthinking (OT)--a stable behavioral regime (Jaccard >= 0.81, 94% inter-annotator agreement) in medical QA where models answer correctly under resampling yet fail in extended chain-of-thought. OT is linearly decodable at 71.6% balanced accuracy (p < 10^{-16}). Yet five families of fixed linear steering (29 configurations, n=1,273) all yield Delta ~= 0, with identical null results cross-architecture (Qwen2.5-7B) and cross-domain (MMLU-STEM). Three convergent lines of evidence suggest representational entanglement: the OT direction has 85-88% overlap with task-critical computation (specificity ratio <= 0.152); non-targeted shared-direction steering damages accuracy (-12.1pp); and LEACE concept erasure damages accuracy (-3.6pp, p=0.01), while 10 random erasures produce Delta=+0.3pp. The per-instance probe-steering correlation is r=-0.002 (p=0.97). Positively, the same probe enables selective abstention (held-out AUROC=0.610, exceeding all five uncertainty baselines, p=0.009): decodable failure structure supports post-generation reliability estimation even when the fixed linear steering family cannot exploit it for correction.

preprint2026arXiv

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

While random demonstration labels barely hurt in-context learning (Min et al., 2022), we show that homogeneous labels--even semantically valid ones--collapse accuracy to <=12% across six models (Pythia, Llama, Qwen; 0.8B--8B) and four tasks. The trigger is label-slot content: the model treats tokens occupying the label position as an exhaustive answer vocabulary, with homogeneity as the maximally collapsed case. A novel set-level fixation finding confirms this: when demonstrations carry varied nonsense tokens from {foo,bar,vex,nit,orb}, the model places 42--67% of probability on the demonstrated set while P(dog) remains below 0.2%. This is inconsistent with latent-concept Bayesian accounts (Xie et al., 2022) and reveals that ICL output is constrained vocabulary retrieval--the model binds its output to the demonstrated token inventory regardless of semantic plausibility. The effect generalizes to 4-way classification (0% accuracy across three models, 1B--8B) and multi-token verbalizers ("very positive"), where we decompose fixation into format-level (template adoption) and content-level (polarity override) components that are experimentally dissociable. Mechanistically, per-item paired activation patching on Pythia-1B recovers 98.4% of the gap (95% CI [84%, 112%]), localizing fixation to a layer-7-centered circuit (rank 2/560, 99.8th percentile; 4-fold CV mean 103%). Cross-architecture logit lens on Llama-3.2-1B replicates the encode-then-override trajectory with causal confirmation (top-5 layers: 89% recovery).

preprint2026arXiv

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factorial experiment over all 2^5=32 subsets of five components on HotpotQA and GSM8K with Llama-3.1-8B/70B (96 conditions, up to 10 seeds). The All-In system is consistently suboptimal: on HotpotQA, a single-tool agent surpasses All-In by 32% (F1 0.233 vs 0.177, p=0.023); on GSM8K, a 3-component subset beats All-In by 79% (0.43 vs 0.24, p=0.010). Optimal component count is task-dependent (k*=1-4) and scale-sensitive: at 70B, combinations that hurt at 8B provide gains, though All-In still trails the best subset. We fit a main-effects regression (R^2=0.916, adj-R^2=0.899, LOOCV=0.872), compute exact Shapley values, and find 183/325 submodularity violations (56.3%), showing greedy selection is unreliable. A three-body synergy among Tool Use, Self-Reflection, and Retrieval (INT_3=+0.175, 95% CI [+0.003,+0.351]) is reported as exploratory. CCI replicates across model families (Qwen2.5) and is robust to prompt paraphrasing. Our findings suggest maximally-equipped agent defaults should be replaced by task-specific subset selection via interaction-aware analysis.

preprint2024arXiv

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.

preprint2024arXiv

Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity Monocular Dense Mapping

In this paper, we introduce Hi-Map, a novel monocular dense mapping approach based on Neural Radiance Field (NeRF). Hi-Map is exceptional in its capacity to achieve efficient and high-fidelity mapping using only posed RGB inputs. Our method eliminates the need for external depth priors derived from e.g., a depth estimation model. Our key idea is to represent the scene as a hierarchical feature grid that encodes the radiance and then factorizes it into feature planes and vectors. As such, the scene representation becomes simpler and more generalizable for fast and smooth convergence on new observations. This allows for efficient computation while alleviating noise patterns by reducing the complexity of the scene representation. Buttressed by the hierarchical factorized representation, we leverage the Sign Distance Field (SDF) as a proxy of rendering for inferring the volume density, demonstrating high mapping fidelity. Moreover, we introduce a dual-path encoding strategy to strengthen the photometric cues and further boost the mapping quality, especially for the distant and textureless regions. Extensive experiments demonstrate our method's superiority in geometric and textural accuracy over the state-of-the-art NeRF-based monocular mapping methods.

preprint2023arXiv

An Efficient Approach to the Online Multi-Agent Path Finding Problem by Using Sustainable Information

Multi-agent path finding (MAPF) is the problem of moving agents to the goal vertex without collision. In the online MAPF problem, new agents may be added to the environment at any time, and the current agents have no information about future agents. The inability of existing online methods to reuse previous planning contexts results in redundant computation and reduces algorithm efficiency. Hence, we propose a three-level approach to solve online MAPF utilizing sustainable information, which can decrease its redundant calculations. The high-level solver, the Sustainable Replan algorithm (SR), manages the planning context and simulates the environment. The middle-level solver, the Sustainable Conflict-Based Search algorithm (SCBS), builds a conflict tree and maintains the planning context. The low-level solver, the Sustainable Reverse Safe Interval Path Planning algorithm (SRSIPP), is an efficient single-agent solver that uses previous planning context to reduce duplicate calculations. Experiments show that our proposed method has significant improvement in terms of computational efficiency. In one of the test scenarios, our algorithm can be 1.48 times faster than SOTA on average under different agent number settings.

preprint2022arXiv

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality. With the ability to generate photo-realistic high-resolution (e.g., $1024\times1024$) images, recent GAN models have greatly narrowed the gaps between the generated images and the real ones. Therefore, many recent works show emerging interest to take advantage of pre-trained GAN models by exploiting the well-disentangled latent space and the learned GAN priors. In this paper, we briefly review recent progress on leveraging pre-trained large-scale GAN models from three aspects, i.e., 1) the training of large-scale generative adversarial networks, 2) exploring and understanding the pre-trained GAN models, and 3) leveraging these models for subsequent tasks like image restoration and editing. More information about relevant methods and repositories can be found at https://github.com/csmliu/pretrained-GANs.

preprint2022arXiv

Adaptive Network Combination for Single-Image Reflection Removal: A Domain Generalization Perspective

Recently, multiple synthetic and real-world datasets have been built to facilitate the training of deep single image reflection removal (SIRR) models. Meanwhile, diverse testing sets are also provided with different types of reflection and scenes. However, the non-negligible domain gaps between training and testing sets make it difficult to learn deep models generalizing well to testing images. The diversity of reflections and scenes further makes it a mission impossible to learn a single model being effective to all testing sets and real-world reflections. In this paper, we tackle these issues by learning SIRR models from a domain generalization perspective. Particularly, for each source set, a specific SIRR model is trained to serve as a domain expert of relevant reflection types. For a given reflection-contaminated image, we present a reflection type-aware weighting (RTAW) module to predict expert-wise weights. RTAW can then be incorporated with adaptive network combination (AdaNEC) for handling different reflection types and scenes, i.e., generalizing to unknown domains. Two representative AdaNEC methods, i.e., output fusion (OF) and network interpolation (NI), are provided by considering both adaptation levels and efficiency. For images from one source set, we train RTAW to only predict expert-wise weights of other domain experts for improving generalization ability, while the weights of all experts are predicted and employed during testing. An in-domain expert (IDE) loss is presented for training RTAW. Extensive experiments show the appealing performance gain of our AdaNEC on different state-of-the-art SIRR networks. Source code and pre-trained models will available at https://github.com/csmliu/AdaNEC.

preprint2022arXiv

An Adaptive Control Algorithm for Quadruped Locomotion with Proprioceptive Linear Legs

Quadruped robots manifest great potential to traverse rough terrains with payload. Numerous traditional control methods for legged dynamic locomotion are model-based and exhibit high sensitivity to model uncertainties and payload variations. Therefore, high-performance model parameter estimation becomes indispensable. However, the inertia parameters of payload are usually unknown and dynamically changing when the quadruped robot is deployed in versatile tasks. To address this problem, online identification of the inertia parameters and the Center of Mass (CoM) position of the payload for the quadruped robots draw an increasing interest. This study presents an adaptive controller based on the online payload identification for the high payload capacity (the ratio between payload and robot's self-weight) quadruped locomotion. We name it as Adaptive Controller for Quadruped Locomotion (ACQL), which consists of a recursive update law and a control law. ACQL estimates the external forces and torques induced by the payload online. The estimation is incorporated in inverse-dynamics-based Quadratic Programming (QP) to realize a trotting gait. As such, the tracking accuracy of the robot's CoM and orientation trajectories are improved. The proposed method, ACQL, is verified in a real quadruped robot platform. Experiments prove the estimation efficacy for the payload weighing from 20 $kg$ to 75 $kg$ and loaded at different locations of the robot's torso.

preprint2022arXiv

An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

preprint2022arXiv

Channel Self-Supervision for Online Knowledge Distillation

Recently, researchers have shown an increased interest in the online knowledge distillation. Adopting an one-stage and end-to-end training fashion, online knowledge distillation uses aggregated intermediated predictions of multiple peer models for training. However, the absence of a powerful teacher model may result in the homogeneity problem between group peers, affecting the effectiveness of group distillation adversely. In this paper, we propose a novel online knowledge distillation method, \textbf{C}hannel \textbf{S}elf-\textbf{S}upervision for Online Knowledge Distillation (CSS), which structures diversity in terms of input, target, and network to alleviate the homogenization problem. Specifically, we construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning, adopting the feature-level transformation and augmenting the corresponding labels. Meanwhile, the dual network structure has a larger space of independent parameters to resist the homogenization problem during distillation. Extensive quantitative experiments on CIFAR-100 illustrate that our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL. The results on three fine-grained datasets (StanfordDogs, StanfordCars, CUB-200-211) also show the significant generalization capability of our approach.

preprint2022arXiv

csBoundary: City-scale Road-boundary Detection in Aerial Images for High-definition Maps

High-Definition (HD) maps can provide precise geometric and semantic information of static traffic environments for autonomous driving. Road-boundary is one of the most important information contained in HD maps since it distinguishes between road areas and off-road areas, which can guide vehicles to drive within road areas. But it is labor-intensive to annotate road boundaries for HD maps at the city scale. To enable automatic HD map annotation, current work uses semantic segmentation or iterative graph growing for road-boundary detection. However, the former could not ensure topological correctness since it works at the pixel level, while the latter suffers from inefficiency and drifting issues. To provide a solution to the aforementioned problems, in this letter, we propose a novel system termed csBoundary to automatically detect road boundaries at the city scale for HD map annotation. Our network takes as input an aerial image patch, and directly infers the continuous road-boundary graph (i.e., vertices and edges) from this image. To generate the city-scale road-boundary graph, we stitch the obtained graphs from all the image patches. Our csBoundary is evaluated and compared on a public benchmark dataset. The results demonstrate our superiority. The accompanied demonstration video is available at our project page \url{https://sites.google.com/view/csboundary/}.

preprint2022arXiv

DarkQuest: A dark sector upgrade to SpinQuest at the 120 GeV Fermilab Main Injector

Expanding the mass range and techniques by which we search for dark matter is an important part of the worldwide particle physics program. Accelerator-based searches for dark matter and dark sector particles are a uniquely compelling part of this program as a way to both create and detect dark matter in the laboratory and explore the dark sector by searching for mediators and excited dark matter particles. This paper focuses on developing the DarkQuest experimental concept and gives an outlook on related enhancements collectively referred to as LongQuest. DarkQuest is a proton fixed-target experiment with leading sensitivity to an array of visible dark sector signatures in the MeV-GeV mass range. Because it builds off of existing accelerator and detector infrastructure, it offers a powerful but modest-cost experimental initiative that can be realized on a short timescale.

preprint2022arXiv

DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks

Autonomous driving in multi-agent dynamic traffic scenarios is challenging: the behaviors of road users are uncertain and are hard to model explicitly, and the ego-vehicle should apply complicated negotiation skills with them, such as yielding, merging and taking turns, to achieve both safe and efficient driving in various settings. Traditional planning methods are largely rule-based and scale poorly in these complex dynamic scenarios, often leading to reactive or even overly conservative behaviors. Therefore, they require tedious human efforts to maintain workability. Recently, deep learning-based methods have shown promising results with better generalization capability but less hand engineering efforts. However, they are either implemented with supervised imitation learning (IL), which suffers from dataset bias and distribution mismatch issues, or are trained with deep reinforcement learning (DRL) but focus on one specific traffic scenario. In this work, we propose DQ-GAT to achieve scalable and proactive autonomous driving, where graph attention-based networks are used to implicitly model interactions, and deep Q-learning is employed to train the network end-to-end in an unsupervised manner. Extensive experiments in a high-fidelity driving simulator show that our method achieves higher success rates than previous learning-based methods and a traditional rule-based method, and better trades off safety and efficiency in both seen and unseen scenarios. Moreover, qualitative results on a trajectory dataset indicate that our learned policy can be transferred to the real world for practical applications with real-time speeds. Demonstration videos are available at https://caipeide.github.io/dq-gat/.

preprint2022arXiv

FusionPortable: A Multi-Sensor Campus-Scene Dataset for Evaluation of Localization and Mapping Accuracy on Diverse Platforms

Combining multiple sensors enables a robot to maximize its perceptual awareness of environments and enhance its robustness to external disturbance, crucial to robotic navigation. This paper proposes the FusionPortable benchmark, a complete multi-sensor dataset with a diverse set of sequences for mobile robots. This paper presents three contributions. We first advance a portable and versatile multi-sensor suite that offers rich sensory measurements: 10Hz LiDAR point clouds, 20Hz stereo frame images, high-rate and asynchronous events from stereo event cameras, 200Hz inertial readings from an IMU, and 10Hz GPS signal. Sensors are already temporally synchronized in hardware. This device is lightweight, self-contained, and has plug-and-play support for mobile robots. Second, we construct a dataset by collecting 17 sequences that cover a variety of environments on the campus by exploiting multiple robot platforms for data collection. Some sequences are challenging to existing SLAM algorithms. Third, we provide ground truth for the decouple localization and mapping performance evaluation. We additionally evaluate state-of-the-art SLAM approaches and identify their limitations. The dataset, consisting of raw sensor easurements, ground truth, calibration data, and evaluated algorithms, will be released: https://ram-lab.com/file/site/multi-sensor-dataset.

preprint2022arXiv

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

The same multi-word expressions may have different meanings in different sentences. They can be mainly divided into two categories, which are literal meaning and idiomatic meaning. Non-contextual-based methods perform poorly on this problem, and we need contextual embedding to understand the idiomatic meaning of multi-word expressions correctly. We use a pre-trained language model, which can provide a context-aware sentence embedding, to detect whether multi-word expression in the sentence is idiomatic usage.

preprint2022arXiv

Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.

preprint2022arXiv

Open-world Semantic Segmentation for LIDAR Point Clouds

Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes, even for objects never seen before, while a static network cannot update its knowledge base according to what it has seen. Therefore, in this work, we propose the open-world semantic segmentation task for LIDAR point clouds, which aims to 1) identify both old and novel classes using open-set semantic segmentation, and 2) gradually incorporate novel objects into the existing knowledge base using incremental learning without forgetting old classes. For this purpose, we propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both the open-set semantic segmentation and incremental learning problems. The experimental results show that REAL can simultaneously achieves state-of-the-art performance in the open-set semantic segmentation task on the SemanticKITTI and nuScenes datasets, and alleviate the catastrophic forgetting problem with a large margin during incremental learning.

preprint2022arXiv

Pothole Detection Based on Disparity Transformation and Road Surface Modeling

Pothole detection is one of the most important tasks for road maintenance. Computer vision approaches are generally based on either 2D road image analysis or 3D road surface modeling. However, these two categories are always used independently. Furthermore, the pothole detection accuracy is still far from satisfactory. Therefore, in this paper, we present a robust pothole detection algorithm that is both accurate and computationally efficient. A dense disparity map is first transformed to better distinguish between damaged and undamaged road areas. To achieve greater disparity transformation efficiency, golden section search and dynamic programming are utilized to estimate the transformation parameters. Otsu's thresholding method is then used to extract potential undamaged road areas from the transformed disparity map. The disparities in the extracted areas are modeled by a quadratic surface using least squares fitting. To improve disparity map modeling robustness, the surface normal is also integrated into the surface modeling process. Furthermore, random sample consensus is utilized to reduce the effects caused by outliers. By comparing the difference between the actual and modeled disparity maps, the potholes can be detected accurately. Finally, the point clouds of the detected potholes are extracted from the reconstructed 3D road surface. The experimental results show that the successful detection accuracy of the proposed system is around 98.7% and the overall pixel-level accuracy is approximately 99.6%.

preprint2022arXiv

RNGDet: Road Network Graph Detection by Transformer in Aerial Images

Road network graphs provide critical information for autonomous-vehicle applications, such as drivable areas that can be used for motion planning algorithms. To find road network graphs, manually annotation is usually inefficient and labor-intensive. Automatically detecting road network graphs could alleviate this issue, but existing works still have some limitations. For example, segmentation-based approaches could not ensure satisfactory topology correctness, and graph-based approaches could not present precise enough detection results. To provide a solution to these problems, we propose a novel approach based on transformer and imitation learning in this paper. In view of that high-resolution aerial images could be easily accessed all over the world nowadays, we make use of aerial images in our approach. Taken as input an aerial image, our approach iteratively generates road network graphs vertex-by-vertex. Our approach can handle complicated intersection points with various numbers of incident road segments. We evaluate our approach on a publicly available dataset. The superiority of our approach is demonstrated through the comparative experiments. Our work is accompanied with a demonstration video which is available at \url{https://tonyxuqaq.github.io/projects/RNGDet/}.

preprint2022arXiv

SCADS: A Scalable Approach Using Spark in Cloud for Host-based Intrusion Detection System with System Calls

Following the current big data trend, the scale of real-time system call traces generated by Linux applications in a contemporary data center may increase excessively. Due to the deficiency of scalability, it is challenging for traditional host-based intrusion detection systems deployed on every single host to collect, maintain, and manipulate those large-scale accumulated system call traces. It is inflexible to build data mining models on one physical host that has static computing capability and limited storage capacity. To address this issue, we propose SCADS, a corresponding solution using Apache Spark in the Google cloud environment. A set of Spark algorithms are developed to achieve the computational scalability. The experiment results demonstrate that the efficiency of intrusion detection can be enhanced, which indicates that the proposed method can apply to the design of next-generation host-based intrusion detection systems with system calls.

preprint2022arXiv

Selective Output Smoothing Regularization: Regularize Neural Networks by Softening Output Distributions

In this paper, we propose Selective Output Smoothing Regularization, a novel regularization method for training the Convolutional Neural Networks (CNNs). Inspired by the diverse effects on training from different samples, Selective Output Smoothing Regularization improves the performance by encouraging the model to produce equal logits on incorrect classes when dealing with samples that the model classifies correctly and over-confidently. This plug-and-play regularization method can be conveniently incorporated into almost any CNN-based project without extra hassle. Extensive experiments have shown that Selective Output Smoothing Regularization consistently achieves significant improvement in image classification benchmarks, such as CIFAR-100, Tiny ImageNet, ImageNet, and CUB-200-2011. Particularly, our method obtains 77.30% accuracy on ImageNet with ResNet-50, which gains 1.1% than baseline (76.2%). We also empirically demonstrate the ability of our method to make further improvements when combining with other widely used regularization techniques. On Pascal detection, using the SOSR-trained ImageNet classifier as the pretrained model leads to better detection performances.

preprint2022arXiv

Temporally Resolution Decrement: Utilizing the Shape Consistency for Higher Computational Efficiency

Image resolution that has close relations with accuracy and computational cost plays a pivotal role in network training. In this paper, we observe that the reduced image retains relatively complete shape semantics but loses extensive texture information. Inspired by the consistency of the shape semantics as well as the fragility of the texture information, we propose a novel training strategy named Temporally Resolution Decrement. Wherein, we randomly reduce the training images to a smaller resolution in the time domain. During the alternate training with the reduced images and the original images, the unstable texture information in the images results in a weaker correlation between the texture-related patterns and the correct label, naturally enforcing the model to rely more on shape properties that are robust and conform to the human decision rule. Surprisingly, our approach greatly improves both the training and inference efficiency of convolutional neural networks. On ImageNet classification, using only 33\% calculation quantity (randomly reducing the training image to 112$\times$112 within 90\% epochs) can still improve ResNet-50 from 76.32\% to 77.71\%. Superimposed with the strong training procedure of ResNet-50 on ImageNet, our method achieves 80.42\% top-1 accuracy with saving 37.5\% calculation overhead. To the best of our knowledge this is the highest ImageNet single-crop accuracy on ResNet-50 under 224$\times$224 without extra data or distillation.

preprint2022arXiv

Trotter-Kato Approximations of Impulsive Neutral SPDEs in Hilbert Spaces

This paper studies a class of impulsive neutral stochastic partial differential equations in real Hilbert spaces. The main goal here is to consider the Trotter-Kato approximations of mild solutions of such equations in the $p$th-mean ($p\geq2$). As an application, a classical limit theorem on the dependence of such equations on a parameter is obtained. The novelty of this paper is that the combination of this approximating system and such equations has not been considered before.

preprint2022arXiv

VEM$^2$L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion

Knowledge Graph Completion (KGC) aims to reason over known facts and infer missing links but achieves weak performances on those sparse Knowledge Graphs (KGs). Recent works introduce text information as auxiliary features or apply graph densification to alleviate this challenge, but suffer from problems of ineffectively incorporating structure features and injecting noisy triples. In this paper, we solve the sparse KGC from these two motivations simultaneously and handle their respective drawbacks further, and propose a plug-and-play unified framework VEM$^2$L over sparse KGs. The basic idea of VEM$^2$L is to motivate a text-based KGC model and a structure-based KGC model to learn with each other to fuse respective knowledge into unity. To exploit text and structure features together in depth, we partition knowledge within models into two nonoverlapping parts: expressiveness ability on the training set and generalization ability upon unobserved queries. For the former, we motivate these two text-based and structure-based models to learn from each other on the training sets. And for the generalization ability, we propose a novel knowledge fusion strategy derived by the Variational EM (VEM) algorithm, during which we also apply a graph densification operation to alleviate the sparse graph problem further. Our graph densification is derived by VEM algorithm. Due to the convergence of EM algorithm, we guarantee the increase of likelihood function theoretically with less being impacted by noisy injected triples heavily. By combining these two fusion methods and graph densification, we propose the VEM$^2$L framework finally. Both detailed theoretical evidence, as well as qualitative experiments, demonstrates the effectiveness of our proposed framework.

preprint2022arXiv

White Paper Assistance: A Step Forward Beyond the Shortcut Learning

The promising performances of CNNs often overshadow the need to examine whether they are doing in the way we are actually interested. We show through experiments that even over-parameterized models would still solve a dataset by recklessly leveraging spurious correlations, or so-called 'shortcuts'. To combat with this unintended propensity, we borrow the idea of printer test page and propose a novel approach called White Paper Assistance. Our proposed method involves the white paper to detect the extent to which the model has preference for certain characterized patterns and alleviates it by forcing the model to make a random guess on the white paper. We show the consistent accuracy improvements that are manifest in various architectures, datasets and combinations with other techniques. Experiments have also demonstrated the versatility of our approach on fine-grained recognition, imbalanced classification and robustness to corruptions.

preprint2022arXiv

Why-So-Deep: Towards Boosting Previously Trained Models for Visual Place Recognition

Deep learning-based image retrieval techniques for the loop closure detection demonstrate satisfactory performance. However, it is still challenging to achieve high-level performance based on previously trained models in different geographical regions. This paper addresses the problem of their deployment with simultaneous localization and mapping (SLAM) systems in the new environment. The general baseline approach uses additional information, such as GPS, sequential keyframes tracking, and re-training the whole environment to enhance the recall rate. We propose a novel approach for improving image retrieval based on previously trained models. We present an intelligent method, MAQBOOL, to amplify the power of pre-trained models for better image recall and its application to real-time multiagent SLAM systems. We achieve comparable image retrieval results at a low descriptor dimension (512-D), compared to the high descriptor dimension (4096-D) of state-of-the-art methods. We use spatial information to improve the recall rate in image retrieval on pre-trained models.

preprint2021arXiv

6 nm super-resolution optical transmission and scattering spectroscopic imaging of carbon nanotubes using a nanometer-scale white light source

Optical hyperspectral imaging based on absorption and scattering of photons at the visible and adjacent frequencies denotes one of the most informative and inclusive characterization methods in material research. Unfortunately, restricted by the diffraction limit of light, it is unable to resolve the nanoscale inhomogeneity in light-matter interactions, which is diagnostic of the local modulation in material structure and properties. Moreover, many nanomaterials have highly anisotropic optical properties that are outstandingly appealing yet hard to characterize through conventional optical methods. Therefore, there has been a pressing demand in the diverse fields including electronics, photonics, physics, and materials science to extend the optical hyperspectral imaging into the nanometer length scale. In this work, we report a super-resolution hyperspectral imaging technique that simultaneously measures optical absorption and scattering spectra with the illumination from a tungsten-halogen lamp. We demonstrated sub-5 nm spatial resolution in both visible and near-infrared wavelengths (415 to 980 nm) for the hyperspectral imaging of strained single-walled carbon nanotubes (SWNT) and reconstructed true-color images to reveal the longitudinal and transverse optical transition-induced light absorption and scattering in the SWNTs. This is the first time transverse optical absorption in SWNTs were clearly observed experimentally. The new technique provides rich near-field spectroscopic information that had made it possible to analyze the spatial modulation of band-structure along a single SWNT induced through strain engineering.

preprint2021arXiv

Autonomous Navigation through intersections with Graph ConvolutionalNetworks and Conditional Imitation Learning for Self-driving Cars

In autonomous driving, navigation through unsignaled intersections with many traffic participants moving around is a challenging task. To provide a solution to this problem, we propose a novel branched network G-CIL for the navigation policy learning. Specifically, we firstly represent such dynamic environments as graph-structured data and propose an effective strategy for edge definition to aggregate surrounding information for the ego-vehicle. Then graph convolutional neural networks are used as the perception module to capture global and geometric features from the environment. To generate safe and efficient navigation policy, we further incorporate it with conditional imitation learning algorithm, to learn driving behaviors directly from expert demonstrations. Our proposed network is capable of handling a varying number of surrounding vehicles and generating optimal control actions (e.g., steering angle and throttle) according to the given high-level commands (e.g., turn left towards the global goal). Evaluations on unsignaled intersections with various traffic densities demonstrate that our end-to-end trainable neural network outperforms the baselines with higher success rate and shorter navigation time.

preprint2021arXiv

AVGCN: Trajectory Prediction using Graph Convolutional Networks Guided by Human Attention

Pedestrian trajectory prediction is a critical yet challenging task, especially for crowded scenes. We suggest that introducing an attention mechanism to infer the importance of different neighbors is critical for accurate trajectory prediction in scenes with varying crowd size. In this work, we propose a novel method, AVGCN, for trajectory prediction utilizing graph convolutional networks (GCN) based on human attention (A denotes attention, V denotes visual field constraints). First, we train an attention network that estimates the importance of neighboring pedestrians, using gaze data collected as subjects perform a bird's eye view crowd navigation task. Then, we incorporate the learned attention weights modulated by constraints on the pedestrian's visual field into a trajectory prediction network that uses a GCN to aggregate information from neighbors efficiently. AVGCN also considers the stochastic nature of pedestrian trajectories by taking advantage of variational trajectory prediction. Our approach achieves state-of-the-art performance on several trajectory prediction benchmarks, and the lowest average prediction error over all considered benchmarks.

preprint2021arXiv

Echo state graph neural networks with analogue random resistor arrays

Recent years have witnessed an unprecedented surge of interest, from social networks to drug discovery, in learning representations of graph-structured data. However, graph neural networks, the machine learning models for handling graph-structured data, face significant challenges when running on conventional digital hardware, including von Neumann bottleneck incurred by physically separated memory and processing units, slowdown of Moore's law due to transistor scaling limit, and expensive training cost. Here we present a novel hardware-software co-design, the random resistor array-based echo state graph neural network, which addresses these challenges. The random resistor arrays not only harness low-cost, nanoscale and stackable resistors for highly efficient in-memory computing using simple physical laws, but also leverage the intrinsic stochasticity of dielectric breakdown to implement random projections in hardware for an echo state network that effectively minimizes the training cost thanks to its fixed and random weights. The system demonstrates state-of-the-art performance on both graph classification using the MUTAG and COLLAB datasets and node classification using the CORA dataset, achieving 34.2x, 93.2x, and 570.4x improvement of energy efficiency and 98.27%, 99.46%, and 95.12% reduction of training cost compared to conventional graph learning on digital hardware, respectively, which may pave the way for the next generation AI system for graph learning.

preprint2021arXiv

Ground-aware Monocular 3D Object Detection for Autonomous Driving

Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a critical and challenging task for low-cost urban autonomous driving and mobile robots. Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation. We first identify how the ground plane provides additional clues in depth reasoning in 3D detection in driving scenes. Based on this observation, we then improve the processing of 3D anchors and introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning. Finally, we introduce an efficient neural network embedded with the proposed module for 3D object detection. We further verify the power of the proposed module with a neural network designed for monocular depth prediction. The two proposed networks achieve state-of-the-art performances on the KITTI 3D object detection and depth prediction benchmarks, respectively. The code will be published in https://www.github.com/Owen-Liuyuxuan/visualDet3D

preprint2021arXiv

The Role of the Hercules Autonomous Vehicle During the COVID-19 Pandemic: An Autonomous Logistic Vehicle for Contactless Goods Transportation

Since early 2020, the coronavirus disease 2019 (COVID-19) has spread rapidly across the world. As at the date of writing this article, the disease has been globally reported in 223 countries and regions, infected over 108 million people and caused over 2.4 million deaths (https://covid19.who.int/, accessed on Feb. 17, 2021). Avoiding person-to-person transmission is an effective approach to control and prevent the pandemic. However, many daily activities, such as transporting goods in our daily life, inevitably involve person-to-person contact. Using an autonomous logistic vehicle to achieve contact-less goods transportation could alleviate this issue. For example, it can reduce the risk of virus transmission between the driver and customers. Moreover, many countries have imposed tough lockdown measures to reduce the virus transmission (e.g., retail, catering) during the pandemic, which causes inconveniences for human daily life. Autonomous vehicle can deliver the goods bought by humans, so that humans can get the goods without going out. These demands motivate us to develop an autonomous vehicle, named as Hercules, for contact-less goods transportation during the COVID-19 pandemic. The vehicle is evaluated through real-world delivering tasks under various traffic conditions.

preprint2021arXiv

Trotter-Kato approximations of semilinear stochastic evolution equations in Hilbert spaces

Motivated by the work of T.E. Govindan in [5,8,9], this paper is concerned with a more general semilinear stochastic evolution equation. The difference between the equations considered in this paper and the previous one is that it makes some changes to the nonlinear function in random integral, which also depends on the probability distribution of stochastic process at that time. First, this paper considers the existence and uniqueness of mild solutions for such equations. Furthermore, Trotter-Kato approximation system is introduced for the mild solutions, and the weak convergence of induced probability measures and zeroth-order approximations are obtained. Then we consider the classical limit theorem about the parameter dependence of this kind of equations. Finally, an example of stochastic partial differential equation is given to illustrate our results.

preprint2021arXiv

Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance

Removing undesired reflection from an image captured through a glass surface is a very challenging problem with many practical application scenarios. For improving reflection removal, cascaded deep models have been usually adopted to estimate the transmission in a progressive manner. However, most existing methods are still limited in exploiting the result in prior stage for guiding transmission estimation. In this paper, we present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR). To be specific, the reflection layer is firstly estimated due to that it generally is much simpler and is relatively easier to estimate. Reflectionaware guidance (RAG) module is then elaborated for better exploiting the estimated reflection in predicting transmission layer. By incorporating feature maps from the estimated reflection and observation, RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. A dedicated mask loss is further presented for reconciling the contributions of encoder and decoder features. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods. The source code and pre-trained model are available at https://github.com/liyucs/RAGNet.

Ming Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

89 published item(s)

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity Monocular Dense Mapping

An Efficient Approach to the Online Multi-Agent Path Finding Problem by Using Sustainable Information

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

Adaptive Network Combination for Single-Image Reflection Removal: A Domain Generalization Perspective

An Adaptive Control Algorithm for Quadruped Locomotion with Proprioceptive Linear Legs

An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

Channel Self-Supervision for Online Knowledge Distillation

csBoundary: City-scale Road-boundary Detection in Aerial Images for High-definition Maps

DarkQuest: A dark sector upgrade to SpinQuest at the 120 GeV Fermilab Main Injector

DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks

FusionPortable: A Multi-Sensor Campus-Scene Dataset for Evaluation of Localization and Mapping Accuracy on Diverse Platforms

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Open-world Semantic Segmentation for LIDAR Point Clouds

Pothole Detection Based on Disparity Transformation and Road Surface Modeling

RNGDet: Road Network Graph Detection by Transformer in Aerial Images

SCADS: A Scalable Approach Using Spark in Cloud for Host-based Intrusion Detection System with System Calls

Selective Output Smoothing Regularization: Regularize Neural Networks by Softening Output Distributions

Temporally Resolution Decrement: Utilizing the Shape Consistency for Higher Computational Efficiency

Trotter-Kato Approximations of Impulsive Neutral SPDEs in Hilbert Spaces

VEM$^2$L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion

White Paper Assistance: A Step Forward Beyond the Shortcut Learning

Why-So-Deep: Towards Boosting Previously Trained Models for Visual Place Recognition

6 nm super-resolution optical transmission and scattering spectroscopic imaging of carbon nanotubes using a nanometer-scale white light source

Autonomous Navigation through intersections with Graph ConvolutionalNetworks and Conditional Imitation Learning for Self-driving Cars

AVGCN: Trajectory Prediction using Graph Convolutional Networks Guided by Human Attention

Echo state graph neural networks with analogue random resistor arrays

Ground-aware Monocular 3D Object Detection for Autonomous Driving

The Role of the Hercules Autonomous Vehicle During the COVID-19 Pandemic: An Autonomous Logistic Vehicle for Contactless Goods Transportation

Trotter-Kato approximations of semilinear stochastic evolution equations in Hilbert spaces

Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance

CoMoGCN: Coherent Motion Aware Trajectory Prediction with Graph Representation

COVID-19 Docking Server: A meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19

Deep Adaptive Inference Networks for Single Image Super-Resolution

Deep Learning Detection of Inaccurate Smart Electricity Meters: A Case Study

Domain evolution in bended freestanding BaTiO3 ultrathin films: a phase-field simulation

High-speed Autonomous Drifting with Deep Reinforcement Learning

Isomorphism between the $R$-Matrix and Drinfeld Presentations of Quantum Affine Algebra: Types $B$ and $D$

Isomorphism between the $R$-matrix and Drinfeld presentations of Yangian in types $B$, $C$ and $D$

Isomorphism between the R-matrix and Drinfeld presentations of quantum affine algebra: type C

Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision

LINS: A Lidar-Inertial State Estimator for Robust and Efficient Navigation

Metric Monocular Localization Using Signed Distance Fields

MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

Monocular Direct Sparse Localization in a Prior 3D Surfel Map

Neural SLAM: Learning to Explore with External Memory

PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds

Rethinking Road Surface 3D Reconstruction and Pothole Detection: From Perspective Transformation to Disparity Map Segmentation

SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

The Hille-Yosida generation theorem for almost surely bounded $C_{0}$--semigroups of continuous module homomorphisms$^1$

Unpaired Learning of Deep Image Denoising

Yangian doubles of classical types and their vertex representations

Containing rumors spreading on correlated multiplex networks

One-dimensional van der Waals heterostructures

Dark Sectors 2016 Workshop: Community Report

Rota-Baxter operators on Witt and Virasoro algebras

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

Beam optical design for high energy electron radiography experiment study based on THU LINAC

Control of magnetic relaxation by electric-field-induced ferroelectric phase transition and inhomogeneous domain switching

A Fast Decodable Full-Rate STBC with High Coding Gain for 4x2 MIMO Systems

Achieving Low-Complexity Maximum-Likelihood Detection for the 3D MIMO Code

Observation of Piezoelectricity in Monolayer Molybdenum Disulfide

R-matrix realization of two-parameter quantum group U_{r,s}(gl_n)

Reduced-complexity maximum-likelihood decoding for 3D MIMO code

Charge carrier relaxation model in hopping system

Charge transfer and interfacial magnetism in (LaNiO3)n/(LaMnO3)2 superlattices

Comment on "Validity of Einstein Relation in Disordered Organic Semiconductors"

Generalized Bell states and principal realization of the Yangian Y(sl_N)

Isomorphism between two realizations of the Yangian Y(so_3)