Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
92works
0followers
47topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

92 published item(s)

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2026arXiv

Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may degrade accuracy and induce underthinking, whereas the latter assumes that substantial portions of reasoning traces can be safely truncated. To obtain a compression signal without these limitations, we revisit the training dynamics of existing compression methods. We observe that the length--accuracy correlation is initially negative but continually increases during compression, indicating that shorter responses are initially more likely to be correct but gradually lose this property as the policy moves toward underthinking. Based on this observation, we formalize overthinking: a negative correlation indicates an overthinking regime, while a positive one indicates underthinking. When overthinking, the shortest correct responses are shorter than the group-average response length in expectation, making them natural compression targets already present in on-policy rollouts. We therefore propose \emph{Implicit Compression Regularization} (ICR), an on-policy regularization method whose compression signal comes from a virtual shorter distribution induced by the shortest correct responses in rollout groups, guiding the policy toward concise yet correct trajectories. Training dynamics show that ICR maintains a better length--accuracy correlation during compression, indicating that short responses remain better aligned with correctness instead of drifting toward underthinking. Experiments on three reasoning backbones and multiple mathematical and knowledge-intensive benchmarks show that ICR consistently shortens responses while preserving or improving accuracy, achieving a stronger accuracy--length Pareto frontier.

preprint2025arXiv

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments

Among existing online mobile-use benchmarks, AndroidWorld has emerged as the dominant benchmark due to its reproducible environment and deterministic evaluation; however, recent agents achieving over 90% success rates indicate its saturation and motivate the need for a more challenging benchmark. In addition, its environment lacks key application categories, such as e-commerce and enterprise communication, and does not reflect realistic mobile-use scenarios characterized by vague user instructions and hybrid tool usage. We introduce MobileWorld, a substantially more challenging benchmark designed to reflect real-world usage through 201 tasks across 20 applications. MobileWorld derives its difficulty from an emphasis on long-horizon, cross-application workflows, requiring nearly twice as many completion steps on average (27.8 vs. 14.3) and featuring a significantly higher proportion of multi-app tasks (62.2% vs. 9.5%) than AndroidWorld. To overcome the limitations of existing environments, MobileWorld achieves a balance between production-grade utility and reproducible evaluation by utilizing open-source alternatives to industry standards (e.g., Mattermost for Slack). This approach enables a fully observable and controlled environment through source code modification and direct backend database access for precise verification. MobileWorld also introduces novel task categories, including agent-user interaction and Model Context Protocol (MCP)-augmented tasks, for evaluating agents in user-aware, hybrid-tool scenarios. To facilitate evaluation, we develop a planner-executor agentic framework with extended action spaces to support user interactions and MCP calls. Our results reveal a sharp performance drop compared to AndroidWorld, with the best agentic framework and end-to-end model achieving 51.7% and 20.9% success rates, respectively, highlighting ample headroom for future research.

preprint2025arXiv

PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.

preprint2024arXiv

Specific Emitter Identification Based on Joint Variational Mode Decomposition

Specific emitter identification (SEI) technology is significant in device administration scenarios, such as self-organized networking and spectrum management, owing to its high security. For nonlinear and non-stationary electromagnetic signals, SEI often employs variational modal decomposition (VMD) to decompose the signal in order to effectively characterize the distinct device fingerprint. However, the trade-off of VMD between the robustness to noise and the ability to preserve signal information has not been investigated in the current literature. Moreover, the existing VMD algorithm does not utilize the stability of the intrinsic distortion of emitters within a certain temporal span, consequently constraining its practical applicability in SEI. In this paper, we propose a joint variational modal decomposition (JVMD) algorithm, which is an improved version of VMD by simultaneously implementing modal decomposition on multi-frame signals. The consistency of multi-frame signals in terms of the central frequencies and the inherent modal functions (IMFs) is exploited, which effectively highlights the distinctive characteristics among emitters and reduces noise. Additionally, the complexity of JVMD is analyzed, which is proven to be more computational-friendly than VMD. Simulations of both modal decomposition and SEI that involve real-world datasets are presented to illustrate that when compared with VMD, the JVMD algorithm improves the accuracy of device classification and the robustness towards noise.

preprint2023arXiv

Finding meaningful and workable applied mathematics problems in science

In this short review, I will summarize my research experience in three fields in applied mathematics: mathematical biology, applied probability, and applied discrete mathematics. Specifically, I will show how each project was initiated, and what wrong approaches were applied. Such details are important in learning how to do research, but they cannot be read out from research papers. I wish that students and junior researchers in applied mathematics could learn a lesson from this summary.

preprint2023arXiv

One RING to Rule Them All: Radon Sinogram for Place Recognition, Orientation and Translation Estimation

LiDAR-based global localization is a fundamental problem for mobile robots. It consists of two stages, place recognition and pose estimation, which yields the current orientation and translation, using only the current scan as query and a database of map scans. Inspired by the definition of a recognized place, we consider that a good global localization solution should keep the pose estimation accuracy with a lower place density. Following this idea, we propose a novel framework towards sparse place-based global localization, which utilizes a unified and learning-free representation, Radon sinogram (RING), for all sub-tasks. Based on the theoretical derivation, a translation invariant descriptor and an orientation invariant metric are proposed for place recognition, achieving certifiable robustness against arbitrary orientation and large translation between query and map scan. In addition, we also utilize the property of RING to propose a global convergent solver for both orientation and translation estimation, arriving at global localization. Evaluation of the proposed RING based framework validates the feasibility and demonstrates a superior performance even under a lower place density.

preprint2023arXiv

Optimal subsampling algorithm for composite quantile regression with distributed data

For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities and the optimal allocation sizes under the L-optimality criteria. A two-step algorithm to approximate the optimal subsampling procedure is developed. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

preprint2023arXiv

Super-Resolution Harmonic Retrieval of Non-Circular Signals

This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively. Accordingly, the augmented covariance matrix constructed by the covariance and pseudo-covariance matrices is not only low rank but also jointly Toeplitz-Hankel structured. To efficiently exploit such a desired structure for high estimation accuracy, we develop a low-rank Toeplitz-Hankel covariance reconstruction (LRTHCR) solution employed over the augmented covariance matrix. Further, we design a fitting error constraint to flexibly implement the LRTHCR algorithm without knowing the noise statistics. In addition, performance analysis is provided for the proposed LRTHCR in practical settings. Simulation results reveal that the LRTHCR outperforms the benchmark methods in terms of lower estimation errors.

preprint2022arXiv

Academic Resource Text Level Multi-label Classification based on Attention

Hierarchical multi-label academic text classification (HMTC) is to assign academic texts into a hierarchically structured labeling system. We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure, the academic documents are classified into the most relevant categories. We utilize word2vec and BiLSTM to obtain embedding and latent vector representations of text, keywords, and hierarchies. We use hierarchical attention mechanism to capture the associations between keywords, label hierarchies, and text word vectors to generate hierarchical-specific document embedding vectors to replace the original text embeddings in HMCN-F. The experimental results on the academic text dataset demonstrate the effectiveness of the AHMCA algorithm.

preprint2022arXiv

Accurate Portraits of Scientific Resources and Knowledge Service Components

With the advent of the cloud computing era, the cost of creating, capturing and managing information has gradually decreased. The amount of data in the Internet is also showing explosive growth, and more and more scientific and technological resources are uploaded to the network. Different from news and social media data ubiquitous in the Internet, the main body of scientific and technological resources is composed of academic-style resources or entities such as papers, patents, authors, and research institutions. There is a rich relationship network between resources, from which a large amount of cutting-edge scientific and technological information can be mined. There are a large number of management and classification standards for existing scientific and technological resources, but these standards are difficult to completely cover all entities and associations of scientific and technological resources, and cannot accurately extract important information contained in scientific and technological resources. How to construct a complete and accurate representation of scientific and technological resources from structured and unstructured reports and texts in the network, and how to tap the potential value of scientific and technological resources is an urgent problem. The solution is to construct accurate portraits of scientific and technological resources in combination with knowledge graph related technologies.

preprint2022arXiv

Adaptive Algorithm for Quantum Amplitude Estimation

Quantum amplitude estimation is a key sub-routine of a number of quantum algorithms with various applications. We propose an adaptive algorithm for interval estimation of amplitudes. The quantum part of the algorithm is based only on Grover's algorithm. The key ingredient is the introduction of an adjustment factor, which adjusts the amplitude of good states such that the amplitude after the adjustment, and the original amplitude, can be estimated without ambiguity in the subsequent step. We show with numerical studies that the proposed algorithm uses a similar number of quantum queries to achieve the same level of precision $ε$ compared to state-of-the-art algorithms, but the classical part, i.e., the non-quantum part, has substantially lower computational complexity. We rigorously prove that the number of oracle queries achieves $O(1/ε)$, i.e., a quadratic speedup over classical Monte Carlo sampling, and the computational complexity of the classical part achieves $O(\log(1/ε))$, both up to a double-logarithmic factor.

preprint2022arXiv

Algorithms for determining transposons in gene sequences

Some genes can change their relative locations in a genome. Thus for different individuals of the same species, the orders of genes might be different. Such jumping genes are called transposons. A practical problem is to determine transposons in given gene sequences. Through an intuitive rule, we transform the biological problem of determining transposons into a rigorous mathematical problem of determining the longest common subsequence. Depending on whether the gene sequence is linear (each sequence has a fixed head and tail) or circular (we can choose any gene as the head, and the previous one is the tail), and whether genes have multiple copies, we classify the problem of determining transposons into four scenarios: (1) linear sequences without duplicated genes; (2) circular sequences without duplicated genes; (3) linear sequences with duplicated genes; (4) circular sequences with duplicated genes. With the help of graph theory, we design fast algorithms for different scenarios. We also derive some results that might be of theoretical interests in combinatorics.

preprint2022arXiv

Bayesian Optimization Based Trustworthiness Model for Multi-robot Bounding Overwatch

In multi-robot system (MRS) bounding overwatch, it is crucial to determine which point to choose for overwatch at each step and whether the robots' positions are trustworthy so that the overwatch can be performed effectively. In this paper, we develop a Bayesian optimization based computational trustworthiness model (CTM) for the MRS to select overwatch points. The CTM can provide real-time trustworthiness evaluation for the MRS on the overwatch points by referring to the robots' situational awareness information, such as traversability and line of sight. The evaluation can quantify each robot's trustworthiness in protecting its robot team members during the bounding overwatch. The trustworthiness evaluation can generate a dynamic cost map for each robot in the workspace and help obtain the most trustworthy bounding overwatch path. Our proposed Bayesian based CTM and motion planning can reduce the number of explorations for the workspace in data collection and improve the CTM learning efficiency. It also enables the MRS to deal with the dynamic and uncertain environments for the multi-robot bounding overwatch task. A robot simulation is implemented in ROS Gazebo to demonstrate the effectiveness of the proposed framework.

preprint2022arXiv

BEV-SGD: Best Effort Voting SGD for Analog Aggregation Based Federated Learning against Byzantine Attackers

As a promising distributed learning technology, analog aggregation based federated learning over the air (FLOA) provides high communication efficiency and privacy provisioning under the edge computing paradigm. When all edge devices (workers) simultaneously upload their local updates to the parameter server (PS) through commonly shared time-frequency resources, the PS obtains the averaged update only rather than the individual local ones. While such a concurrent transmission and aggregation scheme reduces the latency and communication costs, it unfortunately renders FLOA vulnerable to Byzantine attacks. Aiming at Byzantine-resilient FLOA, this paper starts from analyzing the channel inversion (CI) mechanism that is widely used for power control in FLOA. Our theoretical analysis indicates that although CI can achieve good learning performance in the benign scenarios, it fails to work well with limited defensive capability against Byzantine attacks. Then, we propose a novel scheme called the best effort voting (BEV) power control policy that is integrated with stochastic gradient descent (SGD). Our BEV-SGD enhances the robustness of FLOA to Byzantine attacks, by allowing all the workers to send their local updates at their maximum transmit power. Under worst-case attacks, we derive the expected convergence rates of FLOA with CI and BEV power control policies, respectively. The rate comparison reveals that our BEV-SGD outperforms its counterpart with CI in terms of better convergence behavior, which is verified by experimental simulations.

preprint2022arXiv

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations. It is challenging to enrich a KG with newly harvested triples while maintaining the quality of the knowledge representation. This paper proposes a system to refine a KG using information harvested from an additional corpus. To this end, we formulate our task as two coupled sub-tasks, namely join event extraction (JEE) and knowledge graph fusion (KGF). We then propose a Collaborative Knowledge Graph Fusion Framework to allow our sub-tasks to mutually assist one another in an alternating manner. More concretely, the explorer carries out the JEE supervised by both the ground-truth annotation and an existing KG provided by the supervisor. The supervisor then evaluates the triples extracted by the explorer and enriches the KG with those that are highly ranked. To implement this evaluation, we further propose a Translated Relation Alignment Scoring Mechanism to align and translate the extracted triples to the prior KG. Experiments verify that this collaboration can both improve the performance of the JEE and the KGF.

preprint2022arXiv

Context-Based MEC Platform for Augmented-Reality Services in 5G Networks

Augmented reality (AR) has drawn great attention in recent years. However, current AR devices have drawbacks, e.g., weak computation ability and large power consumption. To solve the problem, mobile edge computing (MEC) can be introduced as a key technology to offload data and computation from AR devices to MEC servers via 5th Generation Mobile Communication Technology (5G) networks. To this end, a context-based MEC platform for AR services in 5G networks is proposed in this paper. On the platform, MEC is employed as a data processing center while AR devices are simplified as universal input/output devices, which overcomes their limitations and achieves better user experience. Moreover, the proof-of-concept (PoC) hardware prototype of the platform, and two typical use cases providing AR services of navigation and face recognition respectively are implemented to demonstrate the feasibility and effectiveness of the platform. Finally, the performance of the platform is also numerically evaluated, and the results validate the system design and agree well with the design expectations.

preprint2022arXiv

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

Navier-Stokes equations are significant partial differential equations that describe the motion of fluids such as liquids and air. Due to the importance of Navier-Stokes equations, the development on efficient numerical schemes is important for both science and engineer. Recently, with the development of AI techniques, several approaches have been designed to integrate deep neural networks in simulating and inferring the fluid dynamics governed by incompressible Navier-Stokes equations, which can accelerate the simulation or inferring process in a mesh-free and differentiable way. In this paper, we point out that the capability of existing deep Navier-Stokes informed methods is limited to handle non-smooth or fractional equations, which are two critical situations in reality. To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation. Specifically, the random vortex dynamics motivates a Monte Carlo based loss function for training the neural network, which avoids the calculation of derivatives through auto-differentiation. Therefore, DRVM not only can efficiently solve Navier-Stokes equations involving rough path, non-differentiable initial conditions and fractional operators, but also inherits the mesh-free and differentiable benefits of the deep-learning-based solver. We conduct experiments on the Cauchy problem, parametric solver learning, and the inverse problem of both 2-d and 3-d incompressible Navier-Stokes equations. The proposed method achieves accurate results for simulation and inference of Navier-Stokes equations. Especially for the cases that include singular initial conditions, DRVM significantly outperforms existing PINN method.

preprint2022arXiv

Depth-Independent Depth Completion via Least Square Estimation

The depth completion task aims to complete a per-pixel dense depth map from a sparse depth map. In this paper, we propose an efficient least square based depth-independent method to complete the sparse depth map utilizing the RGB image and the sparse depth map in two independent stages. In this way can we decouple the neural network and the sparse depth input, so that when some features of the sparse depth map change, such as the sparsity, our method can still produce a promising result. Moreover, due to the positional encoding and linear procession in our pipeline, we can easily produce a super-resolution dense depth map of high quality. We also test the generalization of our method on different datasets compared to some state-of-the-art algorithms. Experiments on the benchmark show that our method produces competitive performance.

preprint2022arXiv

DXQ-Net: Differentiable LiDAR-Camera Extrinsic Calibration Using Quality-aware Flow

Accurate LiDAR-camera extrinsic calibration is a precondition for many multi-sensor systems in mobile robots. Most calibration methods rely on laborious manual operations and calibration targets. While working online, the calibration methods should be able to extract information from the environment to construct the cross-modal data association. Convolutional neural networks (CNNs) have powerful feature extraction ability and have been used for calibration. However, most of the past methods solve the extrinsic as a regression task, without considering the geometric constraints involved. In this paper, we propose a novel end-to-end extrinsic calibration method named DXQ-Net, using a differentiable pose estimation module for generalization. We formulate a probabilistic model for LiDAR-camera calibration flow, yielding a prediction of uncertainty to measure the quality of LiDAR-camera data association. Testing experiments illustrate that our method achieves a competitive with other methods for the translation component and state-of-the-art performance for the rotation component. Generalization experiments illustrate that the generalization performance of our method is significantly better than other deep learning-based methods.

preprint2022arXiv

Efficient Distance-Optimal Tethered Path Planning in Planar Environments: The Workspace Convexity

The main contribution of this paper is the proof of the convexity of the omni-directional tethered robot workspace (namely, the set of all tether-length-admissible robot configurations), as well as a set of distance-optimal tethered path planning algorithms that leverage the workspace convexity. The workspace is proven to be topologically a simply-connected subset and geometrically a convex subset of the set of all configurations. As a direct result, the tether-length-admissible optimal path between two configurations is proven exactly the untethered collision-free locally shortest path in the homotopy specified by the concatenation of the tether curve of the given configurations, which can be simply constructed by performing an untethered path shortening process in the 2D environment instead of a path searching process in the pre-calculated workspace. The convexity is an intrinsic property to the tethered robot kinematics, thus has universal impacts on all high-level distance-optimal tethered path planning tasks: The most time-consuming workspace pre-calculation (WP) process is replaced with a goal configuration pre-calculation (GCP) process, and the homotopy-aware path searching process is replaced with untethered path shortening processes. Motivated by the workspace convexity, efficient algorithms to solve the following problems are naturally proposed: (a) The optimal tethered reconfiguration (TR) planning problem is solved by a locally untethered path shortening (UPS) process, (b) The classic optimal tethered path (TP) planning problem (from a starting configuration to a goal location whereby the target tether state is not assigned) is solved by a GCP process and $n$ UPS processes, where $n$ is the number of tether-length-admissible configurations that visit the goal location, (c) The optimal tethered motion to visit a sequence of multiple goal locations, referred to as

preprint2022arXiv

Efficient Object Manipulation to an Arbitrary Goal Pose: Learning-based Anytime Prioritized Planning

We focus on the task of object manipulation to an arbitrary goal pose, in which a robot is supposed to pick an assigned object to place at the goal position with a specific orientation. However, limited by the execution space of the manipulator with gripper, one-step picking, moving and releasing might be failed, where a reorientation object pose is required as a transition. In this paper, we propose a learning-driven anytime prioritized search-based solver to find a feasible solution with low path cost in a short time. In our work, the problem is formulated as a hierarchical learning problem, with the high level finding a reorientation object pose, and the low level planning paths between adjacent grasps. We learn an offline-training path cost estimator to predict approximate path planning costs, which serve as pseudo rewards to allow for pre-training the high-level planner without interacting with the simulator. To deal with the problem of distribution mismatch of the cost net and the actual execution cost space, a refined training stage is conducted with simulation interaction. A series of experiments carried out in simulation and real world indicate that our system can achieve better performances in the object manipulation task with less time and less cost.

preprint2022arXiv

Efficient Search of the k Shortest Non-Homotopic Paths by Eliminating Non-k-Optimal Topologies

An efficient algorithm to solve the $k$ shortest non-homotopic path planning ($k$-SNPP) problem in a 2D environment is proposed in this paper. Motivated by accelerating the inefficient exploration of the homotopy-augmented space of the 2D environment, our fundamental idea is to identify the non-$k$-optimal path topologies as early as possible and terminate the pathfinding along them. This is a non-trivial practice because it has to be done at an intermediate state of the path planning process when locally shortest paths have not been fully constructed. In other words, the paths to be compared have not rendezvoused at the goal location, which makes the homotopy theory, modelling the spatial relationship among the paths having the same endpoint, not applicable. This paper is the first work that develops a systematic distance-based topology simplification mechanism to solve the $k$-SNPP task, whose core contribution is to assert the distance-based order of non-homotopic locally shortest paths before constructing them. If the order can be predicted, then those path topologies having more than $k$ better topologies are proven free of the desired $k$ paths and thus can be safely discarded during the path planning process. To this end, a hierarchical topological tree is proposed as an implementation of the mechanism, whose nodes are proven to expand in non-homotopic directions and edges (collision-free path segments) are proven locally shortest. With efficient criteria that observe the order relations between partly constructed locally shortest paths being imparted into the tree, the tree nodes that expand in non-$k$-optimal topologies will not be expanded. As a result, the computational time for solving the $k$-SNPP problem is reduced by near two orders of magnitude.

preprint2022arXiv

Enhancing the Erdős-Lovász Tihany Conjecture for line graphs of multigraphs

In this paper, we prove an enhanced version of the Erdős-Lovász Tihany Conjecture for line graphs of multigraphs. That is, for every graph $G$ whose chromatic number $χ(G)$ is more than its clique number $ω(G)$ and for nonnegative integer $\ell$, any two integers $s,t \geq 3.5\ell+2$ with $s+t = χ(G)+1$, there is a partition $(S,T)$ of the vertex set $V(G)$ such that $χ(G[S])\geq s$ and $χ(G[T])\geq t+\ell$. In particular, when $\ell=1$, we can obtain the same result just for any $s,t\geq4$. The Erdős-Lovász Tihany conjecture is a special case when $\ell=0$.

preprint2022arXiv

EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We introduce EventNarrative, a knowledge graph-to-text dataset from publicly available open-world knowledge graphs. Given the recent advances in event-driven Information Extraction (IE), and that prior research on graph-to-text only focused on entity-driven KGs, this paper focuses on event-centric data. However, our data generation system can still be adapted to other other types of KG data. Existing large-scale datasets in the graph-to-text area are non-parallel, meaning there is a large disconnect between the KGs and text. The datasets that have a paired KG and text, are small scale and manually generated or generated without a rich ontology, making the corresponding graphs sparse. Furthermore, these datasets contain many unlinked entities between their KG and text pairs. EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset. It makes use of a rich ontology, all of the KGs entities are linked to the text, and our manual annotations confirm a high data quality. Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset in order to better evaluate existing and future knowledge graph-to-text models. We also evaluate two types of baseline on EventNarrative: a graph-to-text specific model and two state-of-the-art language models, which previous work has shown to be adaptable to the knowledge graph-to-text domain.

preprint2022arXiv

Feature-based Style Randomization for Domain Generalization

As a recent noticeable topic, domain generalization (DG) aims to first learn a generic model on multiple source domains and then directly generalize to an arbitrary unseen target domain without any additional adaption. In previous DG models, by generating virtual data to supplement observed source domains, the data augmentation based methods have shown its effectiveness. To simulate the possible unseen domains, most of them enrich the diversity of original data via image-level style transformation. However, we argue that the potential styles are hard to be exhaustively illustrated and fully augmented due to the limited referred styles, leading the diversity could not be always guaranteed. Unlike image-level augmentation, we in this paper develop a simple yet effective feature-based style randomization module to achieve feature-level augmentation, which can produce random styles via integrating random noise into the original style. Compared with existing image-level augmentation, our feature-level augmentation favors a more goal-oriented and sample-diverse way. Furthermore, to sufficiently explore the efficacy of the proposed module, we design a novel progressive training strategy to enable all parameters of the network to be fully trained. Extensive experiments on three standard benchmark datasets, i.e., PACS, VLCS and Office-Home, highlight the superiority of our method compared to the state-of-the-art methods.

preprint2022arXiv

FEJ-VIRO: A Consistent First-Estimate Jacobian Visual-Inertial-Ranging Odometry

In recent years, Visual-Inertial Odometry (VIO) has achieved many significant progresses. However, VIO methods suffer from localization drift over long trajectories. In this paper, we propose a First-Estimates Jacobian Visual-Inertial-Ranging Odometry (FEJ-VIRO) to reduce the localization drifts of VIO by incorporating ultra-wideband (UWB) ranging measurements into the VIO framework \textit{consistently}. Considering that the initial positions of UWB anchors are usually unavailable, we propose a long-short window structure to initialize the UWB anchors' positions as well as the covariance for state augmentation. After initialization, the FEJ-VIRO estimates the UWB anchors' positions simultaneously along with the robot poses. We further analyze the observability of the visual-inertial-ranging estimators and proved that there are \textit{four} unobservable directions in the ideal case, while one of them vanishes in the actual case due to the gain of spurious information. Based on these analyses, we leverage the FEJ technique to enforce the unobservable directions, hence reducing inconsistency of the estimator. Finally, we validate our analysis and evaluate the proposed FEJ-VIRO with both simulation and real-world experiments.

preprint2022arXiv

HDMapNet: An Online HD Map Construction and Evaluation Framework

Constructing HD semantic maps is a central component of autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of HD semantic map learning, which dynamically constructs the local semantics based on onboard sensor observations. Meanwhile, we introduce a semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our camera-LiDAR fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem.

preprint2022arXiv

Inference on autoregulation in gene expression

Some genes can promote or repress their own expressions, which is called autoregulation. Although gene regulation is a central topic in biology, autoregulation is much less studied. In general, it is extremely difficult to determine the existence of autoregulation with direct biochemical approaches. Nevertheless, some papers have observed that certain types of autoregulations are linked to noise levels in gene expression. We generalize these results by two propositions on discrete-state continuous-time Markov chains. These two propositions form a simple but robust method to infer the existence of autoregulation from gene expression data. This method only needs to compare the mean and variance of the gene expression level. Compared to other methods for inferring autoregulation, our method only requires non-interventional one-time data, and does not need to estimate parameters. Besides, our method has few restrictions on the model. We apply this method to four groups of experimental data and find some genes that might have autoregulation. Some inferred autoregulations have been verified by experiments or other theoretical works.

preprint2022arXiv

Inference on the structure of gene regulatory networks

In this paper, we conduct theoretical analyses on inferring the structure of gene regulatory networks. Depending on the experimental method and data type, the inference problem is classified into 20 different scenarios. For each scenario, we discuss the problem that with enough data, under what assumptions, what can be inferred about the structure. For scenarios that have been covered in the literature, we provide a brief review. For scenarios that have not been covered in literature, if the structure can be inferred, we propose new mathematical inference methods and evaluate them on simulated data. Otherwise, we prove that the structure cannot be inferred.

preprint2022arXiv

Kinematic Motion Retargeting via Neural Latent Optimization for Learning Sign Language

Motion retargeting from a human demonstration to a robot is an effective way to reduce the professional requirements and workload of robot programming, but faces the challenges resulting from the differences between humans and robots. Traditional optimization-based methods are time-consuming and rely heavily on good initialization, while recent studies using feedforward neural networks suffer from poor generalization to unseen motions. Moreover, they neglect the topological information in human skeletons and robot structures. In this paper, we propose a novel neural latent optimization approach to address these problems. Latent optimization utilizes a decoder to establish a mapping between the latent space and the robot motion space. Afterward, the retargeting results that satisfy robot constraints can be obtained by searching for the optimal latent vector. Alongside with latent optimization, neural initialization exploits an encoder to provide a better initialization for faster and better convergence of optimization. Both the human skeleton and the robot structure are modeled as graphs to make better use of topological information. We perform experiments on retargeting Chinese sign language, which involves two arms and two hands, with additional requirements on the relative relationships among joints. Experiments include retargeting various human demonstrations to YuMi, NAO, and Pepper in the simulation environment and to YuMi in the real-world environment. Both efficiency and accuracy of the proposed method are verified.

preprint2022arXiv

Learning to Fill the Seam by Vision: Sub-millimeter Peg-in-hole on Unseen Shapes in Real World

In the peg insertion task, human pays attention to the seam between the peg and the hole and tries to fill it continuously with visual feedback. By imitating the human behavior, we design architectures with position and orientation estimators based on the seam representation for pose alignment, which proves to be general to the unseen peg geometries. By putting the estimators into the closed-loop control with reinforcement learning, we further achieve a higher or comparable success rate, efficiency, and robustness compared with the baseline methods. The policy is trained totally in simulation without any manual intervention. To achieve sim-to-real, a learnable segmentation module with automatic data collecting and labeling can be easily trained to decouple the perception and the policy, which helps the model trained in simulation quickly adapt to the real world with negligible effort. Results are presented in simulation and on a physical robot. Code, videos, and supplemental material are available at https://github.com/xieliang555/SFN.git

preprint2022arXiv

Map-based Visual-Inertial Localization: Consistency and Complexity

Drift-free localization is essential for autonomous vehicles. In this paper, we address the problem by proposing a filter-based framework, which integrates the visual-inertial odometry and the measurements of the features in the pre-built map. In this framework, the transformation between the odometry frame and the map frame is augmented into the state and estimated on the fly. Besides, we maintain only the keyframe poses in the map and employ Schmidt extended Kalman filter to update the state partially, so that the uncertainty of the map information can be consistently considered with low computational cost. Moreover, we theoretically demonstrate that the ever-changing linearization points of the estimated state can introduce spurious information to the augmented system and make the original four-dimensional unobservable subspace vanish, leading to inconsistent estimation in practice. To relieve this problem, we employ first-estimate Jacobian (FEJ) to maintain the correct observability properties of the augmented system. Furthermore, we introduce an observability-constrained updating method to compensate for the significant accumulated error after the long-term absence (can be 3 minutes and 1 km) of map-based measurements. Through simulations, the consistent estimation of our proposed algorithm is validated. Through real-world experiments, we demonstrate that our proposed algorithm runs successfully on four kinds of datasets with the lower computational cost (20% time-saving) and the better estimation accuracy (45% trajectory error reduction) compared with the baseline algorithm VINS-Fusion, whereas VINS-Fusion fails to give bounded localization performance on three of four datasets because of its inconsistent estimation.

preprint2022arXiv

MBGDT:Robust Mini-Batch Gradient Descent

In high dimensions, most machine learning method perform fragile even there are a little outliers. To address this, we hope to introduce a new method with the base learner, such as Bayesian regression or stochastic gradient descent to solve the problem of the vulnerability in the model. Because the mini-batch gradient descent allows for a more robust convergence than the batch gradient descent, we work a method with the mini-batch gradient descent, called Mini-Batch Gradient Descent with Trimming (MBGDT). Our method show state-of-art performance and have greater robustness than several baselines when we apply our method in designed dataset.

preprint2022arXiv

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics. Neural Operators, generations of neural networks with capability of learning maps between infinite-dimensional spaces, are strong tools for solving parametric PDEs. However, they lack the ability to modeling SPDEs which usually have poor regularity due to the driving noise. As the theory of regularity structure has achieved great successes in analyzing SPDEs and provides the concept model feature vectors that well-approximate SPDEs' solutions, we propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs. We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d stochastic Navier-Stokes equation, and the results demonstrate that the NORS is resolution-invariant, efficient, and achieves one order of magnitude lower error with a modest amount of data.

preprint2022arXiv

Observation of one-dimensional Dirac fermions in silicon nanoribbons

Dirac materials, which feature Dirac cones in the reciprocal space, have been one of the hottest topics in condensed matter physics in the past decade. To date, 2D and 3D Dirac Fermions have been extensively studied, while their 1D counterparts are rare. Recently, Si nanoribbons (SiNRs), which are composed of alternating pentagonal Si rings, have attracted intensive attention. However, the electronic structure and topological properties of SiNRs are still elusive. Here, by angle-resolved photoemission spectroscopy, scanning tunneling microscopy/spectroscopy measurements, first-principles calculations, and tight-binding model analysis, we demonstrate the existence of 1D Dirac Fermions in SiNRs. Our theoretical analysis shows that the Dirac cones derive from the armchairlike Si chain in the center of the nanoribbon and can be described by the Su-Schrieffer-Heeger model. These results establish SiNRs as a platform for studying the novel physical properties in 1D Dirac materials.

preprint2022arXiv

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method and show that to achieve an $ε$-global optimum, the complexity is $\mathcal O(ε^{-3})$. We then extend our methodology to the general model-free setting and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide simulation results to demonstrate the robustness of our methods.

preprint2022arXiv

Predicting Stock Price Movement after Disclosure of Corporate Annual Reports: A Case Study of 2021 China CSI 300 Stocks

In the current stock market, computer science and technology are more and more widely used to analyse stocks. Not same as most related machine learning stock price prediction work, this work study the predicting the tendency of the stock price on the second day right after the disclosure of the companies' annual reports. We use a variety of different models, including decision tree, logistic regression, random forest, neural network, prototypical networks. We use two sets of financial indicators (key and expanded) to conduct experiments, these financial indicators are obtained from the EastMoney website disclosed by companies, and finally we find that these models are not well behaved to predict the tendency. In addition, we also filter stocks with ROE greater than 0.15 and net cash ratio greater than 0.9. We conclude that according to the financial indicators based on the just-released annual report of the company, the predictability of the stock price movement on the second day after disclosure is weak, with maximum accuracy about 59.6% and maximum precision about 0.56 on our test set by the random forest classifier, and the stock filtering does not improve the performance. And random forests perform best in general among all these models which conforms to some work's findings.

preprint2022arXiv

QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM

This paper focuses on online kernel learning over a decentralized network. Each agent in the network receives continuous streaming data locally and works collaboratively to learn a nonlinear prediction function that is globally optimal in the reproducing kernel Hilbert space with respect to the total instantaneous costs of all agents. In order to circumvent the curse of dimensionality issue in traditional online kernel learning, we utilize random feature (RF) mapping to convert the non-parametric kernel learning problem into a fixed-length parametric one in the RF space. We then propose a novel learning framework named Online Decentralized Kernel learning via Linearized ADMM (ODKLA) to efficiently solve the online decentralized kernel learning problem. To further improve the communication efficiency, we add the quantization and censoring strategies in the communication stage and develop the Quantized and Communication-censored ODKLA (QC-ODKLA) algorithm. We theoretically prove that both ODKLA and QC-ODKLA can achieve the optimal sublinear regret $\mathcal{O}(\sqrt{T})$ over $T$ time slots. Through numerical experiments, we evaluate the learning effectiveness, communication, and computation efficiencies of the proposed methods.

preprint2022arXiv

Recycling of Perovskite Substrate

The use of water-soluble sacrificial layer of Sr$_3$Al$_2$O$_6$ has tremendously boosted the research on freestanding functional oxide thin films, especially thanks to its ultimate capability to produce high-quality epitaxial perovskite thin films. However, the costly single-crystalline substrates, e.g. SrTiO$_3$, were generally discarded after obtaining the freestanding thin films. Here, we demonstrate that the SrTiO$_3$ substrates can be recycled to fabricate La$_{0.7}$Sr$_{0.3}$MnO$_3$ films with nearly identical structural and electrical properties. After attaining freestanding thin films, the residues on SrTiO$_3$ can be removed by 80 \degree C hot water soaking and rinsing treatments. Consequently, the surface of SrTiO$_3$ reverted to its original step-and-terrace structure.

preprint2022arXiv

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $δ$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

preprint2022arXiv

Searching for multiple populations in star clusters using the China Space Station Telescope

Multiple stellar populations (MPs) in most star clusters older than 2 Gyr, as seen by lots of spectroscopic and photometric studies, have led to a significant challenge to the traditional view of star formation. In this field, space-based instruments, in particular the Hubble Space Telescope (HST), have made a breakthrough as they significantly improved the efficiency of detecting MPs in crowding stellar fields by images. The China Space Station Telescope (CSST) and the HST are sensitive to a similar wavelength interval, but it covers a field of view which is about 5-8 times wider than that of HST. One of its instruments, the Multi-Channel Imager (MCI), will have multiple filters covering a wide wavelength range from NUV to NIR, making the CSST a potentially powerful tool for studying MPs in clusters. In this work, we evaluate the efficiency of the designed filters for the MCI/CSST in revealing MPs in different color-magnitude diagrams (CMDs). We find that CMDs made with MCI/CSST photometry in appropriate UV filters are powerful tools to disentangle stellar populations with different abundances of He, C, N, O and Mg. On the contrary, the traditional CMDs are blind to multiple populations in globular clusters (GCs). We show that CSST has the potential of being the spearhead instrument for investigating MPs in GCs in the next decades.

preprint2022arXiv

Tac3D: A Novel Vision-based Tactile Sensor for Measuring Forces Distribution and Estimating Friction Coefficient Distribution

The importance of force perception in interacting with the environment was proven years ago. However, it is still a challenge to measure the contact force distribution accurately in real-time. In order to break through this predicament, we propose a new vision-based tactile sensor, the Tac3D sensor, for measuring the three-dimensional contact surface shape and contact force distribution. In this work, virtual binocular vision is first applied to the tactile sensor, which allows the Tac3D sensor to measure the three-dimensional tactile information in a simple and efficient way and has the advantages of simple structure, low computational costs, and inexpensive. Then, we used contact surface shape and force distribution to estimate the friction coefficient distribution in contact region. Further, combined with the global position of the tactile sensor, the 3D model of the object with friction coefficient distribution is reconstructed. These reconstruction experiments not only demonstrate the excellent performance of the Tac3D sensor but also imply the possibility to optimize the action planning in grasping based on the friction coefficient distribution of the object.

preprint2022arXiv

TomoSAR-ALISTA: Efficient TomoSAR Imaging via Deep Unfolded Network

Synthetic aperture radar (SAR) tomography (TomoSAR) has attracted remarkable interest for its ability in achieving three-dimensional reconstruction along the elevation direction from multiple observations. In recent years, compressed sensing (CS) technique has been introduced into TomoSAR considering for its super-resolution ability with limited samples. Whereas, the CS-based methods suffer from several drawbacks, including weak noise resistance, high computational complexity and complex parameter fine-tuning. Among the different CS algorithms, iterative soft-thresholding algorithm (ISTA) is widely used as a robust reconstruction approach, however, the parameters in the ISTA algorithm are manually chosen, which usually requires a time-consuming fine-tuning process to achieve the best performance. Aiming at efficient TomoSAR imaging, a novel sparse unfolding network named analytic learned ISTA (ALISTA) is proposed towards the TomoSAR imaging problem in this paper, and the key parameters of ISTA are learned from training data via deep learning to avoid complex parameter fine-tuning and significantly relieves the training burden. In addition, experiments verify that it is feasible to use traditional CS algorithms as training labels, which provides a tangible supervised training method to achieve better 3D reconstruction performance even in the absence of labeled data in real applications.

preprint2022arXiv

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

This paper focuses on designing a consistent and efficient filter for map-based visual-inertial localization. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of the map information, the proposed invariant EKF can naturally maintain the correct observability properties of the system. To consider the uncertainty of the map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of the map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of the map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real-world experiments.

preprint2022arXiv

Towards Two-view 6D Object Pose Estimation: A Comparative Study on Fusion Strategy

Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications. However, predicting 6D pose from single 2D image features is susceptible to disturbance from changing of environment and textureless or resemblant object surfaces. Hence, RGB-based methods generally achieve less competitive results than RGBD-based methods, which deploy both image features and 3D structure features. To narrow down this performance gap, this paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images. Combining the learned 3D information and 2D image features, we establish more stable correspondence between the scene and the object models. To seek for the methods best utilizing 3D information from RGB inputs, we conduct an investigation on three different approaches, including Early- Fusion, Mid-Fusion, and Late-Fusion. We ascertain the Mid- Fusion approach is the best approach to restore the most precise 3D keypoints useful for object pose estimation. The experiments show that our method outperforms state-of-the-art RGB-based methods, and achieves comparable results with RGBD-based methods.

preprint2022arXiv

Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value. With the aid of gravity alignment, the degree of freedom in point cloud registration could be reduced to 4DoF, in which only the heading angle is required for rotation estimation. In this paper, we propose a fast and accurate global heading angle estimation method for gravity-aligned point clouds. Our key idea is that we generate a translation invariant representation based on Radon Transform, allowing us to solve the decoupled heading angle globally with circular cross-correlation. Besides, for heading angle estimation between point clouds with different distributions, we implement this heading angle estimator as a differentiable module to train a feature extraction network end- to-end. The experimental results validate the effectiveness of the proposed method in heading angle estimation and show better performance compared with other methods.

preprint2022arXiv

Trust-based Symbolic Motion Planning for Multi-robot Bounding Overwatch

Multi-robot bounding overwatch requires timely coordination of robot team members. Symbolic motion planning (SMP) can provide provably correct solutions for robot motion planning with high-level temporal logic task requirements. This paper aims to develop a framework for safe and reliable SMP of multi-robot systems (MRS) to satisfy complex bounding overwatch tasks constrained by temporal logics. A decentralized SMP framework is first presented, which guarantees both correctness and parallel execution of the complex bounding overwatch tasks by the MRS. A computational trust model is then constructed by referring to the traversability and line of sight of robots in the terrain. The trust model predicts the trustworthiness of each robot team's potential behavior in executing a task plan. The most trustworthy task and motion plan is explored with a Dijkstra searching strategy to guarantee the reliability of MRS bounding overwatch. A robot simulation is implemented in ROS Gazebo to demonstrate the effectiveness of the proposed framework.

preprint2022arXiv

Two Metrics on Rooted Unordered Trees with Labels

The early development of a zygote can be mathematically described by a developmental tree. To compare developmental trees of different species, we need to define distances on trees. If children cells after a division are not distinguishable, developmental trees are represented by the space $\mathcal{T}$ of rooted trees with possibly repeated labels, where all vertices are unordered. If children cells after a division are partially distinguishable, developmental trees are represented by the space $\mathcal{P}$ of rooted trees with possibly repeated labels, where vertices can be ordered or unordered. On $\mathcal{T}$, the space of rooted unordered trees with possibly repeated labels, we define two metrics: the best-match metric and the left-regular metric, which show some advantages over existing methods. On $\mathcal{P}$, the space of rooted labeled trees with ordered or unordered vertices, there is no metric, and we define a semimetric, which is a variant of the best-match metric. To compute the best-match distance between two trees, the expected time complexity and worst-case time complexity are both $\mathcal{O}(n^2)$, where $n$ is the tree size. To compute the left-regular distance between two trees, the expected time complexity is $\mathcal{O}(n)$, and the worst-case time complexity is $\mathcal{O}(n\log n)$. For rooted labeled trees with (fully/partially) unordered vertices, we define metrics (semimetric) that have fast algorithms to compute and have advantages over existing methods. Such trees also appear outside of developmental biology, and such metrics can be applied to other types of trees which have more extensive applications, especially in molecular biology.

preprint2022arXiv

Writing Style Aware Document-level Event Extraction

Event extraction, the technology that aims to automatically get the structural information from documents, has attracted more and more attention in many fields. Most existing works discuss this issue with the token-level multi-label classification framework by distinguishing the tokens as different roles while ignoring the writing styles of documents. The writing style is a special way of content organizing for documents and it is relative fixed in documents with a special field (e.g. financial, medical documents, etc.). We argue that the writing style contains important clues for judging the roles for tokens and the ignorance of such patterns might lead to the performance degradation for the existing works. To this end, we model the writing style in documents as a distribution of argument roles, i.e., Role-Rank Distribution, and propose an event extraction model with the Role-Rank Distribution based Supervision Mechanism to capture this pattern through the supervised training process of an event extraction task. We compare our model with state-of-the-art methods on several real-world datasets. The empirical results show that our approach outperforms other alternatives with the captured patterns. This verifies the writing style contains valuable information that could improve the performance of the event extraction task.

preprint2021arXiv

Deep Samplable Observation Model for Global Localization and Kidnapping

Global localization and kidnapping are two challenging problems in robot localization. The popular method, Monte Carlo Localization (MCL) addresses the problem by iteratively updating a set of particles with a "sampling-weighting" loop. Sampling is decisive to the performance of MCL [1]. However, traditional MCL can only sample from a uniform distribution over the state space. Although variants of MCL propose different sampling models, they fail to provide an accurate distribution or generalize across scenes. To better deal with these problems, we present a distribution proposal model, named Deep Samplable Observation Model (DSOM). DSOM takes a map and a 2D laser scan as inputs and outputs a conditional multimodal probability distribution of the pose, making the samples more focusing on the regions with higher likelihood. With such samples, the convergence is expected to be more effective and efficient. Considering that the learning-based sampling model may fail to capture the true pose sometimes, we furthermore propose the Adaptive Mixture MCL (AdaM MCL), which deploys a trusty mechanism to adaptively select updating mode for each particle to tolerate this situation. Equipped with DSOM, AdaM MCL can achieve more accurate estimation, faster convergence and better scalability compared to previous methods in both synthetic and real scenes. Even in real environments with long-term changing, AdaM MCL is able to localize the robot using DSOM trained only by simulation observations from a SLAM map or a blueprint map.

preprint2021arXiv

DiSCO: Differentiable Scan Context with Orientation

Global localization is essential for robot navigation, of which the first step is to retrieve a query from the map database. This problem is called place recognition. In recent years, LiDAR scan based place recognition has drawn attention as it is robust against the appearance change. In this paper, we propose a LiDAR-based place recognition method, named Differentiable Scan Context with Orientation (DiSCO), which simultaneously finds the scan at a similar place and estimates their relative orientation. The orientation can further be used as the initial value for the down-stream local optimal metric pose estimation, improving the pose estimation especially when a large orientation between the current scan and retrieved scan exists. Our key idea is to transform the feature into the frequency domain. We utilize the magnitude of the spectrum as the place signature, which is theoretically rotation-invariant. In addition, based on the differentiable phase correlation, we can efficiently estimate the global optimal relative orientation using the spectrum. With such structural constraints, the network can be learned in an end-to-end manner, and the backbone is fully shared by the two tasks, achieving interpretability and light weight. Finally, DiSCO is validated on three datasets with long-term outdoor conditions, showing better performance than the compared methods.

preprint2021arXiv

Dynamic Movement Primitive based Motion Retargeting for Dual-Arm Sign Language Motions

We aim to develop an efficient programming method for equipping service robots with the skill of performing sign language motions. This paper addresses the problem of transferring complex dual-arm sign language motions characterized by the coordination among arms and hands from human to robot, which is seldom considered in previous studies of motion retargeting techniques. In this paper, we propose a novel motion retargeting method that leverages graph optimization and Dynamic Movement Primitives (DMPs) for this problem. We employ DMPs in a leader-follower manner to parameterize the original trajectories while preserving motion rhythm and relative movements between human body parts, and adopt a three-step optimization procedure to find deformed trajectories for robot motion planning while ensuring feasibility for robot execution. Experimental results of several Chinese Sign Language (CSL) motions have been successfully performed on ABB's YuMi dual-arm collaborative robot (14-DOF) with two 6-DOF Inspire-Robotics' multi-fingered hands, a system with 26 DOFs in total.

preprint2021arXiv

Imitation Learning of Hierarchical Driving Model: from Continuous Intention to Continuous Trajectory

One of the challenges to reduce the gap between the machine and the human level driving is how to endow the system with the learning capacity to deal with the coupled complexity of environments, intentions, and dynamics. In this paper, we propose a hierarchical driving model with explicit model of continuous intention and continuous dynamics, which decouples the complexity in the observation-to-action reasoning in the human driving data. Specifically, the continuous intention module takes the route planning map obtained by GPS and IMU, perception from a RGB camera and LiDAR as input to generate a potential map encoded with obstacles and intentions being expressed as grid based potentials. Then, the potential map is regarded as a condition, together with the current dynamics, to generate a continuous trajectory as output by a continuous function approximator network, whose derivatives can be used for supervision without additional parameters. Finally, we validate our method on both datasets and simulator, demonstrating that our method has higher prediction accuracy of displacement and velocity and generates smoother trajectories. The method is also deployed on the real vehicle with loop latency, validating its effectiveness. To the best of our knowledge, this is the first work to produce the driving trajectory using a continuous function approximator network.

preprint2021arXiv

Joint Optimization of Communications and Federated Learning Over the Air

Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy. Nonetheless, nonideal communication links and limited transmission resources may hinder the implementation of fast and accurate FL. In this paper, we study joint optimization of communications and FL based on analog aggregation transmission in realistic wireless networks. We first derive closed-form expressions for the expected convergence rate of FL over the air, which theoretically quantify the impact of analog aggregation on FL. Based on the analytical results, we develop a joint optimization model for accurate FL implementation, which allows a parameter server to select a subset of workers and determine an appropriate power scaling factor. Since the practical setting of FL over the air encounters unobservable parameters, we reformulate the joint optimization of worker selection and power allocation using controlled approximation. Finally, we efficiently solve the resulting mixed-integer programming problem via a simple yet optimal finite-set search method by reducing the search space. Simulation results show that the proposed solutions developed for realistic wireless analog channels outperform a benchmark method, and achieve comparable performance of the ideal case where FL is implemented over noise-free wireless channels.

preprint2021arXiv

Learn to Differ: Sim2Real Small Defection Segmentation Network

Recent studies on deep-learning-based small defection segmentation approaches are trained in specific settings and tend to be limited by fixed context. Throughout the training, the network inevitably learns the representation of the background of the training data before figuring out the defection. They underperform in the inference stage once the context changed and can only be solved by training in every new setting. This eventually leads to the limitation in practical robotic applications where contexts keep varying. To cope with this, instead of training a network context by context and hoping it to generalize, why not stop misleading it with any limited context and start training it with pure simulation? In this paper, we propose the network SSDS that learns a way of distinguishing small defections between two images regardless of the context, so that the network can be trained once for all. A small defection detection layer utilizing the pose sensitivity of phase correlation between images is introduced and is followed by an outlier masking layer. The network is trained on randomly generated simulated data with simple shapes and is generalized across the real world. Finally, SSDS is validated on real-world collected data and demonstrates the ability that even when trained in cheap simulation, SSDS can still find small defections in the real world showing the effectiveness and its potential for practical applications.

preprint2021arXiv

Modeling and Leveraging Analytic Focus During Exploratory Visual Analysis

Visual analytics systems enable highly interactive exploratory data analysis. Across a range of fields, these technologies have been successfully employed to help users learn from complex data. However, these same exploratory visualization techniques make it easy for users to discover spurious findings. This paper proposes new methods to monitor a user's analytic focus during visual analysis of structured datasets and use it to surface relevant articles that contextualize the visualized findings. Motivated by interactive analyses of electronic health data, this paper introduces a formal model of analytic focus, a computational approach to dynamically update the focus model at the time of user interaction, and a prototype application that leverages this model to surface relevant medical publications to users during visual analysis of a large corpus of medical records. Evaluation results with 24 users show that the modeling approach has high levels of accuracy and is able to surface highly relevant medical abstracts.

preprint2021arXiv

RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model

Compared to the onboard camera and laser scanner, radar sensor provides lighting and weather invariant sensing, which is naturally suitable for long-term localization under adverse conditions. However, radar data is sparse and noisy, resulting in challenges for radar mapping. On the other hand, the most popular available map currently is built by lidar. In this paper, we propose an end-to-end deep learning framework for Radar Localization on Lidar Map (RaLL) to bridge the gap, which not only achieves the robust radar localization but also exploits the mature lidar mapping technique, thus reducing the cost of radar mapping. We first embed both sensor modals into a common feature space by a neural network. Then multiple offsets are added to the map modal for exhaustive similarity evaluation against the current radar modal, yielding the regression of the current pose. Finally, we apply this differentiable measurement model to a Kalman Filter (KF) to learn the whole sequential localization process in an end-to-end manner. \textit{The whole learning system is differentiable with the network based measurement model at the front-end and KF at the back-end.} To validate the feasibility and effectiveness, we employ multi-session multi-scene datasets collected from the real world, and the results demonstrate that our proposed system achieves superior performance over $90km$ driving, even in generalization scenarios where the model training is in UK, while testing in South Korea. We also release the source code publicly.

preprint2021arXiv

REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination

Object 6D pose estimation is a fundamental task in many applications. Conventional methods solve the task by detecting and matching the keypoints, then estimating the pose. Recent efforts bringing deep learning into the problem mainly overcome the vulnerability of conventional methods to environmental variation due to the hand-crafted feature design. However, these methods cannot achieve end-to-end learning and good interpretability at the same time. In this paper, we propose REDE, a novel end-to-end object pose estimator using RGB-D data, which utilizes network for keypoint regression, and a differentiable geometric pose estimator for pose error back-propagation. Besides, to achieve better robustness when outlier keypoint prediction occurs, we further propose a differentiable outliers elimination method that regresses the candidate result and the confidence simultaneously. Via confidence weighted aggregation of multiple candidates, we can reduce the effect from the outliers in the final estimation. Finally, following the conventional method, we apply a learnable refinement process to further improve the estimation. The experimental results on three benchmark datasets show that REDE slightly outperforms the state-of-the-art approaches and is more robust to object occlusion.

preprint2021arXiv

The collision frequencies in the plasmas with the power-law q-distributions in nonextensive statistics

We study the collision frequencies of particles in the weakly and highly ionized plasmas with the power-law q-distributions in nonextensive statistics. We derive the average collision frequencies of neutral-neutral particle, electron-neutral particle, ion-neutral particle, electron-electron, ion-ion and electron-ion, respectively, in the q-distributed plasmas. We show that the average collision frequencies depend strongly on the q-parameter in a complex form and thus their properties are significantly different from that in Maxwell-distributed plasmas. These new average collision frequencies are important for us to study accurately the transport property in the complex plasmas with non-Maxwell/power-law velocity distributions.

preprint2021arXiv

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.

preprint2020arXiv

2-Entity RANSAC for robust visual localization in changing environment

Visual localization has attracted considerable attention due to its low-cost and stable sensor, which is desired in many applications, such as autonomous driving, inspection robots and unmanned aerial vehicles. However, current visual localization methods still struggle with environmental changes across weathers and seasons, as there is significant appearance variation between the map and the query image. The crucial challenge in this situation is that the percentage of outliers, i.e. incorrect feature matches, is high. In this paper, we derive minimal closed form solutions for 3D-2D localization with the aid of inertial measurements, using only 2 pairs of point matches or 1 pair of point match and 1 pair of line match. These solutions are further utilized in the proposed 2-entity RANSAC, which is more robust to outliers as both line and point features can be used simultaneously and the number of matches required for pose calculation is reduced. Furthermore, we introduce three feature sampling strategies with different advantages, enabling an automatic selection mechanism. With the mechanism, our 2-entity RANSAC can be adaptive to the environments with different distribution of feature types in different segments. Finally, we evaluate the method on both synthetic and real-world datasets, validating its performance and effectiveness in inter-session scenarios.

preprint2020arXiv

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns. To reduce the dominant data movement cost of training, process in-memory (PIM) has emerged as a promising solution as it alleviates the need to access DNN weights. However, state-of-the-art PIM DNN training accelerators employ either analog/mixed signal computing which has limited precision or digital computing based on a memory technology that supports limited logic functions and thus requires complicated procedure to realize floating point computation. In this paper, we propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision. Specifically, this new accelerator features an innovative (1) SOT-MRAM cell, (2) full addition design, and (3) floating point computation. Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3$\times$, 1.8$\times$, and 2.5$\times$ improvement in terms of energy, latency, and area, respectively, compared with a state-of-the-art PIM based DNN training accelerator.

preprint2020arXiv

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction

Extracting relational triples from unstructured text is crucial for large-scale knowledge graph construction. However, few existing works excel in solving the overlapping triple problem where multiple relational triples in the same sentence share the same entities. In this work, we introduce a fresh perspective to revisit the relational triple extraction task and propose a novel cascade binary tagging framework (CasRel) derived from a principled problem formulation. Instead of treating relations as discrete labels as in previous works, our new framework models relations as functions that map subjects to objects in a sentence, which naturally handles the overlapping problem. Experiments show that the CasRel framework already outperforms state-of-the-art methods even when its encoder module uses a randomly initialized BERT encoder, showing the power of the new tagging framework. It enjoys further performance boost when employing a pre-trained BERT encoder, outperforming the strongest baseline by 17.5 and 30.2 absolute gain in F1-score on two public datasets NYT and WebNLG, respectively. In-depth analysis on different scenarios of overlapping triples shows that the method delivers consistent performance gain across all these scenarios. The source code and data are released online.

preprint2020arXiv

Attachment and antibiotic response of early-stage biofilms studied using resonant hyperspectral imaging

Many bacterial species readily develop biofilms that act as a protective matrix against external challenge, e.g. from antimicrobial treatment. Therefore, biofilms are often responsible for persistent and recurring infections. Established methods for studying biofilms are either destructive or they focus on the biofilm surface. A non-destructive method that is sensitive to the underside of the biofilm is highly desirable, as it allows studying the penetration of antibiotics through the film. Here, we demonstrate that the high surface sensitivity of resonant hyperspectral imaging provides this capability. The method allows us to monitor the early stages of Escherichia coli biofilm formation, cell attachment and microcolony formation, in-situ and in real time. We study the response of the biofilm to a number of different antibiotics and verify our observations using confocal microscopy. Based on this ability to closely monitor the surface-bound cells, resonant hyperspectral imaging gives new insights into the antimicrobial resistance of biofilms.

preprint2020arXiv

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a growing demand for DNN chips. However, designing DNN chips is non-trivial because: (1) mainstream DNNs have millions of parameters and operations; (2) the large design space due to the numerous design choices of dataflows, processing elements, memory hierarchy, etc.; and (3) an algorithm/hardware co-design is needed to allow the same DNN functionality to have a different decomposition, which would require different hardware IPs to meet the application specifications. Therefore, DNN chips take a long time to design and require cross-disciplinary experts. To enable fast and effective DNN chip design, we propose AutoDNNchip - a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation given DNNs from machine learning frameworks (e.g., PyTorch) for a designated application and dataset. Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator&#39;s energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.), optimize chip design via the Chip Predictor, and then generate optimized synthesizable RTL to achieve the target design metrics. Experimental results show that our Chip Predictor&#39;s predicted performance differs from real-measured ones by < 10% when validated using 15 DNN models and 4 platforms (edge-FPGA/TPU/GPU and ASIC). Furthermore, accelerators generated by our AutoDNNchip can achieve better (up to 3.86X improvement) performance than that of expert-crafted state-of-the-art accelerators.

preprint2020arXiv

Constructing Basis Path Set by Eliminating Path Dependency

The way the basis path set works in neural network remains mysterious, and the generalization of newly appeared G-SGD algorithm to more practical network is hindered. The Basis Path Set Searching problem is formulated from the perspective of graph theory, to find the basis path set in a regular complicated neural network. Our paper aims to discover the underlying cause of path dependency between two independent substructures. Algorithm DEAH is designed to solve the Basis Path Set Searching problem by eliminating such path dependency. The path subdivision chain is proposed to effectively eliminate the path dependency inside the chain and between chains. The theoretical proofs and analysis of polynomial time complexity are presented. The paper therefore provides one methodology to find the basis path set in a more general neural network, which offers theoretical and algorithmic support for the application of G-SGD algorithm in more practical scenarios.

preprint2020arXiv

DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

Subgraph listing is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Modern graphs can usually be large-scale as well as highly dynamic, which challenges the efficiency of existing subgraph listing algorithms. Recent works have shown the benefits of partitioning and processing big graphs in a distributed system, however, there is only few work targets subgraph listing on dynamic graphs in a distributed environment. In this paper, we propose an efficient approach, called Distributed and Dynamic Subgraph Listing (DDSL), which can incrementally update the results instead of running from scratch. DDSL follows a general distributed join framework. In this framework, we use a Neighbor-Preserved storage for data graphs, which takes bounded extra space and supports dynamic updating. After that, we propose a comprehensive cost model to estimate the I/O cost of listing subgraphs. Then based on this cost model, we develop an algorithm to find the optimal join tree for a given pattern. To handle dynamic graphs, we propose an efficient left-deep join algorithm to incrementally update the join results. Extensive experiments are conducted on real-world datasets. The results show that DDSL outperforms existing methods in dealing with both static dynamic graphs in terms of the responding time.

preprint2020arXiv

Direct Speech-to-image Translation

Direct speech-to-image translation without text is an interesting and useful topic due to the potential applications in human-computer interaction, art creation, computer-aided design. etc. Not to mention that many languages have no writing form. However, as far as we know, it has not been well-studied how to translate the speech signals into images directly and how well they can be translated. In this paper, we attempt to translate the speech signals into the image signals without the transcription stage. Specifically, a speech encoder is designed to represent the input speech signals as an embedding feature, and it is trained with a pretrained image encoder using teacher-student learning to obtain better generalization ability on new classes. Subsequently, a stacked generative adversarial network is used to synthesize high-quality images conditioned on the embedding feature. Experimental results on both synthesized and real data show that our proposed method is effective to translate the raw speech signals into images without the middle text representation. Ablation study gives more insights about our method.

preprint2020arXiv

Effect of magnetic field on transports of charged particles in the weakly ionized plasma with power-law q-distributions in nonextensive statistics

By using the generalized Boltzmann equation of transport in nonextensive statistics, we study transport properties of the diffusion flux and the heat flux of charged particles in the weakly ionized plasma with the power-law q-distributions under the magnetic field. We derive the tensor expressions of diffusion coefficient, thermal diffusion coefficient, mobility and thermal conductivity of electrons and ions in the q-distributed plasma under magnetic field. We show that the tensors of the diffusion coefficient, the thermal diffusion coefficient and the thermal conductivity are strongly depend on the q-parameters in nonextensive statistics, and so they are generally not the same as those in the magnetized plasma with a Maxwell distribution.

preprint2020arXiv

Enhancing the Erdős-Lovász Tihany Conjecture for graphs with independence number two

Let $s\ge2$ and $t\ge2$ be integers. A graph $G$ is $(s,t)$-\emph{splittable} if $V(G)$ can be partitioned into two sets $S$ and $T$ such that $χ(G[S])\geq s$ and $χ(G[T])\geq t$. The well-known Erdős-Lovász Tihany Conjecture from 1968 states that every graph $G$ whose chromatic number $χ(G)=s+t-1$ is more than its clique number $ω(G)$ is $(s,t)$-splittable. In this paper, we prove an enhanced version of the Erdős-Lovász Tihany Conjecture for graphs with independence number two. That is, for every graph $G$ with $χ(G)=s+t-1>ω(G)+1$ is $(s,t+1)$-splittable. There are examples showing that this result is best possible.

preprint2020arXiv

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

preprint2020arXiv

Generative Temporal Link Prediction via Self-tokenized Sequence Modeling

We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10\% improvements on AUC) among other alternatives.

preprint2020arXiv

Globally optimal consensus maximization for robust visual inertial localization in point and line map

Map based visual inertial localization is a crucial step to reduce the drift in state estimation of mobile robots. The underlying problem for localization is to estimate the pose from a set of 3D-2D feature correspondences, of which the main challenge is the presence of outliers, especially in changing environment. In this paper, we propose a robust solution based on efficient global optimization of the consensus maximization problem, which is insensitive to high percentage of outliers. We first introduce translation invariant measurements (TIMs) for both points and lines to decouple the consensus maximization problem into rotation and translation subproblems, allowing for a two-stage solver with reduced solution dimensions. Then we show that (i) the rotation can be calculated by minimizing TIMs using only 1-dimensional branch-and-bound (BnB), (ii) the translation can be found by running 1-dimensional search for three times with prioritized progressive voting. Compared with the popular randomized solver, our solver achieves deterministic global convergence without depending on an initial value. While compared with existing BnB based methods, ours is exponentially faster. Finally, by evaluating the performance on both simulation and real-world datasets, our approach gives accurate pose even when there are 90\% outliers (only 2 inliers).

preprint2020arXiv

Learning hierarchical behavior and motion planning for autonomous driving

Learning-based driving solution, a new branch for autonomous driving, is expected to simplify the modeling of driving by learning the underlying mechanisms from data. To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution. Due to the coupled action space of behavior and motion, it is challenging to solve HBMP problem using reinforcement learning (RL) for long-horizon driving tasks. We transform HBMP problem by integrating a classical sampling-based motion planner, of which the optimal cost is regarded as the rewards for high-level behavior learning. As a result, this formulation reduces action space and diversifies the rewards without losing the optimality of HBMP. In addition, we propose a sharable representation for input sensory data across simulation platforms and real-world environment, so that models trained in a fast event-based simulator, SUMO, can be used to initialize and accelerate the RL training in a dynamics based simulator, CARLA. Experimental results demonstrate the effectiveness of the method. Besides, the model is successfully transferred to the real-world, validating the generalization capability.

preprint2020arXiv

Learning to fool the speaker recognition

Due to the widespread deployment of fingerprint/face/speaker recognition systems, attacking deep learning based biometric systems has drawn more and more attention. Previous research mainly studied the attack to the vision-based system, such as fingerprint and face recognition. While the attack for speaker recognition has not been investigated yet, although it has been widely used in our daily life. In this paper, we attempt to fool the state-of-the-art speaker recognition model and present \textit{speaker recognition attacker}, a lightweight model to fool the deep speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is also vulnerable to the attack, and we achieve a high success rate on the non-targeted attack. Besides, we also present an effective method to optimize the speaker recognition attacker to obtain a trade-off between the attack success rate with the perceptual quality. Experiments on the TIMIT dataset show that we can achieve a sentence error rate of $99.2\%$ with an average SNR $57.2\text{dB}$ and PESQ 4.2 with speed rather faster than real-time.

preprint2020arXiv

Low Complexity Trellis-Coded Quantization in Versatile Video Coding

The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, in this paper, we propose a low complexity trellis-coded quantization scheme in a scientifically sound way with theoretical modeling of the rate and distortion. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. Extensive experimental results on the VVC test model show that the proposed scheme is effective in reducing the encoding complexity by 11% and 5% with all intra and random access configurations, respectively, at the cost of only 0.11% and 0.05% BD-Rate increase. Meanwhile, on average 24% and 27% quantization time savings can be achieved under all intra and random access configurations. Due to the excellent performance, the VVC test model has adopted one implementation of the proposed scheme.

preprint2020arXiv

Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product

Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we propose a multimodal method to jointly predict product attributes and extract values from textual product descriptions with the help of the product images. We argue that product attributes and values are highly correlated, e.g., it will be easier to extract the values on condition that the product attributes are given. Thus, we jointly model the attribute prediction and value extraction tasks from multiple aspects towards the interactions between attributes and values. Moreover, product images have distinct effects on our tasks for different product attributes and values. Thus, we selectively draw useful visual information from product images to enhance our model. We annotate a multimodal product attribute value dataset that contains 87,194 instances, and the experimental results on this dataset demonstrate that explicitly modeling the relationship between attributes and values facilitates our method to establish the correspondence between them, and selectively utilizing visual product information is necessary for the task. Our code and dataset will be released to the public.

preprint2020arXiv

Multiple populations in low mass globular clusters: Palomar 13

Since the discovery of chemically peculiar stars in globular clusters in the last century, the study of multiple populations has become increasingly important, given that chemical inhomogeneity is found in almost all globular clusters. Despite various proposed theories attempting to explain this phenomenon, fitting all the observational evidence in globular clusters with one single theory remains notoriously difficult and currently unsuccessful. In order to improve existing models and motivate new ones, we are observing globular clusters at critical conditions, e.g., metal-rich end, metal-poor end, and low mass end. In this paper, we present our first attempt to investigate multiple populations in low mass globular clusters. We obtained low-resolution spectra around 4000 A of 30 members of the globular cluster Palomar 13 using OSIRIS/Multi-object spectrograph mounted at the Gran Telescopio Canarias. The membership of red giant branch stars is confirmed by the latest proper motions from Gaia DR2 and literature velocities. After comparing the measured CN and CH spectral indices with those of the stellar models, we found a clear sign of nitrogen variation among the red giant branch stars. Palomar 13 may be the lowest mass globular cluster showing multiple populations.

preprint2020arXiv

Multiple stellar populations at less evolved stages: detection of chemical variations among main-sequence dwarfs in NGC 1978

Multiple stellar populations (MPs) with different chemical compositions are not exclusive features of old GCs (older than 10 Gyr). Indeed, recent studies reveal that younger clusters ($\sim$2--6 Gyr-old) in the Magellanic Clouds also exhibit star-to-star chemical variations among evolved stars. However, whether MPs are present among less evolved dwarfs of these intermediate-age clusters is still unclear. In this work, we search for chemical variations among GK-type dwarfs in the $\sim$2 Gyr-old cluster NGC 1978, which is the youngest cluster with MPs. We exploit deep ultraviolet and visual observations from the Hubble Space Telescope to constrain the nitrogen (N) and oxygen (O) variations among MS stars. To do this, we compare appropriate photometric diagrams that are sensitive to N and O with synthetic diagrams of simple stellar populations and MPs. We conclude that the G- and K-type MS stars in NGC\,1978 host MPs. Our statistical analysis shows that the fraction of N-rich stars ranges from $\sim$40\% to $\sim$80\%, depending on the detailed distributions of nitrogen and oxygen.

preprint2020arXiv

NPENAS: Neural Predictor Guided Evolution for Neural Architecture Search

Neural architecture search (NAS) is a promising method for automatically design neural architectures. NAS adopts a search strategy to explore the predefined search space to find outstanding performance architecture with the minimum searching costs. Bayesian optimization and evolutionary algorithms are two commonly used search strategies, but they suffer from computationally expensive, challenge to implement or inefficient exploration ability. In this paper, we propose a neural predictor guided evolutionary algorithm to enhance the exploration ability of EA for NAS (NPENAS) and design two kinds of neural predictors. The first predictor is defined from Bayesian optimization and we propose a graph-based uncertainty estimation network as a surrogate model that is easy to implement and computationally efficient. The second predictor is a graph-based neural network that directly outputs the performance prediction of the input neural architecture. The NPENAS using the two neural predictors are denoted as NPENAS-BO and NPENAS-NP respectively. In addition, we introduce a new random architecture sampling method to overcome the drawbacks of the existing sampling method. Extensive experiments demonstrate the superiority of NPENAS. Quantitative results on three NAS search spaces indicate that both NPENAS-BO and NPENAS-NP outperform most existing NAS algorithms, with NPENAS-BO achieving state-of-the-art performance on NASBench-201 and NPENAS-NP on NASBench-101 and DARTS, respectively.

preprint2020arXiv

Pillar-based Object Detection for Autonomous Driving

We present a simple and flexible object detection framework optimized for autonomous driving. Building on the observation that point clouds in this application are extremely sparse, we propose a practical pillar-based approach to fix the imbalance issue caused by anchors. In particular, our algorithm incorporates a cylindrical projection into multi-view feature learning, predicts bounding box parameters per pillar rather than per point or per anchor, and includes an aligned pillar-to-point projection module to improve the final prediction. Our anchor-free approach avoids hyperparameter search associated with past methods, simplifying 3D object detection while significantly improving upon state-of-the-art.

preprint2020arXiv

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms. Code is available at: http://github.com/WangYueFt/rfs/.

preprint2020arXiv

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$\times$ and the speedup by up to 19.2$\times$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

preprint2020arXiv

Universal Adversarial Perturbations Generative Network for Speaker Recognition

Attacking deep learning based biometric systems has drawn more and more attention with the wide deployment of fingerprint/face/speaker recognition systems, given the fact that the neural networks are vulnerable to the adversarial examples, which have been intentionally perturbed to remain almost imperceptible for human. In this paper, we demonstrated the existence of the universal adversarial perturbations~(UAPs) for the speaker recognition systems. We proposed a generative network to learn the mapping from the low-dimensional normal distribution to the UAPs subspace, then synthesize the UAPs to perturbe any input signals to spoof the well-trained speaker recognition model with high probability. Experimental results on TIMIT and LibriSpeech datasets demonstrate the effectiveness of our model.

preprint2020arXiv

User-generated Video Quality Assessment: A Subjective and Objective Study

Recently, we have observed an exponential increase of user-generated content (UGC) videos. The distinguished characteristic of UGC videos originates from the video production and delivery chain, as they are usually acquired and processed by non-professional users before uploading to the hosting platforms for sharing. As such, these videos usually undergo multiple distortion stages that may affect visual quality before ultimately being viewed. Inspired by the increasing consensus that the optimization of the video coding and processing shall be fully driven by the perceptual quality, in this paper, we propose to study the quality of the UGC videos from both objective and subjective perspectives. We first construct a UGC video quality assessment (VQA) database, aiming to provide useful guidance for the UGC video coding and processing in the hosting platform. The database contains source UGC videos uploaded to the platform and their transcoded versions that are ultimately enjoyed by end-users, along with their subjective scores. Furthermore, we develop an objective quality assessment algorithm that automatically evaluates the quality of the transcoded videos based on the corrupted reference, which is in accordance with the application scenarios of UGC video sharing in the hosting platforms. The information from the corrupted reference is well leveraged and the quality is predicted based on the inferred quality maps with deep neural networks (DNN). Experimental results show that the proposed method yields superior performance. Both subjective and objective evaluations of the UGC videos also shed lights on the design of perceptual UGC video coding.

preprint2020arXiv

When does the onset of multiple stellar populations in star clusters occur-III: No evidence of significant chemical variations in main-sequence stars of NGC 419

Recent studies have revealed that the onset age for the presence of multiple stellar populations (MPs) in star clusters seems to correspond to the disappearance of the extended main-sequence turnoff (eMSTO) in young clusters, a pattern associated with stellar rotations. A speculative suggestion is that MPs might be caused by the magnetic brake, a stellar evolutionary effect linked to the rotation. In this work, we use the young massive cluster NGC 419 as a testbed. We examined if its magnetically baked MS stars would exhibit MPs. Using the deep ultraviolet and visible images observed through the Hubble Space Telescope, combined with a specific color index that is sensitive to the nitrogen (N) abundance, we examined if its late G- and K-type MS stars are affected by N variation. Our analysis reports that the morphology of its GK-type MS is most likely an SSP, and only a negligible probability that indicates a N variation up to 0.4 dex is present. We conclude that there is no significant N variation among its GK-type MS stars. The absence of a significant chemical variation among the late-type MS stars indicates that MPs might not be a specific pattern of magnetically braked stars.

preprint2020arXiv

When Relation Networks meet GANs: Relation GANs with Triplet Loss

Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on high-resolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator&#39;s capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation.

preprint2018arXiv

Efficient Two-Dimensional Line Spectrum Estimation Based on Decoupled Atomic Norm Minimization

This paper presents an efficient optimization technique for gridless {2-D} line spectrum estimation, named decoupled atomic norm minimization (D-ANM). The framework of atomic norm minimization (ANM) is considered, which has been successfully applied in 1-D problems to allow super-resolution frequency estimation for correlated sources even when the number of snapshots is highly limited. The state-of-the-art 2-D ANM approach vectorizes the 2-D measurements to their 1-D equivalence, which incurs huge computational cost and may become too costly for practical applications. We develop a novel decoupled approach of 2-D ANM via semi-definite programming (SDP), which introduces a new matrix-form atom set to naturally decouple the joint observations in both dimensions without loss of optimality. Accordingly, the original large-scale 2-D problem is equivalently reformulated via two decoupled one-level Toeplitz matrices, which can be solved by simple 1-D frequency estimation with pairing. Compared with the conventional vectorized approach, the proposed D-ANM technique reduces the computational complexity by several orders of magnitude with respect to the problem size. It also retains the benefits of ANM in terms of precise signal recovery, small number of required measurements, and robustness to source correlation. The complexity benefits are particularly attractive for large-scale antenna systems such as massive MIMO, radar signal processing and radio astronomy.

preprint2018arXiv

Low-complexity optimization for Two-Dimensional Direction-of-arrival Estimation via Decoupled Atomic Norm Minimization

This paper presents an efficient optimization technique for super-resolution two-dimensional (2D) direction of arrival (DOA) estimation by introducing a new formulation of atomic norm minimization (ANM). ANM allows gridless angle estimation for correlated sources even when the number of snapshots is far less than the antenna size, yet it incurs huge computational cost in 2D processing. This paper introduces a novel formulation of ANM via semi-definite programming, which expresses the original high-dimensional problem by two decoupled Toeplitz matrices in one dimension, followed by 1D angle estimation with automatic angle pairing. Compared with the state-of-the-art 2D ANM, the proposed technique reduces the computational complexity by several orders of magnitude with respect to the antenna size, while retaining the benefits of ANM in terms of super-resolution performance with use of a small number of measurements, and robustness to source correlation and noise. The complexity benefits are particularly attractive for large-scale antenna systems such as massive MIMO and radio astronomy.