Source author record

Xiao Liu

Xiao Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

96works

43topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2024arXiv

Language-free Compositional Action Generation via Decoupling Refinement

Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation. Existing methods largely rely on extensive neural language annotations to discern composable latent semantics, a process that is often costly and labor-intensive. In this study, we introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement. Action Coupling utilizes an energy model to extract the attention masks of each sub-action, subsequently integrating two actions using these attentions to generate pseudo-training examples. Then, we employ a conditional generative model, CVAE, to learn a latent space, facilitating the diverse generation. Finally, we propose Decoupling Refinement, which leverages a self-supervised pre-trained model MAE to ensure semantic consistency between the sub-actions and compositional actions. This refinement process involves rendering generated 3D actions into 2D space, decoupling these images into two sub-segments, using the MAE model to restore the complete image from sub-segments, and constraining the recovered images to match images rendered from raw sub-actions. Due to the lack of existing datasets containing both sub-actions and compositional actions, we created two new datasets, named HumanAct-C and UESTC-C, and present a corresponding evaluation metric. Both qualitative and quantitative assessments are conducted to show our efficacy.

preprint2023arXiv

Capillary gravity water waves linearized at monotone shear flows: eigenvalues and inviscid damping

This paper is concerned with the eigenvalues and linear inviscid damping of the 2D capillary gravity water waves of finite depth $x_2\in(-h,0)$ linearized at a monotone shear flow $U(x_2)$. Unlike the linearized Euler equation in a fixed channel where eigenvalues exist only in low horizontal wave number $k$, we first prove the linearized capillary gravity wave has two branches of eigenvalues $-ikc^\pm(k)$, where the wave speeds $c^\pm(k)=O(\sqrt{|k|})$ for $|k|\gg1$ have the same asymptotics as the those of the linear irrotational capillary gravity waves. Under the additional assumption of $U"\ne0$, we obtain the complete continuation of these two branches, which are all the eigenvalues in this (and some other) case(s). Particularly $-ikc^-(k)$ could bifurcate into unstable eigenvalues at $c^-(k)=U(-h)$. The bifurcation of unstable eigenvalues from inflection values of $U$ is also proved. Assuming no singular modes, i.e. no embedded eigenvalues for any wave number $k$, linear solutions $(v(t,x),η(t,x_1))$ are studieded in both periodic-in-$x_1$ and $x_1\in R$ cases, where $v$ is the velocity and $η$ the surface profile. Solutions can be split into $(v^p,η^p)$ and $(v^c,η^c)$ whose $k$-th Fourier mode in $x_1$ correspond to the eigenvalues and the continuous spectra of wave number $k$, respectively. The component $(v^p,η^p)$ is governed by a (possibly unstable) dispersion relation given by the eigenvalues, which are simply $k\to-ikc^\pm(k)$ in the case of $x_1\in R$. The other component $(v^c,η^c)$ satisfies the inviscid damping as fast as $|v_1^c|_{L_x^2},|η^c|_{L_x^2}=O(|t|^{-1})$ and $|v_2^c|_{L_x^2}=O(t^{-2})$ as $|t|\gg1$. Additional decay of $tv_1^c,t^2v_2^c$ in $L_x^2L_t^q$, $q\in(2,\infty]$, is obtained after leading asymptotic terms are removed, which are in the forms of $t$-dependent translations in $x_1$ of certain functions of $x$.

preprint2023arXiv

Imperceptible Adversarial Attack via Invertible Neural Networks

Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.

preprint2022arXiv

3D Intracranial Aneurysm Classification and Segmentation via Unsupervised Dual-branch Learning

Intracranial aneurysms are common nowadays and how to detect them intelligently is of great significance in digital health. While most existing deep learning research focused on medical images in a supervised way, we introduce an unsupervised method for the detection of intracranial aneurysms based on 3D point cloud data. In particular, our method consists of two stages: unsupervised pre-training and downstream tasks. As for the former, the main idea is to pair each point cloud with its jittered counterpart and maximise their correspondence. Then we design a dual-branch contrastive network with an encoder for each branch and a subsequent common projection head. As for the latter, we design simple networks for supervised classification and segmentation training. Experiments on the public dataset (IntrA) show that our unsupervised method achieves comparable or even better performance than some state-of-the-art supervised techniques, and it is most prominent in the detection of aneurysmal vessels. Experiments on the ModelNet40 also show that our method achieves the accuracy of 90.79\% which outperforms existing state-of-the-art unsupervised models.

preprint2022arXiv

A Tree-Structured Multi-Task Model Recommender

Tree-structured multi-task architectures have been employed to jointly tackle multiple vision tasks in the context of multi-task learning (MTL). The major challenge is to determine where to branch out for each task given a backbone model to optimize for both task accuracy and computation efficiency. To address the challenge, this paper proposes a recommender that, given a set of tasks and a convolutional neural network-based backbone model, automatically suggests tree-structured multi-task architectures that could achieve a high task performance while meeting a user-specified computation budget without performing model training. Extensive evaluations on popular MTL benchmarks show that the recommended architectures could achieve competitive task accuracy and computation efficiency compared with state-of-the-art MTL methods. Our tree-structured multi-task model recommender is open-sourced and available at https://github.com/zhanglijun95/TreeMTL.

preprint2022arXiv

Automatic Facial Paralysis Estimation with Facial Action Units

Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection technique is incorporated into our model, where AUs refer to a unique set of facial muscle movements used to describe almost every anatomically possible facial expression. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity. ALGRNet mainly consists of three main novel structures: (i) an adaptive region learning module that learns the adaptive muscle regions based on the detected landmarks; (ii) a skip-BiLSTM that models the latent relationships among local AUs; and (iii) a feature fusion&refining module that investigates the complementary between the local and global face. Quantitative results on two AU benchmarks, i.e., BP4D and DISFA, demonstrate our ALGRNet can achieve promising AU detection accuracy. We further demonstrate the effectiveness of its application to facial paralysis estimation by migrating ALGRNet to a facial paralysis dataset collected and annotated by medical professionals.

preprint2022arXiv

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

We utilize an offline reinforcement learning (RL) model for sequential targeted promotion in the presence of budget constraints in a real-world business environment. In our application, the mobile app aims to boost customer retention by sending cash bonuses to customers and control the costs of such cash bonuses during each time period. To achieve the multi-task goal, we propose the Budget Constrained Reinforcement Learning for Sequential Promotion (BCRLSP) framework to determine the value of cash bonuses to be sent to users. We first find out the target policy and the associated Q-values that maximizes the user retention rate using an RL model. A linear programming (LP) model is then added to satisfy the constraints of promotion costs. We solve the LP problem by maximizing the Q-values of actions learned from the RL model given the budget constraints. During deployment, we combine the offline RL model with the LP model to generate a robust policy under the budget constraints. Using both online and offline experiments, we demonstrate the efficacy of our approach by showing that BCRLSP achieves a higher long-term customer retention rate and a lower cost than various baselines. Taking advantage of the near real-time cost control method, the proposed framework can easily adapt to data with a noisy behavioral policy and/or meet flexible budget constraints.

preprint2022arXiv

Convolutional Non-homogeneous Poisson Process with Application to Wildfire Risk Quantification for Power Delivery Networks

The current projection shows that much of the continental U.S. will have significantly hotter and drier days in the following decades, leading to more wildfire hazards that threaten the safety of power grid. Unfortunately, the U.S. power industry is not well prepared and still predominantly relies on empirical fire indices which do not consider the full spectrum of dynamic environmental factors. This paper proposes a new spatio-temporal point process model, Convolutional Non-homogeneous Poisson Process (cNHPP), to quantify wildfire risks for power delivery networks. The proposed model captures both the current short-term and cumulative long-term effects of covariates on wildfire risks, and the spatio-temporal dependency among different segments of the power delivery network. The computation and interpretation of the intensity function are thoroughly investigated, and the connection between cNHPP and Recurrent Neural Network is also discussed. We apply the proposed approach to estimate wildfire risks on major transmission lines in California, utilizing historical fire data, meteorological and vegetation data obtained from the National Oceanic and Atmospheric Administration and National Aeronautics and Space Administration. Comparison studies are performed to show the applicability and predictive capability of the proposed approach. Useful insights are obtained that potentially enhance power grid resilience against wildfires.

preprint2022arXiv

Developing NOMA to Next Generation Multiple Access (NGMA): Future Vision and Research Opportunities

As a prominent member of the next generation multiple access (NGMA) family, non-orthogonal multiple access (NOMA) has been recognized as a promising multiple access candidate for the sixth-generation (6G) networks. This article focuses on applying NOMA in 6G networks, with an emphasis on proposing the so-called "One Basic Principle plus Four New" concept. Starting with the basic NOMA principle, the importance of successive interference cancellation (SIC) becomes evident. In particular, the advantages and drawbacks of both the channel state information based SIC and quality-of-service based SIC are discussed. Then, the application of NOMA to meet the new 6G performance requirements, especially for massive connectivity, is explored. Furthermore, the integration of NOMA with new physical layer techniques is considered, followed by introducing new application scenarios for NOMA towards 6G. Finally, the application of machine learning in NOMA networks is investigated, ushering in the machine learning empowered NGMA era.

preprint2022arXiv

Gibbs Phenomenon Suppression in PDE-Based Statistical Spatio-Temporal Models

A class of physics-informed spatio-temporal models has recently been proposed for modeling spatio-temporal processes governed by advection-diffusion equations. The central idea is to approximate the process by a truncated Fourier series and let the governing physics determine the dynamics of the spectral coefficients. However, because many spatio-temporal processes in real applications are non-periodic with boundary discontinuities, the well-known Gibbs phenomenon and ripple artifact almost always exist in the outputs generated by such models due to truncation of the Fourier series. Hence, the key contribution of this paper is to propose a physics-informed spatio-temporal modeling approach that significantly suppresses the Gibbs phenomenon when modeling spatio-temporal advection-diffusion processes. The proposed approach starts with a data flipping procedure for the process respectively along the horizontal and vertical directions (as if we were unfolding a piece of paper that has been folded twice along the two directions). Because the flipped process becomes spatially periodic and has a complete waveform without any boundary discontinuities, the Gibbs phenomenon disappears even if the Fourier series is truncated. Then, for the flipped process and given the Partial Differential Equation (PDE) that governs the process, this paper extends an existing PDE-based spatio-temporal model by obtaining the new temporal dynamics of the spectral coefficients, while maintaining the physical interpretation of the flipped process. Numerical investigations based on a real dataset have been performed to demonstrate the advantages of the proposed approach. It is found that the proposed approach effectively suppresses the Gibbs Phenomenon and significantly reduces the ripple artifact in modeling spatio-temporal advection-diffusion processes. Computer code is available on GitHub.

preprint2022arXiv

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1.25x parameters of BERT Large , demonstrating its generalizability to different downstream tasks.

preprint2022arXiv

GraphMAE: Self-Supervised Masked Graph Autoencoders

Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL has seen emerging success in natural language processing and other AI fields, such as the wide adoption of BERT and GPT. Despite this, contrastive learning-which heavily relies on structural data augmentation and complicated training strategies-has been the dominant approach in graph SSL, while the progress of generative SSL on graphs, especially graph autoencoders (GAEs), has thus far not reached the potential as promised in other fields. In this paper, we identify and examine the issues that negatively impact the development of GAEs, including their reconstruction objective, training robustness, and error metric. We present a masked graph autoencoder GraphMAE that mitigates these issues for generative self-supervised graph pretraining. Instead of reconstructing graph structures, we propose to focus on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE-a simple graph autoencoder with careful designs-can consistently generate outperformance over both contrastive and generative state-of-the-art baselines. This study provides an understanding of graph autoencoders and demonstrates the potential of generative self-supervised pre-training on graphs.

preprint2022arXiv

HSIC-InfoGAN: Learning Unsupervised Disentangled Representations by Maximising Approximated Mutual Information

Learning disentangled representations requires either supervision or the introduction of specific model designs and learning constraints as biases. InfoGAN is a popular disentanglement framework that learns unsupervised disentangled representations by maximising the mutual information between latent representations and their corresponding generated images. Maximisation of mutual information is achieved by introducing an auxiliary network and training with a latent regression loss. In this short exploratory paper, we study the use of the Hilbert-Schmidt Independence Criterion (HSIC) to approximate mutual information between latent representation and image, termed HSIC-InfoGAN. Directly optimising the HSIC loss avoids the need for an additional auxiliary network. We qualitatively compare the level of disentanglement in each model, suggest a strategy to tune the hyperparameters of HSIC-InfoGAN, and discuss the potential of HSIC-InfoGAN for medical applications.

preprint2022arXiv

Instability and spectrum of the linearized two-phase fluids interface problem at shear flows

This paper is concerned with the 2-dim two-phase interface Euler equation linearized at a pair of monotone shear flows in both fluids. We extend the Howard's Semicircle Theorem and study the eigenvalue distribution of the linearized Euler system. Under certain conditions, there are exactly two eigenvalues for each fixed wave number $k\in \mathbb{R}$ in the whole complex plane. We provide sufficient conditions for spectral instability arising from some boundary values of the shear flow velocity. A typical mode is the ocean-air system in which the density ratio of the fluids is sufficiently small. We give a complete picture of eigenvalue distribution for a certain class of shear flows in the ocean-air system.

preprint2022arXiv

Keeping Minimal Experience to Achieve Efficient Interpretable Policy Distillation

Although deep reinforcement learning has become a universal solution for complex control tasks, its real-world applicability is still limited because lacking security guarantees for policies. To address this problem, we propose Boundary Characterization via the Minimum Experience Retention (BCMER), an end-to-end Interpretable Policy Distillation (IPD) framework. Unlike previous IPD approaches, BCMER distinguishes the importance of experiences and keeps a minimal but critical experience pool with almost no loss of policy similarity. Specifically, the proposed BCMER contains two basic steps. Firstly, we propose a novel multidimensional hyperspheres intersection (MHI) approach to divide experience points into boundary points and internal points, and reserve the crucial boundary points. Secondly, we develop a nearest-neighbor-based model to generate robust and interpretable decision rules based on the boundary points. Extensive experiments show that the proposed BCMER is able to reduce the amount of experience to 1.4%~19.1% (when the count of the naive experiences is 10k) and maintain high IPD performance. In general, the proposed BCMER is more suitable for the experience storage limited regime because it discovers the critical experience and eliminates redundant experience.

preprint2022arXiv

Learning Disentangled Representations in the Imaging Domain

Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations. We survey applications in medical imaging emphasising choices made in exemplar key works, and then discuss links to computer vision applications. We conclude by presenting limitations, challenges, and opportunities.

preprint2022arXiv

Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries

Knowledge graph (KG) embeddings have been a mainstream approach for reasoning over incomplete KGs. However, limited by their inherently shallow and static architectures, they can hardly deal with the rising focus on complex logical queries, which comprise logical operators, imputed edges, multiple source entities, and unknown intermediate entities. In this work, we present the Knowledge Graph Transformer (kgTransformer) with masked pre-training and fine-tuning strategies. We design a KG triple transformation method to enable Transformer to handle KGs, which is further strengthened by the Mixture-of-Experts (MoE) sparse activation. We then formulate the complex logical queries as masked prediction and introduce a two-stage masked pre-training strategy to improve transferability and generalizability. Extensive experiments on two benchmarks demonstrate that kgTransformer can consistently outperform both KG embedding-based baselines and advanced encoders on nine in-domain and out-of-domain reasoning tasks. Additionally, kgTransformer can reason with explainability via providing the full reasoning paths to interpret given answers.

preprint2022arXiv

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

Large-scale Bundle Adjustment (BA) requires massive memory and computation resources which are difficult to be fulfilled by existing BA libraries. In this paper, we propose MegBA, a GPU-based distributed BA library. MegBA can provide massive aggregated memory by automatically partitioning large BA problems, and assigning the solvers of sub-problems to parallel nodes. The parallel solvers adopt distributed Precondition Conjugate Gradient and distributed Schur Elimination, so that an effective solution, which can match the precision of those computed by a single node, can be efficiently computed. To accelerate BA computation, we implement end-to-end BA computation using high-performance primitives available on commodity GPUs. MegBA exposes easy-to-use APIs that are compatible with existing popular BA libraries. Experiments show that MegBA can significantly outperform state-of-the-art BA libraries: Ceres (41.45$\times$), RootBA (64.576$\times$) and DeepLM (6.769$\times$) in several large-scale BA benchmarks. The code of MegBA is available at https://github.com/MegviiRobot/MegBA.

preprint2022arXiv

MegLoc: A Robust and Accurate Visual Localization Pipeline

In this paper, we present a visual localization pipeline, namely MegLoc, for robust and accurate 6-DoF pose estimation under varying scenarios, including indoor and outdoor scenes, different time across a day, different seasons across a year, and even across years. MegLoc achieves state-of-the-art results on a range of challenging datasets, including winning the Outdoor and Indoor Visual Localization Challenge of ICCV 2021 Workshop on Long-term Visual Localization under Changing Conditions, as well as the Re-localization Challenge for Autonomous Driving of ICCV 2021 Workshop on Map-based Localization for Autonomous Driving.

preprint2022arXiv

Multiloop corrections for collider processes using auxiliary mass flow

With a key improvement, the auxiliary mass flow method is now able to compute Feynman integrals encountered in cutting-edge collider processes. We have successfully applied it to compute some integrals involved in two-loop electroweak corrections to $e^+e^-\to HZ$, two-loop QCD corrections to $3j$, $W/Z/H+2j$, $t\bar{t}H$ and $4j$ production at hadron colliders, and three-loop QCD corrections to $t\bar{t}$ production at hadron colliders, all of which are crucial for precision frontier in collider physics in the following decade. Our results are important building blocks and benchmarks for future studies of these processes.

preprint2022arXiv

NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing

Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurring failure of clinical trials, and a lack of early diagnosis, the mortality rate is 100%. Information in electronic health records (EHR) can provide vital clues for early detection of CI, but a manual review by experts is tedious and error prone. Several computational methods have been proposed, however, they lack an enhanced understanding of the linguistic context in complex language structures of EHR. Therefore, I propose a novel and more accurate framework, NeuraHealth, to identify patients who had no earlier diagnosis. In NeuraHealth, using patient EHR from Mass General Brigham BioBank, I fine-tuned a bi-directional attention-based deep learning natural language processing model to classify sequences. The sequence predictions were used to generate structured features as input for a patient level regularized logistic regression model. This two-step framework creates high dimensionality, outperforming all existing state-of-the-art computational methods as well as clinical methods. Further, I integrate the models into a real-world product, a web app, to create an automated EHR screening pipeline for scalable and high-speed discovery of undetected CI in EHR, making early diagnosis viable in medical facilities and in regions with scarce health services.

preprint2022arXiv

On Curating Responsible and Representative Healthcare Video Recommendations for Patient Education and Health Literacy: An Augmented Intelligence Approach

Studies suggest that one in three US adults use the Internet to diagnose or learn about a health concern. However, such access to health information online could exacerbate the disparities in health information availability and use. Health information seeking behavior (HISB) refers to the ways in which individuals seek information about their health, risks, illnesses, and health-protective behaviors. For patients engaging in searches for health information on digital media platforms, health literacy divides can be exacerbated both by their own lack of knowledge and by algorithmic recommendations, with results that disproportionately impact disadvantaged populations, minorities, and low health literacy users. This study reports on an exploratory investigation of the above challenges by examining whether responsible and representative recommendations can be generated using advanced analytic methods applied to a large corpus of videos and their metadata on a chronic condition (diabetes) from the YouTube social media platform. The paper focusses on biases associated with demographic characters of actors using videos on diabetes that were retrieved and curated for multiple criteria such as encoded medical content and their understandability to address patient education and population health literacy needs. This approach offers an immense opportunity for innovation in human-in-the-loop, augmented-intelligence, bias-aware and responsible algorithmic recommendations by combining the perspectives of health professionals and patients into a scalable and generalizable machine learning framework for patient empowerment and improved health outcomes.

preprint2022arXiv

Online non-convex learning for river pollution source identification

In this paper, novel gradient-based online learning algorithms are developed to investigate an important environmental application: real-time river pollution source identification, which aims at estimating the released mass, location, and time of a river pollution source based on downstream sensor data monitoring the pollution concentration. The pollution is assumed to be instantaneously released once. The problem can be formulated as a non-convex loss minimization problem in statistical learning, and our online algorithms have vectorized and adaptive step sizes to ensure high estimation accuracy in three dimensions which have different magnitudes. In order to keep the algorithm from stucking to the saddle points of non-convex loss, the escaping from saddle points module and multi-start setting are derived to further improve the estimation accuracy by searching for the global minimizer of the loss functions. This can be shown theoretically and experimentally as the $O(N)$ local regret of the algorithms and the high probability cumulative regret bound $O(N)$ under a particular error bound condition in loss functions. A real-life river pollution source identification example shows the superior performance of our algorithms compared to existing methods in terms of estimation accuracy. Managerial insights for the decision maker to use the algorithms are also provided.

preprint2022arXiv

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence labeling tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks. It matches the performance of finetuning while having only 0.1%-3% tuned parameters. Our method P-Tuning v2 is an implementation of Deep Prompt Tuning \cite{li2021prefix,qin2021learning} optimized and adapted for NLU. Given the universality and simplicity of P-Tuning v2, we believe it can serve as an alternative to finetuning and a strong baseline for future research.Our code and data are released at https://github.com/THUDM/P-tuning-v2.

preprint2022arXiv

Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers

Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved comparable performance to fine-tuning of the full parameter set on both language understanding and generation tasks. In this work, we study the problem of prompt tuning for neural text retrievers. We introduce parameter-efficient prompt tuning for text retrieval across in-domain, cross-domain, and cross-topic settings. Through an extensive analysis, we show that the strategy can mitigate the two issues -- parameter-inefficiency and weak generalizability -- faced by fine-tuning based retrieval methods. Notably, it can significantly improve the out-of-domain zero-shot generalization of the retrieval models. By updating only 0.1% of the model parameters, the prompt tuning strategy can help retrieval models achieve better generalization performance than traditional methods in which all parameters are updated. Finally, to facilitate research on retrievers' cross-topic generalizability, we curate and release an academic retrieval dataset with 18K query-results pairs in 87 topics, making it the largest topic-specific one to date.

preprint2022arXiv

Physics-Informed Statistical Modeling for Wildfire Aerosols Process Using Multi-Source Geostationary Satellite Remote-Sensing Data Streams

Increasingly frequent wildfires significantly affect solar energy production as the atmospheric aerosols generated by wildfires diminish the incoming solar radiation to the earth. Atmospheric aerosols are measured by Aerosol Optical Depth (AOD), and AOD data streams can be retrieved and monitored by geostationary satellites. However, multi-source remote-sensing data streams often present heterogeneous characteristics, including different data missing rates, measurement errors, systematic biases, and so on. To accurately estimate and predict the underlying AOD propagation process, there exist practical needs and theoretical interests to propose a physics-informed statistical approach for modeling wildfire AOD propagation by simultaneously utilizing, or fusing, multi-source heterogeneous satellite remote-sensing data streams. Leveraging a spectral approach, the proposed approach integrates multi-source satellite data streams with a fundamental advection-diffusion equation that governs the AOD propagation process. A bias correction process is included in the statistical model to account for the bias of the physics model and the truncation error of the Fourier series. The proposed approach is applied to California wildfires AOD data streams obtained from the National Oceanic and Atmospheric Administration. Comprehensive numerical examples are provided to demonstrate the predictive capabilities and model interpretability of the proposed approach. Computer code has been made available on GitHub.

preprint2022arXiv

Privacy-Preserving Personalized Fitness Recommender System (P3FitRec): A Multi-level Deep Learning Approach

Recommender systems have been successfully used in many domains with the help of machine learning algorithms. However, such applications tend to use multi-dimensional user data, which has raised widespread concerns about the breach of users privacy. Meanwhile, wearable technologies have enabled users to collect fitness-related data through embedded sensors to monitor their conditions or achieve personalized fitness goals. In this paper, we propose a novel privacy-aware personalized fitness recommender system. We introduce a multi-level deep learning framework that learns important features from a large-scale real fitness dataset that is collected from wearable IoT devices to derive intelligent fitness recommendations. Unlike most existing approaches, our approach achieves personalization by inferring the fitness characteristics of users from sensory data and thus minimizing the need for explicitly collecting user identity or biometric information, such as name, age, height, weight. In particular, our proposed models and algorithms predict (a) personalized exercise distance recommendations to help users to achieve target calories, (b) personalized speed sequence recommendations to adjust exercise speed given the nature of the exercise and the chosen route, and (c) personalized heart rate sequence to guide the user of the potential health status for future exercises. Our experimental evaluation on a real-world Fitbit dataset demonstrated high accuracy in predicting exercise distance, speed sequence, and heart rate sequence compared to similar studies. Furthermore, our approach is novel compared to existing studies as it does not require collecting and using users sensitive information, and thus it preserves the users privacy.

preprint2022arXiv

Regression Trees on Grassmann Manifold for Adapting Reduced-Order Models

Low dimensional and computationally less expensive Reduced-Order Models (ROMs) have been widely used to capture the dominant behaviors of high-dimensional systems. A ROM can be obtained, using the well-known Proper Orthogonal Decomposition (POD), by projecting the full-order model to a subspace spanned by modal basis modes which are learned from experimental, simulated or observational data, i.e., training data. However, the optimal basis can change with the parameter settings. When a ROM, constructed using the POD basis obtained from training data, is applied to new parameter settings, the model often lacks robustness against the change of parameters in design, control, and other real-time operation problems. This paper proposes to use regression trees on Grassmann Manifold to learn the mapping between parameters and POD bases that span the low-dimensional subspaces onto which full-order models are projected. Motivated by the fact that a subspace spanned by a POD basis can be viewed as a point in the Grassmann manifold, we propose to grow a tree by repeatedly splitting the tree node to maximize the Riemannian distance between the two subspaces spanned by the predicted POD bases on the left and right daughter nodes. Five numerical examples are presented to comprehensively demonstrate the performance of the proposed method, and compare the proposed tree-based method to the existing interpolation method for POD basis and the use of global POD basis. The results show that the proposed tree-based method is capable of establishing the mapping between parameters and POD bases, and thus adapt ROMs for new parameters.

preprint2022arXiv

Rethinking Hard-Parameter Sharing in Multi-Domain Learning

Hard parameter sharing in multi-domain learning (MDL) allows domains to share some of the model parameters to reduce storage cost while improving prediction accuracy. One common sharing practice is to share the bottom layers of a deep neural network among domains while using separate top layers for each domain. In this work, we revisit this common practice via an empirical study on image classification tasks from a diverse set of visual domains and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds with different experimental settings. (2) A multi-domain model with a small proportion of domain-specific parameters from bottom layers can achieve competitive performance with independent models trained on each domain separately. Our observations suggest that people adopt the new strategy of using separate bottom-layer parameters as a stronger baseline for model design in MDL.

preprint2022arXiv

Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis

Semantic image synthesis is a challenging task with many practical applications. Albeit remarkable progress has been made in semantic image synthesis with spatially-adaptive normalization and existing methods normalize the feature activations under the coarse-level guidance (e.g., semantic class). However, different parts of a semantic object (e.g., wheel and window of car) are quite different in structures and textures, making blurry synthesis results usually inevitable due to the missing of fine-grained guidance. In this paper, we propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL), for introducing pixel level fine-grained guidance to the normalization architecture. Specifically, we first present a retrieval paradigm by finding a content patch of the same semantic class from training set with the most similar shape to each test semantic mask. Then, RESAIL is presented to use the retrieved patch for guiding the feature normalization of corresponding region, and can provide pixel level fine-grained guidance, thereby greatly mitigating blurry synthesis results. Moreover, distorted ground-truth images are also utilized as alternatives of retrieval-based guidance for feature normalization, further benefiting model training and improving visual quality of generated images. Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation. The source code and pre-trained models will be publicly available.

preprint2022arXiv

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Entity alignment, aiming to identify equivalent entities across different knowledge graphs (KGs), is a fundamental problem for constructing Web-scale KGs. Over the course of its development, the label supervision has been considered necessary for accurate alignments. Inspired by the recent progress of self-supervised learning, we explore the extent to which we can get rid of supervision for entity alignment. Commonly, the label information (positive entity pairs) is used to supervise the process of pulling the aligned entities in each positive pair closer. However, our theoretical analysis suggests that the learning of entity alignment can actually benefit more from pushing unlabeled negative pairs far away from each other than pulling labeled positive pairs close. By leveraging this discovery, we develop the self-supervised learning objective for entity alignment. We present SelfKG with efficient strategies to optimize this objective for aligning entities without label supervision. Extensive experiments on benchmark datasets demonstrate that SelfKG without supervision can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG suggests that self-supervised learning offers great potential for entity alignment in KGs. The code and data are available at https://github.com/THUDM/SelfKG.

preprint2022arXiv

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. However, these methods adhere to the paradigm that the prediction is made "from one character to another", which inevitably yields prediction errors due to the complicated structures of mathematical expressions or crabbed handwritings. In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network. Specifically, we present a set of grammar rules for converting the LaTeX markup sequence of each expression into a parsing tree; then, we model the markup sequence prediction as a tree traverse process with a deep neural network. In this way, the proposed method can effectively describe the syntax context of expressions, alleviating the structure prediction errors of HMER. Experiments on three benchmark datasets demonstrate that our method achieves better recognition performance than prior arts. To further validate the effectiveness of our method, we create a large-scale dataset consisting of 100k handwritten mathematical expression images acquired from ten thousand writers. The source code, new dataset, and pre-trained models of this work will be publicly available.

preprint2022arXiv

Temperature effects on the structure and mechanical properties of vapor deposited a-SiO2

Amorphous silica (a-SiO2) exhibits unique thermo-mechanical behaviors that set it apart from other glasses. However, there is still limited understanding of how this mechanical behavior is related to the atomic structure and to the preparation conditions of a-SiO2. Here, we used electron beam (e-beam) physical vapor deposition (PVD) to prepare a series of a-SiO2 films grown at different substrate temperatures and then combined molecular simulations with Positronium Annihilation Lifetime Spectroscopy and nanoindentation experiments to establish relations among processing, structure, and mechanical response of the films. Specifically, we found that increase in the growth temperature leads to increase in the elastic moduli and hardness of the films. The relative porosity in the films also increases while the a-SiO2 network itself becomes denser, resulting in an overall increase in density despite increased porosity. In addition, we found that the a-SiO2 films exhibit the same anomalous temperature dependence of elastic modulus as bulk a-SiO2. However, the rate of increase in the elastic modulus with the measurement temperature was found to depend on the density of the a-SiO2 network and therefore on the growth temperature. Our findings provide new insights into the influence of the atomic network structure on the anomalous thermomechanical behavior of a-SiO2 and in turn guidance to control the mechanical properties of a-SiO2 films.

preprint2022arXiv

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals

Spatial commonsense, the knowledge about spatial position and relationship between objects (like the relative size of a lion and a girl, and the position of a boy relative to a bicycle when cycling), is an important part of commonsense knowledge. Although pretrained language models (PLMs) succeed in many NLP tasks, they are shown to be ineffective in spatial commonsense reasoning. Starting from the observation that images are more likely to exhibit spatial commonsense than texts, we explore whether models with visual signals learn more spatial commonsense than text-based PLMs. We propose a spatial commonsense benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions. We probe PLMs and models with visual signals, including vision-language pretrained models and image synthesis models, on this benchmark, and find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models. The spatial knowledge from image synthesis models also helps in natural language understanding tasks that require spatial commonsense.

preprint2022arXiv

Unbox the Blackbox: Predict and Interpret YouTube Viewership Using Deep Learning

Predicting video viewership is a top priority for content creators and video-sharing sites. Content creators live on such predictions to maximize influences and minimize budgets. Video-sharing sites rely on this prediction to promote credible videos and curb violative videos. Although deep learning champions viewership prediction, it lacks interpretability, which is fundamental to increasing the adoption of predictive models and prescribing measurements to improve viewership. Following the design-science paradigm, we propose a novel interpretable IT system, Precise Wide and Deep Learning (PrecWD), to precisely interpret viewership prediction. Improving upon state-of-the-art frameworks, PrecWD offers precise feature effects and designs an unstructured component. PrecWD outperforms benchmarks in two contexts: health video viewership prediction and misinformation viewership prediction. A user study confirms the superior interpretability of PrecWD. This study contributes to IS design theory with generalizable design principles and an interpretable predictive framework. Our findings provide implications to improve video viewership and credibility.

preprint2022arXiv

vMFNet: Compositionality Meets Domain-generalised Segmentation

Training medical image segmentation models usually requires a large amount of labeled data. By contrast, humans can quickly learn to accurately recognise anatomy of interest from medical (e.g. MRI and CT) images with some limited guidance. Such recognition ability can easily generalise to new images from different clinical centres. This rapid and generalisable learning ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in medical image segmentation. In this paper, we model the compositional components (i.e. patterns) of human anatomy as learnable von-Mises-Fisher (vMF) kernels, which are robust to images collected from different domains (e.g. clinical centres). The image features can be decomposed to (or composed by) the components with the composing operations, i.e. the vMF likelihoods. The vMF likelihoods tell how likely each anatomical part is at each position of the image. Hence, the segmentation mask can be predicted based on the vMF likelihoods. Moreover, with a reconstruction module, unlabeled data can also be used to learn the vMF kernels and likelihoods by recombining them to reconstruct the input image. Extensive experiments show that the proposed vMFNet achieves improved generalisation performance on two benchmarks, especially when annotations are limited. Code is publicly available at: https://github.com/vios-s/vMFNet.

preprint2022arXiv

What is Healthy? Generative Counterfactual Diffusion for Lesion Localization

Reducing the requirement for densely annotated masks in medical image segmentation is important due to cost constraints. In this paper, we consider the problem of inferring pixel-level predictions of brain lesions by only using image-level labels for training. By leveraging recent advances in generative diffusion probabilistic models (DPM), we synthesize counterfactuals of "How would a patient appear if X pathology was not present?". The difference image between the observed patient state and the healthy counterfactual can be used for inferring the location of pathology. We generate counterfactuals that correspond to the minimal change of the input such that it is transformed to healthy domain. This requires training with healthy and unhealthy data in DPMs. We improve on previous counterfactual DPMs by manipulating the generation process with implicit guidance along with attention conditioning instead of using classifiers. Code is available at https://github.com/vios-s/Diff-SCM.

preprint2022arXiv

Why patient data cannot be easily forgotten?

Rights provisioned within data protection regulations, permit patients to request that knowledge about their information be eliminated by data holders. With the advent of AI learned on data, one can imagine that such rights can extent to requests for forgetting knowledge of patient's data within AI models. However, forgetting patients' imaging data from AI models, is still an under-explored problem. In this paper, we study the influence of patient data on model performance and formulate two hypotheses for a patient's data: either they are common and similar to other patients or form edge cases, i.e. unique and rare cases. We show that it is not possible to easily forget patient data. We propose a targeted forgetting approach to perform patient-wise forgetting. Extensive experiments on the benchmark Automated Cardiac Diagnosis Challenge dataset showcase the improved performance of the proposed targeted forgetting approach as opposed to a state-of-the-art method.

preprint2021arXiv

A Novel Graph-based Computation Offloading Strategy for Workflow Applications in Mobile Edge Computing

With the fast development of mobile edge computing (MEC), there is an increasing demand for running complex applications on the edge. These complex applications can be represented as workflows where task dependencies are explicitly specified. To achieve better Quality of Service (QoS), for instance, faster response time and lower energy consumption, computation offloading is widely used in the MEC environment. However, many existing computation offloading strategies only focus on independent computation tasks but overlook the task dependencies. Meanwhile, most of these strategies are based on search algorithms such as particle swarm optimization (PSO), genetic algorithm (GA) which are often time-consuming and hence not suitable for many delay-sensitive complex applications in MEC. Therefore, a highly efficient graph-based strategy was proposed in our recent work but it can only deal with simple workflow applications with linear (namely sequential) structure. For solving these problems, a novel graph-based strategy is proposed for workflow applications in MEC. Specifically, this strategy can deal with complex workflow applications with nonlinear (viz. parallel, selective and iterative) structures. Meanwhile, the offloading decision plan with the lowest energy consumption of the end-device under the deadline constraint can be found by using the graph-based partition technique. We have comprehensively evaluated our strategy using both a real-world case study on a MEC based UAV (Unmanned Aerial Vehicle) delivery system and extensive simulation experiments on the FogWorkflowSim platform for MEC based workflow applications. The evaluation results successfully demonstrate the effectiveness of our proposed strategy and its overall better performance than other representative strategies.

preprint2021arXiv

Artificial Intelligence Driven UAV-NOMA-MEC in Next Generation Wireless Networks

Driven by the unprecedented high throughput and low latency requirements in next-generation wireless networks, this paper introduces an artificial intelligence (AI) enabled framework in which unmanned aerial vehicles (UAVs) use non-orthogonal multiple access (NOMA) and mobile edge computing (MEC) techniques to service terrestrial mobile users (MUs). The proposed framework enables the terrestrial MUs to offload their computational tasks simultaneously, intelligently, and flexibly, thus enhancing their connectivity as well as reducing their transmission latency and their energy consumption. To this end, the fundamentals of this framework are first introduced. Then, a number of communication and AI techniques are proposed to improve the quality of experiences of terrestrial MUs. To this end, federated learning and reinforcement learning are introduced for intelligent task offloading and computing resource allocation. For each learning technique, motivations, challenges, and representative results are introduced. Finally, several key technical challenges and open research issues of the proposed framework are summarized.

preprint2021arXiv

Artificial Intelligence Enhanced Rapid and Efficient Diagnosis of Mycoplasma Pneumoniae Pneumonia in Children Patients

Artificial intelligence methods have been increasingly turning into a potentially powerful tool in the diagnosis and management of diseases. In this study, we utilized logistic regression (LR), decision tree (DT), gradient boosted decision tree (GBDT), support vector machine (SVM), and multilayer perceptron (MLP) as machine learning models to rapidly diagnose the mycoplasma pneumoniae pneumonia (MPP) in children patients. The classification task was carried out after applying the preprocessing procedure to the MPP dataset. The most efficient results are obtained by GBDT. It provides the best performance with an accuracy of 93.7%. In contrast to standard raw feature weighting, the feature importance takes the underlying correlation structure of the features into account. The most crucial feature of GBDT is the "pulmonary infiltrates range" with a score of 0.5925, followed by "cough" (0.0953) and "pleural effusion" (0.0492). We publicly share our full implementation with the dataset and trained models at https://github.com/zhenguonie/2021_AI4MPP.

preprint2021arXiv

Boost-S: Gradient Boosted Trees for Spatial Data and Its Application to FDG-PET Imaging Data

Boosting Trees are one of the most successful statistical learning approaches that involve sequentially growing an ensemble of simple regression trees (i.e., "weak learners"). However, gradient boosted trees are not yet available for spatially correlated data. This paper proposes a new gradient Boosted Trees algorithm for Spatial Data (Boost-S) with covariate information. Boost-S integrates the spatial correlation structure into the classical framework of gradient boosted trees. Each tree is grown by solving a regularized optimization problem, where the objective function involves two penalty terms on tree complexity and takes into account the underlying spatial correlation. A computationally-efficient algorithm is proposed to obtain the ensemble trees. The proposed Boost-S is applied to the spatially-correlated FDG-PET (fluorodeoxyglucose-positron emission tomography) imaging data collected during cancer chemoradiotherapy. Our numerical investigations successfully demonstrate the advantages of the proposed Boost-S over existing approaches for this particular application.

preprint2021arXiv

Diversity in immunogenomics: the value and the challenge

With the advent of high-throughput sequencing technologies, the fields of immunogenomics and adaptive immune receptor repertoire research are facing both opportunities and challenges. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become an increasingly important tool to characterize T and B cell responses in settings of interest. However, the majority of AIRR-seq studies conducted so far were performed in individuals of European ancestry, restricting the ability to identify variation in human adaptive immune responses across populations and limiting their applications. As AIRR-seq studies depend on the ability to assign VDJ sequence reads to the correct germline gene segments, efforts to characterize the genomic loci that encode adaptive immune receptor genes in different populations are urgently needed. The availability of comprehensive germline gene databases and further applications of AIRR-seq studies to individuals of non-European ancestry will substantially enhance our understanding of human adaptive immune responses, promote the development of effective diagnostics and treatments, and eventually advance precision medicine.

preprint2021arXiv

EdgeWorkflowReal: An Edge Computing based Workflow Execution Engine for Smart Systems

Current cloud-based smart systems suffer from weaknesses such as high response latency, limited network bandwidth and the restricted computing power of smart end devices which seriously affect the system's QoS (Quality of Service). Recently, given its advantages of low latency, high bandwidth and location awareness, edge computing has become a promising solution for smart systems. However, the development of edge computing based smart systems is a very challenging job for software developers who do not have the skills for the creation of edge computing environments. The management of edge computing resources and computing tasks is also very challenging. Workflow technology has been widely used in smart systems to automate task and resource management, but there does not yet exist a real-world deployable edge computing based workflow execution engine. To fill this gap, we present EdgeWorkflowReal, an edge computing based workflow execution engine for smart systems. EdgeWorkflowReal supports: 1) automatic creation of a real edge computing environment according to user settings; 2) visualized modelling of edge workflow applications; and 3) automatic deployment, monitoring and performance evaluation of edge workflow applications in a smart system.

preprint2021arXiv

GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent Software

Graph Neural Networks (GNNs) have recently shown to be powerful tools for representing and analyzing graph data. So far GNNs is becoming an increasingly critical role in software engineering including program analysis, type inference, and code representation. In this paper, we introduce GraphGallery, a platform for fast benchmarking and easy development of GNNs based software. GraphGallery is an easy-to-use platform that allows developers to automatically deploy GNNs even with less domain-specific knowledge. It offers a set of implementations of common GNN models based on mainstream deep learning frameworks. In addition, existing GNNs toolboxes such as PyG and DGL can be easily incorporated into the platform. Experiments demonstrate the reliability of implementations and superiority in fast coding. The official source code of GraphGallery is available at https://github.com/EdisonLeeeee/GraphGallery and a demo video can be found at https://youtu.be/mv7Zs1YeaYo.

preprint2021arXiv

Improved Signed Distance Function for 2D Real-time SLAM and Accurate Localization

Accurate mapping and localization are very important for many industrial robotics applications. In this paper, we propose an improved Signed Distance Function (SDF) for both 2D SLAM and pure localization to improve the accuracy of mapping and localization. To achieve this goal, firstly we improved the back-end mapping to build a more accurate SDF map by extending the update range and building free space, etc. Secondly, to get more accurate pose estimation for the front-end, we proposed a new iterative registration method to align the current scan to the SDF submap by removing random outliers of laser scanners. Thirdly, we merged all the SDF submaps to produce an integrated SDF map for highly accurate pure localization. Experimental results show that based on the merged SDF map, a localization accuracy of a few millimeters (5mm) can be achieved globally within the map. We believe that this method is important for mobile robots working in scenarios where high localization accuracy matters.

preprint2021arXiv

Leveraging Planar Regularities for Point Line Visual-Inertial Odometry

With monocular Visual-Inertial Odometry (VIO) system, 3D point cloud and camera motion can be estimated simultaneously. Because pure sparse 3D points provide a structureless representation of the environment, generating 3D mesh from sparse points can further model the environment topology and produce dense mapping. To improve the accuracy of 3D mesh generation and localization, we propose a tightly-coupled monocular VIO system, PLP-VIO, which exploits point features and line features as well as plane regularities. The co-planarity constraints are used to leverage additional structure information for the more accurate estimation of 3D points and spatial lines in state estimator. To detect plane and 3D mesh robustly, we combine both the line features with point features in the detection method. The effectiveness of the proposed method is verified on both synthetic data and public datasets and is compared with other state-of-the-art algorithms.

preprint2021arXiv

Locality-aware Channel-wise Dropout for Occluded Face Recognition

Face recognition remains a challenging task in unconstrained scenarios, especially when faces are partially occluded. To improve the robustness against occlusion, augmenting the training images with artificial occlusions has been proved as a useful approach. However, these artificial occlusions are commonly generated by adding a black rectangle or several object templates including sunglasses, scarfs and phones, which cannot well simulate the realistic occlusions. In this paper, based on the argument that the occlusion essentially damages a group of neurons, we propose a novel and elegant occlusion-simulation method via dropping the activations of a group of neurons in some elaborately selected channel. Specifically, we first employ a spatial regularization to encourage each feature channel to respond to local and different face regions. In this way, the activations affected by an occlusion in a local region are more likely to be located in a single feature channel. Then, the locality-aware channel-wise dropout (LCD) is designed to simulate the occlusion by dropping out the entire feature channel. Furthermore, by randomly dropping out several feature channels, our method can well simulate the occlusion of larger area. The proposed LCD can encourage its succeeding layers to minimize the intra-class feature variance caused by occlusions, thus leading to improved robustness against occlusion. In addition, we design an auxiliary spatial attention module by learning a channel-wise attention vector to reweight the feature channels, which improves the contributions of non-occluded regions. Extensive experiments on various benchmarks show that the proposed method outperforms state-of-the-art methods with a remarkable improvement.

preprint2021arXiv

Reconfigurable Intelligent Surface (RIS) Aided Multi-User Networks: Interplay Between NOMA and RIS

This article focuses on the exploitation of reconfigurable intelligent surfaces (RISs) in multi-user networks employing orthogonal multiple access (OMA) or non-orthogonal multiple access (NOMA), with an emphasis on investigating the interplay between NOMA and RIS. Depending on whether the RIS reflection coefficients can be adjusted only once or multiple times during one transmission, we distinguish between static and dynamic RIS configurations. In particular, the capacity region of RIS aided single-antenna NOMA networks is characterized and compared with the OMA rate region from an information-theoretic perspective, revealing that the dynamic RIS configuration is capacity-achieving. Then, the impact of the RIS deployment location on the performance of different multiple access schemes is investigated, which reveals that asymmetric and symmetric deployment strategies are preferable for NOMA and OMA, respectively. Furthermore, for RIS aided multiple-antenna NOMA networks, three novel joint active and passive beamformer designs are proposed based on both beamformer based and cluster based strategies. Finally, open research problems for RIS-NOMA networks are highlighted.

preprint2021arXiv

Understanding WeChat User Preferences and "Wow" Diffusion

WeChat is the largest social instant messaging platform in China, with 1.1 billion monthly active users. "Top Stories" is a novel friend-enhanced recommendation engine in WeChat, in which users can read articles based on preferences of both their own and their friends. Specifically, when a user reads an article by opening it, the "click" behavior is private. Moreover, if the user clicks the "wow" button, (only) her/his direct connections will be aware of this action/preference. Based on the unique WeChat data, we aim to understand user preferences and "wow" diffusion in Top Stories at different levels. We have made some interesting discoveries. For instance, the "wow" probability of one user is negatively correlated with the number of connected components that are formed by her/his active friends, but the click probability is the opposite. We further study to what extent users' "wow" and click behavior can be predicted from their social connections. To address this problem, we present a hierarchical graph representation learning based model DiffuseGNN, which is capable of capturing the structure-based social observations discovered above. Our experiments show that the proposed method can significantly improve the prediction performance compared with alternative methods.

preprint2021arXiv

Unsupervised Neural Rendering for Image Hazing

Image hazing aims to render a hazy image from a given clean one, which could be applied to a variety of practical applications such as gaming, filming, photographic filtering, and image dehazing. To generate plausible haze, we study two less-touched but challenging problems in hazy image rendering, namely, i) how to estimate the transmission map from a single image without auxiliary information, and ii) how to adaptively learn the airlight from exemplars, i.e., unpaired real hazy images. To this end, we propose a neural rendering method for image hazing, dubbed as HazeGEN. To be specific, HazeGEN is a knowledge-driven neural network which estimates the transmission map by leveraging a new prior, i.e., there exists the structure similarity (e.g., contour and luminance) between the transmission map and the input clean image. To adaptively learn the airlight, we build a neural module based on another new prior, i.e., the rendered hazy image and the exemplar are similar in the airlight distribution. To the best of our knowledge, this could be the first attempt to deeply rendering hazy images in an unsupervised fashion. Comparing with existing haze generation methods, HazeGEN renders the hazy images in an unsupervised, learnable, and controllable manner, thus avoiding the labor-intensive efforts in paired data collection and the domain-shift issue in haze generation. Extensive experiments show the promising performance of our method comparing with some baselines in both qualitative and quantitative comparisons. The code will be released on GitHub after acceptance.

preprint2020arXiv

An end-to-end CNN framework for polarimetric vision tasks based on polarization-parameter-constructing network

Pixel-wise operations between polarimetric images are important for processing polarization information. For the lack of such operations, the polarization information cannot be fully utilized in convolutional neural network(CNN). In this paper, a novel end-to-end CNN framework for polarization vision tasks is proposed, which enables the networks to take full advantage of polarimetric images. The framework consists of two sub-networks: a polarization-parameter-constructing network (PPCN) and a task network. PPCN implements pixel-wise operations between images in the CNN form with 1x1 convolution kernels. It takes raw polarimetric images as input, and outputs polarization-parametric images to task network so as to complete a vison task. By training together, the PPCN can learn to provide the most suitable polarization-parametric images for the task network and the dataset. Taking faster R-CNN as task network, the experimental results show that compared with existing methods, the proposed framework achieves much higher mean-average-precision (mAP) in object detection task

preprint2020arXiv

Artificial Intelligence Aided Next-Generation Networks Relying on UAVs

Artificial intelligence (AI) assisted unmanned aerial vehicle (UAV) aided next-generation networking is proposed for dynamic environments. In the AI-enabled UAV-aided wireless networks (UAWN), multiple UAVs are employed as aerial base stations, which are capable of rapidly adapting to the dynamic environment by collecting information about the users' position and tele-traffic demands, learning from the environment and acting upon the feedback received from the users. Moreover, AI enables the interaction amongst a swarm of UAVs for cooperative optimization of the system. As a benefit of the AI framework, several challenges of conventional UAWN may be circumvented, leading to enhanced network performance, improved reliability and agile adaptivity. As a further benefit, dynamic trajectory design and resource allocation are demonstrated. Finally, potential research challenges and opportunities are discussed.

preprint2020arXiv

Calculation of Feynman loop integration and phase-space integration via auxiliary mass flow

We extend the auxiliary-mass-flow (AMF) method originally developed for Feynman loop integration to calculate integrals involving also phase-space integration. Flow of the auxiliary mass from the boundary ($\infty$) to the physical point ($0^+$) is obtained by numerically solving differential equations with respective to the auxiliary mass. For problems with two or more kinematical invariants, the AMF method can be combined with traditional differential equation method by providing systematical boundary conditions and highly nontrivial self-consistent check. The method is described in detail with a pedagogical example of $e^+e^-\rightarrow γ^* \rightarrow t\bar{t}+X$ at NNLO. We show that the AMF method can systematically and efficiently calculate integrals to high precision.

preprint2020arXiv

Complete reduction of integrals in two-loop five-light-parton scattering amplitudes

We reduce all the most complicated Feynman integrals in two-loop five-light-parton scattering amplitudes to basic master integrals, while other integrals can be reduced even easier. Our results are expressed as systems of linear relations in the block-triangular form, very efficient for numerical calculations. Our results are crucial for complete next-to-next-to-leading order quantum chromodynamics calculations for three-jet, photon, and/or hadron production at hadron colliders. To determine the block-triangular relations, we develop an efficient and general method, which may provide a practical solution to the bottleneck problem of reducing multiloop multiscale integrals.

preprint2020arXiv

Dialogue State Induction Using Neural Latent Variable Models

Dialogue state modules are a useful component in a task-oriented dialogue system. Traditional methods find dialogue states by manually labeling training corpora, upon which neural models are trained. However, the labeling process can be costly, slow, error-prone, and more importantly, cannot cover the vast range of domains in real-world dialogues for customer service. We propose the task of dialogue state induction, building two neural latent variable models that mine dialogue states automatically from unlabeled customer service dialogue records. Results show that the models can effectively find meaningful slots. In addition, equipped with induced dialogue states, a state-of-the-art dialogue system gives better performance compared with not using a dialogue state module.

preprint2020arXiv

Disentangled Representations for Domain-generalized Cardiac Segmentation

Robust cardiac image segmentation is still an open challenge due to the inability of the existing methods to achieve satisfactory performance on unseen data of different domains. Since the acquisition and annotation of medical data are costly and time-consuming, recent work focuses on domain adaptation and generalization to bridge the gap between data from different populations and scanners. In this paper, we propose two data augmentation methods that focus on improving the domain adaptation and generalization abilities of state-to-the-art cardiac segmentation models. In particular, our "Resolution Augmentation" method generates more diverse data by rescaling images to different resolutions within a range spanning different scanner protocols. Subsequently, our "Factor-based Augmentation" method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned anatomy and modality factors from different domains. Our extensive experiments demonstrate the importance of efficient adaptation between seen and unseen domains, as well as model generalization ability, to robust cardiac image segmentation.

preprint2020arXiv

Have you forgotten? A method to assess if machine learning models have forgotten data

In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. This provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model `forgets' their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model's output. We establish statistical methods that compare the target's outputs with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage studies on what information a model retains and inspire extensions in more complex settings.

preprint2020arXiv

Hyperfine Structure and Coherent Dynamics of Rare Earth Spins Explored with Electron-Nuclear Double Resonance at Sub-Kelvin Temperatures

An experimental platform of ultralow-temperature pulsed ENDOR (electron-nuclear double resonance) spectroscopy is constructed for the bulk materials. Coherent property of the coupled electron and nuclear spins of the rare-earth (RE) dopants in a crystal (143Nd3+:Y2SiO5) is investigated from 100 mK to 6 K. At the lowest working temperatures, two-pulse-echo coherence time exceeding 2 ms and 40 ms are achieved for the electron and nuclear spins, while the electronic Zeeman and hyperfine population lifetimes are more than 15 s and 10 min. With the aid of the near-unity electron spin polarization at 100 mK, the complete hyperfine level structure with 16 energy levels is measured using ENDOR technique without the assistance of the reconstructed spin Hamiltonian. These results demonstrate the suitability of the deeply cooled paramagnetic RE-doped solids for memory components aimed for quantum communication and quantum computation. The developed experimental platform is expected to be a powerful tool for paramagnetic materials from various research fields.

preprint2020arXiv

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks and Temples benchmark with effective improvement. Our code is available at https://github.com/whubaichuan/M3VSNet

preprint2020arXiv

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

preprint2020arXiv

Neighborhood Matching Network for Entity Alignment

Structural heterogeneity between knowledge graphs is an outstanding challenge for entity alignment. This paper presents Neighborhood Matching Network (NMN), a novel entity alignment framework for tackling the structural heterogeneity challenge. NMN estimates the similarities between entities to capture both the topological structure and the neighborhood difference. It provides two innovative components for better learning representations for entity alignment. It first uses a novel graph sampling method to distill a discriminative neighborhood for each entity. It then adopts a cross-graph neighborhood matching module to jointly encode the neighborhood difference for a given entity pair. Such strategies allow NMN to effectively construct matching-oriented entity representations while ignoring noisy neighbors that have a negative impact on the alignment task. Extensive experiments performed on three entity alignment datasets show that NMN can well estimate the neighborhood similarity in more tough cases and significantly outperforms 12 previous state-of-the-art methods.

preprint2020arXiv

Predictions of 2019-nCoV Transmission Ending via Comprehensive Methods

Since the SARS outbreak in 2003, a lot of predictive epidemiological models have been proposed. At the end of 2019, a novel coronavirus, termed as 2019-nCoV, has broken out and is propagating in China and the world. Here we propose a multi-model ordinary differential equation set neural network (MMODEs-NN) and model-free methods to predict the interprovincial transmissions in mainland China, especially those from Hubei Province. Compared with the previously proposed epidemiological models, the proposed network can simulate the transportations with the ODEs activation method, while the model-free methods based on the sigmoid function, Gaussian function, and Poisson distribution are linear and fast to generate reasonable predictions. According to the numerical experiments and the realities, the special policies for controlling the disease are successful in some provinces, and the transmission of the epidemic, whose outbreak time is close to the beginning of China Spring Festival travel rush, is more likely to decelerate before February 18 and to end before April 2020. The proposed mathematical and artificial intelligence methods can give consistent and reasonable predictions of the 2019-nCoV ending. We anticipate our work to be a starting point for comprehensive prediction researches of the 2019-nCoV.

preprint2020arXiv

Reliable coherent optical memory based on a laser-written waveguide

$\mathrm {^{151}Eu^{3+}}$-doped yttrium silicate ($\mathrm {^{151}Eu^{3+}:Y_2SiO_5}$ ) crystal is a unique material that possesses hyperfine states with coherence time up to 6 h. Many efforts have been devoted to the development of this material as optical quantum memories based on the bulk crystals, but integrable structures (such as optical waveguides) that can promote $\mathrm {^{151}Eu^{3+}:Y_2SiO_5}$-based quantum memories to practical applications, have not been demonstrated so far. Here we report the fabrication of type 2 waveguides in a $\mathrm {^{151}Eu^{3+}:Y_2SiO_5}$ crystal using femtosecond-laser micromachining. The resulting waveguides are compatible with single-mode fibers and have the smallest insertion loss of $4.95\ dB$. On-demand light storage is demonstrated in a waveguide by employing the spin-wave atomic frequency comb (AFC) scheme and the revival of silenced echo (ROSE) scheme. We implement a series of interference experiments based on these two schemes to characterize the storage fidelity. Interference visibility of the readout pulse is $0.99\pm 0.03$ for the spin-wave AFC scheme and $0.97\pm 0.02$ for the ROSE scheme, demonstrating the reliability of the integrated optical memory.

preprint2020arXiv

RIS Enhanced Massive Non-orthogonal Multiple Access Networks: Deployment and Passive Beamforming Design

A novel framework is proposed for the deployment and passive beamforming design of a reconfigurable intelligent surface (RIS) with the aid of non-orthogonal multiple access (NOMA) technology. The problem of joint deployment, phase shift design, as well as power allocation is formulated for maximizing the energy efficiency with considering users' particular data requirements. To tackle this pertinent problem, machine learning approaches are adopted in two steps. Firstly, a novel long short-term memory (LSTM) based echo state network (ESN) algorithm is proposed to predict users' tele-traffic demand by leveraging a real dataset. Secondly, a decaying double deep Q-network (D3QN) based position-acquisition and phase-control algorithm is proposed to solve the joint problem of deployment and design of the RIS. In the proposed algorithm, the base station, which controls the RIS by a controller, acts as an agent. The agent periodically observes the state of the RIS-enhanced system for attaining the optimal deployment and design policies of the RIS by learning from its mistakes and the feedback of users. Additionally, it is proved that the proposed D3QN based deployment and design algorithm is capable of converging within mild conditions. Simulation results are provided for illustrating that the proposed LSTM-based ESN algorithm is capable of striking a tradeoff between the prediction accuracy and computational complexity. Finally, it is demonstrated that the proposed D3QN based algorithm outperforms the benchmarks, while the NOMA-enhanced RIS system is capable of achieving higher energy efficiency than orthogonal multiple access (OMA) enabled RIS system.

preprint2020arXiv

Statistical Modeling for Spatio-Temporal Data from Stochastic Convection-Diffusion Processes

This paper proposes a physical-statistical modeling approach for spatio-temporal data arising from a class of stochastic convection-diffusion processes. Such processes are widely found in scientific and engineering applications where fundamental physics imposes critical constraints on how data can be modeled and how models should be interpreted. The idea of spectrum decomposition is employed to approximate a physical spatio-temporal process by the linear combination of spatial basis functions and a multivariate random process of spectral coefficients. Unlike existing approaches assuming spatially- and temporally-invariant convection-diffusion, this paper considers a more general scenario with spatially-varying convection-diffusion and nonzero-mean source-sink. As a result, the temporal dynamics of spectral coefficients is coupled with each other, which can be interpreted as the non-linear energy redistribution across multiple scales from the perspective of physics. Because of the spatially-varying convection-diffusion, the space-time covariance is non-stationary in space. The theoretical results are integrated into a hierarchical dynamical spatio-temporal model. The connection is established between the proposed model and the existing models based on Integro-Difference Equations. Computational efficiency and scalability are also investigated to make the proposed approach practical. The advantages of the proposed methodology are demonstrated by numerical examples, a case study, and comprehensive comparison studies. Computer code is available on GitHub.

preprint2020arXiv

Studying Product Competition Using Representation Learning

Studying competition and market structure at the product level instead of brand level can provide firms with insights on cannibalization and product line optimization. However, it is computationally challenging to analyze product-level competition for the millions of products available on e-commerce platforms. We introduce Product2Vec, a method based on the representation learning algorithm Word2Vec, to study product-level competition, when the number of products is large. The proposed model takes shopping baskets as inputs and, for every product, generates a low-dimensional embedding that preserves important product information. In order for the product embeddings to be useful for firm strategic decision making, we leverage economic theories and causal inference to propose two modifications to Word2Vec. First of all, we create two measures, complementarity and exchangeability, that allow us to determine whether product pairs are complements or substitutes. Second, we combine these vectors with random utility-based choice models to forecast demand. To accurately estimate price elasticities, i.e., how demand responds to changes in price, we modify Word2Vec by removing the influence of price from the product vectors. We show that, compared with state-of-the-art models, our approach is faster, and can produce more accurate demand forecasts and price elasticities.

preprint2020arXiv

TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

This paper introduces our approach to the EmotioNet Challenge 2020. We pose the AU recognition problem as a multi-task learning problem, where the non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head motion (the last 6 AUs) are modeled separately. The co-occurrence of the expression features and the head pose features are explored. We observe that different AUs converge at various speed. By choosing the optimal checkpoint for each AU, the recognition results are improved. We are able to obtain a final score of 0.746 in validation set and 0.7306 in the test set of the challenge.

preprint2020arXiv

TP-LSD: Tri-Points Based Line Segment Detector

This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed. The previous related methods typically use the two-step strategy, relying on either heuristic post-process or extra classifier. To realize one-step detection with a faster and more compact model, we introduce the tri-points representation, converting the line segment detection to the end-to-end prediction of a root-point and two endpoints for each line segment. TP-LSD has two branches: tri-points extraction branch and line segmentation branch. The former predicts the heat map of root-points and the two displacement maps of endpoints. The latter segments the pixels on straight lines out from background. Moreover, the line segmentation map is reused in the first branch as structural prior. We propose an additional novel evaluation metric and evaluate our method on Wireframe and YorkUrban datasets, demonstrating not only the competitive accuracy compared to the most recent methods, but also the real-time run speed up to 78 FPS with the $320\times 320$ input.

preprint2016arXiv

A Cost-Effective Strategy for Storing Scientific Datasets with Multiple Service Providers in the Cloud

Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated data sets can be flexibly 1 stored locally in the current cloud, 2 deleted and regenerated whenever reused or 3 transferred to cheaper cloud service for storage. However, due to the pay for use model, the total application cost largely depends on the usage of computation, storage and bandwidth resources, hence cutting the cost of cloud based data storage becomes a big concern for deploying scientific applications in the cloud. In this paper, we propose a novel strategy that can cost effectively store large generated data sets with multiple cloud service providers. The strategy is based on a novel algorithm that finds the trade off among computation, storage and bandwidth costs in the cloud, which are three key factors for the cost of data storage. Both general (random) simulations conducted with popular cloud service providers pricing models and three specific case studies on real world scientific applications show that the proposed storage strategy is highly cost effective and practical for run time utilization in the cloud.

preprint2016arXiv

A Polyhedral Study of the Static Probabilistic Lot-Sizing Problem

We study the polyhedral structure of the static probabilistic lot-sizing problem and propose valid inequalities that integrate information from the chance constraint and the binary setup variables. We prove that the proposed inequalities subsume existing inequalities for this problem, and they are facet-defining under certain conditions. In addition, we show that they give the convex hull description of a related stochastic lot-sizing problem. We propose a new formulation that exploits the simple recourse structure, which significantly reduces the number of variables and constraints of the deterministic equivalent program. This reformulation can be applied to general chance-constrained programs with simple recourse. The computational results show that the proposed inequalities and the new formulation are effective for the the static probabilistic lot-sizing problems.

preprint2016arXiv

A Spatio-Temporal Modeling Approach for Weather Radar Reflectivity Data and Its Applications in Tropical Southeast Asia

Weather radar echoes, correlated in both space and time, are the most important input data for short-term precipitation forecast. Motivated by real datasets, this paper is concerned with the spatio-temporal modeling of two-dimensional radar reflectivity fields from a sequence of radar images. Under a Lagrangian integration scheme, we model the radar reflectivity data by a spatio-temporal conditional autoregressive process which is driven by two hidden sub-processes. The first sub-process is the dynamic velocity field which determines the motion of the weather system, while the second sub-process governs the growth or decay of the strength of radar reflectivity. The proposed method is demonstrated, and compared with existing methods, using the real radar data collected from the tropical southeast Asia. Note that, since the tropical storms are known to be highly chaotic and extremely difficult to be predicted, we only focus on the modeling of reflectivity data within a short-period of time and consider the short-term prediction problem based on the proposed model. This is often referred to as the nowcasting issue in the meteorology society.

preprint2016arXiv

Deep Embedding for Spatial Role Labeling

This paper introduces the visually informed embedding of word (VIEW), a continuous vector representation for a word extracted from a deep neural model trained using the Microsoft COCO data set to forecast the spatial arrangements between visual objects, given a textual description. The model is composed of a deep multilayer perceptron (MLP) stacked on the top of a Long Short Term Memory (LSTM) network, the latter being preceded by an embedding layer. The VIEW is applied to transferring multimodal background knowledge to Spatial Role Labeling (SpRL) algorithms, which recognize spatial relations between objects mentioned in the text. This work also contributes with a new method to select complementary features and a fine-tuning method for MLP that improves the $F1$ measure in classifying the words into spatial roles. The VIEW is evaluated with the Task 3 of SemEval-2013 benchmark data set, SpaceEval.

preprint2016arXiv

Elastic measurements of amorphous silicon films at mK temperatures

The low temperature properties of glass are distinct from those of crystals due to the presence of poorly understood low-energy excitations. The tunneling model proposes that these are atoms tunneling between nearby equilibria, forming tunneling two level systems (TLSs). This model is rather successful, but it does not explain the remarkably universal value of the mechanical dissipation $Q^{-1}$ near 1 kelvin. The only known exceptions to this universality are the $Q^{-1}$ of certain thin films of amorphous silicon, carbon and germanium. Recently, it was found that $Q^{-1}$ of amorphous silicon (a-Si) films can be reduced by two orders of magnitude by increasing the temperature of the substrate during deposition. According to the tunneling model, the reduction in $Q^{-1}$ at 1 kelvin implies a reduction in $P_{0}γ^{2}$, where $P_{0}$ is the density of TLSs and $γ$ is their coupling to phonons. In this preliminary report, we demonstrate elastic measurements of a-Si films down to 20 mK. This will allow us, in future work, to determine whether $P_{0}$ or $γ$ is responsible for the reduction in $Q^{-1}$ with deposition temperature.

preprint2016arXiv

Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks

Existing deep convolutional neural networks (CNNs) have shown their great success on image classification. CNNs mainly consist of convolutional and pooling layers, both of which are performed on local image areas without considering the dependencies among different image regions. However, such dependencies are very important for generating explicit image representation. In contrast, recurrent neural networks (RNNs) are well known for their ability of encoding contextual information among sequential data, and they only require a limited number of network parameters. General RNNs can hardly be directly applied on non-sequential data. Thus, we proposed the hierarchical RNNs (HRNNs). In HRNNs, each RNN layer focuses on modeling spatial dependencies among image regions from the same scale but different locations. While the cross RNN scale connections target on modeling scale dependencies among regions from the same location but different scales. Specifically, we propose two recurrent neural network models: 1) hierarchical simple recurrent network (HSRN), which is fast and has low computational cost; and 2) hierarchical long-short term memory recurrent network (HLSTM), which performs better than HSRN with the price of more computational cost. In this manuscript, we integrate CNNs with HRNNs, and develop end-to-end convolutional hierarchical recurrent neural networks (C-HRNNs). C-HRNNs not only make use of the representation power of CNNs, but also efficiently encodes spatial and scale dependencies among different image regions. On four of the most challenging object/scene image classification benchmarks, our C-HRNNs achieve state-of-the-art results on Places 205, SUN 397, MIT indoor, and competitive results on ILSVRC 2012.

preprint2016arXiv

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition

A key challenge in fine-grained recognition is how to find and represent discriminative local regions. Recent attention models are capable of learning discriminative region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they are not able to accurately find multiple distinctive regions. In this work, we introduce an attribute-guided attention localization scheme where the local region localizers are learned under the guidance of part attribute descriptions. By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm. The attribute labeling requirement of the scheme is more amenable than the accurate part location annotation required by traditional part-based fine-grained recognition methods. Experimental results on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme on both fine-grained recognition and attribute recognition.

preprint2016arXiv

Research on Life Characteristics of Internet Based on Network Motifs

From biosystem to complex system,the study of life is always an important area. Inspired by hyper-cycle theory about the evolution of non-life system, we study the metabolism, self-replication and mutation behavior in the Internet based on node entity, connection relationship and function subgraph--motif--of network topology. Firstly a framework of complex network evolution is proposed to analyze the birth and death phenomena of Internet topology from January 1998 to August 2013. Then we find the Internet metabolism behavior from angle of node, motif to global topology, i.e. one born node is only added into Internet, subsequently takes part in the local reconstruction activities. Meanwhile there are nodes' and motifs' death. In process of the local reconstruction, although the Internet system replicates motifs repeatedly by adding or removing actions, the system characteristics and global structure are not destroyed. Statistics about the motif M3 which is a full connectivity subgraph shows that the process of its metabolism is fluctuation that causes mutation of Internet. Furthermore we find that mutation is instinctive reaction of Internet when it's influenced from inside or outside environment, such as Internet bubble, social network rising and finance crisis. The behaviors of metabolism, self-replication and mutation of Internet indicate its life characteristic as a complex artificial life. And our work will inspire people to study the life-like phenomena of other complex systems from angle of topology structure.

preprint2015arXiv

On Differentially Private Online Collaborative Recommendation Systems

In collaborative recommendation systems, privacy may be compromised, as users' opinions are used to generate recommendations for others. In this paper, we consider an online collaborative recommendation system, and we measure users' privacy in terms of the standard differential privacy. We give the first quantitative analysis of the trade-offs between recommendation quality and users' privacy in such a system by showing a lower bound on the best achievable privacy for any non-trivial algorithm, and proposing a near-optimal algorithm. From our results, we find that there is actually little trade-off between recommendation quality and privacy for any non-trivial algorithm. Our results also identify the key parameters that determine the best achievable privacy.

preprint2015arXiv

Quantum Storage of Three-Dimensional Orbital-Angular-Momentum Entanglement in a Crystal

Here we present the quantum storage of three-dimensional orbital-angular-momentum photonic entanglement in a rare-earth-ion-doped crystal. The properties of the entanglement and the storage process are confirmed by the violation of the Bell-type inequality generalized to three dimensions after storage ($S=2.152\pm0.033$). The fidelity of the memory process is $0.993\pm0.002$, as determined through complete quantum process tomography in three dimensions. An assessment of the visibility of the stored weak coherent pulses in higher-dimensional spaces, demonstrates that the memory is highly reliable for 51 spatial modes. These results pave the way towards the construction of high-dimensional and multiplexed quantum repeaters based on solid-state devices. The multimode capacity of rare-earth-based optical processor goes beyond the temporal and the spectral degree of freedom, which might provide a useful tool for photonic information processing.

preprint2015arXiv

Storage of multiple single-photon pulses emitted from a quantum dot in a solid-state quantum memory

Quantum repeaters are critical components for distributing entanglement over long distances in presence of unavoidable optical losses during transmission. Stimulated by Duan-Lukin-Cirac-Zoller protocol, many improved quantum-repeater protocols based on quantum memories have been proposed, which commonly focus on the entanglement-distribution rate. Among these protocols, the elimination of multi-photons (multi-photon-pairs) and the use of multimode quantum memory are demonstrated to have the ability to greatly improve the entanglement-distribution rate. Here, we demonstrate the storage of deterministic single photons emitted from a quantum dot in a polarization-maintaining solid-state quantum memory; in addition, multi-temporal-mode memory with $1$, $20$ and $100$ narrow single-photon pulses is also demonstrated. Multi-photons are eliminated, and only one photon at most is contained in each pulse. Moreover, the solid-state properties of both sub-systems make this configuration more stable and easier to be scalable. Our work will be helpful in the construction of efficient quantum repeaters based on all-solid-state devices

preprint2015arXiv

The Dirac Operator on Regular Metric Trees

A metric tree is a tree whose edges are viewed as line segments of positive length. The Dirac operator on such tree is the operator which operates on each edge, complemented by the matching conditions at the vertices which were given by Bolte and Harrison \cite{BolteHarrison2003}. The spectrum of Dirac operator can be quite different, reflecting geometry of the tree. We discuss a special case of trees, namely the so-called regular trees. They possess a rich group of symmetries. This allows one to construct an orthogonal decomposition of the space $L^2(Γ)$ which reduces the Dirac. Based upon this decomposition, a detailed spectral analysis of Dirac operator on the regular metric trees is possible.

preprint2015arXiv

Two intervals Rényi entanglement entropy of compact free boson on torus

We compute the $N=2$ Rényi entanglement entropy of two intervals at equal time in a circle, for the theory of a 2d compact complex free scalar at finite temperature. This is carried out by performing functional integral on a genus 3 ramified cover of the torus, wherein the quantum part of the integral is captured by the four point function of twist fields on the worldsheet torus, and the classical piece is given by summing over winding modes of the genus 3 surface onto the target space torus. The final result is given in terms of a product of theta function and certain multi-dimensional theta function. We demonstrate the T-duality invariance of the result. We also study its low temperature limit. In the case in which the size of the intervals and of their separation are much smaller than the whole system, our result is in exact agreement with the known result for two intervals on an infinite system at zero temperature \cite{eeoftwo}. In the case in which the separation between the two intervals is much smaller than the interval length, the leading thermal corrections take the same universal form as proposed in \cite{Cardy:2014jwa,Chen:2015cna} for Rényi entanglement entropy of a single interval.

preprint2014arXiv

A Robust Point Sets Matching Method

Point sets matching method is very important in computer vision, feature extraction, fingerprint matching, motion estimation and so on. This paper proposes a robust point sets matching method. We present an iterative algorithm that is robust to noise case. Firstly, we calculate all transformations between two points. Then similarity matrix are computed to measure the possibility that two transformation are both true. We iteratively update the matching score matrix by using the similarity matrix. By using matching algorithm on graph, we obtain the matching result. Experimental results obtained by our approach show robustness to outlier and jitter.

preprint2014arXiv

An Empirical Study on Software Defect Prediction with a Simplified Metric Set

Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within- and cross-project defect prediction when available historical data are insufficient remain unclear. The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. First, based on six typical classifiers, we constructed three types of predictors using the size of software metric set in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. The experimental results indicate that (1) the choice of training data should depend on the specific requirement of prediction accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naive Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice. The guideline for choosing a suitable simplified metric set in different scenarios is presented in Table 12.

preprint2013arXiv

A Survey on Gas Sensing Technology

Sensing technology has been widely investigated and utilized for gas detection. Due to the different applicability and inherent limitations of different gas sensing technologies, researchers have been working on different scenarios with enhanced gas sensor calibration. This paper reviews the descriptions, evaluation, comparison and recent developments in existing gas sensing technologies. A classification of sensing technologies is given, based on the variation of electrical and other properties. Detailed introduction to sensing methods based on electrical variation is discussed through further classification according to sensing materials, including metal oxide semiconductors, polymers, carbon nanotubes, and moisture absorbing materials. Methods based on other kinds of variations such as optical, calorimetric, acoustic and gas-chromatographic, are presented in a general way. Several suggestions related to future development are also discussed. Furthermore, this paper focuses on sensitivity and selectivity for performance indicators to compare different sensing technologies, analyzes the factors that influence these two indicators, and lists several corresponding improved approaches.

preprint2013arXiv

Dynamical behaviors of FRW Universe containing a positive/negative potential scalar field in loop quantum cosmology

The dynamical behaviors of FRW Universe containing a posivive/negative potential scalar field in loop quantum cosmology scenario are discussed. The method of the phase-plane analysis is used to investigate the stability of the Universe. It is found that the stability properties in this situation are quite different from the classical cosmology case. For a positive potential scalar field coupled with a barotropic fluid, the cosmological autonomous system has five fixed points and one of them is stable if the adiabatic index $γ$ satisfies $0<γ<2$. This leads to the fact that the universe just have one bounce point instead of the singularity which lies in the quantum dominated area and it is caused by the quantum geometry effect. There are four fixed points if one considers a scalar field with a negative potential, but none of them is stable. Therefore, the universe has two kinds of bounce points, one is caused by the quantum geometry effect and the other is caused by the negative potential, the Universe may enter a classical re-collapse after the quantum bounce. This hints that the spatially flat FRW Universe containing a negative potential scalar field is cyclic.

preprint2013arXiv

Focusing Singularity in a Derivative Nonlinear Schrödinger Equation

We present a numerical study of a derivative nonlinear Schrödinger equation with a general power nonlinearity, $|ψ|^{2σ}ψ_x$. In the $L^2$-supercritical regime, $σ>1$, our simulations indicate that there is a finite time singularity. We obtain a precise description of the local structure of the solution in terms of blowup rate and asymptotic profile, in a form similar to that of the nonlinear Schrödinger equation with supercritical power law nonlinearity.

preprint2013arXiv

Path Integral of Bianchi I models in Loop Quantum Cosmology

A path integral formulation of the Bianchi I models containing a massless scalar field in loop quantum cosmology is constructed. Following the strategy used in the homogenous and isotropic case, the calculation is extended to the simplest non-isotropic models according to the $\barμ$ and $\barμ^{\prime}$ scheme. It is proved from the path integral angle that the quantum dynamic lacks the full invariance with respect to fiducial cell scaling in the $\barμ$ scheme, but it does not in the $\barμ^{\prime}$ scheme. The investigation affirms the equivalence of the canonical approach and the path integral approach in loop quantum cosmology.

preprint2012arXiv

Stability of Solitary Waves for a Generalized Derivative Nonlinear Schrödinger Equation

We consider a derivative nonlinear Schrödinger equation with a general nonlinearity. This equation has a two parameter family of solitary wave solutions. We prove orbital stability/instability results that depend on the strength of the nonlinearity and, in some instances, their velocity. We illustrate these results with numerical simulations.

preprint2011arXiv

The Leading Behaviour of The Ground-State Energy of Heavy Ions According to Brown and Ravenhall

In this article we prove the absence of relativistic effects in leading order for the ground-state energy according to Brown-Ravenhall operator. We obtain this asymptotic result for negative ions and for systems with the number of electrons proportional to the nuclear charge. In the case of neutral atoms the analogous result was obtained earlier by Cassanas and Siedentop [4].

preprint2010arXiv

Experimental realization of a broadband illusion optics device

We experimentally demonstrate the first metamaterial "illusion optics" device - an "invisible gateway" by using a transmission-line medium. The device contains an open channel that can block electromagnetic waves at a particular frequency range. We also demonstrate that such a device can work in a broad frequency range.

preprint2010arXiv

Shocks and Universal Statistics in (1+1)-Dimensional Relativistic Turbulence

We propose that statistical averages in relativistic turbulence exhibit universal properties. We consider analytically the velocity and temperature differences structure functions in the (1+1)-dimensional relativistic turbulence in which shock waves provide the main contribution to the structure functions in the inertial range. We study shock scattering, demonstrate the stability of the shock waves, and calculate the anomalous exponents. We comment on the possibility of finite time blowup singularities.

preprint2009arXiv

Invisibility Cloaks Modeled by Anisotropic Metamaterials Based on Inductor-capacitor Networks

Based on the transformation optics, a novel transmission-line (TL) approach to realize invisibility cloaking using planar anisotropic metamaterials (MTMs) is proposed. The two-dimensional cylindrical cloaks are modeled based on inductor-capacitor (L-C) MTMs networks. The three elements of the constitutive parameters are all allowed to be spatially inhomogeneous which lead to the full parameter realization of a cylindrical cloak. As an example, a cloak working at VHF band is modeled and its invisibility behavior is demonstrated based on the solution of the node voltage distribution. Due to the non-resonant properties of the L-C elements, the broadband characteristic of the proposed cloaks is also evident.

preprint2008arXiv

Gravity Duals of Lifshitz-like Fixed Points

We find candidate macroscopic gravity duals for scale-invariant but non-Lorentz invariant fixed points, which do not have particle number as a conserved quantity. We compute two-point correlation functions which exhibit novel behavior relative to their AdS counterparts, and find holographic renormalization group flows to conformal field theories. Our theories are characterized by a dynamical critical exponent $z$, which governs the anisotropy between spatial and temporal scaling $t \to λ^z t$, $x \to λx$; we focus on the case with $z=2$. Such theories describe multicritical points in certain magnetic materials and liquid crystals, and have been shown to arise at quantum critical points in toy models of the cuprate superconductors. This work can be considered a small step towards making useful dual descriptions of such critical points.

preprint2004arXiv

Comments on Cosmic Censorship in AdS/CFT

Recently Hertog, Horowitz, and Maeda (HHM) (hep-th/0310054) have proposed that cosmic censorship can be violated in the AdS/CFT context. They argue that for certain initial data there is insufficient energy available to make a black hole whose horizon is big enough to cloak the singularity that forms. We have investigated this proposal in the models HHM discuss and have thus far been unable to find initial data that provably satisfy this criterion, despite our development of an improved lower bound on the size of the singular region. This is consistent with recent numerical results (hep-th/0402109). For certain initial data, the energies of our configurations are not far above the lower bound on the requisite black hole mass, and so it is possible that in the exact time development naked singularities do form. We go on to argue that the finite radius cut-off AdS_5 situation discussed by HHM displays instabilities when the full 10D theory is considered. We propose an AdS_3 example that may well be free of this instability.

Xiao Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

96 published item(s)

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

Language-free Compositional Action Generation via Decoupling Refinement

Capillary gravity water waves linearized at monotone shear flows: eigenvalues and inviscid damping

Imperceptible Adversarial Attack via Invertible Neural Networks

3D Intracranial Aneurysm Classification and Segmentation via Unsupervised Dual-branch Learning

A Tree-Structured Multi-Task Model Recommender

Automatic Facial Paralysis Estimation with Facial Action Units

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

Convolutional Non-homogeneous Poisson Process with Application to Wildfire Risk Quantification for Power Delivery Networks

Developing NOMA to Next Generation Multiple Access (NGMA): Future Vision and Research Opportunities

Gibbs Phenomenon Suppression in PDE-Based Statistical Spatio-Temporal Models

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GraphMAE: Self-Supervised Masked Graph Autoencoders

HSIC-InfoGAN: Learning Unsupervised Disentangled Representations by Maximising Approximated Mutual Information

Instability and spectrum of the linearized two-phase fluids interface problem at shear flows

Keeping Minimal Experience to Achieve Efficient Interpretable Policy Distillation

Learning Disentangled Representations in the Imaging Domain

Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

MegLoc: A Robust and Accurate Visual Localization Pipeline

Multiloop corrections for collider processes using auxiliary mass flow

NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing

On Curating Responsible and Representative Healthcare Video Recommendations for Patient Education and Health Literacy: An Augmented Intelligence Approach

Online non-convex learning for river pollution source identification

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers

Physics-Informed Statistical Modeling for Wildfire Aerosols Process Using Multi-Source Geostationary Satellite Remote-Sensing Data Streams

Privacy-Preserving Personalized Fitness Recommender System (P3FitRec): A Multi-level Deep Learning Approach

Regression Trees on Grassmann Manifold for Adapting Reduced-Order Models

Rethinking Hard-Parameter Sharing in Multi-Domain Learning

Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

Temperature effects on the structure and mechanical properties of vapor deposited a-SiO2

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals

Unbox the Blackbox: Predict and Interpret YouTube Viewership Using Deep Learning

vMFNet: Compositionality Meets Domain-generalised Segmentation

What is Healthy? Generative Counterfactual Diffusion for Lesion Localization

Why patient data cannot be easily forgotten?

A Novel Graph-based Computation Offloading Strategy for Workflow Applications in Mobile Edge Computing

Artificial Intelligence Driven UAV-NOMA-MEC in Next Generation Wireless Networks

Artificial Intelligence Enhanced Rapid and Efficient Diagnosis of Mycoplasma Pneumoniae Pneumonia in Children Patients

Boost-S: Gradient Boosted Trees for Spatial Data and Its Application to FDG-PET Imaging Data

Diversity in immunogenomics: the value and the challenge

EdgeWorkflowReal: An Edge Computing based Workflow Execution Engine for Smart Systems

GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent Software

Improved Signed Distance Function for 2D Real-time SLAM and Accurate Localization

Leveraging Planar Regularities for Point Line Visual-Inertial Odometry

Locality-aware Channel-wise Dropout for Occluded Face Recognition

Reconfigurable Intelligent Surface (RIS) Aided Multi-User Networks: Interplay Between NOMA and RIS

Understanding WeChat User Preferences and "Wow" Diffusion

Unsupervised Neural Rendering for Image Hazing

An end-to-end CNN framework for polarimetric vision tasks based on polarization-parameter-constructing network

Artificial Intelligence Aided Next-Generation Networks Relying on UAVs

Calculation of Feynman loop integration and phase-space integration via auxiliary mass flow

Complete reduction of integrals in two-loop five-light-parton scattering amplitudes

Dialogue State Induction Using Neural Latent Variable Models

Disentangled Representations for Domain-generalized Cardiac Segmentation

Have you forgotten? A method to assess if machine learning models have forgotten data

Hyperfine Structure and Coherent Dynamics of Rare Earth Spins Explored with Electron-Nuclear Double Resonance at Sub-Kelvin Temperatures

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

Neighborhood Matching Network for Entity Alignment

Predictions of 2019-nCoV Transmission Ending via Comprehensive Methods

Reliable coherent optical memory based on a laser-written waveguide

RIS Enhanced Massive Non-orthogonal Multiple Access Networks: Deployment and Passive Beamforming Design

Statistical Modeling for Spatio-Temporal Data from Stochastic Convection-Diffusion Processes

Studying Product Competition Using Representation Learning

TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

TP-LSD: Tri-Points Based Line Segment Detector

A Cost-Effective Strategy for Storing Scientific Datasets with Multiple Service Providers in the Cloud

A Polyhedral Study of the Static Probabilistic Lot-Sizing Problem

A Spatio-Temporal Modeling Approach for Weather Radar Reflectivity Data and Its Applications in Tropical Southeast Asia

Deep Embedding for Spatial Role Labeling