Source author record

Jian Tang

Jian Tang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

88works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

While the overall inference latency of Video Diffusion Transformers (DiTs) can be substantially reduced through model distillation, per-step inference latency remains a critical bottleneck. Existing acceleration paradigms primarily exploit redundancy across the denoising trajectory; however, we identify a limitation where these step-wise strategies encounter diminishing returns in few-step regimes. In such scenarios, the scarcity of temporal states prevents effective feature reuse or predictive modeling, creating a formidable barrier to further acceleration. To overcome this, we propose Frame Interleaved Sparsity DiT (FIS-DiT), a training-free and operator-agnostic framework that shifts the optimization focus from the temporal trajectory to the latent frame dimension. Our approach is motivated by an intrinsic duality within this dimension: the existence of frame-wise sparsity that permits reduced computation, coupled with a structural consistency where each frame position remains equally vital to the global spatiotemporal context. Leveraging this insight, we implement Frame Interleaved Sparsity (FIS) as an execution strategy that manipulates frame subsets across the model hierarchy, refreshing all latent positions without requiring full-scale block computation. Empirical evaluations on Wan 2.2 and HunyuanVideo 1.5 demonstrate that FIS-DiT consistently achieves 2.11--2.41$\times$ speedup with negligible degradation across VBench-Q and CLIP metrics, providing a scalable and robust pathway toward real-time high-definition video generation.

preprint2026arXiv

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

We present Pelican-Unified 1.0, the first embodied foundation model trained according to the principle of unification. Pelican-Unified 1.0 uses a single VLM as a unified understanding module, mapping scenes, instructions, visual contexts, and action histories into a shared semantic space. The same VLM also serves as a unified reasoning module, autoregressively producing task-, action-, and future-oriented chains of thought in a single forward pass and projecting the final hidden state into a dense latent variable. A Unified Future Generator (UFG) then conditions on this latent variable and jointly generates future videos and future actions through two modality-specific output heads within the same denoising process. The language, video, and action losses are all backpropagated into the shared representation, enabling the model to jointly optimize understanding, reasoning, imagination, and action during training, rather than training three isolated expert systems. Experiments demonstrate that unification does not imply compromise. With a single checkpoint, Pelican-Unified 1.0 achieves strong performance across all three capabilities: 64.7 on eight VLM benchmarks, the best among comparable-scale models; 66.03 on WorldArena, ranking first; and 93.5 on RoboTwin, the second-best average among compared action methods. These results show that the unified paradigm succeeds in preserving specialist strength while bringing understanding, reasoning, imagination, and action into one model.

preprint2026arXiv

Visualization of Tunable Electronic Structure of Monolayer TaIrTe$_4$

Monolayer TaIrTe$_4$ has emerged as an attractive material platform to study intriguing phenomena related to topology and strong electron correlations. Recently, strong interactions have been demonstrated to induce strain and dielectric screening tunable topological phases such as quantum spin Hall insulator (QSHI), trivial insulator, higher-order topological insulator, and metallic phase, in the ground state of monolayer TaIrTe$_4$. Moreover, charge dosing has been demonstrated to convert the QSHI into a dual QSHI state. Although the band structure of monolayer TaIrTe$_4$ is central to interpreting its topological phases in transport experiments, direct experimental access to its intrinsic electronic structure has so far remained elusive. Here we report direct measurements of the monolayer TaIrTe$_4$ band structure using spatially resolved micro-angle-resolved photoemission spectroscopy (microARPES) with micrometre-scale resolution. The observed dispersions show quantitative agreement with density functional theory calculations using the Heyd-Scuseria-Ernzerhof hybrid functional, establishing the insulating ground state and revealing no evidence for strong electronic correlations. We further uncover a pronounced electron-hole asymmetry in the doping response. Whereas hole doping is readily induced by electrostatic gating, attempts to introduce electrons via gating or alkali metal deposition do not yield a rigid upward shift of the Fermi level. Fractional charge calculations demonstrate that added electrons instead drive band renormalization and shrink the band gap. Taken together, our experimental and theoretical results identify the microscopic mechanism by which induced charges reshape the band topology of monolayer TaIrTe$_4$, showing that doping can fundamentally alter the electronic structure beyond the rigid band behaviour that is typically assumed.

preprint2026arXiv

Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test

As world models gain momentum in Embodied AI, an increasing number of works explore using video foundation models as predictive world models for downstream embodied tasks like 3D prediction or interactive generation. However, before exploring these downstream tasks, video foundation models still have two critical questions unanswered: (1) whether their generative generalization is sufficient to maintain perceptual fidelity in the eyes of human observers, and (2) whether they are robust enough to serve as a universal prior for real-world embodied agents. To provide a standardized framework for answering these questions, we introduce the Embodied Turing Test benchmark: WoW-World-Eval (Wow,wo,val). Building upon 609 robot manipulation data, Wow-wo-val examines five core abilities, including perception, planning, prediction, generalization, and execution. We propose a comprehensive evaluation protocol with 22 metrics to assess the models' generation ability, which achieves a high Pearson Correlation between the overall score and human preference (>0.93) and establishes a reliable foundation for the Human Turing Test. On Wow-wo-val, models achieve only 17.27 on long-horizon planning and at best 68.02 on physical consistency, indicating limited spatiotemporal consistency and physical reasoning. For the Inverse Dynamic Model Turing Test, we first use an IDM to evaluate the video foundation models' execution accuracy in the real world. However, most models collapse to $\approx$ 0% success, while WoW maintains a 40.74% success rate. These findings point to a noticeable gap between the generated videos and the real world, highlighting the urgency and necessity of benchmarking World Model in Embodied AI.

preprint2025arXiv

Real-world Reinforcement Learning from Suboptimal Interventions

Real-world reinforcement learning (RL) offers a promising approach to training precise and dexterous robotic manipulation policies in an online manner, enabling robots to learn from their own experience while gradually reducing human labor. However, prior real-world RL methods often assume that human interventions are optimal across the entire state space, overlooking the fact that even expert operators cannot consistently provide optimal actions in all states or completely avoid mistakes. Indiscriminately mixing intervention data with robot-collected data inherits the sample inefficiency of RL, while purely imitating intervention data can ultimately degrade the final performance achievable by RL. The question of how to leverage potentially suboptimal and noisy human interventions to accelerate learning without being constrained by them thus remains open. To address this challenge, we propose SiLRI, a state-wise Lagrangian reinforcement learning algorithm for real-world robot manipulation tasks. Specifically, we formulate the online manipulation problem as a constrained RL optimization, where the constraint bound at each state is determined by the uncertainty of human interventions. We then introduce a state-wise Lagrange multiplier and solve the problem via a min-max optimization, jointly optimizing the policy and the Lagrange multiplier to reach a saddle point. Built upon a human-as-copilot teleoperation system, our algorithm is evaluated through real-world experiments on diverse manipulation tasks. Experimental results show that SiLRI effectively exploits human suboptimal interventions, reducing the time required to reach a 90% success rate by at least 50% compared with the state-of-the-art RL method HIL-SERL, and achieving a 100% success rate on long-horizon manipulation tasks where other RL methods struggle to succeed. Project website: https://silri-rl.github.io/.

preprint2023arXiv

Iterative Graph Self-Distillation

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation in an unsupervised manner through instance discrimination using a self-supervised contrastive learning approach. IGSD involves a teacher-student distillation process that uses graph diffusion augmentations and constructs the teacher model using an exponential moving average of the student model. The intuition behind IGSD is to predict the teacher network representation of the graph pairs under different augmented views. As a natural extension, we also apply IGSD to semi-supervised scenarios by jointly regularizing the network with both supervised and self-supervised contrastive loss. Finally, we show that finetuning the IGSD-trained models with self-training can further improve the graph representation power. Empirically, we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings, which well validates the superiority of IGSD.

preprint2023arXiv

Optimization of muonium yield in perforated silica aerogel

A muonium consists of a positive muon associated with an orbital electron, and the spontaneous conversion to antimuonium serves as a clear indication of new physics beyond the Standard Model in particle physics.One of the most important aspects in muonium-to-antimuonium conversion experiment (MACE) is to increase the muonium yield in vacuum to challenge the latest limit obtained in 1999. This study focuses on a simulation of the muonium formation and diffusion in the perforated silica aerogel. The independent simulation results can be well validated by experimental data. By optimizing the target geometry, we find a maximum muonium emission efficiency of $7.92(2)\%$ and a maximum vacuum yield of $1.134(2)\%$ with a typical surface muon beam, indicating a 2.6 times and a 2.1 times enhancement, respectively. Our results will pave the way for muonium experiments.

preprint2023arXiv

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. Though one could lower the complexity of Transformers by inducing the sparsity in point-wise self-attentions for LTTF, the limited information utilization prohibits the model from exploring the complex dependencies comprehensively. To this end, we propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects: (i) an encoder-decoder architecture incorporating a linear complexity without sacrificing information utilization is proposed on top of sliding-window attention and Stationary and Instant Recurrent Network (SIRN); (ii) a module derived from the normalizing flow is devised to further improve the information utilization by inferring the outputs with the latent variables in SIRN directly; (iii) the inter-series correlation and temporal dynamics in time-series data are modeled explicitly to fuel the downstream self-attention mechanism. Extensive experiments on seven real-world datasets demonstrate that Conformer outperforms the state-of-the-art methods on LTTF and generates reliable prediction results with uncertainty quantification.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

Balanced Datasets for IoT IDS

As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this study, four commonly used datasets are visualized and analyzed visually. Moreover, it proposes a sampling algorithm that generates a sample that represents the original dataset. In addition, it proposes an algorithm to generate a balanced dataset. Researchers can use this paper as a starting point when investigating cybersecurity and machine learning. The proposed sampling algorithms showed reliability in generating well-representing and balanced samples from NSL-KDD, UNSW-NB15, BotNetIoT-01, and BoTIoT datasets.

preprint2022arXiv

Constraints on cosmic-ray boosted DM in CDEX-10

Dark matter (DM) direct detection experiments have been setting strong limits on the DM-nucleon scattering cross section at the DM mass above a few GeV, but leave large parameter space unexplored in the low mass region. DM is likely to be scattered and boosted by relativistic cosmic rays in the expanding universe if it can generate nuclear recoils in direct detection experiments to offer observable signals. Since low energy threshold detectors using Germanium have provided good constraints on ordinary halo GeV-scale DM, it is necessary to re-analyze 102.8 kg$\times$day data in the CDEX-10 experiment assuming that DM is boosted by cosmic rays. For the DM mass range 1 keV $<m_χ<$ 1 MeV and the effective distance within 1 kpc, we reach an almost flat floor limit at $8.32\times10^{-30}$ cm$^2$ on spin-independent DM-nucleon scattering cross section at a 90\% confidence level. The CDEX-10 result is able to close the gap unambiguously in the parameter space between MiniBooNE and XENON1T constraints which was partially hindered by the Earth attenuation effect. We also quantitatively calculate expected neutrino floor on searching for CRBDM in future direct detection experiments using Germanium.

preprint2022arXiv

Continual Few-Shot Learning with Adversarial Class Storage

Humans have a remarkable ability to quickly and effectively learn new concepts in a continuous manner without forgetting old knowledge. Though deep learning has made tremendous successes on various computer vision tasks, it faces challenges for achieving such human-level intelligence. In this paper, we define a new problem called continual few-shot learning, in which tasks arrive sequentially and each task is associated with a few training samples. We propose Continual Meta-Learner (CML) to solve this problem. CML integrates metric-based classification and a memory-based mechanism along with adversarial learning into a meta-learning framework, which leads to the desirable properties: 1) it can quickly and effectively learn to handle a new task; 2) it overcomes catastrophic forgetting; 3) it is model-agnostic. We conduct extensive experiments on two image datasets, MiniImageNet and CIFAR100. Experimental results show that CML delivers state-of-the-art performance in terms of classification accuracy on few-shot learning tasks without catastrophic forgetting.

preprint2022arXiv

DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search

The convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead against efficient deployment. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure, such that the pruned network can be easily deployed in practice. However, existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space. In this paper, we introduce Differentiable Annealing Indicator Search (DAIS) that leverages the strength of neural architecture search in the channel pruning and automatically searches for the effective pruned model with given constraints on computation overhead. Specifically, DAIS relaxes the binarized channel indicators to be continuous and then jointly learns both indicators and model parameters via bi-level optimization. To bridge the non-negligible discrepancy between the continuous model and the target binarized model, DAIS proposes an annealing-based procedure to steer the indicator convergence towards binarized states. Moreover, DAIS designs various regularizations based on a priori structural knowledge to control the pruning sparsity and to improve model performance. Experimental results show that DAIS outperforms state-of-the-art pruning methods on CIFAR-10, CIFAR-100, and ImageNet.

preprint2022arXiv

Ekar: An Explainable Method for Knowledge Aware Recommendation

This paper studies recommender systems with knowledge graphs, which can effectively address the problems of data sparsity and cold start. Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations. Though these methods have been shown quite effective, they lack good explanations, which are critical to recommender systems. In this paper, we take a different route and propose generating recommendations by finding meaningful paths from users to items. Specifically, we formulate the problem as a sequential decision process, where the target user is defined as the initial state, and the edges on the graphs are defined as actions. We shape the rewards according to existing state-of-the-art methods and then train a policy function with policy gradient methods. Experimental results on three real-world datasets show that our proposed method not only provides effective recommendations but also offers good explanations.

preprint2022arXiv

Feasibility study of an accelerator neutrino experiment in China

Future accelerator neutrino experiments will provide a powerful tool to measure standard oscillation parameters and search for new physics. In this context, we discuss the prospects of building an accelerator neutrino experiment in China. The feasibility of such facilities is investigated by evaluating their prospects to the standard mixing parameters. As an example, we consider an SPPC-based neutrino beamline and CJPL-based neutrino detector with 1736 km baseline length. We find this setup able to significantly improve the precision on $δ_{CP}$, $θ_{23}$ and $Δm_{31}^2$.

preprint2022arXiv

Generative Coarse-Graining of Molecular Conformations

Coarse-graining (CG) of molecular simulations simplifies the particle representation by grouping selected atoms into pseudo-beads and drastically accelerates simulation. However, such CG procedure induces information losses, which makes accurate backmapping, i.e., restoring fine-grained (FG) coordinates from CG coordinates, a long-standing challenge. Inspired by the recent progress in generative models and equivariant networks, we propose a novel model that rigorously embeds the vital probabilistic nature and geometric consistency requirements of the backmapping transformation. Our model encodes the FG uncertainties into an invariant latent space and decodes them back to FG geometries via equivariant convolutions. To standardize the evaluation of this domain, we provide three comprehensive benchmarks based on molecular dynamics trajectories. Experiments show that our approach always recovers more realistic structures and outperforms existing data-driven methods with a significant margin.

preprint2022arXiv

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distribution, in this paper, we propose a novel generative model named GeoDiff for molecular conformation prediction. GeoDiff treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be roto-translational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-to-end fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GeoDiff is superior or comparable to existing state-of-the-art approaches, especially on large molecules.

preprint2022arXiv

HIRL: A General Framework for Hierarchical Image Representation Learning

Learning self-supervised image representations has been broadly studied to boost various visual understanding tasks. Existing methods typically learn a single level of image semantics like pairwise semantic similarity or image clustering patterns. However, these methods can hardly capture multiple levels of semantic information that naturally exists in an image dataset, e.g., the semantic hierarchy of "Persian cat to cat to mammal" encoded in an image database for species. It is thus unknown whether an arbitrary image self-supervised learning (SSL) approach can benefit from learning such hierarchical semantics. To answer this question, we propose a general framework for Hierarchical Image Representation Learning (HIRL). This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained. Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme. We adopt six representative image SSL methods as baselines and study how they perform under HIRL. By rigorous fair comparison, performance gain is observed on all the six methods for diverse downstream tasks, which, for the first time, verifies the general effectiveness of learning hierarchical image semantics. All source code and model weights are available at https://github.com/hirl-team/HIRL

preprint2022arXiv

Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KG) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modeling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely-connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modeling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.

preprint2022arXiv

Mass Testing and Characterization of 20-inch PMTs for JUNO

Main goal of the JUNO experiment is to determine the neutrino mass ordering using a 20kt liquid-scintillator detector. Its key feature is an excellent energy resolution of at least 3 % at 1 MeV, for which its instruments need to meet a certain quality and thus have to be fully characterized. More than 20,000 20-inch PMTs have been received and assessed by JUNO after a detailed testing program which began in 2017 and elapsed for about four years. Based on this mass characterization and a set of specific requirements, a good quality of all accepted PMTs could be ascertained. This paper presents the performed testing procedure with the designed testing systems as well as the statistical characteristics of all 20-inch PMTs intended to be used in the JUNO experiment, covering more than fifteen performance parameters including the photocathode uniformity. This constitutes the largest sample of 20-inch PMTs ever produced and studied in detail to date, i.e. 15,000 of the newly developed 20-inch MCP-PMTs from Northern Night Vision Technology Co. (NNVT) and 5,000 of dynode PMTs from Hamamatsu Photonics K. K.(HPK).

preprint2022arXiv

Muon Collider Physics Summary

The perspective of designing muon colliders with high energy and luminosity, which is being investigated by the International Muon Collider Collaboration, has triggered a growing interest in their physics reach. We present a concise summary of the muon colliders potential to explore new physics, leveraging on the unique possibility of combining high available energy with very precise measurements.

preprint2022arXiv

Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.

preprint2022arXiv

Neural Structured Prediction for Inductive Node Classification

This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.

preprint2022arXiv

Neural-Symbolic Models for Logical Queries on Knowledge Graphs

Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse a complete knowledge graph to extract the answers, which provides good interpretation for each step. Recent neural methods learn geometric embeddings for complex queries. These methods can generalize to incomplete knowledge graphs, but their reasoning process is hard to interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets, which provides interpretability for intermediate variables. To reason about the missing links, GNN-QE adapts a graph neural network from knowledge graph completion to execute the relation projections, and models the logical operations with product fuzzy logic. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries. Meanwhile, GNN-QE can predict the number of answers without explicit supervision, and provide visualizations for intermediate variables.

preprint2022arXiv

Non-minimal Lorentz invariance violation in light of muon anomalous magnetic moment and long-baseline neutrino oscillation data

In light of the increasing hints of new physics at the muon $g-2$ and neutrino oscillation experiments, we consider the recently observed tension in the long-baseline neutrino oscillation experiments as a potential indication of Lorentz invariance violation. For this purpose, the latest data from T2K and NO$ν$A is analysed in presence of non-minimal Lorentz invariance violation. Indeed, we find that isotropic violation in dimensions $D =$ 4, 5 and 6 can alleviate the tension in neutrino oscillation data by 0.4$-$2.4$σ$ CL significance, with the isotropic coefficient $γ^{(5)}_{ττ} =$ 3.58$\times$10$^{-32}$GeV$^{-1}$ yielding the best fit. At the same time, the anomalous muon $g-2$ result can be reproduced with an additional non-isotropic violation of $d^{zt} =$ -1.7$\times$10$^{-25}$. The analysis highlights the possibility of simultaneous relaxation of experimental tensions with Lorentz invariance violation of mixed nature.

preprint2022arXiv

Pre-training Molecular Graph Representation with 3D Geometry

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods.

preprint2022arXiv

Precision measurements and tau neutrino physics in a future accelerator neutrino experiment

We investigate prospects of building a future accelerator-based neutrino oscillation experiment in China, including site selection, beam optimization and tau neutrino physics aspects. CP violation, non-unitary mixing and non-standard neutrino interactions are discussed. We simulate neutrino beam setups based on muon and beta decay techniques and compare Chinese laboratory sites by their expected sensitivities. A case study on Super Proton-Proton Collider and China JinPing Laboratory is also presented. It is shown that the muon-decay-based beam setup can measure the Dirac CP phase by about 14.2$^\circ$ precision at 1$\,σ$ CL, whereas non-unitarity can be probed down to $|α_{i j}| \lesssim$ 0.37 ($i \neq j =$ 1, 2, 3) and non-standard interactions to $|ε^m_{\ell \ell'}| \lesssim$ 0.11 ($\ell \neq \ell' = e$, $μ$, $τ$) at 90% CL, respectively.

preprint2022arXiv

Promising Technologies and R&D Directions for the Future Muon Collider Detectors

Among the post-LHC generation of particle accelerators, the muon collider represents a unique machine with capability to provide very high energy leptonic collisions and to open the path to a vast and mostly unexplored physics programme. However, on the experimental side, such great physics potential is accompanied by unprecedented technological challenges, due to the fact that muons are unstable particles. Their decay products interact with the machine elements and produce an intense flux of background particles that eventually reach the detector and may degrade its performance. In this paper, we present technologies that have a potential to match the challenging specifications of a muon collider detector and outline a path forward for the future R&D efforts.

preprint2022arXiv

RGB-Depth Fusion GAN for Indoor Depth Completion

The raw depth image captured by the indoor depth sensor usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and limited distance range. The incomplete depth map burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing the large contiguous regions of missing depth values, which is common and critical. In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure to regress the local dense depth values from the raw depth map, with the help of local guidance information extracted from the RGB image. In the other branch, we propose an RGB-depth fusion GAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments with the help of the pseudo depth map.

preprint2022arXiv

Robotic Grasping from Classical to Modern: A Survey

Robotic Grasping has always been an active topic in robotics since grasping is one of the fundamental but most challenging skills of robots. It demands the coordination of robotic perception, planning, and control for robustness and intelligence. However, current solutions are still far behind humans, especially when confronting unstructured scenarios. In this paper, we survey the advances of robotic grasping, starting from the classical formulations and solutions to the modern ones. By reviewing the history of robotic grasping, we want to provide a complete view of this community, and perhaps inspire the combination and fusion of different ideas, which we think would be helpful to touch and explore the essence of robotic grasping problems. In detail, we firstly give an overview of the analytic methods for robotic grasping. After that, we provide a discussion on the recent state-of-the-art data-driven grasping approaches rising in recent years. With the development of computer vision, semantic grasping is being widely investigated and can be the basis of intelligent manipulation and skill learning for autonomous robotic systems in the future. Therefore, in our survey, we also briefly review the recent progress in this topic. Finally, we discuss the open problems and the future research directions that may be important for the human-level robustness, autonomy, and intelligence of robots.

preprint2022arXiv

Simulated Detector Performance at the Muon Collider

In this paper we report on the current status of studies on the expected performance for a detector designed to operate in a muon collider environment. Beam-induced backgrounds (BIB) represent the main challenge in the design of the detector and the event reconstruction algorithms. The current detector design aims to show that satisfactory performance can be achieved, while further optimizations are expected to significantly improve the overall performance. We present the characterization of the expected beam-induced background, describe the detector design and software used for detailed event simulations taking into account BIB effects. The expected performance of charged-particle reconstruction, jets, electrons, photons and muons is discussed, including an initial study on heavy-flavor jet tagging. A simple method to measure the delivered luminosity is also described. Overall, the proposed design and reconstruction algorithms can successfully reconstruct the high transverse-momentum objects needed to carry out a broad physics program.

preprint2022arXiv

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

Recent works on knowledge base question answering (KBQA) retrieve subgraphs for easier reasoning. A desired subgraph is crucial as a small one may exclude the answer but a large one might introduce more noises. However, the existing retrieval is either heuristic or interwoven with the reasoning, causing reasoning on the partial subgraphs, which increases the reasoning bias when the intermediate supervision is missing. This paper proposes a trainable subgraph retriever (SR) decoupled from the subsequent reasoning process, which enables a plug-and-play framework to enhance any subgraph-oriented KBQA model. Extensive experiments demonstrate SR achieves significantly better retrieval and QA performance than existing retrieval methods. Via weakly supervised pre-training as well as the end-to-end fine-tuning, SRl achieves new state-of-the-art performance when combined with NSM, a subgraph-oriented reasoner, for embedding-based KBQA methods.

preprint2022arXiv

Temperature-linear Resistivity in Twisted Double Bilayer Graphene

We report an experimental study of carrier density (n), displacement field (D) and twist angle (θ) dependence of temperature (T)-linear resistivity in twisted double bilayer graphene (TDBG). For a large twist angle (θ>1.5°) where correlated insulating states are absent, we observe a T-linear resistivity (with the slope of the order ~10Ω/K) over a wide range of carrier density and its slope decreases with increasing of n, in agreement with acoustic phonon scattering model semi-quantitatively. The slope of T-linear resistivity is non-monotonically dependent on the displacement field with a single peak structure. For device with θ~1.23° at which correlated states emerge, the slope of T-linear resistivity is found maximum (~100Ω/K) at the boundary of the halo structure where phase transition occurs, with signatures of continuous phase transition, Planckian dissipation, and the diverging effective mass; these observations are in line with quantum critical behaviors, which might be due to the symmetry-breaking instability at the critical points. Our results shed new light on correlated physics in TDBG and other twisted moiré systems.

preprint2022arXiv

The physics case of a 3 TeV muon collider stage

In the path towards a muon collider with center of mass energy of 10 TeV or more, a stage at 3 TeV emerges as an appealing option. Reviewing the physics potential of such muon collider is the main purpose of this document. In order to outline the progression of the physics performances across the stages, a few sensitivity projections for higher energy are also presented. There are many opportunities for probing new physics at a 3 TeV muon collider. Some of them are in common with the extensively documented physics case of the CLIC 3 TeV energy stage, and include measuring the Higgs trilinear coupling and testing the possible composite nature of the Higgs boson and of the top quark at the 20 TeV scale. Other opportunities are unique of a 3 TeV muon collider, and stem from the fact that muons are collided rather than electrons. This is exemplified by studying the potential to explore the microscopic origin of the current $g$-2 and $B$-physics anomalies, which are both related with muons.

preprint2022arXiv

TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery

Machine learning has huge potential to revolutionize the field of drug discovery and is attracting increasing attention in recent years. However, lacking domain knowledge (e.g., which tasks to work on), standard benchmarks and data preprocessing pipelines are the main obstacles for machine learning researchers to work in this domain. To facilitate the progress of machine learning for drug discovery, we develop TorchDrug, a powerful and flexible machine learning platform for drug discovery built on top of PyTorch. TorchDrug benchmarks a variety of important tasks in drug discovery, including molecular property prediction, pretrained molecular representations, de novo molecular design and optimization, retrosynthsis prediction, and biomedical knowledge graph reasoning. State-of-the-art techniques based on geometric deep learning (or graph machine learning), deep generative models, reinforcement learning and knowledge graph reasoning are implemented for these tasks. TorchDrug features a hierarchical interface that facilitates customization from both novices and experts in this domain. Tutorials, benchmark results and documentation are available at https://torchdrug.ai. Code is released under Apache License 2.0.

preprint2022arXiv

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Recently generating natural language explanations has shown very promising results in not only offering interpretable explanations but also providing additional information and supervision for prediction. However, existing approaches usually require a large set of human annotated explanations for training while collecting a large set of explanations is not only time consuming but also expensive. In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training. Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model. We develop a variational EM framework for optimization where an explanation generation module and an explanation-augmented prediction module are alternatively optimized and mutually enhance each other. Moreover, we further propose an explanation-based self-training method under this framework for semi-supervised learning. It alternates between assigning pseudo-labels to unlabeled data and generating new explanations to iteratively improve each other. Experiments on two natural language understanding tasks demonstrate that our framework can not only make effective predictions in both supervised and semi-supervised settings, but also generate good natural language explanation.

preprint2022arXiv

Tyger: Task-Type-Generic Active Learning for Molecular Property Prediction

How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery, which generally requires a large amount of annotation for training deep learning models. Annotating molecules, however, is quite costly because it requires lab experiments conducted by experts. To reduce annotation cost, deep Active Learning (AL) methods are developed to select only the most representative and informative data for annotating. However, existing best deep AL methods are mostly developed for a single type of learning task (e.g., single-label classification), and hence may not perform well in molecular property prediction that involves various task types. In this paper, we propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner. The key is to learn a chemically-meaningful embedding space and perform active selection fully based on the embeddings, instead of relying on task-type-specific heuristics (e.g., class-wise prediction probability) as done in existing works. Specifically, for learning the embedding space, we instantiate a querying module that learns to translate molecule graphs into corresponding SMILES strings. Furthermore, to ensure that samples selected from the space are both representative and informative, we propose to shape the embedding space by two learning objectives, one based on domain knowledge and the other leveraging feedback from the task learner (i.e., model that performs the learning task at hand). We conduct extensive experiments on benchmark datasets of different task types. Experimental results show that Tyger consistently achieves high AL performance on molecular property prediction, outperforming baselines by a large margin. We also perform ablative experiments to verify the effectiveness of each component in Tyger.

preprint2022arXiv

Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version

In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.

preprint2021arXiv

Exploring SMEFT Induced Non-Standard Interactions from COHERENT to Neutrino Oscillations

We investigate the prospects of next-generation neutrino oscillation experiments DUNE, T2HK and JUNO including TAO within Standard Model Effective Field Theory (SMEFT). We also re-interpret COHERENT data in this framework. Considering both charged and neutral current neutrino Non-Standard Interactions (NSIs), we analyse dimension-6 SMEFT operators and derive lower bounds to UV scale $Λ$. The most powerful probe is obtained on ${\cal O}_{{ledq}_{1211}}$ with $Λ\gtrsim$ 450 TeV due to the electron neutrino sample in T2HK near detector. We find DUNE and JUNO to be complementary to T2HK in exploring different subsets of SMEFT operators at about 25 TeV. We conclude that near detectors play a significant role in each experiment. We also find COHERENT with CsI and LAr targets to be sensitive to new physics up to $\sim$900 GeV.

preprint2021arXiv

Isospin competitions and valley polarized correlated insulators in twisted double bilayer graphene

New phase of matter usually emerges when a given symmetry breaks spontaneously, which can involve charge, spin, and valley degree of freedoms. Here, we report an observation of new correlated insulators evolved from spin polarized states to valley polarized states in AB-BA stacked twisted double bilayer graphene (TDBG). The transition of the isospin polarization is a result of the competition between spin and valley, driven by the displacement field (D). At a high field |D| > 0.7 V/nm, we observe valley polarized correlated insulators with a big Zeeman g factor of ~10, both at v = 2 in the moiré conduction band and more surprisingly at v = -2 in the moiré valence band. At a medium field |D| < 0.6 V/nm, by contrast, it is a conventional spin polarized correlated insulator at v = 2 and a featureless metal at v = -2. Moreover, we observe a valley polarized Chern insulator with C = 2 emanating at v = 2 in the electron side and a valley polarized Fermi surface around v = -2 in the hole side. The valley Chern insulator with C = 2 is evident from a well quantized Hall conductance plateau at 2e^2/h and correspondingly a vanishing longitudinal component. The valley polarized Fermi surface is topologically trivial with C = 0, and it shows a series of quantized Landau levels with v_LL = 0, 1, 2, 3, 4 and others. These observations are in good agreements with our band and topology calculations. Our results demonstrate a feasible way to realize isospin control and to obtain new phases of matter in TDBG by the displacement field, and might benefit other twisted or non-twisted multilayer systems.

preprint2021arXiv

JUNO Physics and Detector

The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton LS detector at 700-m underground. An excellent energy resolution and a large fiducial volume offer exciting opportunities for addressing many important topics in neutrino and astro-particle physics. With 6 years of data, the neutrino mass ordering can be determined at 3-4 sigma and three oscillation parameters can be measured to a precision of 0.6% or better by detecting reactor antineutrinos. With 10 years of data, DSNB could be observed at 3-sigma; a lower limit of the proton lifetime of 8.34e33 years (90% C.L.) can be set by searching for p->nu_bar K^+; detection of solar neutrinos would shed new light on the solar metallicity problem and examine the vacuum-matter transition region. A core-collapse supernova at 10 kpc would lead to ~5000 IBD and ~2000 (300) all-flavor neutrino-proton (electron) scattering events. Geo-neutrinos can be detected with a rate of ~400 events/year. We also summarize the final design of the JUNO detector and the key R&D achievements. All 20-inch PMTs have been tested. The average photon detection efficiency is 28.9% for the 15,000 MCP PMTs and 28.1% for the 5,000 dynode PMTs, higher than the JUNO requirement of 27%. Together with the >20 m attenuation length of LS, we expect a yield of 1345 p.e. per MeV and an effective energy resolution of 3.02%/\sqrt{E (MeV)}$ in simulations. The underwater electronics is designed to have a loss rate <0.5% in 6 years. With degassing membranes and a micro-bubble system, the radon concentration in the 35-kton water pool could be lowered to <10 mBq/m^3. Acrylic panels of radiopurity <0.5 ppt U/Th are produced. The 20-kton LS will be purified onsite. Singles in the fiducial volume can be controlled to ~10 Hz. The JUNO experiment also features a double calorimeter system with 25,600 3-inch PMTs, a LS testing facility OSIRIS, and a near detector TAO.

preprint2021arXiv

Non-autoregressive electron flow generation for reaction prediction

Reaction prediction is a fundamental problem in computational chemistry. Existing approaches typically generate a chemical reaction by sampling tokens or graph edits sequentially, conditioning on previously generated outputs. These autoregressive generating methods impose an arbitrary ordering of outputs and prevent parallel decoding during inference. We devise a novel decoder that avoids such sequential generating and predicts the reaction in a Non-Autoregressive manner. Inspired by physical-chemistry insights, we represent edge edits in a molecule graph as electron flows, which can then be predicted in parallel. To capture the uncertainty of reactions, we introduce latent variables to generate multi-modal outputs. Following previous works, we evaluate our model on USPTO MIT dataset. Our model achieves both an order of magnitude lower inference latency, with state-of-the-art top-1 accuracy and comparable performance on Top-K sampling.

preprint2021arXiv

Towards Generalized Implementation of Wasserstein Distance in GANs

Wasserstein GANs (WGANs), built upon the Kantorovich-Rubinstein (KR) duality of Wasserstein distance, is one of the most theoretically sound GAN models. However, in practice it does not always outperform other variants of GANs. This is mostly due to the imperfect implementation of the Lipschitz condition required by the KR duality. Extensive work has been done in the community with different implementations of the Lipschitz constraint, which, however, is still hard to satisfy the restriction perfectly in practice. In this paper, we argue that the strong Lipschitz constraint might be unnecessary for optimization. Instead, we take a step back and try to relax the Lipschitz constraint. Theoretically, we first demonstrate a more general dual form of the Wasserstein distance called the Sobolev duality, which relaxes the Lipschitz constraint but still maintains the favorable gradient property of the Wasserstein distance. Moreover, we show that the KR duality is actually a special case of the Sobolev duality. Based on the relaxed duality, we further propose a generalized WGAN training scheme named Sobolev Wasserstein GAN (SWGAN), and empirically demonstrate the improvement of SWGAN over existing methods with extensive experiments.

preprint2021arXiv

Utilising Graph Machine Learning within Drug Discovery and Development

Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning.

preprint2020arXiv

Adversarial Meta-Learning

Meta-learning enables a model to learn from very limited data to undertake a new task. In this paper, we study the general meta-learning with adversarial samples. We present a meta-learning algorithm, ADML (ADversarial Meta-Learner), which leverages clean and adversarial samples to optimize the initialization of a learning model in an adversarial manner. ADML leads to the following desirable properties: 1) it turns out to be very effective even in the cases with only clean samples; 2) it is robust to adversarial samples, i.e., unlike other meta-learning algorithms, it only leads to a minor performance degradation when there are adversarial samples; 3) it sheds light on tackling the cases with limited and even contaminated samples. It has been shown by extensive experimental results that ADML consistently outperforms three representative meta-learning algorithms in the cases involving adversarial samples, on two widely-used image datasets, MiniImageNet and CIFAR100, in terms of both accuracy and robustness.

preprint2020arXiv

An Advert Creation System for 3D Product Placements

Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.

preprint2020arXiv

An End-to-End Neighborhood-based Interaction Model for Knowledge-enhanced Recommendation

This paper studies graph-based recommendation, where an interaction graph is constructed from historical records and is lever-aged to alleviate data sparsity and cold start problems. We reveal an early summarization problem in existing graph-based models, and propose Neighborhood Interaction (NI) model to capture each neighbor pair (between user-side and item-side) distinctively. NI model is more expressive and can capture more complicated structural patterns behind user-item interactions. To further enrich node connectivity and utilize high-order structural information, we incorporate extra knowledge graphs (KGs) and adopt graph neural networks (GNNs) in NI, called Knowledge-enhanced NeighborhoodInteraction (KNI). Compared with the state-of-the-art recommendation methods,e.g., feature-based, meta path-based, and KG-based models, our KNI achieves superior performance in click-through rate prediction (1.1%-8.4% absolute AUC improvements) and out-performs by a wide margin in top-N recommendation on 4 real-world datasets.

preprint2020arXiv

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are essentially trade-offs between model accuracy and regularity which lead to impaired inference accuracy and limited on-device acceleration performance. To solve the problem, we introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. With carefully designed patterns, the proposed pruning unprecedentedly and consistently achieves accuracy enhancement and better feature extraction ability on different DNN structures and datasets, and our pattern-aware pruning framework also achieves pattern library extraction, pattern selection, pattern and connectivity pruning and weight training simultaneously. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms. To the best of our knowledge, it is the first time that mobile devices achieve real-time inference for the large-scale DNN models thanks to the unique spatial property of pattern-based sparsity and the help of the code generation capability of compilers.

preprint2020arXiv

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.

preprint2020arXiv

Continuous Graph Neural Networks

This paper builds on the connection between graph neural networks and traditional dynamical systems. We propose continuous graph neural networks (CGNN), which generalise existing graph neural networks with discrete dynamics in that they can be viewed as a specific discretisation scheme. The key idea is how to characterise the continuous dynamics of node representations, i.e. the derivatives of node representations, w.r.t. time. Inspired by existing diffusion-based methods on graphs (e.g. PageRank and epidemic models on social networks), we define the derivatives as a combination of the current node representations, the representations of neighbors, and the initial values of the nodes. We propose and analyse two possible dynamics on graphs---including each dimension of node representations (a.k.a. the feature channel) change independently or interact with each other---both with theoretical justification. The proposed continuous graph neural networks are robust to over-smoothing and hence allow us to build deeper networks, which in turn are able to capture the long-range dependencies between nodes. Experimental results on the task of node classification demonstrate the effectiveness of our proposed approach over competitive baselines.

preprint2020arXiv

Correlated states in twisted double bilayer graphene

Electron-electron interactions play an important role in graphene and related systems and can induce exotic quantum states, especially in a stacked bilayer with a small twist angle. For bilayer graphene where the two layers are twisted by a "magic angle", flat band and strong many-body effects lead to correlated insulating states and superconductivity. In contrast to monolayer graphene, the band structure of untwisted bilayer graphene can be further tuned by a displacement field, providing an extra degree of freedom to control the flat band that should appear when two bilayers are stacked on top of each other. Here, we report the discovery and characterization of such displacement-field tunable electronic phases in twisted double bilayer graphene. We observe insulating states at a half-filled conduction band in an intermediate range of displacement fields. Furthermore, the resistance gap in the correlated insulator increases with respect to the in-plane magnetic fields and we find that the g factor according to spin Zeeman effect is ~2, indicating spin polarization at half filling. These results establish the twisted double bilayer graphene as an easily tunable platform for exploring quantum many-body states.

preprint2020arXiv

COVI White Paper

The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile apps has the potential to shift the paradigm. Some countries have deployed centralized tracking systems, but more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or for-profit corporations. Machine learning methods can circumvent some of the limitations of standard digital tracing by incorporating many clues and their uncertainty into a more graded and precise estimation of infection risk. The estimated risk can provide early risk awareness, personalized recommendations and relevant information to the user. Finally, non-identifying risk data can inform epidemiological models trained jointly with the machine learning predictor. These models can provide statistical evidence for the importance of factors involved in disease transmission. They can also be used to monitor, evaluate and optimize health policy and (de)confinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of `COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.

preprint2020arXiv

Differentiable Feature Aggregation Search for Knowledge Distillation

Knowledge distillation has become increasingly important in model compression. It boosts the performance of a miniaturized student network with the supervision of the output distribution and feature maps from a sophisticated teacher network. Some recent works introduce multi-teacher distillation to provide more supervision to the student network. However, the effectiveness of multi-teacher distillation methods are accompanied by costly computation resources. To tackle with both the efficiency and the effectiveness of knowledge distillation, we introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework by extracting informative supervision from multiple teacher feature maps. Specifically, we introduce DFA, a two-stage Differentiable Feature Aggregation search method that motivated by DARTS in neural architecture search, to efficiently find the aggregations. In the first stage, DFA formulates the searching problem as a bi-level optimization and leverages a novel bridge loss, which consists of a student-to-teacher path and a teacher-to-student path, to find appropriate feature aggregations. The two paths act as two players against each other, trying to optimize the unified architecture parameters to the opposite directions while guaranteeing both expressivity and learnability of the feature aggregation simultaneously. In the second stage, DFA performs knowledge distillation with the derived feature aggregation. Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets under various teacher-student settings, verifying the effectiveness and robustness of the design.

preprint2020arXiv

Domain Conditioned Adaptation Network

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three cross-domain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.

preprint2020arXiv

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

The Jiangmen Underground Neutrino Observatory~(JUNO) features a 20~kt multi-purpose underground liquid scintillator sphere as its main detector. Some of JUNO's features make it an excellent experiment for $^8$B solar neutrino measurements, such as its low-energy threshold, its high energy resolution compared to water Cherenkov detectors, and its much large target mass compared to previous liquid scintillator detectors. In this paper we present a comprehensive assessment of JUNO's potential for detecting $^8$B solar neutrinos via the neutrino-electron elastic scattering process. A reduced 2~MeV threshold on the recoil electron energy is found to be achievable assuming the intrinsic radioactive background $^{238}$U and $^{232}$Th in the liquid scintillator can be controlled to 10$^{-17}$~g/g. With ten years of data taking, about 60,000 signal and 30,000 background events are expected. This large sample will enable an examination of the distortion of the recoil electron spectrum that is dominated by the neutrino flavor transformation in the dense solar matter, which will shed new light on the tension between the measured electron spectra and the predictions of the standard three-flavor neutrino oscillation framework. If $Δm^{2}_{21}=4.8\times10^{-5}~(7.5\times10^{-5})$~eV$^{2}$, JUNO can provide evidence of neutrino oscillation in the Earth at the about 3$σ$~(2$σ$) level by measuring the non-zero signal rate variation with respect to the solar zenith angle. Moveover, JUNO can simultaneously measure $Δm^2_{21}$ using $^8$B solar neutrinos to a precision of 20\% or better depending on the central value and to sub-percent precision using reactor antineutrinos. A comparison of these two measurements from the same detector will help elucidate the current tension between the value of $Δm^2_{21}$ reported by solar neutrino experiments and the KamLAND experiment.

preprint2020arXiv

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

This paper studies few-shot relation extraction, which aims at predicting the relation for a pair of entities in a sentence by training with a few labeled examples in each relation. To more effectively generalize to new relations, in this paper we study the relationships between different relations and propose to leverage a global relation graph. We propose a novel Bayesian meta-learning approach to effectively learn the posterior distribution of the prototype vectors of relations, where the initial prior of the prototype vectors is parameterized with a graph neural network on the global relation graph. Moreover, to effectively optimize the posterior distribution of the prototype vectors, we propose to use the stochastic gradient Langevin dynamics, which is related to the MAML algorithm but is able to handle the uncertainty of the prototype vectors. The whole framework can be effectively and efficiently optimized in an end-to-end fashion. Experiments on two benchmark datasets prove the effectiveness of our proposed approach against competitive baselines in both the few-shot and zero-shot settings.

preprint2020arXiv

Flavour Symmetry Embedded -- GLoBES (FaSE-GLoBES)

Neutrino models based on flavour symmetries provide the natural way to explain the origin of tiny neutrino masses. At the dawn of precision measurements of neutrino mixing parameters, neutrino mass models can be constrained and examined by on-going and up-coming neutrino experiments. We present a supplemental tool Flavour Symmetry Embedded (FaSE) for General Long Baseline Experiment Simulator (GLoBES), and it is available via the link https://github.com/tcwphy/FASE_GLoBES. It can translate the neutrino mass model parameters to standard neutrino oscillation parameters and offer prior functions in a user-friendly way. We demonstrate the robustness of FaSE-GLoBE with four examples on how the model parameters can be constrained and even whether the model is excluded by an experiment or not. We wish that this toolkit will facilitate the study of new neutrino mass models in an effecient and effective manner.

preprint2020arXiv

Global oscillation data analysis on the $3ν$ mixing without unitarity

We present results of a combined analysis in neutrino oscillations without unitarity assumption in the $3ν$ mixing picture. Constraints on neutrino mixing matrix elements are based on recent data from the reactor, solar and long-baseline accelerator neutrino oscillation experiments. The current data are consistent with the standard $3ν$ scheme. The precision on different matrix elements can be as good as a few percent at $3σ$ CL, and is mainly limited by the experimental statistical uncertainty. The $ν_e$ related elements are the most precisely measured among all sectors with the uncertainties $<20\%$. The measured leptonic CP violation is very close to the one assuming the standard $3ν$ mixing. The deviations on normalization and the unitarity triangle closure are confined within $\mathcal{O}(10^{-3})$, $\mathcal{O}(10^{-2})$ and $\mathcal{O}(10^{-1})$, for $ν_e$, $ν_μ$ and $ν_τ$ sectors, respectively. We look forward to the next-generation neutrino oscillation experiments \textit{such as} DUNE, T2HK, and JUNO, especially the precise measurements on $ν_τ$ oscillations, to significantly improve the precision of unitarity test on the $3ν$ mixing matrix.

preprint2020arXiv

GMNN: Graph Markov Neural Networks

This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.

preprint2020arXiv

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

preprint2020arXiv

High-order minibands and interband Landau level reconstruction in graphene moire superlattice

The propagation of Dirac fermions in graphene through a long-period periodic potential would result in a band folding together with the emergence of a series of cloned Dirac points (DPs). In highly aligned graphene/hexagonal boron nitride (G/hBN) heterostructures, the lattice mismatch between the two atomic crystals generates a unique kind of periodic structure known as a moiré superlattice. Of particular interests is the emergent phenomena related to the reconstructed band-structure of graphene, such as the Hofstadter butterfly, topological currents, gate dependent pseudospin mixing, and ballistic miniband conduction. However, most studies so far have been limited to the lower-order minibands, e.g. the 1st and 2nd minibands counted from charge neutrality, and consequently the fundamental nature of the reconstructed higher-order miniband spectra still remains largely unknown. Here we report on probing the higher-order minibands of precisely aligned graphene moiré superlattices by transport spectroscopy. Using dual electrostatic gating, the edges of these high-order minibands, i.e. the 3rd and 4th minibands, can be reached. Interestingly, we have observed interband Landau level (LL) crossinginducing gap closures in a multiband magneto-transport regime, which originates from band overlap between the 2nd and 3rd minibands. As observed high-order minibands and LL reconstruction qualitatively match our simulated results. Our findings highlight the synergistic effect of minibands in transport, thus presenting a new opportunity for graphene electronic devices.

preprint2020arXiv

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization

This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for graphs but they suffer from poor generalization due to hand-crafted designs. There are also some recent methods based on language models (e.g. graph2vec) but they tend to only consider certain substructures (e.g. subtrees) as graph representatives. Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. We maximize the mutual information between the graph-level representation and the representations of substructures of different scales (e.g., nodes, edges, triangles). By doing so, the graph-level representations encode aspects of the data that are shared across different scales of substructures. Furthermore, we further propose InfoGraph*, an extension of InfoGraph for semi-supervised scenarios. InfoGraph* maximizes the mutual information between unsupervised graph representations learned by InfoGraph and the representations learned by existing supervised methods. As a result, the supervised encoder learns from unlabeled data while preserving the latent semantic space favored by the current supervised task. Experimental results on the tasks of graph classification and molecular property prediction show that InfoGraph is superior to state-of-the-art baselines and InfoGraph* can achieve performance competitive with state-of-the-art semi-supervised models.

preprint2020arXiv

Investigating Class-level Difficulty Factors in Multi-label Classification Problems

This work investigates the use of class-level difficulty factors in multi-label classification problems for the first time. Four class-level difficulty factors are proposed: frequency, visual variation, semantic abstraction, and class co-occurrence. Once computed for a given multi-label classification dataset, these difficulty factors are shown to have several potential applications including the prediction of class-level performance across datasets and the improvement of predictive performance through difficulty weighted optimisation. Significant improvements to mAP and AUC performance are observed for two challenging multi-label datasets (WWW Crowd and Visual Genome) with the inclusion of difficulty weighted optimisation. The proposed technique does not require any additional computational complexity during training or inference and can be extended over time with inclusion of other class-level difficulty factors.

preprint2020arXiv

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.

preprint2020arXiv

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, -- fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2x, 11.4x, and 6.3x, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

preprint2020arXiv

Prospects and requirements of opaque detectors in accelerator neutrino experiments

Opaque detectors are a recently proposed novel detector concept where an opaque scintillator aligned with wavelength-shifting fibers is used to enable the discrimination of electron neutrinos and antineutrinos with a rather low energy threshold. In this work, we investigate the potential effects of the enhanced detection capabilities of the opaque detectors in accelerator neutrino experiments. Focusing on the energy threshold, energy resolution, detection efficiency and background suppression in the analysis of electron-like events, we determine whether using opaque detectors could lead to improvements in the CP violation and light sterile neutrino searches in the future accelerator neutrino experiments. We also identify the minimum requirements for the opaque detectors to reach the designated physics goals in the simulated experiments. We find that a 75.6% fraction of $δ_{CP}$ values could be reached for CP violation discovery by 3$σ$ confidence level or better when opaque detectors of 120 kton and 130 kton fiducial masses are used together with neutrino beams from J-PARC and MOMENT, respectively, whereas near detectors placed about 250 m from sources are sufficient to exclude the gallium anomaly at 2$σ$ confidence level.

preprint2020arXiv

Spin coating TPB film on acrylics and measurement of its wavelength shifting efficiency

Scintillation light from liquid noble gas in a neutrino or dark matter experiment lies typically within the vacuum ultraviolet (VUV) region and might be strongly absorbed by surrounding materials such as light guides or photomultiplier. Tetraphenyl butadiene (TPB) is a fluorescent material and acts as a wavelength shifter (WLS) which can turn the UV light to the visible light around a peak wavelength of 425 nm, enabling the light signals to be detected easily for physics study. Compared with a traditional TPB coating method using vapor deposition, we propose an alternative technique with a spin coating procedure in order to facilitate the development of neutrino and dark matter detectors. This article introduces how to fabricate the TPB film on acrylics using the spin coating method, reports measurement of sample film thickness and roughness, shows the reemission spectrum, and quantifies the wavelength shifting efficiency (WLSE).

preprint2020arXiv

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

The Taishan Antineutrino Observatory (TAO, also known as JUNO-TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO). A ton-level liquid scintillator detector will be placed at about 30 m from a core of the Taishan Nuclear Power Plant. The reactor antineutrino spectrum will be measured with sub-percent energy resolution, to provide a reference spectrum for future reactor neutrino experiments, and to provide a benchmark measurement to test nuclear databases. A spherical acrylic vessel containing 2.8 ton gadolinium-doped liquid scintillator will be viewed by 10 m^2 Silicon Photomultipliers (SiPMs) of >50% photon detection efficiency with almost full coverage. The photoelectron yield is about 4500 per MeV, an order higher than any existing large-scale liquid scintillator detectors. The detector operates at -50 degree C to lower the dark noise of SiPMs to an acceptable level. The detector will measure about 2000 reactor antineutrinos per day, and is designed to be well shielded from cosmogenic backgrounds and ambient radioactivities to have about 10% background-to-signal ratio. The experiment is expected to start operation in 2022.

preprint2019arXiv

Precision measurements on $δ_\text{CP}$ in MOMENT

As it is very promising to expect a discovery of CP violation in the leptonic sector, the precision measurement of the Dirac CP phase $δ_\text{CP}$ is going to be one of the key interests in the future neutrino oscillation experiments. In this work, we examine the physics reach of the proposed medium baseline muon decay experiment MOMENT. In order to identify potential bottlenecks and opportunities to improve CP precision in MOMENT, we investigate the effect of statistical error, systematic uncertainties, fraction of the muon beam polarity, and adjusting the baseline length to match the first or second oscillation maximum on the precision measurement of $δ_\text{CP}$. We also simulate superbeam experiments T2K, NO$ν$A, T2HK, DUNE and T2HKK in comparison and complementary to MOMENT. To reach the precision of $δ_\text{CP}$ at 12$^\circ$ or better at 1$σ$ confidence level, we find it sufficient to combine the data of MOMENT, DUNE and T2HK.

preprint2019arXiv

Study of a tri-direct littlest seesaw model at MOMENT

The flavour symmetry succeeds in explaining the current global fit results. Flavour-symmetry models can be tested by the future experiments that improve the precision of neutrino oscillation parameters, \textit{such as} the MuOn-decay MEdium baseline NeuTrino beam experiment (MOMENT). In this work, we consider tri-direct littlest seesaw (TDLS) models for a case study, and analyze how much MOMENT can extend our knowledge on the TDLS model. We find that measurements of $θ_{23}$ and $δ$ are crucial for MOMENT to exclude the model at more than $5σ$ confidence level, if the best fit values in the last global analysis result is confirmed. Moreover, the $3σ$ precision of model parameters can be improved at MOMENT by at least a factor of two. Finally, we project the surface at the $3σ$ confidence level from the model-parameter space to the oscillation-parameter space, and find the potential of MOMENT to observe the sum rule between $θ_{23}$ and $δ$ predicted by TDLS.

preprint2016arXiv

Context-aware Natural Language Generation with Recurrent Neural Networks

This paper studied generating natural languages at particular contexts or situations. We proposed two novel approaches which encode the contexts into a continuous semantic representation and then decode the semantic representation into text sequences with recurrent neural networks. During decoding, the context information are attended through a gating mechanism, addressing the problem of long-range dependency caused by lengthy sequences. We evaluate the effectiveness of the proposed approaches on user review data, in which rich contexts are available and two informative contexts, sentiments and products, are selected for evaluation. Experiments show that the fake reviews generated by our approaches are very natural. Results of fake review detection with human judges show that more than 50\% of the fake reviews are misclassified as the real reviews, and more than 90\% are misclassified by existing state-of-the-art fake review detection algorithm.

preprint2016arXiv

Energy-Efficient Power Allocation in Cognitive Radio Systems with Imperfect Spectrum Sensing

This paper studies energy-efficient power allocation schemes for secondary users in sensing-based spectrum sharing cognitive radio systems. It is assumed that secondary users first perform channel sensing possibly with errors and then initiate data transmission with different power levels based on sensing decisions. The circuit power is taken into account in total power consumption. In this setting, the optimization problem is to maximize energy efficiency (EE) subject to peak/average transmission power constraints and peak/average interference constraints. By exploiting quasiconcave property of EE maximization problem, the original problem is transformed into an equivalent parameterized concave problem and an iterative power allocation algorithm based on Dinkelbach's method is proposed. The optimal power levels are identified in the presence of different levels of channel side information (CSI) regarding the transmission and interference links at the secondary transmitter, namely perfect CSI of both transmission and interference links, perfect CSI of the transmission link and imperfect CSI of the interference link, imperfect CSI of both links or only statistical CSI of both links. Through numerical results, the impact of sensing performance, different types of CSI availability, and transmit and interference power constraints on the EE of the secondary users is analyzed.

preprint2016arXiv

Identity-sensitive Word Embedding through Heterogeneous Networks

Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the \textbf{identity-sensitive} word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identity-sensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation.

preprint2016arXiv

Less is More: Learning Prominent and Diverse Topics for Data Summarization

Statistical topic models efficiently facilitate the exploration of large-scale data sets. Many models have been developed and broadly used to summarize the semantic structure in news, science, social media, and digital humanities. However, a common and practical objective in data exploration tasks is not to enumerate all existing topics, but to quickly extract representative ones that broadly cover the content of the corpus, i.e., a few topics that serve as a good summary of the data. Most existing topic models fit exactly the same number of topics as a user specifies, which have imposed an unnecessary burden to the users who have limited prior knowledge. We instead propose new models that are able to learn fewer but more representative topics for the purpose of data summarization. We propose a reinforced random walk that allows prominent topics to absorb tokens from similar and smaller topics, thus enhances the diversity among the top topics extracted. With this reinforced random walk as a general process embedded in classical topic models, we obtain \textit{diverse topic models} that are able to extract the most prominent and diverse topics from data. The inference procedures of these diverse topic models remain as simple and efficient as the classical models. Experimental results demonstrate that the diverse topic models not only discover topics that better summarize the data, but also require minimal prior knowledge of the users.

preprint2016arXiv

PandaX-III: Searching for Neutrinoless Double Beta Decay with High Pressure $^{136}$Xe Gas Time Projection Chambers

Searching for the Neutrinoless Double Beta Decay (NLDBD) is now regarded as the topmost promising technique to explore the nature of neutrinos after the discovery of neutrino masses in oscillation experiments. PandaX-III (Particle And Astrophysical Xenon Experiment III) will search for the NLDBD of $^{136}$Xe at the China Jin Ping underground Laboratory (CJPL). In the first phase of the experiment, a high pressure gas Time Projection Chamber (TPC) will contain 200 kg, 90% $^{136}$Xe enriched gas operated at 10 bar. Fine pitch micro-pattern gas detector (Microbulk Micromegas) will be used at both ends of the TPC for the charge readout with a cathode in the middle. Charge signals can be used to reconstruct tracks of NLDBD events and provide good energy and spatial resolution. The detector will be immersed in a large water tank to ensure $\sim$5 m of water shielding in all directions. The second phase, a ton-scale experiment, will consist of five TPCs in the same water tank, with improved energy resolution and better control over backgrounds.

preprint2016arXiv

Visualizing Large-scale and High-dimensional Data

We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to large-scale and high-dimensional data (e.g., millions of data points and hundreds of dimensions). We propose the LargeVis, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to t-SNE, LargeVis significantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure thus easily scales to millions of high-dimensional data points. Experimental results on real-world data sets demonstrate that the LargeVis outperforms the state-of-the-art methods in both efficiency and effectiveness. The hyper-parameters of LargeVis are also much more stable over different data sets.

preprint2015arXiv

LINE: Large-scale Information Network Embedding

This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.

preprint2015arXiv

Neutrino Physics with JUNO

The Jiangmen Underground Neutrino Observatory (JUNO), a 20 kton multi-purpose underground liquid scintillator detector, was proposed with the determination of the neutrino mass hierarchy as a primary physics goal. It is also capable of observing neutrinos from terrestrial and extra-terrestrial sources, including supernova burst neutrinos, diffuse supernova neutrino background, geoneutrinos, atmospheric neutrinos, solar neutrinos, as well as exotic searches such as nucleon decays, dark matter, sterile neutrinos, etc. We present the physics motivations and the anticipated performance of the JUNO detector for various proposed measurements. By detecting reactor antineutrinos from two power plants at 53-km distance, JUNO will determine the neutrino mass hierarchy at a 3-4 sigma significance with six years of running. The measurement of antineutrino spectrum will also lead to the precise determination of three out of the six oscillation parameters to an accuracy of better than 1\%. Neutrino burst from a typical core-collapse supernova at 10 kpc would lead to ~5000 inverse-beta-decay events and ~2000 all-flavor neutrino-proton elastic scattering events in JUNO. Detection of DSNB would provide valuable information on the cosmic star-formation rate and the average core-collapsed neutrino energy spectrum. Geo-neutrinos can be detected in JUNO with a rate of ~400 events per year, significantly improving the statistics of existing geoneutrino samples. The JUNO detector is sensitive to several exotic searches, e.g. proton decay via the $p\to K^++\barν$ decay channel. The JUNO detector will provide a unique facility to address many outstanding crucial questions in particle and astrophysics. It holds the great potential for further advancing our quest to understanding the fundamental properties of neutrinos, one of the building blocks of our Universe.

preprint2015arXiv

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the \textit{predictive text embedding} (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.

preprint2014arXiv

"Look Ma, No Hands!" A Parameter-Free Topic Model

It has always been a burden to the users of statistical topic models to predetermine the right number of topics, which is a key parameter of most topic models. Conventionally, automatic selection of this parameter is done through either statistical model selection (e.g., cross-validation, AIC, or BIC) or Bayesian nonparametric models (e.g., hierarchical Dirichlet process). These methods either rely on repeated runs of the inference algorithm to search through a large range of parameter values which does not suit the mining of big data, or replace this parameter with alternative parameters that are less intuitive and still hard to be determined. In this paper, we explore to "eliminate" this parameter from a new perspective. We first present a nonparametric treatment of the PLSA model named nonparametric probabilistic latent semantic analysis (nPLSA). The inference procedure of nPLSA allows for the exploration and comparison of different numbers of topics within a single execution, yet remains as simple as that of PLSA. This is achieved by substituting the parameter of the number of topics with an alternative parameter that is the minimal goodness of fit of a document. We show that the new parameter can be further eliminated by two parameter-free treatments: either by monitoring the diversity among the discovered topics or by a weak supervision from users in the form of an exemplar topic. The parameter-free topic model finds the appropriate number of topics when the diversity among the discovered topics is maximized, or when the granularity of the discovered topics matches the exemplar topic. Experiments on both synthetic and real data prove that the parameter-free topic model extracts topics with a comparable quality comparing to classical topic models with "manual transmission". The quality of the topics outperforms those extracted through classical Bayesian nonparametric models.

preprint2013arXiv

Study on perturbation schemes for achieving the real PMNS matrix from various symmetric textures

The PMNS matrix displays an obvious symmetry, but not exact. There are several textures proposed in literature, which possess various symmetry patterns and seem to originate from different physics scenarios at high energy scales. To be consistent with the experimental measurement, all of the regularities slightly decline, i.e. the symmetry must be broken. Following the schemes given in literature, we modify the matrices (9 in total) to gain the real PMNS matrix by perturbative rotations. The transformations may provide hints about the underlying physics at high energies and the breaking mechanisms which apply during the evolution to the low energy scale, especially the results may be useful for the future model builders.

preprint2012arXiv

Requirements for a New Detector at the South Pole Receiving an Accelerator Neutrino Beam

There are recent considerations to increase the photomultiplier density in the IceCube detector array beyond that of DeepCore, which will lead to a lower detection threshold and a huge fiducial mass for the neutrino detection. This initiative is known as "Phased IceCube Next Generation Upgrade" (PINGU). We discuss the possibility to send a neutrino beam from one of the major accelerator laboratories in the Northern hemisphere to such a detector. Such an experiment would be unique in the sense that it would be the only neutrino beam where the baseline crosses the Earth's core. We study the detector requirements for a beta beam, a neutrino factory beam, and a superbeam, where we consider both the cases of small theta_13 and large theta_13, as suggested by the recent T2K and Double Chooz results. We illustrate that a flavor-clean beta beam best suits the requirements of such a detector, in particular, that PINGU may replace a magic baseline detector for small values of theta_13 -- even in the absence of any energy resolution capability. For large theta_13, however, a single-baseline beta beam experiment cannot compete if it is constrained by the CERN-SPS. For a neutrino factory, because of the missing charge identification possibility in the detector, a very good energy resolution is required. If this can be achieved, especially a low energy neutrino factory, which does not suffer from the tau contamination, may be an interesting option for large theta_13. For the superbeam, where we use the LBNE beam as a reference, electron neutrino flavor identification and statistics are two of the main limitations. Finally, we demonstrate that, at least in principle, neutrino factory and superbeam can measure the density of the Earth's core to the sub-percent level for sin^2 2theta_13 larger than 0.01.

preprint2011arXiv

Optimization of the Neutrino Factory, revisited

We perform the baseline and energy optimization of the Neutrino Factory including the latest simulation results on the magnetized iron detector (MIND). We also consider the impact of tau decays, generated by nu_mu to nu_tau or nu_e to nu_tau appearance, on the mass hierarchy, CP violation, and theta_{13} discovery reaches, which we find to be negligible for the considered detector. For the baseline-energy optimization for small theta_{13}, we qualitatively recover the results with earlier simulations of the MIND detector. We find optimal baselines of about 2500 km to 5000 km for the CP violation measurement, where now values of E_mu as low as about 12 GeV may be possible. However, for large theta_{13}, we demonstrate that the lower threshold and the backgrounds reconstructed at lower energies allow in fact for muon energies as low as 5 GeV at considerably shorter baselines, such as FNAL-Homestake. This implies that with the latest MIND analysis, low- and high-energy versions of the Neutrino Factory are just two different versions of the same experiment optimized for different parts of the parameter space. Apart from a green-field study of the updated detector performance, we discuss specific implementations for the two-baseline Neutrino Factory, where the considered detector sites are taken to be currently discussed underground laboratories. We find that reasonable setups can be found for the Neutrino Factory source in Asia, Europe, and North America, and that a triangular-shaped storage ring is possible in all cases based on geometrical arguments only.

preprint2011arXiv

Phenomenology of Neutrino Oscillations at the Neutrino Factory

We consider the prospects for a neutrino factory to measure mixing angles, the CP violating phase and mass-squared differences by detecting wrong-charged muons arising from the chain μ^+ to ν_e to ν_μ to μ^- and the right-charged muons coming from the chain μ^+ to \barν_μ to \barν_μ to μ^+ (similar to μ^- chains), where ν_e to ν_μ and \barν_μ to \barν_μ are neutrino oscillation channels through a long baseline. First, we perform the baseline and energy optimization of the neutrino factory including the latest simulation results from the magnetized iron neutrino detector (MIND). Second, we study physics with near detectors and consider the treatment of systematic errors including cross section errors, flux errors, and background uncertainties. Third, the effects of one additional massive sterile neutrino are investigated in the context of near and far detector combinations.

preprint2010arXiv

Neutrino factory in stages: Low energy, high energy, off-axis

We discuss neutrino oscillation physics with a neutrino factory in stages, including the possibility of upgrading the muon energy within the same program. We point out that a detector designed for the low energy neutrino factory may be used off-axis in a high energy neutrino factory beam. We include the re-optimization of the experiment depending on the value of theta_13 found. As upgrade options, we consider muon energy, additional baselines, a detector mass upgrade, an off-axis detector, and the platinum (muon to electron neutrino) channels. In addition, we test the impact of Daya Bay data on the optimization. We find that for large theta_13 (theta_13 discovered by the next generation of experiments), a low energy neutrino factory might be the most plausible minimal version to test the unknown parameters. However, if a higher muon energy is needed for new physics searches, a high energy version including an off-axis detector may be an interesting alternative. For small theta_13 (theta_13 not discovered by the next generation), a plausible program could start with a low energy neutrino factory, followed by energy upgrade, and then baseline or detector mass upgrade, depending on the outcome of the earlier phases.

preprint2010arXiv

Sterile neutrinos beyond LSND at the Neutrino Factory

We discuss the effects of one additional sterile neutrino at the Neutrino Factory. Compared to earlier analyses, which have been motivated by LSND results, we do not impose any constraint on the additional mass squared splitting. This means that the additional mass eigenstate could, with small mixings, be located among the known ones, as it is suggested by the recent analysis of cosmological data. We use a self-consistent framework at the Neutrino Factory without any constraints on the new parameters. We demonstrate for a combined short and long baseline setup that near detectors can provide the expected sensitivity at the LSND-motivated Δm_{41}^2-range, while some sensitivity can also be obtained in the region of the atmospheric mass splitting from the long baselines. We point out that limits on such very light sterile neutrinos may also be obtained from a re-analysis of atmospheric and solar neutrino oscillation data, as well as from supernova neutrino observations. In the second part of the analysis, we compare our sensitivity with the existing literature using additional assumptions, such as |Δm_{41}^2| \gg |Δm_{31}^2| leading to averaging of the fast oscillations in the far detectors. We demonstrate that while the Neutrino Factory has excellent sensitivity compared to existing studies using similar assumptions, one has to be very careful interpreting these results for a combined short and long baseline setup where oscillations could occur in the near detectors. We also test the impact of additional ν_τdetectors at the short and long baselines, and we do not find a substantial improvement of the sensitivities.

preprint2009arXiv

On near detectors at a neutrino factory

The geometric effects of the beam in near detectors at a neutrino factory are discussed. The refined systematics treatment, including cross section errors, flux errors and background uncertainties, is compared with the IDS-NF one. Different near detector setups are included. We also probe their effects both at the measurements of standard neutrino oscillation parameters and constraints of the non-standard neutrino interaction.

preprint1995arXiv

Locality of the Strange Sea in the Nucleon

We introduce the concept of ``locality" for the strange sea in the nucleon, which measures proximity of the strange and anti-strange quarks in the momentum and coordinate spaces. The CCFR data for the strange and anti-strange distributions imply a ``local" strange sea in the momentum space, which is unexpected in QCD and is at variance with the simple meson-cloud model where the strangeness is generated from the virtual transition of the nucleon to a hyperon plus a kaon. We present a simple model to interpret the CCFR data and to correlate momentum and coordinate space locality, yielding an upper bound of 0.005 fm$^2$ on the strange radius. We also discuss significances of locality for other charge-conjugation-odd observables.

Jian Tang

What is connected

Connect this record

See the researcher in context

Building this map preview

88 published item(s)

FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

Visualization of Tunable Electronic Structure of Monolayer TaIrTe$_4$

Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test

Real-world Reinforcement Learning from Suboptimal Interventions

Iterative Graph Self-Distillation

Optimization of muonium yield in perforated silica aerogel

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

A Roadmap for Big Model

Balanced Datasets for IoT IDS

Constraints on cosmic-ray boosted DM in CDEX-10

Continual Few-Shot Learning with Adversarial Class Storage

DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search

Ekar: An Explainable Method for Knowledge Aware Recommendation

Feasibility study of an accelerator neutrino experiment in China

Generative Coarse-Graining of Molecular Conformations

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

HIRL: A General Framework for Hierarchical Image Representation Learning

Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

Mass Testing and Characterization of 20-inch PMTs for JUNO

Muon Collider Physics Summary

Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

Neural Structured Prediction for Inductive Node Classification

Neural-Symbolic Models for Logical Queries on Knowledge Graphs

Non-minimal Lorentz invariance violation in light of muon anomalous magnetic moment and long-baseline neutrino oscillation data

Pre-training Molecular Graph Representation with 3D Geometry

Precision measurements and tau neutrino physics in a future accelerator neutrino experiment

Promising Technologies and R&D Directions for the Future Muon Collider Detectors

RGB-Depth Fusion GAN for Indoor Depth Completion

Robotic Grasping from Classical to Modern: A Survey

Simulated Detector Performance at the Muon Collider

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

Temperature-linear Resistivity in Twisted Double Bilayer Graphene

The physics case of a 3 TeV muon collider stage

TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Tyger: Task-Type-Generic Active Learning for Molecular Property Prediction

Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version

Exploring SMEFT Induced Non-Standard Interactions from COHERENT to Neutrino Oscillations

Isospin competitions and valley polarized correlated insulators in twisted double bilayer graphene

JUNO Physics and Detector

Non-autoregressive electron flow generation for reaction prediction

Towards Generalized Implementation of Wasserstein Distance in GANs

Utilising Graph Machine Learning within Drug Discovery and Development

Adversarial Meta-Learning

An Advert Creation System for 3D Product Placements

An End-to-End Neighborhood-based Interaction Model for Knowledge-enhanced Recommendation

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Continuous Graph Neural Networks

Correlated states in twisted double bilayer graphene

COVI White Paper

Differentiable Feature Aggregation Search for Knowledge Distillation

Domain Conditioned Adaptation Network

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Flavour Symmetry Embedded -- GLoBES (FaSE-GLoBES)

Global oscillation data analysis on the $3ν$ mixing without unitarity

GMNN: Graph Markov Neural Networks

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

High-order minibands and interband Landau level reconstruction in graphene moire superlattice

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization

Investigating Class-level Difficulty Factors in Multi-label Classification Problems

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Prospects and requirements of opaque detectors in accelerator neutrino experiments

Spin coating TPB film on acrylics and measurement of its wavelength shifting efficiency

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

Precision measurements on $δ_\text{CP}$ in MOMENT

Study of a tri-direct littlest seesaw model at MOMENT

Context-aware Natural Language Generation with Recurrent Neural Networks

Energy-Efficient Power Allocation in Cognitive Radio Systems with Imperfect Spectrum Sensing

Identity-sensitive Word Embedding through Heterogeneous Networks

Less is More: Learning Prominent and Diverse Topics for Data Summarization