Source author record

Tong Zhao

Tong Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

32works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets

If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient and natural than relying solely on end-effectors (i.e., fingertips). However, state-of-the-art model-based planners for CRM are still focused on feasibility rather than optimality, limiting their ability to fully exploit CRM's advantages. We introduce a new paradigm that computes approximately optimal manipulator plans. This approach has two phases. Offline, we construct a graph of mutual reachable sets, where each set contains all object orientations reachable from a starting object orientation and grasp. Online, we plan over this graph, effectively computing and sequencing local plans for globally optimized motion. On a challenging, representative contact-rich task, our approach outperforms a leading planner, reducing task cost by 61%. It also achieves a 91% success rate across 250 queries and maintains sub-minute query times, ultimately demonstrating that globally optimized contact-rich manipulation is now practical for real-world tasks.

preprint2026arXiv

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation

Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.

preprint2026arXiv

From Failure to Mastery: Generating Hard Samples for Tool-use Agents

The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random sampling and shallow generation, often yield simple and homogeneous trajectories that fail to capture complex, implicit logical dependencies. To bridge this gap, we introduce HardGen, an automatic agentic pipeline designed to generate hard tool-use training samples with verifiable reasoning. Firstly, HardGen establishes a dynamic API Graph built upon agent failure cases, from which it samples to synthesize hard traces. Secondly, these traces serve as conditional priors to guide the instantiation of modular, abstract advanced tools, which are subsequently leveraged to formulate hard queries. Finally, the advanced tools and hard queries enable the generation of verifiable complex Chain-of-Thought (CoT), with a closed-loop evaluation feedback steering the continuous refinement of the process. Extensive evaluations demonstrate that a 4B parameter model trained with our curated dataset achieves superior performance compared to several leading open-source and closed-source competitors (e.g., GPT-5.2, Gemini-3-Pro and Claude-Opus-4.5). Our code, models, and dataset will be open-sourced to facilitate future research.

preprint2026arXiv

Limits to Extracting Neutron-Star Physics Constraints from NICER Pulse Profiles

Modeling energy-dependent X-ray pulse profiles from rotation-powered millisecond pulsars observed with NICER has emerged as a promising avenue for measuring neutron star radii and probing the equation of state of cold, ultra-dense matter. However, pulse profile models have often required an unwieldy number of parameters to account for complex surface emission geometries, introducing the risk of overfitting and degeneracies. To explore the number of model parameters that can be inferred uniquely, we perform a quantitative assessment of the information content in X-ray pulse profiles by applying Fourier methods. We determine the number of independent observables that can be reliably extracted from the pulse shapes, as well as from complementary X-ray spectral data obtained with XMM-Newton, for key NICER targets. Our analysis provides a framework for evaluating the match between model complexity and data constraints. It also demonstrates the importance of incorporating in the model the pulsed components of the magnetospheric non-thermal emission, which may often contribute significantly to the observed spectra. Our results highlight limitations in previous inferences of neutron-star radii from NICER observations, which may have incorporated model complexity not supported by the data.

preprint2026arXiv

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.

preprint2026arXiv

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

preprint2023arXiv

Graph Data Augmentation for Graph Machine Learning: A Survey

Data augmentation has recently seen increased interest in graph machine learning given its demonstrated ability to improve model performance and generalization by added training data. Despite this recent surge, the area is still relatively under-explored, due to the challenges brought by complex, non-Euclidean structure of graph data, which limits the direct analogizing of traditional augmentation operations on other types of image, video or text data. Our work aims to give a necessary and timely overview of existing graph data augmentation methods; notably, we present a comprehensive and systematic survey of graph data augmentation approaches, summarizing the literature in a structured manner. We first introduce three different taxonomies for categorizing graph data augmentation methods from the data, task, and learning perspectives, respectively. Next, we introduce recent advances in graph data augmentation, differentiated by their methodologies and applications. We conclude by outlining currently unsolved challenges and directions for future research. Overall, our work aims to clarify the landscape of existing literature in graph data augmentation and motivates additional work in this area, providing a helpful resource for researchers and practitioners in the broader graph machine learning domain. Additionally, we provide a continuously updated reading list at https://github.com/zhao-tong/graph-data-augmentation-papers.

preprint2022arXiv

Coarse-to-Fine Sparse Sequential Recommendation

Sequential recommendation aims to model dynamic user behavior from historical interactions. Self-attentive methods have proven effective at capturing short-term dynamics and long-term preferences. Despite their success, these approaches still struggle to model sparse data, on which they struggle to learn high-quality item representations. We propose to model user dynamics from shopping intents and interacted items simultaneously. The learned intents are coarse-grained and work as prior knowledge for item recommendation. To this end, we present a coarse-to-fine self-attention framework, namely CaFe, which explicitly learns coarse-grained and fine-grained sequential dynamics. Specifically, CaFe first learns intents from coarse-grained sequences which are dense and hence provide high-quality user intent representations. Then, CaFe fuses intent representations into item encoder outputs to obtain improved item representations. Finally, we infer recommended items based on representations of items and corresponding intents. Experiments on sparse datasets show that CaFe outperforms state-of-the-art self-attentive recommenders by 44.03% NDCG@5 on average.

preprint2022arXiv

Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts

Generative commonsense reasoning (GCR) in natural language is to reason about the commonsense while generating coherent text. Recent years have seen a surge of interest in improving the generation quality of commonsense reasoning tasks. Nevertheless, these approaches have seldom investigated diversity in the GCR tasks, which aims to generate alternative explanations for a real-world situation or predict all possible outcomes. Diversifying GCR is challenging as it expects to generate multiple outputs that are not only semantically different but also grounded in commonsense knowledge. In this paper, we propose MoKGE, a novel method that diversifies the generative reasoning by a mixture of expert (MoE) strategy on commonsense knowledge graphs (KG). A set of knowledge experts seek diverse reasoning on KG to encourage various generation outputs. Empirical experiments demonstrated that MoKGE can significantly improve the diversity while achieving on par performance on accuracy on two GCR benchmarks, based on both automatic and human evaluations.

preprint2022arXiv

Formal Certification Methods for Automated Vehicle Safety Assessment

Challenges related to automated driving are no longer focused on just the construction of such automated vehicles (AVs), but in assuring the safety of their operation. Recent advances in Level 3 and Level 4 autonomous driving have motivated more extensive study in safety guarantees of complicated AV maneuvers, which aligns with the goal of ISO 21448 (Safety of the Intended Functions, or SOTIF), i.e. minimizing unsafe scenarios both known and unknown, as well as Vision Zero -- eliminating highway fatalities by 2050. A majority of approaches used in providing safety guarantees for AV motion control originate from formal methods, especially reachability analysis (RA), which relies on mathematical models for the dynamic evolution of the system to provide guarantees. However, to the best of the authors' knowledge, there have been no review papers dedicated to describing and interpreting state-of-the-art of formal methods in the context of AVs. In this work, we provide both an overview of the safety verification, validation and certification process, as well as review formal safety techniques that are best suited to AV applications. We also propose a unified scenario coverage framework that can provide either a formal or sample-based estimate of safety verification for full AVs. Finally, remaining challenges and future opportunities beyond the scope of current published research for assured AV safety are presented.

preprint2022arXiv

Learning from Counterfactual Links for Link Prediction

Learning to predict missing links is important for many graph-based applications. Existing methods were designed to learn the association between observed graph structure and existence of link between a pair of nodes. However, the causal relationship between the two variables was largely ignored for learning to predict links on a graph. In this work, we visit this factor by asking a counterfactual question: "would the link still exist if the graph structure became different from observation?" Its answer, counterfactual links, will be able to augment the graph data for representation learning. To create these links, we employ causal models that consider the information (i.e., learned representations) of node pairs as context, global graph structural properties as treatment, and link existence as outcome. We propose a novel data augmentation-based link prediction method that creates counterfactual links and learns representations from both the observed and counterfactual links. Experiments on benchmark data show that our graph learning method achieves state-of-the-art performance on the task of link prediction.

preprint2022arXiv

Learning Personalized Item-to-Item Recommendation Metric via Implicit Feedback

This paper studies the item-to-item recommendation problem in recommender systems from a new perspective of metric learning via implicit feedback. We develop and investigate a personalizable deep metric model that captures both the internal contents of items and how they were interacted with by users. There are two key challenges in learning such model. First, there is no explicit similarity annotation, which deviates from the assumption of most metric learning methods. Second, these approaches ignore the fact that items are often represented by multiple sources of meta data and different users use different combinations of these sources to form their own notion of similarity. To address these challenges, we develop a new metric representation embedded as kernel parameters of a probabilistic model. This helps express the correlation between items that a user has interacted with, which can be used to predict user interaction with new items. Our approach hinges on the intuition that similar items induce similar interactions from the same user, thus fitting a metric-parameterized model to predict an implicit feedback signal could indirectly guide it towards finding the most suitable metric for each user. To this end, we also analyze how and when the proposed method is effective from a theoretical lens. Its empirical effectiveness is also demonstrated on several real-world datasets.

preprint2022arXiv

Neural-PDE: A RNN based neural network for solving time dependent PDEs

Partial differential equations (PDEs) play a crucial role in studying a vast number of problems in science and engineering. Numerically solving nonlinear and/or high-dimensional PDEs is often a challenging task. Inspired by the traditional finite difference and finite elements methods and emerging advancements in machine learning, we propose a sequence deep learning framework called Neural-PDE, which allows to automatically learn governing rules of any time-dependent PDE system from existing data by using a bidirectional LSTM encoder, and predict the next n time steps data. One critical feature of our proposed framework is that the Neural-PDE is able to simultaneously learn and simulate the multiscale variables.We test the Neural-PDE by a range of examples from one-dimensional PDEs to a high-dimensional and nonlinear complex fluids model. The results show that the Neural-PDE is capable of learning the initial conditions, boundary conditions and differential operators without the knowledge of the specific form of a PDE system.In our experiments the Neural-PDE can efficiently extract the dynamics within 20 epochs training, and produces accurate predictions. Furthermore, unlike the traditional machine learning approaches in learning PDE such as CNN and MLP which require vast parameters for model precision, Neural-PDE shares parameters across all time steps, thus considerably reduces the computational complexity and leads to a fast learning algorithm.

preprint2022arXiv

RETE: Retrieval-Enhanced Temporal Event Forecasting on Unified Query Product Evolutionary Graph

With the increasing demands on e-commerce platforms, numerous user action history is emerging. Those enriched action records are vital to understand users' interests and intents. Recently, prior works for user behavior prediction mainly focus on the interactions with product-side information. However, the interactions with search queries, which usually act as a bridge between users and products, are still under investigated. In this paper, we explore a new problem named temporal event forecasting, a generalized user behavior prediction task in a unified query product evolutionary graph, to embrace both query and product recommendation in a temporal manner. To fulfill this setting, there involves two challenges: (1) the action data for most users is scarce; (2) user preferences are dynamically evolving and shifting over time. To tackle those issues, we propose a novel Retrieval-Enhanced Temporal Event (RETE) forecasting framework. Unlike existing methods that enhance user representations via roughly absorbing information from connected entities in the whole graph, RETE efficiently and dynamically retrieves relevant entities centrally on each user as high-quality subgraphs, preventing the noise propagation from the densely evolutionary graph structures that incorporate abundant search queries. And meanwhile, RETE autoregressively accumulates retrieval-enhanced user representations from each time step, to capture evolutionary patterns for joint query and product prediction. Empirically, extensive experiments on both the public benchmark and four real-world industrial datasets demonstrate the effectiveness of the proposed RETE method.

preprint2022arXiv

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourishing, keyphrase generation on non-English languages haven't been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages. The retrieval-augmented model leverages keyphrase annotations in English datasets to facilitate generating keyphrases in low-resource languages. Given a non-English passage, a cross-lingual dense passage retrieval module finds relevant English passages. Then the associated English keyphrases serve as external knowledge for keyphrase generation in the current language. Moreover, we develop a retriever-generator iterative training algorithm to mine pseudo parallel passage pairs to strengthen the cross-lingual passage retriever. Comprehensive experiments and ablations show that the proposed approach outperforms all baselines.

preprint2020arXiv

A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Textual patterns (e.g., Country's president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.

preprint2020arXiv

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.

preprint2020arXiv

Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network

Noun phrases and relational phrases in Open Knowledge Bases are often not canonical, leading to redundant and ambiguous facts. In this work, we integrate structural information (from which tuple, which sentence) and semantic information (semantic similarity) to do the canonicalization. We represent the two types of information as a multi-layered graph: the structural information forms the links across the sentence, relational phrase, and noun phrase layers; the semantic information forms weighted intra-layer links for each layer. We propose a graph neural network model to aggregate the representations of noun phrases and relational phrases through the multi-layered meta-graph structure. Experiments show that our model outperforms existing approaches on a public datasets in general domain.

preprint2020arXiv

Denoising individual bias for a fairer binary submatrix detection

Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.

preprint2020arXiv

Fast And Efficient Boolean Matrix Factorization By Geometric Segmentation

Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lies on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different sparsity and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate sparse patterns than popular methods such as ASSO, PANDA and MP. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.

preprint2020arXiv

Federated Dynamic GNN with Secure Aggregation

Given video data from multiple personal devices or street cameras, can we exploit the structural and dynamic information to learn dynamic representation of objects for applications such as distributed surveillance, without storing data at a central server that leads to a violation of user privacy? In this work, we introduce Federated Dynamic Graph Neural Network (Feddy), a distributed and secured framework to learn the object representations from multi-user graph sequences: i) It aggregates structural information from nearby objects in the current graph as well as dynamic information from those in the previous graph. It uses a self-supervised loss of predicting the trajectories of objects. ii) It is trained in a federated learning manner. The centrally located server sends the model to user devices. Local models on the respective user devices learn and periodically send their learning to the central server without ever exposing the user's data to server. iii) Studies showed that the aggregated parameters could be inspected though decrypted when broadcast to clients for model synchronizing, after the server performed a weighted average. We design an appropriate aggregation mechanism of secure aggregation primitives that can protect the security and privacy in federated learning with scalability. Experiments on four video camera datasets (in four different scenes) as well as simulation demonstrate that Feddy achieves great effectiveness and security.

preprint2020arXiv

Learning Attribute-Structure Co-Evolutions in Dynamic Graphs

Most graph neural network models learn embeddings of nodes in static attributed graphs for predictive analysis. Recent attempts have been made to learn temporal proximity of the nodes. We find that real dynamic attributed graphs exhibit complex co-evolution of node attributes and graph structure. Learning node embeddings for forecasting change of node attributes and birth and death of links over time remains an open problem. In this work, we present a novel framework called CoEvoGNN for modeling dynamic attributed graph sequence. It preserves the impact of earlier graphs on the current graph by embedding generation through the sequence. It has a temporal self-attention mechanism to model long-range dependencies in the evolution. Moreover, CoEvoGNN optimizes model parameters jointly on two dynamic tasks, attribute inference and link prediction over time. So the model can capture the co-evolutionary patterns of attribute change and link formation. This framework can adapt to any graph neural algorithms so we implemented and investigated three methods based on it: CoEvoGCN, CoEvoGAT, and CoEvoSAGE. Experiments demonstrate the framework (and its methods) outperform strong baselines on predicting an entire unseen graph snapshot of personal attributes and interpersonal links in dynamic social graphs and financial graphs.

preprint2020arXiv

MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with multiple types of nodes and edges. On the other hand, there are external input signals, such as the number of votes or pageviews, which can directly tell us about the importance of entities in a KG. While several methods have been developed to tackle this problem, their use of these external signals has been limited as they are not designed to consider multiple signals simultaneously. In this paper, we develop an end-to-end model MultiImport, which infers latent node importance from multiple, potentially overlapping, input signals. MultiImport is a latent variable model that captures the relation between node importance and input signals, and effectively learns from multiple signals with potential conflicts. Also, MultiImport provides an effective estimator based on attentive graph neural networks. We ran experiments on real-world KGs to show that MultiImport handles several challenges involved with inferring node importance from multiple input signals, and consistently outperforms existing methods, achieving up to 23.7% higher NDCG@100 than the state-of-the-art method.

preprint2020arXiv

Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.

preprint2020arXiv

On the stability of two-flavor and three-flavor quark stars

Following our recenty proposed self-consistent mean-field approximation approach, we have done some researches on the chiral phase transition of strong interaction matter within the framework of Nambu-Jona-Lasinio (NJL) model. The chiral susceptibility and equation of state (EOS) are computed in this work for both two-flavor and three-flavor quark matter for contrast. The Pauli-Villars scheme, which can preserve gauge invariance, is used in this paper. Moreover, whether the three-flavor quark matter is more stable than the two-flavor quark matter or not in quark stars is discussed in this work. In our model, when the bag constant are the same, the two-flavor quark matter has a higher pressure than the three-flavor quark matter, which is different from what Witten proposed in his pioneering work.

preprint2019arXiv

Evaluating entropy rate of laser chaos and shot noise

Evaluating entropy rate of high-dimensional chaos and shot noise from analog raw signals remains elusive and important in information security. We experimentally present an accurate assessment of entropy rate for physical process randomness. The entropy generation of optical-feedback laser chaos and physical randomness limit from shot noise are quantified and unambiguously discriminated using the growth rate of average permutation entropy value in memory time. The permutation entropy difference of filtered laser chaos with varying embedding delay time is investigated experimentally and theoretically. High resolution maps of the entropy difference is observed over the range of the injection-feedback parameter space. We also clarify an inverse relationship between the entropy rate and time delay signature of laser chaos over a wide range of parameters. Compared to the original chaos, the time delay signature is suppressed up to 95% with the minimum of 0.015 via frequency-band extractor, and the experiment agrees well with the theory. Our system provides a commendable entropy evaluation and source for physical random number generation.

preprint2016arXiv

GT-SEER: Geo-Temporal SEquential Embedding Rank for Point-of-interest Recommendation

Point-of-interest (POI) recommendation is an important application in location-based social networks (LBSNs), which learns the user preference and mobility pattern from check-in sequences to recommend POIs. However, previous POI recommendation systems model check-in sequences based on either tensor factorization or Markov chain model, which cannot capture contextual check-in information in sequences. The contextual check-in information implies the complementary functions among POIs that compose an individual's daily check-in sequence. In this paper, we exploit the embedding learning technique to capture the contextual check-in information and further propose the \textit{\textbf{SE}}quential \textit{\textbf{E}}mbedding \textit{\textbf{R}}ank (\textit{SEER}) model for POI recommendation. In particular, the \textit{SEER} model learns user preferences via a pairwise ranking model under the sequential constraint modeled by the POI embedding learning method. Furthermore, we incorporate two important factors, i.e., temporal influence and geographical influence, into the \textit{SEER} model to enhance the POI recommendation system. Due to the temporal variance of sequences on different days, we propose a temporal POI embedding model and incorporate the temporal POI representations into a temporal preference ranking model to establish the \textit{T}emporal \textit{SEER} (\textit{T-SEER}) model. In addition, We incorporate the geographical influence into the \textit{T-SEER} model and develop the \textit{\textbf{Geo-Temporal}} \textit{\textbf{SEER}} (\textit{GT-SEER}) model.

preprint2016arXiv

Performance Analysis of Joint Time Delay and Doppler-Stretch Estimation with Random Stepped-Frequency Signals

This paper investigates the performance of joint time delay and Doppler-stretch estimation with the random stepp ed-frequency (RSF) signal. Applying the ambiguity function (AF) to implement the estimation, we derive the compact expressions of the theoretical mean square errors (MSEs) under high signal-to-noise ratios (SNRs). The obtained MSEs are shown consistent with the corresponding Cramer-Rao lower bounds (CRLBs), implying that the AF-based estimation is approximately efficient. Waveform parameters including higher carrier frequencies, wider bandwidth covered by the carrier frequencies, and frequency shifting codewords with larger variance are expected for a better estimation performance. As a synthetic wideband signal, the RSF signal achieves the same estimation performance as the OFDM signal within an identical bandwidth. Due to its instantaneous narrowband character, requirement for the bandwidth of the receiver is much reduced.

preprint2016arXiv

Saliency Detection with Spaces of Background-based Distribution

In this letter, an effective image saliency detection method is proposed by constructing some novel spaces to model the background and redefine the distance of the salient patches away from the background. Concretely, given the backgroundness prior, eigendecomposition is utilized to create four spaces of background-based distribution (SBD) to model the background, in which a more appropriate metric (Mahalanobis distance) is quoted to delicately measure the saliency of every image patch away from the background. After that, a coarse saliency map is obtained by integrating the four adjusted Mahalanobis distance maps, each of which is formed by the distances between all the patches and background in the corresponding SBD. To be more discriminative, the coarse saliency map is further enhanced into the posterior probability map within Bayesian perspective. Finally, the final saliency map is generated by properly refining the posterior probability map with geodesic distance. Experimental results on two usual datasets show that the proposed method is effective compared with the state-of-the-art algorithms.

preprint2015arXiv

Bio-Inspired Aggregation Control of Carbon Nanotubes for Ultra-Strong Composites

High performance nanocomposites require well dispersion and high alignment of the nanometer-sized components, at a high mass or volume fraction as well. However, the road towards such composite structure is severely hindered due to the easy aggregation of these nanometer-sized components. Here we demonstrate a big step to approach the ideal composite structure for carbon nanotube (CNT) where all the CNTs were highly packed, aligned, and unaggregated, with the impregnated polymers acting as interfacial adhesions and mortars to build up the composite structure. The strategy was based on a bio-inspired aggregation control to limit the CNT aggregation to be sub 20--50 nm, a dimension determined by the CNT growth. After being stretched with full structural relaxation in a multi-step way, the CNT/polymer (bismaleimide) composite yielded super-high tensile strengths up to 6.27--6.94 GPa, more than 100% higher than those of carbon fiber/epoxy composites, and toughnesses up to 117--192 MPa. We anticipate that the present study can be generalized for developing multifunctional and smart nanocomposites where all the surfaces of nanometer-sized components can take part in shear transfer of mechanical, thermal, and electrical signals.

preprint2015arXiv

Studies of two-solar-mass hybrid stars within the framework of Dyson-Schwinger equations

In this paper we introduce an equation of state (EOS) of quark matter within the framework of Dyson-Schwinger equations (DSEs) to study the structure of compact stars. The smooth crossover from hadronic matter to quark matter in the hybrid star is studied. We compare different strategies to obtain crossover EOSs and find a new way to construct two-solar-mass hybrid stars with even a relatively soft quark EOS, while earlier works show that the quark EOS should be stiff enough to support a massive hybrid star.

preprint2012arXiv

Aharon-Vaidman quantum game with a Young-type photonic qutrit

The Aharon-Vaidman (AV) game exemplifies the advantage of using simple quantum systems to outperform classical strategies. We present an experimental test of this advantage by using a three-state quantum system (qutrit) encoded in a spatial mode of a single photon passing through three slits. The preparation of a particular state is controlled as the photon propagates through the slits by varying the number of open slits and their respective phases. The measurements are achieved by placing detectors in the specific positions in the near and far fields after the slits. This set of tools allowed us to perform tomographic reconstructions of generalized qutrit states, and to implement the quantum version of the AV game with compelling evidence of the quantum advantage.

Tong Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation

From Failure to Mastery: Generating Hard Samples for Tool-use Agents

Limits to Extracting Neutron-Star Physics Constraints from NICER Pulse Profiles

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Graph Data Augmentation for Graph Machine Learning: A Survey

Coarse-to-Fine Sparse Sequential Recommendation

Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts

Formal Certification Methods for Automated Vehicle Safety Assessment

Learning from Counterfactual Links for Link Prediction

Learning Personalized Item-to-Item Recommendation Metric via Implicit Feedback

Neural-PDE: A RNN based neural network for solving time dependent PDEs

RETE: Retrieval-Enhanced Temporal Event Forecasting on Unified Query Product Evolutionary Graph

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network

Denoising individual bias for a fairer binary submatrix detection

Fast And Efficient Boolean Matrix Factorization By Geometric Segmentation

Federated Dynamic GNN with Secure Aggregation

Learning Attribute-Structure Co-Evolutions in Dynamic Graphs

MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

On the stability of two-flavor and three-flavor quark stars

Evaluating entropy rate of laser chaos and shot noise

GT-SEER: Geo-Temporal SEquential Embedding Rank for Point-of-interest Recommendation

Performance Analysis of Joint Time Delay and Doppler-Stretch Estimation with Random Stepped-Frequency Signals

Saliency Detection with Spaces of Background-based Distribution

Bio-Inspired Aggregation Control of Carbon Nanotubes for Ultra-Strong Composites

Studies of two-solar-mass hybrid stars within the framework of Dyson-Schwinger equations

Aharon-Vaidman quantum game with a Young-type photonic qutrit