Source author record

Yu Rong

Yu Rong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence astro-ph.GA Social and Information Networks Computer Vision Cryptography and Security Databases Quantitative Methods astro-ph.CO astro-ph.HE astro-ph.SR Computer Science and Game Theory Information Retrieval Multiagent Systems Networking and Internet Architecture

Catalog footprint

What is connected

36works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential training can cause catastrophic forgetting, and joint optimization often suffers from severe gradient conflicts. We analyze SFT and RLVR through the lens of task vectors and reveal three structural properties behind these failures: a 30* magnitude disparity, 45* sign interference, and heterogeneous module-wise update distributions. These findings show SFT and RLVR are difficult to integrate directly, but they also suggest that the two paradigms modify partly complementary components of the model. Motivated by these observations, we propose Decoupled Test-time Synthesis (DoTS), a post-hoc framework allows SFT and RLVR checkpoints to be trained independently and synthesizes their capabilities only at inference time via task vector arithmetic, without updating model parameters. To reduce interference, DOTS applies selective sparsification with norm-preserving rescaling. It then uses Bayesian optimization on a small set of unlabeled queries to search for combination coefficients on the Pareto frontier of consistency and perplexity. Empirically, \ours matches or exceeds the performance of training-based SFT--RLVR integration methods across multiple mathematical reasoning benchmarks, incurring only $\sim$3\% of the computational cost. When applied to stronger post-trained checkpoints, DOTS surpasses SOTA models and generalizes to out-of-domain benchmarks without re-tuning. Code is available at https://github.com/chaohaoyuan/DoTS.

preprint2022arXiv

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing high-level campaign objectives and constraints. Previous works designed auto-bidding tools from the view of single-agent, without modeling the mutual influence between agents. In this paper, we instead consider this problem from a distributed multi-agent perspective, and propose a general $\underline{M}$ulti-$\underline{A}$gent reinforcement learning framework for $\underline{A}$uto-$\underline{B}$idding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose a temperature-regularized credit assignment to establish a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among agents, we can reach an equilibrium state that guarantees not only individual advertiser's utility but also the system performance (i.e., social welfare). Second, to avoid the potential collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then alleviate the revenue degradation due to the cooperation. Third, to deploy MAAB in the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among the large-scale advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and revenue.

preprint2022arXiv

A Survey of Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection

Deep graph learning has achieved remarkable progresses in both business and scientific areas ranging from finance and e-commerce, to drug and advanced material discovery. Despite these progresses, how to ensure various deep graph learning algorithms behave in a socially responsible manner and meet regulatory compliance requirements becomes an emerging problem, especially in risk-sensitive domains. Trustworthy graph learning (TwGL) aims to solve the above problems from a technical viewpoint. In contrast to conventional graph learning research which mainly cares about model performance, TwGL considers various reliability and safety aspects of the graph learning framework including but not limited to robustness, explainability, and privacy. In this survey, we provide a comprehensive review of recent leading approaches in the TwGL field from three dimensions, namely, reliability, explainability, and privacy protection. We give a general categorization for existing work and review typical work for each category. To give further insights for TwGL research, we provide a unified view to inspect previous works and build the connection between them. We also point out some important open problems remaining to be solved in the future developments of TwGL.

preprint2022arXiv

Adversarial Attack Framework on Graph Embedding Models with Limited Knowledge

With the success of the graph embedding model in both academic and industry areas, the robustness of graph embedding against adversarial attack inevitably becomes a crucial problem in graph learning. Existing works usually perform the attack in a white-box fashion: they need to access the predictions/labels to construct their adversarial loss. However, the inaccessibility of predictions/labels makes the white-box attack impractical to a real graph learning system. This paper promotes current frameworks in a more general and flexible sense -- we demand to attack various kinds of graph embedding models with black-box driven. We investigate the theoretical connections between graph signal processing and graph embedding models and formulate the graph embedding model as a general graph signal process with a corresponding graph filter. Therefore, we design a generalized adversarial attacker: GF-Attack. Without accessing any labels and model predictions, GF-Attack can perform the attack directly on the graph filter in a black-box fashion. We further prove that GF-Attack can perform an effective attack without knowing the number of layers of graph embedding models. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experiments validate the effectiveness of GF-Attack on several benchmark datasets.

preprint2022arXiv

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

preprint2022arXiv

Energy-Based Learning for Cooperative Games, with Applications to Valuation Problems in Machine Learning

Valuation problems, such as feature interpretation, data valuation and model valuation for ensembles, become increasingly more important in many machine learning applications. Such problems are commonly solved by well-known game-theoretic criteria, such as Shapley value or Banzhaf value. In this work, we present a novel energy-based treatment for cooperative games, with a theoretical justification by the maximum entropy framework. Surprisingly, by conducting variational inference of the energy-based model, we recover various game-theoretic valuation criteria through conducting one-step fixed point iteration for maximizing the mean-field ELBO objective. This observation also verifies the rationality of existing criteria, as they are all attempting to decouple the correlations among the players through the mean-field approach. By running fixed point iteration for multiple steps, we achieve a trajectory of the valuations, among which we define the valuation with the best conceivable decoupling error as the Variational Index. We prove that under uniform initializations, these variational valuations all satisfy a set of game-theoretic axioms. We experimentally demonstrate that the proposed Variational Index enjoys lower decoupling error and better valuation performance on certain synthetic and real-world valuation problems.

preprint2022arXiv

Equivariant Graph Mechanics Networks with Constraints

Learning to reason about relations and dynamics over multiple interacting objects is a challenging topic in machine learning. The challenges mainly stem from that the interacting systems are exponentially-compositional, symmetrical, and commonly geometrically-constrained. Current methods, particularly the ones based on equivariant Graph Neural Networks (GNNs), have targeted on the first two challenges but remain immature for constrained systems. In this paper, we propose Graph Mechanics Network (GMN) which is combinatorially efficient, equivariant and constraint-aware. The core of GMN is that it represents, by generalized coordinates, the forward kinematics information (positions and velocities) of a structural object. In this manner, the geometrical constraints are implicitly and naturally encoded in the forward kinematics. Moreover, to allow equivariant message passing in GMN, we have developed a general form of orthogonality-equivariant functions, given that the dynamics of constrained systems are more complicated than the unconstrained counterparts. Theoretically, the proposed equivariant formulation is proved to be universally expressive under certain conditions. Extensive experiments support the advantages of GMN compared to the state-of-the-art GNNs in terms of prediction accuracy, constraint satisfaction and data efficiency on the simulated systems consisting of particles, sticks and hinges, as well as two real-world datasets for molecular dynamics prediction and human motion capture.

preprint2022arXiv

Fine-Tuning Graph Neural Networks via Graph Topology induced Optimal Transport

Recently, the pretrain-finetuning paradigm has attracted tons of attention in graph learning community due to its power of alleviating the lack of labels problem in many real-world applications. Current studies use existing techniques, such as weight constraint, representation constraint, which are derived from images or text data, to transfer the invariant knowledge from the pre-train stage to fine-tuning stage. However, these methods failed to preserve invariances from graph structure and Graph Neural Network (GNN) style models. In this paper, we present a novel optimal transport-based fine-tuning framework called GTOT-Tuning, namely, Graph Topology induced Optimal Transport fine-Tuning, for GNN style backbones. GTOT-Tuning is required to utilize the property of graph data to enhance the preservation of representation produced by fine-tuned networks. Toward this goal, we formulate graph local knowledge transfer as an Optimal Transport (OT) problem with a structural prior and construct the GTOT regularizer to constrain the fine-tuned model behaviors. By using the adjacency relationship amongst nodes, the GTOT regularizer achieves node-level optimal transport procedures and reduces redundant transport procedures, resulting in efficient knowledge transfer from the pre-trained models. We evaluate GTOT-Tuning on eight downstream tasks with various GNN backbones and demonstrate that it achieves state-of-the-art fine-tuning performance for GNNs.

preprint2022arXiv

Frustratingly Easy Transferability Estimation

Transferability estimation has been an essential tool in selecting a pre-trained model and the layers in it for transfer learning, to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. Through a single pass over examples of a target task, TransRate measures the transferability as the mutual information between features of target examples extracted by a pre-trained model and their labels. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. From the perspective of feature representation, the resulting TransRate evaluates both completeness (whether features contain sufficient information of a target task) and compactness (whether features of each class are compact enough for good generalization) of pre-trained features. Theoretically, we have analyzed the close connection of TransRate to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 32 pre-trained models and 16 downstream tasks.

preprint2022arXiv

Local Augmentation for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved remarkable performance on graph-based tasks. The key idea for GNNs is to obtain informative representation through aggregating information from local neighborhoods. However, it remains an open question whether the neighborhood information is adequately aggregated for learning representations of nodes with few neighbors. To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node features of the neighbors conditioned on the central node's feature and enhance GNN's expressive power with generated features. Local augmentation is a general framework that can be applied to any GNN model in a plug-and-play manner. It samples feature vectors associated with each node from the learned conditional distribution as additional input for the backbone model at each training iteration. Extensive experiments and analyses show that local augmentation consistently yields performance improvement when applied to various GNN architectures across a diverse set of benchmarks. For example, experiments show that plugging in local augmentation to GCN and GAT improves by an average of 3.4\% and 1.6\% in terms of test accuracy on Cora, Citeseer, and Pubmed. Besides, our experimental results on large graphs (OGB) show that our model consistently improves performance over backbones. Code is available at https://github.com/SongtaoLiu0823/LAGNN.

preprint2022arXiv

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer

Click-Through Rate (CTR) prediction, which aims to estimate the probability that a user will click an item, is an essential component of online advertising. Existing methods mainly attempt to mine user interests from users' historical behaviours, which contain users' directly interacted items. Although these methods have made great progress, they are often limited by the recommender system's direct exposure and inactive interactions, and thus fail to mine all potential user interests. To tackle these problems, we propose Neighbor-Interaction based CTR prediction (NI-CTR), which considers this task under a Heterogeneous Information Network (HIN) setting. In short, Neighbor-Interaction based CTR prediction involves the local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to guide the representation learning of the local neighbourhood, we further consider different kinds of interactions among the local neighborhood nodes from both explicit and implicit perspective, and propose a novel Graph-Masked Transformer (GMT) to effectively incorporates these kinds of interactions to produce highly representative embeddings for the target user-item pair. Moreover, in order to improve model robustness against neighbour sampling, we enforce a consistency regularization loss over the neighbourhood embedding. We conduct extensive experiments on two real-world datasets with millions of instances and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. Meanwhile, the comprehensive ablation studies verify the effectiveness of every component of our model. Furthermore, we have deployed this framework on the WeChat Official Account Platform with billions of users. The online A/B tests demonstrate an average CTR improvement of 21.9 against all online baselines.

preprint2022arXiv

Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed

Given one or more query vertices, Community Search (CS) aims to find densely intra-connected and loosely inter-connected structures containing query vertices. Attributed Community Search (ACS), a related problem, is more challenging since it finds communities with both cohesive structures and homogeneous vertex attributes. However, most methods for the CS task rely on inflexible pre-defined structures and studies for ACS treat each attribute independently. Moreover, the most popular ACS strategies decompose ACS into two separate sub-problems, i.e., the CS task and subsequent attribute filtering task. However, in real-world graphs, the community structure and the vertex attributes are closely correlated to each other. This correlation is vital for the ACS problem. In this paper, we propose Graph Neural Network models for both CS and ACS problems, i.e., Query Driven-GNN and Attributed Query Driven-GNN. In QD-GNN, we combine the local query-dependent structure and global graph embedding. In order to extend QD-GNN to handle attributes, we model vertex attributes as a bipartite graph and capture the relation between attributes by constructing GNNs on this bipartite graph. With a Feature Fusion operator, AQD-GNN processes the structure and attribute simultaneously and predicts communities according to each attributed query. Experiments on real-world graphs with ground-truth communities demonstrate that the proposed models outperform existing CS and ACS algorithms in terms of both efficiency and effectiveness. More recently, an interactive setting for CS is proposed that allows users to adjust the predicted communities. We further verify our approaches under the interactive setting and extend to the attributed context. Our method achieves 2.37% and 6.29% improvements in F1-score than the state-of-the-art model without attributes and with attributes respectively.

preprint2022arXiv

Similarity-aware Positive Instance Sampling for Graph Contrastive Pre-training

Graph instance contrastive learning has been proved as an effective task for Graph Neural Network (GNN) pre-training. However, one key issue may seriously impede the representative power in existing works: Positive instances created by current methods often miss crucial information of graphs or even yield illegal instances (such as non-chemically-aware graphs in molecular generation). To remedy this issue, we propose to select positive graph instances directly from existing graphs in the training set, which ultimately maintains the legality and similarity to the target graphs. Our selection is based on certain domain-specific pair-wise similarity measurements as well as sampling from a hierarchical graph encoding similarity relations among graphs. Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph. We conduct extensive experiments on $13$ graph classification and node classification benchmark datasets from various domains. The results demonstrate that the GNN models pre-trained by our strategies can outperform those trained-from-scratch models as well as the variants obtained by existing methods.

preprint2022arXiv

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

After the great success of Vision Transformer variants (ViTs) in computer vision, it has also demonstrated great potential in domain adaptive semantic segmentation. Unfortunately, straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement. We find that the pitfall of local ViTs is due to the severe high-frequency components generated during both the pseudo-label construction and features alignment for target domains. These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability. In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels. Furthermore, we propose a dynamic of discrepancy measurement to align the distributions in the source and target domains via dynamic weights to evaluate the importance of the samples. After tackling the above issues, extensive experiments on sim2real benchmarks show that the proposed method outperforms the state-of-the-art methods. Our codes are available at https://github.com/alpc91/TransDA

preprint2022arXiv

Tackling Over-Smoothing for General Graph Convolutional Networks

Increasing the depth of GCN, which is expected to permit more expressivity, is shown to incur performance detriment especially on node classification. The main cause of this lies in over-smoothing. The over-smoothing issue drives the output of GCN towards a space that contains limited distinguished information among nodes, leading to poor expressivity. Several works on refining the architecture of deep GCN have been proposed, but it is still unknown in theory whether or not these refinements are able to relieve over-smoothing. In this paper, we first theoretically analyze how general GCNs act with the increase in depth, including generic GCN, GCN with bias, ResGCN, and APPNP. We find that all these models are characterized by a universal process: all nodes converging to a cuboid. Upon this theorem, we propose DropEdge to alleviate over-smoothing by randomly removing a certain number of edges at each training epoch. Theoretically, DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by dimension collapse. Experimental evaluations on simulated dataset have visualized the difference in over-smoothing between different GCNs. Moreover, extensive experiments on several real benchmarks support that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs.

preprint2022arXiv

Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis

The ability to synthesize long-term human motion sequences in real-world scenes can facilitate numerous applications. Previous approaches for scene-aware motion synthesis are constrained by pre-defined target objects or positions and thus limit the diversity of human-scene interactions for synthesized motions. In this paper, we focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences. To achieve this, we first decompose the diversity of scene-aware human motions into three aspects, namely interaction diversity (e.g. sitting on different objects with different poses in the given scenes), path diversity (e.g. moving to the target locations following different paths), and the motion diversity (e.g. having various body movements during moving). Based on this factorized scheme, a hierarchical framework is proposed, with each sub-module responsible for modeling one aspect. We assess the effectiveness of our framework on two challenging datasets for scene-aware human motion synthesis. The experiment results show that the proposed framework remarkably outperforms previous methods in terms of diversity and naturalness.

preprint2022arXiv

Transformer for Graphs: An Overview from Architecture Perspective

Recently, Transformer model, which has achieved great success in many artificial intelligence fields, has demonstrated its great potential in modeling graph-structured data. Till now, a great variety of Transformers has been proposed to adapt to the graph-structured data. However, a comprehensive literature review and systematical evaluation of these Transformer variants for graphs are still unavailable. It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. In this survey, we provide a comprehensive review of various Graph Transformer models from the architectural design perspective. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer: 1) GNNs as Auxiliary Modules, 2) Improved Positional Embedding from Graphs, and 3) Improved Attention Matrix from Graphs. Furthermore, we implement the representative components in three groups and conduct a comprehensive comparison on various kinds of famous graph data benchmarks to investigate the real performance gain of each component. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.

preprint2021arXiv

Towards Expectation-Maximization by SQL in RDBMS

Integrating machine learning techniques into RDBMSs is an important task since there are many real applications that require modeling (e.g., business intelligence, strategic analysis) as well as querying data in RDBMSs. In this paper, we provide an SQL solution that has the potential to support different machine learning modelings. As an example, we study how to support unsupervised probabilistic modeling, that has a wide range of applications in clustering, density estimation and data summarization, and focus on Expectation-Maximization (EM) algorithms, which is a general technique for finding maximum likelihood estimators. To train a model by EM, it needs to update the model parameters by an E-step and an M-step in a while-loop iteratively until it converges to a level controled by some threshold or repeats a certain number of iterations. To support EM in RDBMSs, we show our answers to the matrix/vectors representations in RDBMSs, the relational algebra operations to support the linear algebra operations required by EM, parameters update by relational algebra, and the support of a while-loop. It is important to note that the SQL'99 recursion cannot be used to handle such a while-loop since the M-step is non-monotonic. In addition, assume that a model has been trained by an EM algorithm, we further design an automatic in-database model maintenance mechanism to maintain the model when the underlying training data changes.We have conducted experimental studies and will report our findings in this paper.

preprint2020arXiv

Adversarial Attack on Community Detection by Hiding Individuals

It has been demonstrated that adversarial graphs, i.e., graphs with imperceptible perturbations added, can cause deep graph models to fail on node/graph classification tasks. In this paper, we extend adversarial graphs to the problem of community detection which is much more difficult. We focus on black-box attack and aim to hide targeted individuals from the detection of deep graph community detection models, which has many applications in real-world scenarios, for example, protecting personal privacy in social networks and understanding camouflage patterns in transaction networks. We propose an iterative learning framework that takes turns to update two modules: one working as the constrained graph generator and the other as the surrogate community detection model. We also find that the adversarial graphs generated by our method can be transferred to other learning based community detection models.

preprint2020arXiv

Chasing the Tail in Monocular 3D Human Reconstruction with Prototype Memory

Deep neural networks have achieved great progress in single-image 3D human reconstruction. However, existing methods still fall short in predicting rare poses. The reason is that most of the current models perform regression based on a single human prototype, which is similar to common poses while far from the rare poses. In this work, we 1) identify and analyze this learning obstacle and 2) propose a prototype memory-augmented network, PM-Net, that effectively improves performances of predicting rare poses. The core of our framework is a memory module that learns and stores a set of 3D human prototypes capturing local distributions for either common poses or rare poses. With this formulation, the regression starts from a better initialization, which is relatively easier to converge. Extensive experiments on several widely employed datasets demonstrate the proposed framework's effectiveness compared to other state-of-the-art methods. Notably, our approach significantly improves the models' performances on rare poses while generating comparable results on other samples.

preprint2020arXiv

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

\emph{Over-fitting} and \emph{over-smoothing} are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes are released on~\url{https://github.com/DropEdge/DropEdge}.

preprint2020arXiv

Exploring the origin of ultra-diffuse galaxies in clusters from their primordial alignment

We find that the minor axes of the ultra-diffuse galaxies (UDGs) in Abell 2634 tend to be aligned with the major axis of the central dominant galaxy, at a $\gtrsim 95\%$ confidence level. This alignment is produced by the bright UDGs with the absolute magnitudes $M_r<-15.3$ mag, and outer-region UDGs with $R>0.5R_{200}$. The alignment signal implies that these bright, outer-region UDGs are very likely to acquire their angular momenta from the vortices around the large-scale filament before they were accreted into A2634, and form their extended stellar bodies outside of the cluster; in this scenario, the orientations of their primordial angular momenta, which are roughly shown by their minor axes on the images, should tend to be parallel to the elongation of the large-scale filament. When these UDGs fell into the unrelaxed cluster A2634 along the filament, they could still preserve their primordial alignment signal before violent relaxation and encounters. These bright, outer-region UDGs in A2634 are very unlikely to be the descendants of the high-surface-brightness dwarf progenitors under tidal interactions with the central dominant galaxy in the cluster environment. Our results indicate that the primordial alignment could be a useful probe of the origin of UDGs in large-scale structures.

preprint2020arXiv

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.

preprint2020arXiv

Graph Ordering: Towards the Optimal by Learning

Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.

preprint2020arXiv

Graph Representation Learning via Graphical Mutual Information Maximization

The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. GMI generalizes the idea of conventional mutual information computations from vector space to the graph domain where measuring mutual information from two aspects of node features and topological structure is indispensable. GMI exhibits several benefits: First, it is invariant to the isomorphic transformation of input graphs---an inevitable constraint in many existing graph representation learning algorithms; Besides, it can be efficiently estimated and maximized by current mutual information estimation methods such as MINE; Finally, our theoretical analysis confirms its correctness and rationality. With the aid of GMI, we develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder. Considerable experiments on transductive as well as inductive node classification and link prediction demonstrate that our method outperforms state-of-the-art unsupervised counterparts, and even sometimes exceeds the performance of supervised ones.

preprint2020arXiv

Intrinsic Morphology of Ultra-diffuse Galaxies

With the published data of apparent axis ratios for 1109 ultra-diffuse galaxies (UDGs) located in 17 low-redshift (z~ 0.020 - 0.063) galaxy clusters and 84 UDGs in 2 intermediate-redshift (z~ 0.308 - 0.348) clusters, we take advantage of a Markov Chain Monte Carlo approach and assume a ubiquitous triaxial model to investigate the intrinsic morphologies of UDGs. In contrast to the conclusion of Burkert (2017), i.e., the underlying shapes of UDGs are purely prolate ($C=B<A$), we find that the data favor the oblate-triaxial models ($C<B\lesssim A$) over the nearly prolate ones. We also find that the intrinsic morphologies of UDGs are relevant to their stellar masses/luminosities, environments, and redshifts. First, for the low-redshift UDGs in the same environment, the more-luminous ones are always thicker than the less-luminous counterparts, possibly due to the more voilent internal supernovae feedback or external tidal interactions for the progenitors of the more-luminous UDGs. The UDG thickness dependence on luminosity is distinct from that of the typical quiescent dwarf ellipticals (dEs) and dwarf spheroidals (dSphs) in the local clusters and groups, but resembles that of massive galaxies; in this sense, UDGs may not be simply treated as an extension of the dE/dSph class with similar evolutionary histories. Second, for the low-redshift UDGs within the same luminosity range, the ones with smaller cluster-centric distances are more puffed-up, probably attributed to tidal interactions. Finally, the intermediate-redshift cluster UDGs are more flattened, which plausibly suggests a `disky' origin for high-redshift, initial UDGs.

preprint2020arXiv

Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels?

Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice---we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.

preprint2020arXiv

Lessons on Star-forming Ultra-diffuse Galaxies from The Stacked Spectra of Sloan Digital Sky Survey

We investigate the on-average properties for 28 star-forming ultra-diffuse galaxies (UDGs) located in low-density environments, by stacking their spectra from the Sloan Digital Sky Survey. These relatively-isolated UDGs, with stellar masses of $\log_{10}(M_*/M_{\odot})\sim 8.57\pm0.29$, have the on-average total-stellar-metallicity [M/H]$\sim -0.82\pm0.14$, iron-metallicity [Fe/H]$\sim -1.00\pm0.16$, stellar age $t_*\sim5.2\pm0.5$ Gyr, $α$-enhancement [$α$/Fe]$\sim 0.24\pm0.10$, and oxygen abundance 12+log(O/H)$\sim 8.16\pm0.06$, as well as central stellar velocity dispersion $54\pm12$ km/s. On the star-formation rate versus stellar mass diagram, these UDGs are located lower than the extrapolated star-forming main sequence from the massive spirals, but roughly follow the main sequence of low-surface-brightness dwarf galaxies. We find that these star-forming UDGs are not particularly metal-poor or metal-rich for their stellar masses, as compared with the metallicity-mass relations of the nearby typical dwarfs. With the UDG data of this work and previous studies, we also find a coarse correlation between [Fe/H] and magnesium-element enhancement [Mg/Fe] for UDGs: [Mg/Fe]$\simeq-0.43(\pm0.26)$[Fe/H]$-0.14(\pm0.40)$.

preprint2020arXiv

Multi-View Graph Neural Networks for Molecular Property Prediction

The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models.

preprint2020arXiv

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Social media has been developing rapidly in public due to its nature of spreading new information, which leads to rumors being circulated. Meanwhile, detecting rumors from such massive information in social media is becoming an arduous challenge. Therefore, some deep learning methods are applied to discover rumors through the way they spread, such as Recursive Neural Network (RvNN) and so on. However, these deep learning methods only take into account the patterns of deep propagation but ignore the structures of wide dispersion in rumor detection. Actually, propagation and dispersion are two crucial characteristics of rumors. In this paper, we propose a novel bi-directional graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of rumors. It leverages a GCN with a top-down directed graph of rumor spreading to learn the patterns of rumor propagation, and a GCN with an opposite directed graph of rumor diffusion to capture the structures of rumor dispersion. Moreover, the information from the source post is involved in each layer of GCN to enhance the influences from the roots of rumors. Encouraging empirical results on several benchmarks confirm the superiority of the proposed method over the state-of-the-art approaches.

preprint2020arXiv

The Blue Compact Dwarf Galaxy VCC 848 Formed by Dwarf-Dwarf Merging: HI Gas, Star Formation and Numerical Simulations

A clear link between a dwarf-dwarf merger event and enhanced star formation (SF) in the recent past was recently identified in the gas-dominated merger remnant VCC 848, offering by far the clearest view of a gas-rich late-stage dwarf-dwarf merger. We present a joint analysis of JVLA HI emission-line mapping, optical imaging and numerical simulations of VCC 848, in order to examine the impact of the merger on the stellar and gaseous distributions. VCC 848 has less than 30% of its HI gas concentrated within the central high-surface-brightness star-forming region, while the remaining HI is entrained in outlying tidal features. Particularly, a well-defined tidal arm reaches N(HI) comparable to the galaxy center but lacks SF. The molecular gas mass inferred from the current SF rate (SFR) dominates over the atomic gas mass in the central ~ 1.5 kpc. VCC 848 is consistent with being a main-sequence star-forming galaxy for its current stellar mass and SFR. The HII region luminosity distribution largely agrees with that of normal dwarf irregulars with similar luminosities, except that the brightest HII region is extraordinarily luminous. Our N-body/hydrodynamical simulations imply that VCC 848 is a merger between a gas-dominated primary progenitor and a gas-bearing star-dominated secondary. The progenitors had their first passage on a near-radial non-coplanar orbit more than 1 Gyr ago. The merger did not build up a core as compact as typical compact dwarfs with centralized starburst, which may be partly ascribed to the star-dominated nature of the secondary, and in a general sense, a negative stellar feedback following intense starbursts triggered at early stages of the merger.

preprint2020arXiv

The Next Generation Fornax Survey (NGFS): VII. A MUSE view of the nuclear star clusters in Fornax dwarf galaxies

Clues to the formation and evolution of Nuclear Star Clusters (NSCs) lie in their stellar populations. However, these structures are often very faint compared to their host galaxy, and spectroscopic analysis of NSCs is hampered by contamination of light from the rest of the system. With the introduction of wide-field IFU spectrographs, new techniques have been developed to model the light from different components within galaxies, making it possible to cleanly extract the spectra of the NSCs and study their properties with minimal contamination from the light of the rest of the galaxy. This work presents the analysis of the NSCs in a sample of 12 dwarf galaxies in the Fornax Cluster observed with MUSE. Analysis of the stellar populations and star-formation histories reveal that all the NSCs show evidence of multiple episodes of star formation, indicating that they have built up their mass further since their initial formation. The NSCs were found to have systematically lower metallicities than their host galaxies, which is consistent with a scenario for mass-assembly through mergers with infalling globular clusters, while the presence of younger stellar populations and gas emission in the core of two galaxies is indicative of in-situ star formation. We conclude that the NSCs in these dwarf galaxies likely originated as globular clusters that migrated to the core of the galaxy which have built up their mass mainly through mergers with other infalling clusters, with gas-inflow leading to in-situ star formation playing a secondary role.

preprint2015arXiv

Galaxy alignment as a probe of large-scale filaments

The orientations of the red galaxies in a filament are aligned with the orientation of the filament. We thus develop a location-alignment-method (LAM) of detecting filaments around clusters of galaxies, which uses both the alignments of red galaxies and their distributions in two-dimensional images. For the first time, the orientations of red galaxies are used as probes of filaments. We apply LAM to the environment of Coma cluster, and find four filaments (two filaments are located in sheets) in two selected regions, which are compared with the filaments detected with the method of \cite{Falco14}. We find that LAM can effectively detect the filaments around a cluster, even with $3σ$ confidence level, and clearly reveal the number and overall orientations of the detected filaments. LAM is independent of the redshifts of galaxies, and thus can be applied at relatively high redshifts and to the samples of red galaxies without the information of redshifts. We also find that the images of background galaxies (interlopers) which are lensed by the gravity of foreground filaments are amplifiers to probe the filaments.

preprint2015arXiv

Primordial alignment of elliptical galaxies in intermediate redshift clusters

We measure primordial alignments for the red galaxies in the sample of eight massive galaxy clusters in the southern sky from the CLASH-VLT Large Programme, at a median redshift of 0.375. We find primordial alignment with about $3σ$ significance in the four dynamically young clusters, but null detection of primordial alignment in the four highly relaxed clusters. The observed primordial alignment is not dominated by any single one of the four dynamically young clusters, and is primarily due to a population of bright galaxies ($M_r<-20.5\ \rm{m}$) residing in the region 300 to 810 kpc from the cluster centers. For the first time, we point out that the combination of radial alignment and halo alignment can cause fake primordial alignment. Finally, we find that the detected alignment for the dynamically young clusters is real rather than fake primordial alignment.

preprint2015arXiv

Radial alignment of elliptical galaxies by the tidal force of a cluster of galaxies

Unlike the random radial orientation distribution of field elliptical galaxies, galaxies in a cluster are expected to point preferentially towards the center of the cluster, as a result of the cluster's tidal force on its member galaxies. In this work an analytic model is formulated to simulate this effect. The deformation time scale of a galaxy in a cluster is usually much shorter than the time scale of change of the tidal force; the dynamical process of the tidal interaction within the galaxy can thus be ignored. An equilibrium shape of a galaxy is then assumed to be the surface of equipotential, which is the sum of the self-gravitational potential of the galaxy and the tidal potential of the cluster at this location. We use a Monte-Carlo method to calculate the radial orientation distribution of these galaxies, by assuming the NFW mass profile of the cluster and the initial ellipticity of field galaxies. The radial angles show a single peak distribution centered at zero. The Monte-Carlo simulations also show that a shift of the reference center from the real cluster center weakens the anisotropy of the radial angle distribution. Therefore, the expected radial alignment cannot be revealed if the distribution of spatial position angle is used instead of that of radial angle. The observed radial orientations of elliptical galaxies in cluster Abell~2744 are consistent with the simulated distribution.

preprint2015arXiv

X-ray softening during the 2008 outburst of XTE J1810-189

XTE J1810-189 underwent an outburst in 2008, and was observed over $\sim 100$ d by RXTE. Performing a time-resolved spectral analysis on the photospheric radius expansion burst detected on 2008 May 4, we obtain the source distance in the range of 3.5--8.7 kpc for the first time. During its outburst, XTE J1810-189 did not enter into the high/soft state, and both the soft and hard colours decreased with decreasing flux. The fractional rms remained at high values ($\sim 30$ per cent). The RXTE/PCA spectra for 3-25 keV can be described by an absorbed power-law component with an additional Gaussian component, and the derived photon index $Γ$ increased from $1.84\pm0.01$ to $2.25\pm0.04$ when the unabsorbed X-ray luminosity in 3-25 keV dropped from $4\times10^{36}$ ergs s$^{-1}$ to $6\times10^{35}$ ergs s$^{-1}$. The relatively high flux, dense observations and broadband spectra allow us to provide strong evidence that the softening behaviour detected in the outburst of XTE J1810-189 originates from the evolution of non-thermal component rather than the thermal component (i.e. neutron star surface emission).

Yu Rong

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

A Survey of Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection

Adversarial Attack Framework on Graph Embedding Models with Limited Knowledge

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Energy-Based Learning for Cooperative Games, with Applications to Valuation Problems in Machine Learning

Equivariant Graph Mechanics Networks with Constraints

Fine-Tuning Graph Neural Networks via Graph Topology induced Optimal Transport

Frustratingly Easy Transferability Estimation

Local Augmentation for Graph Neural Networks

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer

Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed

Similarity-aware Positive Instance Sampling for Graph Contrastive Pre-training

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

Tackling Over-Smoothing for General Graph Convolutional Networks

Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis

Transformer for Graphs: An Overview from Architecture Perspective

Towards Expectation-Maximization by SQL in RDBMS

Adversarial Attack on Community Detection by Hiding Individuals

Chasing the Tail in Monocular 3D Human Reconstruction with Prototype Memory

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

Exploring the origin of ultra-diffuse galaxies in clusters from their primordial alignment

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Graph Ordering: Towards the Optimal by Learning

Graph Representation Learning via Graphical Mutual Information Maximization

Intrinsic Morphology of Ultra-diffuse Galaxies

Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels?

Lessons on Star-forming Ultra-diffuse Galaxies from The Stacked Spectra of Sloan Digital Sky Survey

Multi-View Graph Neural Networks for Molecular Property Prediction

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

The Blue Compact Dwarf Galaxy VCC 848 Formed by Dwarf-Dwarf Merging: HI Gas, Star Formation and Numerical Simulations

The Next Generation Fornax Survey (NGFS): VII. A MUSE view of the nuclear star clusters in Fornax dwarf galaxies

Galaxy alignment as a probe of large-scale filaments

Primordial alignment of elliptical galaxies in intermediate redshift clusters

Radial alignment of elliptical galaxies by the tidal force of a cluster of galaxies

X-ray softening during the 2008 outburst of XTE J1810-189