Source author record

Yu Zhao

Yu Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

39works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios

Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory over extended durations and infer causal dependencies across temporally distant events. Existing end-to-end video understanding methods are fundamentally limited by the $O(n^2)$ complexity of self-attention, while recent retrieval-augmented generation (RAG) approaches still suffer from fragmented clip-level memory, weak modeling of temporal and causal structure, and high storage and online inference costs. We present Event-Causal RAG, a lightweight retrieval-augmented framework for infinite long-video reasoning. Instead of indexing fixed-length clips, our method segments streaming videos into semantically coherent events and represents each event as a structured State-Event-State (SES) graph, capturing the event together with its surrounding state transitions. These graphs are merged into a global Event Knowledge Graph and stored in a dual-store memory that supports both semantic matching and causal-topological retrieval. On top of this memory, we design a bidirectional retrieval strategy to efficiently identify the most relevant event causal chains and provide them, together with the associated video evidence, to a backbone video foundation model for answer generation. Experiments on long-video understanding benchmarks demonstrate that Event-Causal RAG consistently outperforms strong clip-based retrieval baselines and long-context video models, particularly on questions requiring multi-event integration and causal inference across long temporal gaps, while also achieving improved memory efficiency and robust streaming performance.

preprint2026arXiv

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5\% of the total parameters are activated per input token. This extreme sparsity, combined with upcycling from dense models, enables efficient pre-training on 5T tokens. Our models surpass similarly-sized competitors on English and multilingual benchmarks, achieving a best-in-class performance-to-compute ratio. We further post-train these models to create Marco-MoE-\textsc{Instruct} variants, which surpass the performance of competing models possessing $3$--$14\times$ more activated parameters. Our analysis reveals that Marco-MoE learns structured expert activation patterns shared across related languages, while maintaining highly specialized utilization for linguistically isolated ones. We further show that Marco-MoE allows for scalable language expansion without the interference typical of dense models. To support the community, we disclose our full training datasets, recipes, and model weights.

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2023arXiv

An approximation to peak detection power using Gaussian random field theory

We study power approximation formulas for peak detection using Gaussian random field theory. The approximation, based on the expected number of local maxima above the threshold $u$, $\mathbb{E}[M_u]$, is proved to work well under three asymptotic scenarios: small domain, large threshold, and sharp signal. An adjusted version of $\mathbb{E}[M_u]$ is also proposed to improve accuracy when the expected number of local maxima $\mathbb{E}[M_{-\infty}]$ exceeds 1. Cheng and Schwartzman (2018) developed explicit formulas for $\mathbb{E}[M_u]$ of smooth isotropic Gaussian random fields with zero mean. In this paper, these formulas are extended to allow for rotational symmetric mean functions, so that they are suitable for power calculations. We also apply our formulas to 2D and 3D simulated datasets, and the 3D data is induced by a group analysis of fMRI data from the Human Connectome Project to measure performance in a realistic setting.

preprint2023arXiv

Automatically Reproducing Android Bug Reports Using Natural Language Processing and Reinforcement Learning

As part of the process of resolving issues submitted by users via bug reports, Android developers attempt to reproduce and observe the failures described by the bug report. Due to the low-quality of bug reports and the complexity of modern apps, the reproduction process is non-trivial and time-consuming. Therefore, automatic approaches that can help reproduce Android bug reports are in great need. However, current approaches to help developers automatically reproduce bug reports are only able to handle limited forms of natural language text and struggle to successfully reproduce failures for which the initial bug report had missing or imprecise steps. In this paper, we introduce a new fully automated Android bug report reproduction approach that addresses these limitations. Our approach accomplishes this by leveraging natural language process techniques to more holistically and accurately analyze the natural language in Android bug reports and designing new techniques, based on reinforcement learning, to guide the search for successful reproducing steps. We conducted an empirical evaluation of our approach on 77 real world bug reports. Our approach achieved 67% precision and 77% recall in accurately extracting reproduction steps from bug reports, and reproduced 74% of the bug reports, significantly outperforming state of the art techniques.

preprint2023arXiv

Causal conditional hidden Markov model for multimodal traffic prediction

Multimodal traffic flow can reflect the health of the transportation system, and its prediction is crucial to urban traffic management. Recent works overemphasize spatio-temporal correlations of traffic flow, ignoring the physical concepts that lead to the generation of observations and their causal relationship. Spatio-temporal correlations are considered unstable under the influence of different conditions, and spurious correlations may exist in observations. In this paper, we analyze the physical concepts affecting the generation of multimode traffic flow from the perspective of the observation generation principle and propose a Causal Conditional Hidden Markov Model (CCHMM) to predict multimodal traffic flow. In the latent variables inference stage, a posterior network disentangles the causal representations of the concepts of interest from conditional information and observations, and a causal propagation module mines their causal relationship. In the data generation stage, a prior network samples the causal latent variables from the prior distribution and feeds them into the generator to generate multimodal traffic flow. We use a mutually supervised training method for the prior and posterior to enhance the identifiability of the model. Experiments on real-world datasets show that CCHMM can effectively disentangle causal representations of concepts of interest and identify causality, and accurately predict multimodal traffic flow.

preprint2023arXiv

Spatio-temporal neural structural causal models for bike flow prediction

As a representative of public transportation, the fundamental issue of managing bike-sharing systems is bike flow prediction. Recent methods overemphasize the spatio-temporal correlations in the data, ignoring the effects of contextual conditions on the transportation system and the inter-regional timevarying causality. In addition, due to the disturbance of incomplete observations in the data, random contextual conditions lead to spurious correlations between data and features, making the prediction of the model ineffective in special scenarios. To overcome this issue, we propose a Spatio-temporal Neural Structure Causal Model(STNSCM) from the perspective of causality. First, we build a causal graph to describe the traffic prediction, and further analyze the causal relationship between the input data, contextual conditions, spatiotemporal states, and prediction results. Second, we propose to apply the frontdoor criterion to eliminate confounding biases in the feature extraction process. Finally, we propose a counterfactual representation reasoning module to extrapolate the spatio-temporal state under the factual scenario to future counterfactual scenarios to improve the prediction performance. Experiments on real-world datasets demonstrate the superior performance of our model, especially its resistance to fluctuations caused by the external environment. The source code and data will be released.

preprint2022arXiv

Asynchronous Federated Learning Based Mobility-aware Caching in Vehicular Edge Computing

Vehicular edge computing (VEC) is a promising technology to support real-time applications through caching the contents in the roadside units (RSUs), thus vehicles can fetch the contents requested by vehicular users (VUs) from the RSU within short time. The capacity of the RSU is limited and the contents requested by VUs change frequently due to the high-mobility characteristics of vehicles, thus it is essential to predict the most popular contents and cache them in the RSU in advance. The RSU can train model based on the VUs' data to effectively predict the popular contents. However, VUs are often reluctant to share their data with others due to the personal privacy. Federated learning (FL) allows each vehicle to train the local model based on VUs' data, and upload the local model to the RSU instead of data to update the global model, and thus VUs' privacy information can be protected. The traditional synchronous FL must wait all vehicles to complete training and upload their local models for global model updating, which would cause a long time to train global model. The asynchronous FL updates the global model in time once a vehicle's local model is received. However, the vehicles with different staying time have different impacts to achieve the accurate global model. In this paper, we consider the vehicle mobility and propose an Asynchronous FL based Mobility-aware Edge Caching (AFMC) scheme to obtain an accurate global model, and then propose an algorithm to predict the popular contents based on the global model. Experimental results show that AFMC outperforms other baseline caching schemes.

preprint2022arXiv

Combining Intra-Risk and Contagion Risk for Enterprise Bankruptcy Prediction Using Graph Neural Networks

Predicting the bankruptcy risk of small and medium-sized enterprises (SMEs) is an important step for financial institutions when making decisions about loans. Existing studies in both finance and AI research fields, however, tend to only consider either the intra-risk or contagion risk of enterprises, ignoring their interactions and combinatorial effects. This study for the first time considers both types of risk and their joint effects in bankruptcy prediction. Specifically, we first propose an enterprise intra-risk encoder based on statistically significant enterprise risk indicators for its intra-risk learning. Then, we propose an enterprise contagion risk encoder based on enterprise relation information from an enterprise knowledge graph for its contagion risk embedding. In particular, the contagion risk encoder includes both the newly proposed Hyper-Graph Neural Networks and Heterogeneous Graph Neural Networks, which can model contagion risk in two different aspects, i.e. common risk factors based on hyperedges and direct diffusion risk from neighbors, respectively. To evaluate the model, we collect real-world multi-sources data on SMEs and build a novel benchmark dataset called SMEsD. We provide open access to the dataset, which is expected to further promote research on financial risk analysis. Experiments on SMEsD against twelve state-of-the-art baselines demonstrate the effectiveness of the proposed model for bankruptcy prediction.

preprint2022arXiv

Differentiable Channel Sparsity Search via Weight Sharing within Filters

In this paper, we propose the differentiable channel sparsity search (DCSS) for convolutional neural networks. Unlike traditional channel pruning algorithms which require users to manually set prune ratios for each convolutional layer, DCSS automatically searches the optimal combination of sparsities. Inspired by the differentiable architecture search (DARTS), we draw lessons from the continuous relaxation and leverage the gradient information to balance the computational cost and metrics. Since directly applying the scheme of DARTS causes shape mismatching and excessive memory consumption, we introduce a novel technique called weight sharing within filters. This technique elegantly eliminates the problem of shape mismatching with negligible additional resources. We conduct comprehensive experiments on not only image classification but also find-grained tasks including semantic segmentation and image super resolution to verify the effectiveness of DCSS. Compared with previous network pruning approaches, DCSS achieves state-of-the-art results for image classification. Experimental results of semantic segmentation and image super resolution indicate that task-specific search achieves better performance than transferring slim models, demonstrating the wide applicability and high efficiency of DCSS.

preprint2022arXiv

Fast Electromagnetic Validations of Large-Scale Digital Coding Metasurfaces Accelerated by Recurrence Rebuild and Retrieval Method

The recurrence rebuild and retrieval method (R3M) is proposed in this paper to accelerate the electromagnetic (EM) validations of large-scale digital coding metasurfaces (DCMs). R3M aims to accelerate the EM validations of DCMs with varied codebooks, which involves the analysis of a group of similar but not identical structures. The method transforms general DCMs to rigorously periodic arrays by replacing each coding unit with the macro unit, which comprises all possible coding states. The system matrix corresponding to the rigorously periodic array is globally shared for DCMs with arbitrary codebooks via implicit retrieval. The discrepancy of the interactions for edge and corner units are precluded by the basis extension of periodic boundaries. Moreover, the hierarchical pattern exploitation (HPE) algorithm is leveraged to efficiently assemble the system matrix for further acceleration. Due to the fully utilization of the rigid periodicity, the computational complexity of R3M-HPE is theoretically lower than that of $\mathcal{H}$-matrix within the same paradigm. Numerical results for two types of DCMs indicate that R3M-HPE is accurate in comparison with commercial software. Besides, R3M-HPE is also compatible with the preconditioning for efficient iterative solutions. The efficiency of R3M-HPE for DCMs outperforms the conventional fast algorithms in both the storage and CPU time cost.

preprint2022arXiv

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding. We propose PhraseHOI, containing a HOI branch and a novel phrase branch, to leverage language prior and improve relation expression. Specifically, the phrase branch is supervised by semantic embeddings, whose ground truths are automatically converted from the original HOI annotations without extra human efforts. Meanwhile, a novel label composition method is proposed to deal with the long-tailed problem in HOI, which composites novel phrase labels by semantic neighbors. Further, to optimize the phrase branch, a loss composed of a distilling loss and a balanced triplet loss is proposed. Extensive experiments are conducted to prove the effectiveness of the proposed PhraseHOI, which achieves significant improvement over the baseline and surpasses previous state-of-the-art methods on Full and NonRare on the challenging HICO-DET benchmark.

preprint2022arXiv

Learning Bi-typed Multi-relational Heterogeneous Graph via Dual Hierarchical Attention Networks

Bi-type multi-relational heterogeneous graph (BMHG) is one of the most common graphs in practice, for example, academic networks, e-commerce user behavior graph and enterprise knowledge graph. It is a critical and challenge problem on how to learn the numerical representation for each node to characterize subtle structures. However, most previous studies treat all node relations in BMHG as the same class of relation without distinguishing the different characteristics between the intra-class relations and inter-class relations of the bi-typed nodes, causing the loss of significant structure information. To address this issue, we propose a novel Dual Hierarchical Attention Networks (DHAN) based on the bi-typed multi-relational heterogeneous graphs to learn comprehensive node representations with the intra-class and inter-class attention-based encoder under a hierarchical mechanism. Specifically, the former encoder aggregates information from the same type of nodes, while the latter aggregates node representations from its different types of neighbors. Moreover, to sufficiently model node multi-relational information in BMHG, we adopt a newly proposed hierarchical mechanism. By doing so, the proposed dual hierarchical attention operations enable our model to fully capture the complex structures of the bi-typed multi-relational heterogeneous graphs. Experimental results on various tasks against the state-of-the-arts sufficiently confirm the capability of DHAN in learning node representations on the BMHGs.

preprint2022arXiv

Medical Dialogue Response Generation with Pivotal Information Recalling

Medical dialogue generation is an important yet challenging task. Most previous works rely on the attention mechanism and large-scale pretrained language models. However, these methods often fail to acquire pivotal information from the long dialogue history to yield an accurate and informative response, due to the fact that the medical entities usually scatters throughout multiple utterances along with the complex relationships between them. To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i.e., knowledge-aware dialogue graph encoder and recall-enhanced generator. The knowledge-aware dialogue graph encoder constructs a dialogue graph by exploiting the knowledge relationships between entities in the utterances, and encodes it with a graph attention network. Then, the recall-enhanced generator strengthens the usage of these pivotal information by generating a summary of the dialogue before producing the actual response. Experimental results on two large-scale medical dialogue datasets show that MedPIR outperforms the strong baselines in BLEU scores and medical entities F1 measure.

preprint2022arXiv

MSDF: A General Open-Domain Multi-Skill Dialog Framework

Dialog systems have achieved significant progress and have been widely used in various scenarios. The previous researches mainly focused on designing dialog generation models in a single scenario, while comprehensive abilities are required to handle tasks under various scenarios in the real world. In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e.g. knowledge grounded dialog and persona based dialog). Specifically, we propose a transferable response generator pre-trained on diverse large-scale dialog corpora as the backbone of MSDF, consisting of BERT-based encoders and a GPT-based decoder. To select the response consistent with dialog history, we propose a consistency selector trained through negative sampling. Moreover, the flexible copy mechanism of external knowledge is also employed to enhance the utilization of multiform knowledge in various scenarios. We conduct experiments on knowledge grounded dialog, recommendation dialog, and persona based dialog tasks. The experimental results indicate that our MSDF outperforms the baseline models with a large margin. In the Multi-skill Dialog of 2021 Language and Intelligence Challenge, our general MSDF won the 3rd prize, which proves our MSDF is effective and competitive.

preprint2022arXiv

On the equilibrium of the Poisson-Nernst-Planck-Bikermann model equipping with the steric and correlation effects

The Poisson-Nernst-Planck-Bikermann (PNPB) model, in which the ions and water molecules are treated as different species with non-uniform sizes and valences with interstitial voids, can describe the steric and correlation effects in ionic solution neglected by the Poisson-Nernst-Planck and Poisson-Boltzmann theories with point charge assumption. In the PNPB model, the electric potential is governed by the fourth-order Poisson-Bikermann (4PBik) equation instead of the Poisson equation so that it can describe the correlation effect. What's more, the steric potential is included in the ionic and water fluxes as well as the equilibrium Fermi-like distributions which characterizes the steric effect quantitatively. In this work, after doing a nondimensionalization step, we analyze the self-adjointness and the kernel of the fourth-order operator of the 4PBik equation. Also, we show the positivity of the void volume function and the convexity of the free energy. Following these properties, the well-posedness of the PNPB model in equilibrium is given. Furthermore, because the PNPB model has an energy dissipated structure, we adopt a finite volume scheme which preserves the energy dissipated property at the semi-discrete level. After that, various numerical investigations are given to show the parameter dependence of the steric effect to the steady state.

preprint2022arXiv

ReMix: A General and Efficient Framework for Multiple Instance Learning based Whole Slide Image Classification

Whole slide image (WSI) classification often relies on deep weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels. Yet the decent performance of deep learning comes from harnessing massive datasets and diverse samples, urging the need for efficient training pipelines for scaling to large datasets and data augmentation techniques for diversifying samples. However, current MIL-based WSI classification pipelines are memory-expensive and computation-inefficient since they usually assemble tens of thousands of patches as bags for computation. On the other hand, despite their popularity in other tasks, data augmentations are unexplored for WSI MIL frameworks. To address them, we propose ReMix, a general and efficient framework for MIL based WSI classification. It comprises two steps: reduce and mix. First, it reduces the number of instances in WSI bags by substituting instances with instance prototypes, i.e., patch cluster centroids. Then, we propose a ``Mix-the-bag'' augmentation that contains four online, stochastic and flexible latent space augmentations. It brings diverse and reliable class-identity-preserving semantic changes in the latent space while enforcing semantic-perturbation invariance. We evaluate ReMix on two public datasets with two state-of-the-art MIL methods. In our experiments, consistent improvements in precision, accuracy, and recall have been achieved but with orders of magnitude reduced training time and memory consumption, demonstrating ReMix's effectiveness and efficiency. Code is available.

preprint2022arXiv

Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification

The rapid on-site evaluation (ROSE) technique can signifi-cantly accelerate the diagnosis of pancreatic cancer by im-mediately analyzing the fast-stained cytopathological images. Computer-aided diagnosis (CAD) can potentially address the shortage of pathologists in ROSE. However, the cancerous patterns vary significantly between different samples, making the CAD task extremely challenging. Besides, the ROSE images have complicated perturbations regarding color distribution, brightness, and contrast due to different staining qualities and various acquisition device types. To address these challenges, we proposed a shuffle instances-based Vision Transformer (SI-ViT) approach, which can reduce the perturbations and enhance the modeling among the instances. With the regrouped bags of shuffle instances and their bag-level soft labels, the approach utilizes a regression head to make the model focus on the cells rather than various perturbations. Simultaneously, combined with a classification head, the model can effectively identify the general distributive patterns among different instances. The results demonstrate significant improvements in the classification accuracy with more accurate attention regions, indicating that the diverse patterns of ROSE images are effectively extracted, and the complicated perturbations are significantly reduced. It also suggests that the SI-ViT has excellent potential in analyzing cytopathological images. The code and experimental results are available at https://github.com/sagizty/MIL-SI.

preprint2022arXiv

Stock Movement Prediction Based on Bi-typed Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.

preprint2022arXiv

Time-Dependent Performance Modeling for Platooning Communications at Intersection

With the development of internet of vehicles, platooning strategy has been widely studied as the potential approach to ensure the safety of autonomous driving. Vehicles in the form of platoon adopt 802.11p to exchange messages through vehicle to vehicle (V2V) communications. When multiple platoons arrive at an intersection, the leader vehicle of each platoon adjusts its movement characteristics to ensure that it can cross the intersection and thus the following vehicles have to adjust their movement characteristics accordingly. In this case, the time-varying connectivity among vehicles leads to the significant non-stationary performance change in platooning communications, which may incur safety issues. In this paper, we construct the time-dependent model to evaluate the platooning communication performance at the intersection based on the initial movement characteristics. We first consider the movement behaviors of vehicles at the intersection including turning, accelerating, decelerating and stopping as well as the periodic change of traffic lights to construct movement model, and then establish a hearing network to reflect the time-varying connectivity among vehicles. Afterwards, we adopt the pointwise stationary fluid flow approximation (PSFFA) to model the non-stationary behavior of transmission queue. Then, we consider four access categories (ACs) and continuous backoff freezing of 802.11p to construct the models to describe the time-dependent access process of 802.11p. Finally, based on the time-dependent model, the packet transmission delay and packet delivery ratio are derived. The accuracy of our proposed model is verified by comparing the simulation results with analytical results.

preprint2022arXiv

Timing performance simulation for 3D 4H-SiC detector

To meet high radiation challenge for detectors in future high-energy physics, a novel 3D 4H-SiC detector was investigated. SiC detectors could potentially operate in radiation harsh and room temperature environment because of its high thermal conductivity and high atomic displacement threshold energy. 3D structure, which decouples thickness and distance between electrodes, further improves timing performance and radiation hardness of the detector. We developed a simulation software - RASER (RAdiation SEmiconductoR) to simulate the time resolution of planar and 3D 4H-SiC detectors with different parameters and structures, and the reliability of the software is verified by comparing time resolution results of simulation with data. The rough time resolution of 3D 4H-SiC detector was estimated, and the simulation parameters could be used as guideline to 3D 4H-SiC detector design and optimization.

preprint2021arXiv

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves $26.61\% $ $ AP $ on HICO-DET and $52.9\%$ $AP_{role}$ on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope our approach will serve as a simple and effective alternative for HOI tasks. Code is available at https://github.com/bbepoch/HoiTransformer .

preprint2021arXiv

QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task

Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years. The goal is to investigate automatic methods for estimating the quality of machine translation results without reference translations. In this year's WMT QE shared task, we utilize the large-scale XLM-Roberta pre-trained model and additionally propose several useful features to evaluate the uncertainty of the translations to build our QE system, named \textit{QEMind}. The system has been applied to the sentence-level scoring task of Direct Assessment and the binary score prediction task of Critical Error Detection. In this paper, we present our submissions to the WMT 2021 QE shared task and an extensive set of experimental results have shown us that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.

preprint2020arXiv

A unified structure preserving scheme for a multi-species model with a gradient flow structure and nonlocal interactions via singular kernels

In this paper, we consider a nonlinear and nonlocal parabolic model for multi-species ionic fluids and introduce a semi-implicit finite volume scheme, which is second order accurate in space, first order in time and satisfies the following properties: positivity preserving, mass conservation and energy dissipation. Besides, our scheme involves a fast algorithm on the convolution terms with singular but integrable kernels, which otherwise impedes the accuracy and efficiency of the whole scheme. Error estimates on the fast convolution algorithm are shown next. Numerous numerical tests are provided to demonstrate the properties, such as unconditional stability, order of convergence, energy dissipation and the complexity of the fast convolution algorithm. Furthermore, extensive numerical experiments are carried out to explore the modeling effects in specific examples, such as, the steric repulsion, the concentration of ions at the boundary and the blowup phenomenon of the Keller-Segel equations.

preprint2020arXiv

ADMM-IDNN: Iteratively Double-reweighted Nuclear Norm Algorithm for Group-prior based Nonconvex Compressed Sensing via ADMM

Group-prior based regularization method has led to great successes in various image processing tasks, which can usually be considered as a low-rank matrix minimization problem. As a widely used surrogate function of low-rank, the nuclear norm based convex surrogate usually lead to over-shrinking phenomena, since the nuclear norm shrinks the rank components (singular value) simultaneously. In this paper, we propose a novel Group-prior based nonconvex image compressive sensing (CS) reconstruction framework via a family of nonconvex nuclear norms functions which contain common concave and monotonically properties. To solve the resulting nonconvex nuclear norm minimization (NNM) problem, we develop a Group based iteratively double-reweighted nuclear norm algorithm (IDNN) via an alternating direction method of multipliers (ADMM) framework. Our proposed algorithm can convert the nonconvex nuclear norms optimization problem into a double-reweighted singular value thresholding (DSVT) problem. Extensive experiments demonstrate our proposed framework achieved favorable reconstruction performance compared with current state-of-the-art convex methods.

preprint2020arXiv

Connecting Embeddings for Knowledge Graph Entity Typing

Knowledge graph (KG) entity typing aims at inferring possible missing entity type instances in KG, which is a very significant but still under-explored subtask of knowledge graph completion. In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge from KGs. Specifically, we present two distinct knowledge-driven effective mechanisms of entity type inference. Accordingly, we build two novel embedding models to realize the mechanisms. Afterward, a joint model with them is used to infer missing entity type instances, which favors inferences that agree with both entity type instances and triple knowledge in KGs. Experimental results on two real-world datasets (Freebase and YAGO) demonstrate the effectiveness of our proposed mechanisms and models for improving KG entity typing. The source code and data of this paper can be obtained from: https://github.com/ Adam1679/ConnectE

preprint2020arXiv

Fixed-Time Cooperative Tracking Control for Double-Integrator Multi-Agent Systems: A Time-Based Generator Approach

In this paper, both the fixed-time distributed consensus tracking and the fixed-time distributed average tracking problems for double-integrator-type multi-agent systems with bounded input disturbances are studied, respectively. Firstly, a new practical robust fixed-time sliding mode control method based on the time-based generator is proposed. Secondly, a fixed-time distributed consensus tracking observer for double-integrator-type multi-agent systems is designed to estimate the state disagreements between the leader and the followers under undirected and directed communication, respectively. Thirdly, a fixed-time distributed average tracking observer for double-integrator-type multi-agent systems is designed to measure the average value of reference signals under undirected communication. Note that both the observers for the distributed consensus tracking and the distributed average tracking are devised based on time-based generators and can be extended to that of high-order multi-agent systems trivially. Furthermore, by combing the fixed-time sliding mode control with the fixed-time observers, the fixed-time controllers are designed to solve the distributed consensus tracking and the distributed average tracking problems. Finally, a few numerical simulations are shown to verify the results.

preprint2020arXiv

Nonconvex Nonsmooth Low-Rank Minimization for Generalized Image Compressed Sensing via Group Sparse Representation

Group sparse representation (GSR) based method has led to great successes in various image recovery tasks, which can be converted into a low-rank matrix minimization problem. As a widely used surrogate function of low-rank, the nuclear norm based convex surrogate usually leads to over-shrinking problem, since the standard soft-thresholding operator shrinks all singular values equally. To improve traditional sparse representation based image compressive sensing (CS) performance, we propose a generalized CS framework based on GSR model, which leads to a nonconvex nonsmooth low-rank minimization problem. The popular L_2-norm and M-estimator are employed for standard image CS and robust CS problem to fit the data respectively. For the better approximation of the rank of group-matrix, a family of nuclear norms are employed to address the over-shrinking problem. Moreover, we also propose a flexible and effective iteratively-weighting strategy to control the weighting and contribution of each singular value. Then we develop an iteratively reweighted nuclear norm algorithm for our generalized framework via an alternating direction method of multipliers framework, namely, GSR-AIR. Experimental results demonstrate that our proposed CS framework can achieve favorable reconstruction performance compared with current state-of-the-art methods and the robust CS framework can suppress the outliers effectively.

preprint2020arXiv

Q-greedyUCB: a New Exploration Policy for Adaptive and Resource-efficient Scheduling

This paper proposes a learning algorithm to find a scheduling policy that achieves an optimal delay-power trade-off in communication systems. Reinforcement learning (RL) is used to minimize the expected latency for a given energy constraint where the environments such as traffic arrival rates or channel conditions can change over time. For this purpose, this problem is formulated as an infinite-horizon Markov Decision Process (MDP) with constraints. To handle the constrained optimization problem, we adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines Q-learning for \emph{average} reward algorithm and Upper Confidence Bound (UCB) policy to solve this decision-making problem. We prove that the Q-greedyUCB algorithm is convergent through mathematical analysis. Simulation results show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with the $\varepsilon$-greedy and Average-payoff RL algorithm in terms of the cumulative reward (i.e., the weighted sum of delay and energy) and the convergence speed. We also show that our algorithm can reduce the regret by up to 12% compared to the Q-learning with the $\varepsilon$-greedy and Average-payoff RL algorithm.

preprint2016arXiv

Designing Distributed Fixed-Time Consensus Protocols for Linear Multi-Agent Systems Over Directed Graphs

This technical note addresses the distributed fixed-time consensus protocol design problem for multi-agent systems with general linear dynamics over directed communication graphs. By using motion planning approaches, a class of distributed fixed-time consensus algorithms are developed, which rely only on the sampling information at some sampling instants. For linear multi-agent systems, the proposed algorithms solve the fixed-time consensus problem for any directed graph containing a directed spanning tree. In particular, the settling time can be off-line pre-assigned according to task requirements. Compared with the existing results for multi-agent systems, to our best knowledge, it is the first-time to solve fixed-time consensus problems for general linear multi-agent systems over directed graphs having a directed spanning tree. Extensions to the fixed-time formation flying are further studied for multiple satellites described by Hill equations.

preprint2016arXiv

Distributed Average Tracking for Multiple Signals Generated by Linear Dynamical Systems: An Edge-based Framework

This paper studies the distributed average tracking problem for multiple time-varying signals generated by linear dynamics, whose reference inputs are nonzero and not available to any agent in the network. In the edge-based framework, a pair of continuous algorithms with, respectively, static and adaptive coupling strengths are designed. Based on the boundary layer concept, the proposed continuous algorithm with static coupling strengths can asymptotically track the average of multiple reference signals without the chattering phenomenon. Furthermore, for the case of algorithms with adaptive coupling strengths, average tracking errors are uniformly ultimately bounded and exponentially converge to a small adjustable bounded set. Finally, a simulation example is presented to show the validity of theoretical results.

preprint2016arXiv

Fixed-time consensus of multiple double-integrator systems under directed topologies: A motion-planning approach

This paper investigates the fixed-time consensus problem under directed topologies. By using a motion-planning approach, a class of distributed fixed-time algorithms are developed for a multi-agent system with double-integrator dynamics. In the context of the fixed-time consensus, we focus on both directed fixed and switching topologies. Under the directed fixed topology, a novel class of distributed algorithms are designed, which guarantee the consensus of the multi-agent system with a fixed settling time if the topology has a directed spanning tree. Under the directed periodically switching topologies, the fixedtime consensus is solved via the proposed algorithms if the topologies jointly have a directed spanning tree. In particular, the fixed settling time can be off-line pre-assigned according to task requirements. Compared with the existing results, to our best knowledge, it is the first time to solve the fixed-time consensus problem for double-integrator systems under directed topologies. Finally, a numerical example is given to illustrate the effectiveness of the analytical results.

preprint2016arXiv

Neural Headline Generation with Sentence-wise Optimization

Recently, neural models have been proposed for headline generation by learning to map documents to headlines with recurrent neural networks. Nevertheless, as traditional neural network utilizes maximum likelihood estimation for parameter optimization, it essentially constrains the expected training objective within word level rather than sentence level. Moreover, the performance of model prediction significantly relies on training data distribution. To overcome these drawbacks, we employ minimum risk training strategy in this paper, which directly optimizes model parameters in sentence level with respect to evaluation metrics and leads to significant improvements for headline generation. Experiment results show that our models outperforms state-of-the-art systems on both English and Chinese headline generation tasks.

preprint2016arXiv

SICS: Secure In-Cloud Service Function Chaining

There is an increasing trend that enterprises outsource their network functions to the cloud for lower cost and ease of management. However, network function outsourcing brings threats to the privacy of enterprises since the cloud is able to access the traffic and rules of in-cloud network functions. Current tools for secure network function outsourcing either incur large performance overhead or do not support real-time updates. In this paper, we present SICS, a secure service function chain outsourcing framework. SICS encrypts each packet header and use a label for in-cloud rule matching, which enables the cloud to perform its functionalities correctly with minimum header information leakage. Evaluation results show that SICS achieves higher throughput, faster construction and update speed, and lower resource overhead at both enterprise and cloud sides, compared to existing solutions.

preprint2015arXiv

Polynomial bounds for decoupling, with applications

Let f(x) = f(x_1, ..., x_n) = \sum_{|S| <= k} a_S \prod_{i \in S} x_i be an n-variate real multilinear polynomial of degree at most k, where S \subseteq [n] = {1, 2, ..., n}. For its "one-block decoupled" version, f~(y,z) = \sum_{|S| <= k} a_S \sum_{i \in S} y_i \prod_{j \in Sı} z_j, we show tail-bound comparisons of the form Pr[|f~(y,z)| > C_k t] <= D_k Pr[f(x) > t]. Our constants C_k, D_k are significantly better than those known for "full decoupling". For example, when x, y, z are independent Gaussians we obtain C_k = D_k = O(k); when x, y, z, Rademacher random variables we obtain C_k = O(k^2), D_k = k^{O(k)}. By contrast, for full decoupling only C_k = D_k = k^{O(k)} is known in these settings. We describe consequences of these results for query complexity (related to conjectures of Aaronson and Ambainis) and for analysis of Boolean functions (including an optimal sharpening of the DFKO Inequality).

preprint2014arXiv

Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory input under resource constraints. Applying this framework to a model system consisting of an active eye behaving in a time varying environment, we find that this generic principle leads to the simultaneous development of both smooth pursuit behavior and model neurons whose properties are similar to those of primary visual cortical neurons selective for different directions of visual motion. We suggest that this general principle may form the basis for a unified and integrated explanation of many perception/action loops.

preprint2013arXiv

Distributed average tracking for multiple reference signals with general linear dynamics

This technical note studies the distributed average tracking problem for multiple time-varying signals with general linear dynamics, whose reference inputs are nonzero and not available to any agent in the network. In distributed fashion, a pair of continuous algorithms with, respectively, static and adaptive coupling strengths are designed. Based on the boundary layer concept, the proposed continuous algorithm with static coupling strengths can asymptotically track the average of the multiple reference signals without chattering phenomenon. Furthermore, for the case of algorithms with adaptive coupling strengths, the average tracking errors are uniformly ultimately bounded and exponentially converge to a small adjustable bounded set. Finally, a simulation example is presented to show the validity of the theoretical results.

preprint2012arXiv

Almost totally complex points on elliptic curves

Let $F/F_0$ be a quadratic extension of totally real number fields, and let $E$ be an elliptic curve over $F$ which is isogenous to its Galois conjugate over $F_0$. A quadratic extension $M/F$ is said to be almost totally complex (ATC) if all archimedean places of $F$ but one extend to a complex place of $M$. The main goal of this note is to provide a new construction of a supply of Darmon-like points on $E$, which are conjecturally defined over certain ring class fields of $M$. These points are constructed by means of an extension of Darmon's ATR method to higher dimensional modular abelian varieties, from which they inherit the following features: they are algebraic provided Darmon's conjectures on ATR points hold true, and they are explicitly computable, as we illustrate with a detailed example that provides certain numerical evidence for the validity of our conjectures.

preprint2012arXiv

The Han-Kobayashi Region for a Class of Gaussian Interference Channels with Mixed Interference

A simple encoding scheme based on Sato's non-naïve frequency division is proposed for a class of Gaussian interference channels with mixed interference. The achievable region is shown to be equivalent to that of Costa's noiseberg region for the onesided Gaussian interference channel. This allows for an indirect proof that this simple achievable rate region is indeed equivalent to the Han-Kobayashi (HK) region with Gaussian input and with time sharing for this class of Gaussian interference channels with mixed interference.

Yu Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

Training Report of TeleChat3-MoE

An approximation to peak detection power using Gaussian random field theory

Automatically Reproducing Android Bug Reports Using Natural Language Processing and Reinforcement Learning

Causal conditional hidden Markov model for multimodal traffic prediction

Spatio-temporal neural structural causal models for bike flow prediction

Asynchronous Federated Learning Based Mobility-aware Caching in Vehicular Edge Computing

Combining Intra-Risk and Contagion Risk for Enterprise Bankruptcy Prediction Using Graph Neural Networks

Differentiable Channel Sparsity Search via Weight Sharing within Filters

Fast Electromagnetic Validations of Large-Scale Digital Coding Metasurfaces Accelerated by Recurrence Rebuild and Retrieval Method

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Learning Bi-typed Multi-relational Heterogeneous Graph via Dual Hierarchical Attention Networks

Medical Dialogue Response Generation with Pivotal Information Recalling

MSDF: A General Open-Domain Multi-Skill Dialog Framework

On the equilibrium of the Poisson-Nernst-Planck-Bikermann model equipping with the steric and correlation effects

ReMix: A General and Efficient Framework for Multiple Instance Learning based Whole Slide Image Classification

Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification

Stock Movement Prediction Based on Bi-typed Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Time-Dependent Performance Modeling for Platooning Communications at Intersection

Timing performance simulation for 3D 4H-SiC detector

End-to-End Human Object Interaction Detection with HOI Transformer

QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task

A unified structure preserving scheme for a multi-species model with a gradient flow structure and nonlocal interactions via singular kernels

ADMM-IDNN: Iteratively Double-reweighted Nuclear Norm Algorithm for Group-prior based Nonconvex Compressed Sensing via ADMM

Connecting Embeddings for Knowledge Graph Entity Typing

Fixed-Time Cooperative Tracking Control for Double-Integrator Multi-Agent Systems: A Time-Based Generator Approach

Nonconvex Nonsmooth Low-Rank Minimization for Generalized Image Compressed Sensing via Group Sparse Representation

Q-greedyUCB: a New Exploration Policy for Adaptive and Resource-efficient Scheduling

Designing Distributed Fixed-Time Consensus Protocols for Linear Multi-Agent Systems Over Directed Graphs

Distributed Average Tracking for Multiple Signals Generated by Linear Dynamical Systems: An Edge-based Framework

Fixed-time consensus of multiple double-integrator systems under directed topologies: A motion-planning approach

Neural Headline Generation with Sentence-wise Optimization

SICS: Secure In-Cloud Service Function Chaining

Polynomial bounds for decoupling, with applications

Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

Distributed average tracking for multiple reference signals with general linear dynamics

Almost totally complex points on elliptic curves

The Han-Kobayashi Region for a Class of Gaussian Interference Channels with Mixed Interference