Source author record

Yao Xiao

Yao Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Information Retrieval Computation and Language Machine Learning math.DS physics.soc-ph cond-mat.mes-hall cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing eess.IV math.MG math.NA Multiagent Systems Networking and Internet Architecture Numerical Analysis physics.comp-ph physics.med-ph

Catalog footprint

What is connected

22works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.

preprint2026arXiv

The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load

Our study examines how generative AI (GenAI) influences performance, creative self-efficacy, and cognitive load in architectural conceptual design tasks. Thirty-six student participants from Architectural Engineering and other disciplines completed a two-phase architectural design task, first independently and then with external tools (GenAI-assisted condition and control condition using an online repository of existing architectural projects). Design outcomes were evaluated by expert raters, while self-efficacy and cognitive load were self-reported after each phase. Difference-in-differences analyses revealed no overall performance advantage of GenAI across participants; however, subgroup analyses showed that GenAI significantly improved design performance for novice designers. In contrast, general creative self-efficacy declined for students using GenAI. Cognitive load did not differ significantly between conditions, though prompt usage patterns showed that iterative idea generation and visual feedback prompts were linked to greater reductions in cognitive load. These findings suggest that GenAI effectiveness depends on users' prior expertise and interaction strategies through prompting.

preprint2026arXiv

When Preference Labels Fall Short: Aligning Diffusion Models from Real Data

Preference alignment aims to guide generative models by learning from comparisons between preferred and non-preferred samples. In practice, most existing approaches rely on preference pairs constructed from model-generated images. Such supervision is inherently relative and can be ambiguous when both samples exhibit artifacts or limited visual quality, making it difficult to infer what constitutes a truly desirable output. In this work, we investigate whether real data can serve as an alternative source of supervision for preference alignment. We adopt a data-centric perspective and study a curation strategy that treats real images as reference points and constructs preference signals by contrasting them with generated or perturbed samples, without requiring manually annotated preference pairs. Through empirical analysis, we show that real-data-based supervision provides effective guidance for aligning diffusion models and achieves performance comparable to existing preference-based methods. Our results suggest that real data offers a practical and complementary source of supervision for preference alignment and highlight directions of label-efficient alignment strategies. Code and models are available at https://cwyxx.github.io/RealAlign.

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2023arXiv

Determinate Node Selection for Semi-supervised Classification Oriented Graph Convolutional Networks

Graph Convolutional Networks (GCNs) have been proved successful in the field of semi-supervised node classification by extracting structural information from graph data. However, the random selection of labeled nodes used by GCNs may lead to unstable generalization performance of GCNs. In this paper, we propose an efficient method for the deterministic selection of labeled nodes: the Determinate Node Selection (DNS) algorithm. The DNS algorithm identifies two categories of representative nodes in the graph: typical nodes and divergent nodes. These labeled nodes are selected by exploring the structure of the graph and determining the ability of the nodes to represent the distribution of data within the graph. The DNS algorithm can be applied quite simply on a wide range of semi-supervised graph neural network models for node classification tasks. Through extensive experimentation, we have demonstrated that the incorporation of the DNS algorithm leads to a remarkable improvement in the average accuracy of the model and a significant decrease in the standard deviation, as compared to the original method.

preprint2022arXiv

A Spatial Deconstruction Behavioural Model for Pedestrian Dynamics

How to reproduce realistic motion in simulations has always been a fundamental problem for pedestrian dynamics, and a critical challenge for current studies is the natural correlation of the movement choices and the human behaviours. To advance the study, we deconstruct the two-dimensional space of pedestrian crowds with a behavioural motion network on basis of Voronoi diagram and Delaunay diagram, and accordingly formulate a behavioural model considering both the long-term route choice and the real-time velocity choice. The comparisons between simulation results and real-world movements suggest that the behavioural model can not only deal with traditional scenarios but also be capable of describing well-agreed movements in challenging conflicting situations. The spatial deconstruction idea provides a geometric perspective to recognize and reproduce the pedestrian behaviours, and it can further benefit the understanding and realization of pedestrian dynamics and even robot navigation.

preprint2022arXiv

AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

Industrial search and recommendation systems mostly follow the classic multi-stage information retrieval paradigm: matching, pre-ranking, ranking, and re-ranking stages. To account for system efficiency, simple vector-product based models are commonly deployed in the pre-ranking stage. Recent works consider distilling the high knowledge of large ranking models to small pre-ranking models for better effectiveness. However, two major challenges in pre-ranking system still exist: (i) without explicitly modeling the performance gain versus computation cost, the predefined latency constraint in the pre-ranking stage inevitably leads to suboptimal solutions; (ii) transferring the ranking teacher's knowledge to a pre-ranking student with a predetermined handcrafted architecture still suffers from the loss of model performance. In this work, a novel framework AutoFAS is proposed which jointly optimizes the efficiency and effectiveness of the pre-ranking model: (i) AutoFAS for the first time simultaneously selects the most valuable features and network architectures using Neural Architecture Search (NAS) technique; (ii) equipped with ranking model guided reward during NAS procedure, AutoFAS can select the best pre-ranking architecture for a given ranking teacher without any computation overhead. Experimental results in our real world search system show AutoFAS consistently outperforms the previous state-of-the-art (SOTA) approaches at a lower computing cost. Notably, our model has been adopted in the pre-ranking module in the search system of Meituan, bringing significant improvements.

preprint2022arXiv

Automated WBRT Treatment Planning via Deep Learning Auto-Contouring and Customizable Landmark-Based Field Aperture Design

In this work, we developed and evaluated a novel pipeline consisting of two landmark-based field aperture generation approaches for WBRT treatment planning; they are fully automated and customizable. The automation pipeline is beneficial for both clinicians and patients, where we can reduce clinician workload and reduce treatment planning time. The customizability of the field aperture design addresses different clinical requirements and allows the personalized design to become feasible. The performance results regarding quantitative and qualitative evaluations demonstrated that our plans were comparable with the original clinical plans. This technique has been deployed as part of a fully automated treatment planning tool for whole-brain cancer and could be translated to other treatment sites in the future.

preprint2022arXiv

Contrastive Information Transfer for Pre-Ranking Systems

Real-word search and recommender systems usually adopt a multi-stage ranking architecture, including matching, pre-ranking, ranking, and re-ranking. Previous works mainly focus on the ranking stage while very few focus on the pre-ranking stage. In this paper, we focus on the information transfer from ranking to pre-ranking stage. We propose a new Contrastive Information Transfer (CIT) framework to transfer useful information from ranking model to pre-ranking model. We train the pre-ranking model to distinguish the positive pair of representation from a set of positive and negative pairs with a contrastive objective. As a consequence, the pre-ranking model can make full use of rich information in ranking model's representations. The CIT framework also has the advantage of alleviating selection bias and improving the performance of recall metrics, which is crucial for pre-ranking models. We conduct extensive experiments including offline datasets and online A/B testing. Experimental results show that CIT achieves superior results than competitive models. In addition, a strict online A/B testing at one of the world's largest E-commercial platforms shows that the proposed model achieves 0.63\% improvements on CTR and 1.64\% improvements on VBR. The proposed model now has been deployed online and serves the main traffic of this system, contributing a remarkable business growth.

preprint2022arXiv

Contributions of Shape, Texture, and Color in Visual Recognition

We investigate the contributions of three important features of the human visual system (HVS)~ -- ~shape, texture, and color ~ -- ~to object classification. We build a humanoid vision engine (HVE) that explicitly and separately computes shape, texture, and color features from images. The resulting feature vectors are then concatenated to support the final classification. We show that HVE can summarize and rank-order the contributions of the three features to object recognition. We use human experiments to confirm that both HVE and humans predominantly use some specific features to support the classification of specific classes (e.g., texture is the dominant feature to distinguish a zebra from other quadrupeds, both for humans and HVE). With the help of HVE, given any environment (dataset), we can summarize the most important features for the whole task (task-specific; e.g., color is the most important feature overall for classification with the CUB dataset), and for each class (class-specific; e.g., shape is the most important feature to recognize boats in the iLab-20M dataset). To demonstrate more usefulness of HVE, we use it to simulate the open-world zero-shot learning ability of humans with no attribute labeling. Finally, we show that HVE can also simulate human imagination ability with the combination of different features. We will open-source the HVE engine and corresponding datasets.

preprint2022arXiv

Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization

We focus on controllable disentangled representation learning (C-Dis-RL), where users can control the partition of the disentangled latent space to factorize dataset attributes (concepts) for downstream tasks. Two general problems remain under-explored in current methods: (1) They lack comprehensive disentanglement constraints, especially missing the minimization of mutual information between different attributes across latent and observation domains. (2) They lack convexity constraints, which is important for meaningfully manipulating specific attributes for downstream tasks. To encourage both comprehensive C-Dis-RL and convexity simultaneously, we propose a simple yet efficient method: Controllable Interpolation Regularization (CIR), which creates a positive loop where disentanglement and convexity can help each other. Specifically, we conduct controlled interpolation in latent space during training, and we reuse the encoder to help form a 'perfect disentanglement' regularization. In that case, (a) disentanglement loss implicitly enlarges the potential understandable distribution to encourage convexity; (b) convexity can in turn improve robust and precise disentanglement. CIR is a general module and we merge CIR with three different algorithms: ELEGANT, I2I-Dis, and GZS-Net to show the compatibility and effectiveness. Qualitative and quantitative experiments show improvement in C-Dis-RL and latent convexity by CIR. This further improves downstream tasks: controllable image synthesis, cross-modality image translation, and zero-shot synthesis.

preprint2022arXiv

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of high-level programs down to the universal intermediate representation, extracting the specific computational patterns and predicting which code segments would run best on a specific core in heterogeneous hardware platforms. The proposed framework extracts multi-fractal topological features from code graphs, utilizes graph autoencoders to learn how to partition the graph into computational kernels, and exploits graph neural networks (GNN) to predict the correct assignment to a processor type. In the evaluation, we validate the PGL framework and demonstrate a maximum speedup of 6.42x compared to the thread-based execution, and 2.02x compared to the state-of-the-art technique.

preprint2022arXiv

Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction

Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR) prediction applications, especially in industrial recommender, search, or advertising systems. However, it's non-trivial for real-world systems to make full use of long-term user behaviors due to the strict requirements of online serving time. Most previous works adopt the retrieval-based strategy, where a small number of user behaviors are retrieved first for subsequent attention. However, the retrieval-based methods are sub-optimal and would cause more or less information losses, and it's difficult to balance the effectiveness and efficiency of the retrieval algorithm. In this paper, we propose SDIM (Sampling-based Deep Interest Modeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors. We sample from multiple hash functions to generate hash signatures of the candidate item and each item in the user behavior sequence, and obtain the user interest by directly gathering behavior items associated with the candidate item with the same hash signature. We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors, while being sizable times faster. We also introduce the deployment of SDIM in our system. Specifically, we decouple the behavior sequence hashing, which is the most time-consuming part, from the CTR model by designing a separate module named BSE (behavior Sequence Encoding). BSE is latency-free for the CTR server, enabling us to model extremely long user behaviors. Both offline and online experiments are conducted to demonstrate the effectiveness of SDIM. SDIM now has been deployed online in the search system of Meituan APP.

preprint2022arXiv

Sequential image recovery from noisy and under-sampled Fourier data

A new algorithm is developed to jointly recover a temporal sequence of images from noisy and under-sampled Fourier data. Specifically, we consider the case where each data set is missing vital information that prevents its (individual) accurate recovery. Our new method is designed to restore the missing information in each individual image by "borrowing" it from the other images in the sequence. As a result, {\em all} of the individual reconstructions yield improved accuracy. The use of high resolution Fourier edge detection methods is essential to our algorithm. In particular, edge information is obtained directly from the Fourier data which leads to an accurate coupling term between data sets. Moreover, data loss is largely avoided as coarse reconstructions are not required to process inter- and intra-image information. Numerical examples are provided to demonstrate the accuracy, efficiency and robustness of our new method.

preprint2020arXiv

Modeling indoor-level non-pharmaceutical interventions during the COVID-19 pandemic: a pedestrian dynamics-based microscopic simulation approach

Mathematical modeling of epidemic spreading has been widely adopted to estimate the threats of epidemic diseases (i.e., the COVID-19 pandemic) as well as to evaluate epidemic control interventions. The indoor place is considered to be a significant epidemic spreading risk origin, but existing widely-used epidemic spreading models are usually limited for indoor places since the dynamic physical distance changes between people are ignored, and the empirical features of the essential and non-essential travel are not differentiated. In this paper, we introduce a pedestrian-based epidemic spreading model that is capable of modeling indoor transmission risks of diseases during people's social activities. Taking advantage of the before-and-after mobility data from the University of Maryland COVID-19 Impact Analysis Platform, it's found that people tend to spend more time in grocery stores once their travel frequencies are restricted to a low level. In other words, an increase in dwell time could balance the decrease in travel frequencies and satisfy people's demand. Based on the pedestrian-based model and the empirical evidence, combined non-pharmaceutical interventions from different operational levels are evaluated. Numerical simulations show that restrictions on people's travel frequency and open-hours of indoor places may not be universally effective in reducing average infection risks for each pedestrian who visit the place. Entry limitations can be a widely effective alternative, whereas the decision-maker needs to balance the decrease in risky contacts and the increase in queue length outside the place that may impede people from fulfilling their travel needs.

preprint2019arXiv

Anharmonic corrections to the multiphonon deep-level charge capture ab initio calculations for semiconductors

Nonradiative carrier recombination at semiconductor deep centers is of great importance to both fundamental physics and device engineering. In this letter, we provide a revised analysis of K. Huang's original nonradiative multi-phonon (NMP) theory with ab initio calculations. First, we identify at first-principle level that Huang's concise formula gives the same results as the matrix based formula, and Huang's high temperature formula provides an analytical expression for the coupling constant in Marcus theory. Secondly, the anharmonic effects are corrected by taking into account local phonon mode variation at different charge states of the defect. The corrected capture rates for defects in GaN and SiC agree well with experiments.

preprint2017arXiv

Correction of exciton binding energy in monolayer transition metal dichalcogenides

We theoretically investigate the corrections of exciton binding energy in monolayer transition metal dichalcogenides (TMDs) due to the exciton-optical phonon coupling in the Fr$\ddot{o}$hlich interaction model by using the linear operator combined Lee-Low-Pines variational method. We not only consider the excitons couple with the intrinsic longitudinal optical (LO) phonon modes, but also the surface optical phonon modes that induced by the polar substrates underneath the TMDs. We find that exciton binding energies are corrected in a large scale due to these exciton-optical phonon couplings. We discuss the dependences of exciton binding energy on the cut-off wave vector of optical phonon modes, the polarization parameters of materials and the interlayer distance between the polar substrates and TMDs. These results provide potential explanations for the divergence of the exciton binding energy between experiment and theory in TMDs.

preprint2014arXiv

1-HKUST: Object Detection in ILSVRC 2014

The Imagenet Large Scale Visual Recognition Challenge (ILSVRC) is the one of the most important big data challenges to date. We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams. We introduce in our object detection system a number of novel techniques in localization and recognition. For localization, initial candidate proposals are generated using selective search, and a novel bounding boxes regression method is used for better object localization. For recognition, to represent a candidate proposal, we adopt three features, namely, RCNN feature, IFV feature, and DPM feature. Given these features, category-specific combination functions are learned to improve the object recognition rate. In addition, object context in the form of background priors and object interaction priors are learned and applied in our system. Our ILSVRC 2014 results are reported alongside with the results of other participating teams.

preprint2014arXiv

MobiCacher: Mobility-Aware Content Caching in Small-Cell Networks

Small-cell networks have been proposed to meet the demand of ever growing mobile data traffic. One of the prominent challenges faced by small-cell networks is the lack of sufficient backhaul capacity to connect small-cell base stations (small-BSs) to the core network. We exploit the effective application layer semantics of both spatial and temporal locality to reduce the backhaul traffic. Specifically, small-BSs are equipped with storage facility to cache contents requested by users. As the {\em cache hit ratio} increases, most of the users' requests can be satisfied locally without incurring traffic over the backhaul. To make informed caching decisions, the mobility patterns of users must be carefully considered as users might frequently migrate from one small cell to another. We study the issue of mobility-aware content caching, which is formulated into an optimization problem with the objective to maximize the caching utility. As the problem is NP-complete, we develop a polynomial-time heuristic solution termed {\em MobiCacher} with bounded approximation ratio. We also conduct trace-based simulations to evaluate the performance of {\em MobiCacher}, which show that {\em MobiCacher} yields better caching utility than existing solutions.

preprint2014arXiv

Multifractal analysis of dimension spectrum and the set of irregular points in non-uniformly hyperbolic systems

We study the multifractal analysis of dimension spectrum for almost additive potential in a class of one dimensional non-uniformly hyperbolic dynamic systems and prove that the irregular set has full Hausdroff dimension.

preprint2013arXiv

Generalized Fermat point

In this paper, we generalized the classical Fermat point, proved the sufficient and necessary condition for uniqueness and existence for the generalized Fermat point(GFP) theorem, and discuss some interesting geometric property of the generalized Fermat point.

preprint2013arXiv

Higher dimensional multifractal analysis of non-uniformly hyperbolic systems

Johansson, Jordan, Öberg and Pollicott ( Israel J. Math.(2010)) has studied the multifractal analysis of a class of one-dimensional non-uniformly hyperbolic systems, by introducing some new techniques, we extend the results to the case of high dimension.

Yao Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load

When Preference Labels Fall Short: Aligning Diffusion Models from Real Data

Training Report of TeleChat3-MoE

Determinate Node Selection for Semi-supervised Classification Oriented Graph Convolutional Networks

A Spatial Deconstruction Behavioural Model for Pedestrian Dynamics

AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

Automated WBRT Treatment Planning via Deep Learning Auto-Contouring and Customizable Landmark-Based Field Aperture Design

Contrastive Information Transfer for Pre-Ranking Systems

Contributions of Shape, Texture, and Color in Visual Recognition

Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction

Sequential image recovery from noisy and under-sampled Fourier data

Modeling indoor-level non-pharmaceutical interventions during the COVID-19 pandemic: a pedestrian dynamics-based microscopic simulation approach

Anharmonic corrections to the multiphonon deep-level charge capture ab initio calculations for semiconductors

Correction of exciton binding energy in monolayer transition metal dichalcogenides

1-HKUST: Object Detection in ILSVRC 2014

MobiCacher: Mobility-Aware Content Caching in Small-Cell Networks

Multifractal analysis of dimension spectrum and the set of irregular points in non-uniformly hyperbolic systems

Generalized Fermat point

Higher dimensional multifractal analysis of non-uniformly hyperbolic systems