Source author record

Rui Yao

Rui Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control eess.SY astro-ph.GA Computer Vision eess.SP Machine Learning Methodology Artificial Intelligence astro-ph.HE astro-ph.IM Computation and Language Data Structures and Algorithms math.OC Multiagent Systems physics.soc-ph stat.OT

Catalog footprint

What is connected

15works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.

preprint2026arXiv

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths

"Fedspeak", the stylized and often nuanced language used by the U.S. Federal Reserve, encodes implicit policy signals and strategic stances. The Federal Open Market Committee strategically employs Fedspeak as a communication tool to shape market expectations and influence both domestic and global economic conditions. As such, automatically parsing and interpreting Fedspeak presents a high-impact challenge, with significant implications for financial forecasting, algorithmic trading, and data-driven policy analysis. In this paper, we propose an LLM-based, uncertainty-aware framework for deciphering Fedspeak and classifying its underlying monetary policy stance. Technically, to enrich the semantic and contextual representation of Fedspeak texts, we incorporate domain-specific reasoning grounded in the monetary policy transmission mechanism. We further introduce a dynamic uncertainty decoding module to assess the confidence of model predictions, thereby enhancing both classification accuracy and model reliability. Experimental results demonstrate that our framework achieves state-of-the-art performance on the policy stance analysis task. Moreover, statistical analysis reveals a significant positive correlation between perceptual uncertainty and model error rates, validating the effectiveness of perceptual uncertainty as a diagnostic signal.

preprint2022arXiv

A ridesharing simulation platform that considers dynamic supply-demand interactions

This paper presents a new ridesharing simulation platform that accounts for dynamic driver supply and passenger demand, and complex interactions between drivers and passengers. The proposed simulation platform explicitly considers driver and passenger acceptance/rejection on the matching options, and cancellation before/after being matched. New simulation events, procedures and modules have been developed to handle these realistic interactions. The capabilities of the simulation platform are illustrated using numerical experiments. The experiments confirm the importance of considering supply and demand interactions and provide new insights to ridesharing operations. Results show that increase of driver supply does not always increase matching option accept rate, and larger matching window could have negative impacts on overall ridesharing success rate. These results emphasize the importance of a careful planning of a ridesharing system.

preprint2022arXiv

D-optimal Data Fusion: Exact and Approximation Algorithms

We study the D-optimal Data Fusion (DDF) problem, which aims to select new data points, given an existing Fisher information matrix, so as to maximize the logarithm of the determinant of the overall Fisher information matrix. We show that the DDF problem is NP-hard and has no constant-factor polynomial-time approximation algorithm unless P $=$ NP. Therefore, to solve the DDF problem effectively, we propose two convex integer-programming formulations and investigate their corresponding complementary and Lagrangian-dual problems. We also develop scalable randomized-sampling and local-search algorithms with provable performance guarantees. Leveraging the concavity of the objective functions in the two proposed formulations, we design an exact algorithm, aimed at solving the DDF problem to optimality. We further derive a family of submodular valid inequalities and optimality cuts, which can significantly enhance the algorithm performance. Finally, we test our algorithms using real-world data on the new phasor-measurement-units placement problem for modern power grids, considering the existing conventional sensors. Our numerical study demonstrates the efficiency of our exact algorithm and the scalability and high-quality outputs of our approximation algorithms.

preprint2022arXiv

Efficient Truncated Linear Regression with Unknown Noise Variance

Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.

preprint2022arXiv

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation. Most existing regression based methods resort to regress the masks or contour points of text regions to model the text instances. However, regressing the complete masks requires high training complexity, and contour points are not sufficient to capture the details of highly curved texts. To tackle the above limitations, we propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors. Further, considering the imbalanced number of training samples among pyramid layers, we only employ a single-level head for top-down prediction. To model the multi-scale texts in a single-level head, we introduce a novel positive sampling strategy by treating the shrunk text region as positive samples, and design a feature awareness module (FAM) for spatial-awareness and scale-awareness by fusing rich contextual information and focusing on more significant features. Moreover, we propose a segmented non-maximum suppression (S-NMS) method that can filter low-quality mask regressions. Extensive experiments are conducted on four challenging datasets, which demonstrate our TextDCT obtains competitive performance on both accuracy and efficiency. Specifically, TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.

preprint2021arXiv

Online detection of cascading change-points

We propose an online detection procedure for cascading failures in the network from sequential data, which can be modeled as multiple correlated change-points happening during a short period. We consider a temporal diffusion network model to capture the temporal dynamic structure of multiple change-points and develop a sequential Shewhart procedure based on the generalized likelihood ratio statistics based on the diffusion network model assuming unknown post-change distribution parameters. We also tackle the computational complexity posed by the unknown propagation. Numerical experiments demonstrate the good performance for detecting cascade failures.

preprint2020arXiv

A Dynamic Tree Algorithm for On-demand Peer-to-peer Ride-sharing Matching

Innovative shared mobility services provide on-demand flexible mobility options and have the potential to alleviate traffic congestion. These attractive services are challenging from different perspectives. One major challenge in such systems is to find suitable ride-sharing matchings between drivers and passengers with respect to the system objective and constraints, and to provide optimal pickup and drop-off sequence to the drivers. In this paper, we develop an efficient dynamic tree algorithm to find the optimal pickup and drop-off sequence. The algorithm finds an initial solution to the problem, keeps track of previously explored feasible solutions, and reduces the solution search space when considering new requests. In addition, an efficient pre-processing procedure to select candidate passenger requests is proposed, which further improves the algorithm performance. Numerical experiments are conducted on a real size network to illustrate the efficiency of our algorithm. Sensitivity analysis suggests that small vehicle capacities and loose excess travel time constraints do not guarantee overall savings in vehicle kilometer traveled.

preprint2020arXiv

A Fast Radio Burst discovered in FAST drift scan survey

We report the discovery of a highly dispersed fast radio burst, FRB~181123, from an analysis of $\sim$1500~hr of drift-scan survey data taken using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The pulse has three distinct emission components, which vary with frequency across our 1.0--1.5~GHz observing band. We measure the peak flux density to be $>0.065$~Jy and the corresponding fluence $>0.2$~Jy~ms. Based on the observed dispersion measure of 1812~cm$^{-3}$~pc, we infer a redshift of $\sim 1.9$. From this, we estimate the peak luminosity and isotropic energy to be $\lesssim 2\times10^{43}$~erg~s$^{-1}$ and $\lesssim 2\times10^{40}$~erg, respectively. With only one FRB from the survey detected so far, our constraints on the event rate are limited. We derive a 95\% confidence lower limit for the event rate of 900 FRBs per day for FRBs with fluences $>0.025$~Jy~ms. We performed follow-up observations of the source with FAST for four hours and have not found a repeated burst. We discuss the implications of this discovery for our understanding of the physical mechanisms of FRBs.

preprint2020arXiv

Experiments on route choice set generation using a large GPS trajectory set

Several route choice models developed in the literature were based on a relatively small number of observations. With the extensive use of tracking devices in recent surveys, there is a possibility to obtain insights with respect to the traveler's choice behavior. In this paper, different path generation algorithms are evaluated using a large GPS trajectory dataset. The dataset contains 6,000 observations from Tel-Aviv metropolitan area. An initial analysis is performed by generating a single route based on the shortest path. Almost 60% percent of the 6,000 observations can be covered (assuming a threshold of 80% overlap) using a single path. This result significantly contrasts previous literature findings. Link penalty, link elimination, simulation and via-node methods are applied to generate route sets, and the consistency of the algorithms are compared. A modified link penalty method, which accounts for preference of using higher hierarchical roads, provides a route set with 97% coverage (80% overlap threshold). The via-node method produces route set with satisfying coverage, and generates routes that are more heterogeneous (in terms number of links and routes ratio).

preprint2020arXiv

Guiding Cascading Failure Search with Interpretable Graph Convolutional Network

Power system cascading failures become more time variant and complex because of the increasing network interconnection and higher renewable energy penetration. High computational cost is the main obstacle for a more frequent online cascading failure search, which is essential to improve system security. In this work, we show that the complex mechanism of cascading failures can be well captured by training a graph convolutional network (GCN) offline. Subsequently, the search of cascading failures can be significantly accelerated with the aid of the trained GCN model. We link the power network topology with the structure of the GCN, yielding a smaller parameter space to learn the complex mechanism. We further enable the interpretability of the GCN model by a layer-wise relevance propagation (LRP) algorithm. The proposed method is tested on both the IEEE RTS-79 test system and China's Henan Province power system. The results show that the GCN guided method can not only accelerate the search of cascading failures, but also reveal the reasons for predicting the potential cascading failures.

preprint2020arXiv

The Fundamental Performance of FAST with 19-beam Receiver at L Band

The Five-hundred-meter Aperture Spherical radio Telescope (FAST) passed national acceptance and is taking pilot cycle of 'Shared-Risk' observations. The 19-beam receiver covering 1.05-1.45 GHz was used for most of these observations. The electronics gain fluctuation of the system is better than 1\% over 3.5 hours, enabling enough stability for observations. Pointing accuracy, aperture efficiency and system temperature are three key parameters of FAST. The measured standard deviation of pointing accuracy is 7.9$''$, which satisfies the initial design of FAST. When zenith angle is less than 26.4$^\circ$, the aperture efficiency and system temperature around 1.4 GHz are $\sim$ 0.63 and less than 24 K for central beam, respectively. The measured value of these two parameters are better than designed value of 0.6 and 25 K, respectively. The sensitivity and stability of the 19-beam backend are confirmed to satisfy expectation by spectral HI observations toward N672 and polarization observations toward 3C286. The performance allows FAST to take sensitive observations in various scientific goals, from studies of pulsar to galaxy evolution.

preprint2020arXiv

Vehicle Re-Identification Based on Complementary Features

In this work, we present our solution to the vehicle re-identification (vehicle Re-ID) track in AI City Challenge 2020 (AIC2020). The purpose of vehicle Re-ID is to retrieve the same vehicle appeared across multiple cameras, and it could make a great contribution to the Intelligent Traffic System(ITS) and smart city. Due to the vehicle's orientation, lighting and inter-class similarity, it is difficult to achieve robust and discriminative representation feature. For the vehicle Re-ID track in AIC2020, our method is to fuse features extracted from different networks in order to take advantages of these networks and achieve complementary features. For each single model, several methods such as multi-loss, filter grafting, semi-supervised are used to increase the representation ability as better as possible. Top performance in City-Scale Multi-Camera Vehicle Re-Identification demonstrated the advantage of our methods, and we got 5-th place in the vehicle Re-ID track of AIC2020. The codes are available at https://github.com/gggcy/AIC2020_ReID.

preprint2019arXiv

Pilot HI Survey of Planck Galactic Cold Clumps with FAST

We present a pilot HI survey of 17 Planck Galactic Cold Clumps (PGCCs) with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). HI Narrow Self-Absorption (HINSA) is an effective method to detect cold HI being mixed with molecular hydrogen H$_2$ and improves our understanding of the atomic to molecular transition in the interstellar medium. HINSA was found in 58\% PGCCs that we observed. The column density of HINSA was found to have an intermediate correlation with that of $^{13}$CO, following $\rm log( N(HINSA)) = (0.52\pm 0.26) log(N_{^{13}CO}) + (10 \pm 4.1) $. HI abundance relative to total hydrogen [HI]/[H] has an average value of $4.4\times 10^{-3}$, which is about 2.8 times of the average value of previous HINSA surveys toward molecular clouds. For clouds with total column density N$\rm_H >5 \times 10^{20}$ cm$^{-2}$, an inverse correlation between HINSA abundance and total hydrogen column density is found, confirming the depletion of cold HI gas during molecular gas formation in more massive clouds. Nonthermal line width of $^{13}$CO is about 0-0.5 km s$^{-1}$ larger than that of HINSA. One possible explanation of narrower nonthermal width of HINSA is that HINSA region is smaller than that of $^{13}$CO. Based on an analytic model of H$_2$ formation and H$_2$ dissociation by cosmic ray, we found the cloud ages to be within 10$^{6.7}$-10$^{7.0}$ yr for five sources.

preprint2016arXiv

Risk Assessment of Multi-timescale Cascading Outages based on Markovian Tree Search

In the risk assessment of cascading outages, the rationality of simulation and efficiency of computation are both of great significance. To overcome the drawback of sampling-based methods that huge computation resources are required and the shortcoming of initial contingency selection practices that the dependencies in sequences of outages are omitted, this paper proposes a novel risk assessment approach by searching on Markovian Tree. The Markovian tree model is reformulated from the quasi-dynamic multi-timescale simulation model proposed recently to ensure reasonable modeling and simulation of cascading outages. Then a tree search scheme is established to avoid duplicated simulations on same cascade paths, significantly saving computation time. To accelerate the convergence of risk assessment, a risk estimation index is proposed to guide the search for states with major contributions to the risk, and the risk assessment is realized based on the risk estimation index with a forward tree search and backward update algorithm. The effectiveness of the proposed method is illustrated on a 4-node power system, and its convergence profile as well as efficiency is demonstrated on the RTS-96 test system.

Rui Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths

A ridesharing simulation platform that considers dynamic supply-demand interactions

D-optimal Data Fusion: Exact and Approximation Algorithms

Efficient Truncated Linear Regression with Unknown Noise Variance

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Online detection of cascading change-points

A Dynamic Tree Algorithm for On-demand Peer-to-peer Ride-sharing Matching

A Fast Radio Burst discovered in FAST drift scan survey

Experiments on route choice set generation using a large GPS trajectory set

Guiding Cascading Failure Search with Interpretable Graph Convolutional Network

The Fundamental Performance of FAST with 19-beam Receiver at L Band

Vehicle Re-Identification Based on Complementary Features

Pilot HI Survey of Planck Galactic Cold Clumps with FAST

Risk Assessment of Multi-timescale Cascading Outages based on Markovian Tree Search