Researcher profile

Yi Song

Yi Song contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Enumeration of weighted plane trees by a permutation model

This work addresses an enumeration problem on weighted bi-colored plane trees with prescribed vertex data, with all vertices labeled distinctly. We give a bijection proof of the enumeration formula originally due to Kochetkov, hence affirmatively answer a question of Adrianov-Pakovich-Zvonkin. The argument is purely combinatorial and totally constructive, remaining valid for real-valued edge weights. A central process is a geometric construction that directly encodes each tree as a permutation. We also exhibit algebraic relationships between the enumeration problem, the partial order on partitions of vertices and the Stirling numbers of the second kind. Some computation examples are presented as appendices.

preprint2022arXiv

A Comprehensive Empirical Investigation on Failure Clustering in Parallel Debugging

The clustering technique has attracted a lot of attention as a promising strategy for parallel debugging in multi-fault scenarios, this heuristic approach (i.e., failure indexing or fault isolation) enables developers to perform multiple debugging tasks simultaneously through dividing failed test cases into several disjoint groups. When using statement ranking representation to model failures for better clustering, several factors influence clustering effectiveness, including the risk evaluation formula (REF), the number of faults (NOF), the fault type (FT), and the number of successful test cases paired with one individual failed test case (NSP1F). In this paper, we present the first comprehensive empirical study of how these four factors influence clustering effectiveness. We conduct extensive controlled experiments on 1060 faulty versions of 228 simulated faults and 141 real faults, and the results reveal that: 1) GP19 is highly competitive across all REFs, 2) clustering effectiveness decreases as NOF increases, 3) higher clustering effectiveness is easier to achieve when a program contains only predicate faults, and 4) clustering effectiveness remains when the scale of NSP1F is reduced to 20%.

preprint2022arXiv

Channel State Acquisition in FDD Massive MIMO: Rate-Distortion Bound and Effectiveness of "Analog" Feedback

We consider the problem of estimating channel fading coefficients (modeled as a correlated Gaussian vector) via Downlink (DL) training and Uplink (UL) feedback in wideband FDD massive MIMO systems. Using rate-distortion theory, we derive optimal bounds on the achievable channel state estimation error in terms of the number of training pilots in DL ($β_{tr}$) and feedback dimension in UL ($β_{fb}$), with random, spatially isotropic pilots. It is shown that when the number of training pilots exceeds the channel covariance rank ($r$), the optimal rate-distortion feedback strategy achieves an estimation error decay of $Θ(SNR^{-α})$ in estimating the channel state, where $α= min (β_{fb}/r , 1)$ is the so-called quality scaling exponent. We also discuss an "analog" feedback strategy, showing that it can achieve the optimal quality scaling exponent for a wide range of training and feedback dimensions with no channel covariance knowledge and simple signal processing at the user side. Our findings are supported by numerical simulations comparing various strategies in terms of channel state mean squared error and achievable ergodic sum-rate in DL with zero-forcing precoding.

preprint2022arXiv

FDD Massive MIMO Channel Training Optimal Rate Distortion Bounds and the Efficiency of one-shot Schemes

We study the problem of providing channel state information (CSI) at the transmitter in multi-user massive MIMO systems operating in frequency division duplexing (FDD). The wideband MIMO channel is a vector-valued random process correlated in time, space (antennas), and frequency (subcarriers). The base station (BS) broadcasts periodically beta_tr pilot symbols from its M antenna ports to K single-antenna users (UEs). Correspondingly, the K UEs send feedback messages about their channel state using beta_fb symbols in the uplink (UL). Using results from remote rate-distortion theory, we show that, as snr reaches infty, the optimal feedback strategy achieves a channel state estimation mean squared error (MSE) that behaves as Theta(1) if beta_tr < r and as Theta(snr^(-alpha)) when beta_tr >=r, where alpha = min(beta_fb/r, 1), where r is the rank of the channel covariance matrix. The MSE-optimal rate-distortion strategy implies encoding of long sequences of channel states, which would yield completely stale CSI and therefore poor multiuser precoding performance. Hence, we consider three practical one-shot CSI strategies with minimum one-slot delay and analyze their large-SNR channel estimation MSE behavior. These are: (1) digital feedback via entropy-coded scalar quantization (ECSQ), (2) analog feedback (AF), and (3) local channel estimation at the UEs and digital feedback. These schemes have different requirements in terms of knowledge of the channel statistics at the UE and at the BS. In particular, the latter strategy requires no statistical knowledge and is closely inspired by a CSI feedback scheme currently proposed in 3GPP standardization.

preprint2022arXiv

Many-Class Text Classification with Matching

In this work, we formulate \textbf{T}ext \textbf{C}lassification as a \textbf{M}atching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels, which helps distinguish each class better when the class number is large, especially in low-resource scenarios. TCM is also easy to implement and is compatible with various large pretrained language models. We evaluate TCM on 4 text classification datasets (each with 20+ labels) in both few-shot and full-data settings, and this model demonstrates significant improvements over other text classification paradigms. We also conduct extensive experiments with different variants of TCM and discuss the underlying factors of its success. Our method and analyses offer a new perspective on text classification.

preprint2021arXiv

Robust Kalman filter-based dynamic state estimation of natural gas pipeline networks

To obtain the accurate transient states of the big scale natural gas pipeline networks under the bad data and non-zero mean noises conditions, a robust Kalman filter-based dynamic state estimation method is proposed using the linearized gas pipeline transient flow equations in this paper. Firstly, the dynamic state estimation model is built. Since the gas pipeline transient flow equations are less than the states, the boundary conditions are used as supplementary constraints to predict the transient states. To increase the measurement redundancy, the zero mass flow rate constraints at the sink nodes are taken as virtual measurements. Secondly, to ensure the stability under bad data condition, the robust Kalman filter algorithm is proposed by introducing a time-varying scalar matrix to regulate the measurement error variances correctly according to the innovation vector at every time step. At last, the proposed method is applied to a 30-node gas pipeline networks in several kinds of measurement conditions. The simulation shows that the proposed robust dynamic state estimation can decrease the effects of bad data and achieve better estimating results.

preprint2020arXiv

An Exploratory Study of Argumentative Writing by Young Students: A Transformer-based Approach

We present a computational exploration of argument critique writing by young students. Middle school students were asked to criticize an argument presented in the prompt, focusing on identifying and explaining the reasoning flaws. This task resembles an established college-level argument critique task. Lexical and discourse features that utilize detailed domain knowledge to identify critiques exist for the college task but do not perform well on the young students data. Instead, transformer-based architecture (e.g., BERT) fine-tuned on a large corpus of critique essays from the college task performs much better (over 20% improvement in F1 score). Analysis of the performance of various configurations of the system suggests that while children&#39;s writing does not exhibit the standard discourse structure of an argumentative essay, it does share basic local sequential structures with the more mature writers.