Researcher profile

Yucheng Liu

Yucheng Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance

Vision-Language Models (VLMs) have enhanced traditional LLMs with visual capabilities through the integration of vision encoders. While recent works have explored various combinations of vision encoders and LLMs, there still lacks a principled understanding of what makes a vision encoder suitable for VLM alignment. In this paper, we systematically investigate this question via comprehensive experiments on a curated collection of 19 pre-trained vision encoders from diverse sources. We first demonstrate that common practices, such as choosing encoders with the largest size or highest zero-shot accuracy, consistently fail to identify optimal models. In fact, these metrics show only weak to moderate correlation with VLM performance. This intriguing finding begs a fundamental question: What factors of vision-encoders matter in VLM? Through comprehensive analysis, we identify that the structural similarity across modalities plays a crucial but previously overlooked role in vision-encoder selection, which we measure using the Gromov-Wasserstein distance as a proxy. From a theoretical perspective, we show that the learnability of cross-modality mapping can be provably associated with the Gromov-Wasserstein distance. Empirical verification on 60+ full VLM training runs shows that our proposed inference-only metric performs significantly better than alternative model selection strategies and exhibits a much stronger correlation with final VLM performance, thereby enabling efficient and effective prediction of VLM performance before full training.

preprint2022arXiv

Information Leakage in Index Coding

We study the information leakage to a guessing adversary in index coding with a general message distribution. Under both vanishing-error and zero-error decoding assumptions, we develop lower and upper bounds on the optimal leakage rate, which are based on the broadcast rate of the subproblem induced by the set of messages the adversary tries to guess. When the messages are independent and uniformly distributed, the lower and upper bounds match, establishing an equivalence between the two rates.

preprint2022arXiv

Information Leakage in Index Coding With Sensitive and Non-Sensitive Messages

Information leakage to a guessing adversary in index coding is studied, where some messages in the system are sensitive and others are not. The non-sensitive messages can be used by the server like secret keys to mitigate leakage of the sensitive messages to the adversary. We construct a deterministic linear coding scheme, developed from the rank minimization method based on fitting matrices (Bar-Yossef et al. 2011). The linear scheme leads to a novel upper bound on the optimal information leakage rate, which is proved to be tight over all deterministic scalar linear codes. We also derive a converse result from a graph-theoretic perspective, which holds in general over all deterministic and stochastic coding schemes.

preprint2022arXiv

Some quadratic inequalities on product varieties

We prove some positivity results on the coefficients in the complexified Hilbert polynomial of a semi-stable object. After applying these results on the classical slope stability conditions, we get sequences of quadratic inequalities for semi-stable objects on product varieties. The leading quadratic inequality can be viewed as a weak version of Bogomolov's inequality holding in arbitrary characteristics.

preprint2021arXiv

A Linear Reduction Method for Local Differential Privacy and Log-lift

This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance between conditional probability $P_{Y|S}(x|s)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{Y|S}(x|s)$ that linearly reduces this distance $|P_{Y|S}(x|s) - P_Y(x)| = (1-α)|P_{X|S}(x|s) - P_X(x)|,\forall s,x$ for some $α\in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{Y|X}(x|x') = P_{Y|S,X}(x|s,x')$ is obtained by solving linear equations. The optimal non-Markov sanitization, the transition probability $P_{Y|S,X}(x|s,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the non-Markov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$.

preprint2021arXiv

Information Leakage in Zero-Error Source Coding: A Graph-Theoretic Perspective

We study the information leakage to a guessing adversary in zero-error source coding. The source coding problem is defined by a confusion graph capturing the distinguishability between source symbols. The information leakage is measured by the ratio of the adversary's successful guessing probability after and before eavesdropping the codeword, maximized over all possible source distributions. Such measurement under the basic adversarial model where the adversary makes a single guess and allows no distortion between its estimator and the true sequence is known as the maximum min-entropy leakage or the maximal leakage in the literature. We develop a single-letter characterization of the optimal normalized leakage under the basic adversarial model, together with an optimum-achieving scalar stochastic mapping scheme. An interesting observation is that the optimal normalized leakage is equal to the optimal compression rate with fixed-length source codes, both of which can be simultaneously achieved by some deterministic coding schemes. We then extend the leakage measurement to generalized adversarial models where the adversary makes multiple guesses and allows certain level of distortion, for which we derive single-letter lower and upper bounds.

preprint2020arXiv

Capacity Theorems for Distributed Index Coding

In index coding, a server broadcasts multiple messages to their respective receivers, each with some side information that can be utilized to reduce the amount of communication from the server. Distributed index coding is an extension of index coding in which the messages are broadcast from multiple servers, each storing different subsets of the messages. In this paper, the optimal tradeoff among the message rates and the server broadcast rates, which is defined formally as the capacity region, is studied for a general distributed index coding problem. Inner and outer bounds on the capacity region are established that have matching sum-rates for all 218 non-isomorphic four-message problems with equal link capacities for all the links from servers to receivers. The proposed inner bound is built on a distributed composite coding scheme that outperforms the existing schemes by incorporating more flexible decoding configurations and enhanced fractional rate allocations into two-stage composite coding, a scheme that was originally introduced for centralized index coding. The proposed outer bound is built on the polymatroidal axioms of entropy, as well as functional dependences such as the $\rm{fd}$-separation introduced by the multi-server nature of the problem. This outer bound utilizes general groupings of servers with different levels of granularity, which allows a natural tradeoff between computational complexity and tightness of the bound, and includes and improves upon all existing outer bounds for distributed index coding. Specific features of the proposed inner and outer bounds are demonstrated through concrete examples with four or five messages.

preprint2020arXiv

Independent User Partition Multicast Scheme for the Groupcast Index Coding Problem

The groupcast index coding (GIC) problem is a generalization of the index coding problem, where one packet can be demanded by multiple users. In this paper, we propose a new coding scheme called independent user partition multicast (IUPM) for the GIC problem. The novelty of this scheme compared to the user partition multicast (UPM) (Shanmugam \textit{et al.}, 2015) is in removing redundancies in the UPM solution by eliminating the linearly dependent coded packets. We also prove that the UPM scheme subsumes the packet partition multicast (PPM) scheme (Tehrani \textit{et al.}, 2012). Hence, the IUPM scheme is a generalization of both PPM and UPM schemes. Furthermore, inspired by jointly considering users and packets, we modify the approximation partition multicast (CAPM) scheme (Unal and Wagner, 2016) to achieve a new polynomial-time algorithm for solving the general GIC problem. We characterize a class of GIC problems with $\frac{k(k-1)}{2}$ packets, for any integer $k\geq 2$, for which the IUPM scheme is optimal. We also prove that for this class, the broadcast rate of the proposed new heuristic algorithm is $k$, while the broadcast rate of the CAPM scheme is $\mathcal{O}(k^2)$.

preprint2020arXiv

Privacy-Utility Tradeoff in a Guessing Framework Inspired by Index Coding

This paper studies the tradeoff in privacy and utility in a single-trial multi-terminal guessing (estimation) framework using a system model that is inspired by index coding. There are $n$ independent discrete sources at a data curator. There are $m$ legitimate users and one adversary, each with some side information about the sources. The data curator broadcasts a distorted function of sources to legitimate users, which is also overheard by the adversary. In terms of utility, each legitimate user wishes to perfectly reconstruct some of the unknown sources and attain a certain gain in the estimation correctness for the remaining unknown sources. In terms of privacy, the data curator wishes to minimize the maximal leakage: the worst-case guessing gain of the adversary in estimating any target function of its unknown sources after receiving the broadcast data. Given the system settings, we derive fundamental performance lower bounds on the maximal leakage to the adversary, which are inspired by the notion of confusion graph and performance bounds for the index coding problem. We also detail a greedy privacy enhancing mechanism, which is inspired by the agglomerative clustering algorithms in the information bottleneck and privacy funnel problems.

preprint2020arXiv

Secure Index Coding with Security Constraints on Receivers

Index coding is concerned with efficient broadcast of a set of messages to receivers in the presence of receiver side information. In this paper, we study the secure index coding problem with security constraints on the receivers themselves. That is, for each receiver there is a single legitimate message it needs to decode and a prohibited message list, none of which should be decoded by that receiver. To this end, our contributions are threefold. We first introduce a secure linear coding scheme, which is an extended version of the fractional local partial clique covering scheme that was originally devised for non-secure index coding. We then develop two information-theoretic bounds on the performance of any valid secure index code, namely secure polymatroidal outer bound (on the capacity region) and secure maximum acyclic induced subgraph lower bound (on the broadcast rate). The structure of these bounds leads us to further develop two necessary conditions for a given index coding problem to be securely feasible (i.e., to have nonzero rates).

preprint2020arXiv

Stability conditions on product varieties

Given a stability condition on a smooth projective variety $X$, we construct a family of stability conditions on $X\times C$, where $C$ is a smooth projective curve. In particular, this gives the existence of stability conditions on arbitrary products of curves. The proof uses, by following an idea of Toda, the positivity lemma established by Bayer and Macrì and weak stability conditions on the Abramovich-Polishchuk heart of a bounded t-structure in $D(X\times C)$.

preprint2019arXiv

Comparative studies of optoelectrical properties of prominent PV materials: Halide Perovskite, CdTe, and GaAs

We compare three representative high performance PV materials: halide perovskite MAPbI3, CdTe, and GaAs, in terms of photoluminescence (PL) efficiency, PL lineshape, carrier diffusion, and surface recombination, over multiple orders of photo-excitation density. An analytic model is used to describe the excitation density dependence of PL intensity and extract the internal PL efficiency and multiple pertinent recombination parameters. A PL imaging technique is used to obtain carrier diffusion length without using a PL quencher, thus, free of unintended influence beyond pure diffusion. Our results show that perovskite samples tend to exhibit lower Shockley-Read-Hall (SRH) recombination rate in both bulk and surface, thus higher PL efficiency than the inorganic counterparts, particularly under low excitation density, even with no or preliminary surface passivation. PL lineshape and diffusion analysis indicate that there is considerable structural disordering in the perovskite materials, and thus photo-generated carriers are not in global thermal equilibrium, which in turn suppresses the nonradiative recombination. This study suggests that relatively low point-defect density, less detrimental surface recombination, and moderate structural disordering contribute to the high PV efficiency in the perovskite. This comparative photovoltaics study provides more insights into the fundamental material science and the search for optimal device designs by learning from different technologies.