Researcher profile

Yuqing Zhang

Yuqing Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Modeling Human-Like Color Naming Behavior in Context

Modeling the emergence of human-like lexicons in computational systems has advanced through the use of interacting neural agents, which simulate both learning and communicative pressures. The NeLLCom-Lex framework (Zhang et al., 2025) allows neural agents to develop pragmatic color naming behavior and human-like lexicons through supervised learning (SL) from human data and reinforcement learning (RL) in referential games. Despite these successes, the lexicons that emerge diverge systematically from human color categories, producing highly non-convex regions in color space, which contrast with the convexity typical of human categories. To address this, we introduce two factors, upsampling rare color terms during SL and multi-listener RL interactions, and adopt a convexity measure to quantify geometric coherence. We find that upsampling improves lexical diversity and system-level informativeness of the color lexicon, while many-listener setups promote more convex color categories. The combination of moderate upsampling and multiple listeners produces lexicons most similar to human systems.

preprint2022arXiv

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot of Natural Language Processing (NLP) research. However, existing datasets are centered around the English language, which restricts the development of Chinese scientific NLP. In this work, we present CSL, a large-scale Chinese Scientific Literature dataset, which contains the titles, abstracts, keywords and academic fields of 396k papers. To our knowledge, CSL is the first scientific document dataset in Chinese. The CSL can serve as a Chinese corpus. Also, this semi-structured data is a natural annotation that can constitute many supervised NLP tasks. Based on CSL, we present a benchmark to evaluate the performance of models across scientific domain tasks, i.e., summarization, keyword generation and text classification. We analyze the behavior of existing text-to-text models on the evaluation tasks and reveal the challenges for Chinese scientific NLP tasks, which provides a valuable reference for future research. Data and code are available at https://github.com/ydli-ai/CSL

preprint2022arXiv

FishFuzz: Throwing Larger Nets to Catch Deeper Bugs

Greybox fuzzing is the de-facto standard to discover bugs during development. Fuzzers execute many inputs to maximize the amount of reached code. Recently, Directed Greybox Fuzzers (DGFs) propose an alternative strategy that goes beyond "just" coverage: driving testing toward specific code targets by selecting "closer" seeds. DGFs go through different phases: exploration (i.e., reaching interesting locations) and exploitation (i.e., triggering bugs). In practice, DGFs leverage coverage to directly measure exploration, while exploitation is, at best, measured indirectly by alternating between different targets. Specifically, we observe two limitations in existing DGFs: (i) they lack precision in their distance metric, i.e., averaging multiple paths and targets into a single score (to decide which seeds to prioritize), and (ii) they assign energy to seeds in a round-robin fashion without adjusting the priority of the targets (exhaustively explored targets should be dropped). We propose FishFuzz, which draws inspiration from trawl fishing: first casting a wide net, scraping for high coverage, then slowly pulling it in to maximize the harvest. The core of our fuzzer is a novel seed selection strategy that builds on two concepts: (i) a novel multi-distance metric whose precision is independent of the number of targets, and (ii) a dynamic target ranking to automatically discard exhausted targets. This strategy allows FishFuzz to seamlessly scale to tens of thousands of targets and dynamically alternate between exploration and exploitation phases. We evaluate FishFuzz by leveraging all sanitizer labels as targets. Extensively comparing FishFuzz against modern DGFs and coverage-guided fuzzers shows that FishFuzz reached higher coverage compared to the direct competitors, reproduces existing bugs (70.2% faster), and finally discovers 25 new bugs (18 CVEs) in 44 programs.

preprint2022arXiv

PPA: Preference Profiling Attack Against Federated Learning

Federated learning (FL) trains a global model across a number of decentralized users, each with a local dataset. Compared to traditional centralized learning, FL does not require direct access to local datasets and thus aims to mitigate data privacy concerns. However, data privacy leakage in FL still exists due to inference attacks, including membership inference, property inference, and data inversion. In this work, we propose a new type of privacy inference attack, coined Preference Profiling Attack (PPA), that accurately profiles the private preferences of a local user, e.g., most liked (disliked) items from the client's online shopping and most common expressions from the user's selfies. In general, PPA can profile top-k (i.e., k = 1, 2, 3 and k = 1 in particular) preferences contingent on the local client (user)'s characteristics. Our key insight is that the gradient variation of a local user's model has a distinguishable sensitivity to the sample proportion of a given class, especially the majority (minority) class. By observing a user model's gradient sensitivity to a class, PPA can profile the sample proportion of the class in the user's local dataset, and thus the user's preference of the class is exposed. The inherent statistical heterogeneity of FL further facilitates PPA. We have extensively evaluated the PPA's effectiveness using four datasets (MNIST, CIFAR10, RAF-DB and Products-10K). Our results show that PPA achieves 90% and 98% top-1 attack accuracy to the MNIST and CIFAR10, respectively. More importantly, in real-world commercial scenarios of shopping (i.e., Products-10K) and social network (i.e., RAF-DB), PPA gains a top-1 attack accuracy of 78% in the former case to infer the most ordered items (i.e., as a commercial competitor), and 88% in the latter case to infer a victim user's most often facial expressions, e.g., disgusted.

preprint2022arXiv

Several classes of optimal $p$-ary cyclic codes with minimal distance four

Cyclic codes are a subclass of linear codes and have wide applications in data storage systems, communication systems and consumer electronics due to their efficient encoding and decoding algorithms. Let $p\ge 5$ be an odd prime and $m$ be a positive integer. Let $\mathcal{C}_{(1,e,s)}$ denote the $p$-ary cyclic code with three nonzeros $α$, $α^e$, and $α^s$, where $α$ is a generator of ${\mathbb F}_{p^m}^*$, $s=\frac{p^m-1}{2}$, and $2\le e\le p^m-2$. In this paper, we present four classes of optimal $p$-ary cyclic codes $\mathcal{C}_{(1,e,s)}$ with parameters $[p^m-1,p^m-2m-2,4]$ by analyzing the solutions of certain polynomials over finite fields. Some previous results about optimal quinary cyclic codes with parameters $[5^m-1,5^m-2m-2,4]$ are special cases of our constructions. In addition, by analyzing the irreducible factors of certain polynomials over ${\mathbb F}_{5^m}$, we present two classes of optimal quinary cyclic codes $\mathcal{C}_{(1,e,s)}$.

preprint2021arXiv

From Machine Learning to Transfer Learning in Laser-Induced Breakdown Spectroscopy: the Case of Rock Analysis for Mars Exploration

With the ChemCam instrument, laser-induced breakdown spectroscopy (LIBS) has successively contributed to Mars exploration by determining elemental compositions of the soil, crust and rocks. Two new lunched missions, Chinese Tianwen 1 and American Perseverance, will further increase the number of LIBS instruments on Mars after the planned landings in spring 2021. Such unprecedented situation requires a reinforced research effort on the methods of LIBS spectral data treatment. Although the matrix effects correspond to a general issue in LIBS, they become accentuated in the case of rock analysis for Mars exploration, because of the large variation of rock composition leading to the chemical matrix effect, and the difference in morphology between laboratory standard samples (in pressed pellet, glass or ceramics) used to establish calibration models and natural rocks encountered on Mars, leading to the physical matric effect. The chemical matrix effect has been tackled in the ChemCam project with large sets of laboratory standard samples offering a good representation of various compositions of Mars rocks. The present work deals with the physical matrix effect which is still expecting a satisfactory solution. The approach consists in introducing transfer learning in LIBS data treatment. For the specific case of total alkali-silica (TAS) classification of natural rocks, the results show a significant improvement of the prediction capacity of pellet sample-based models when trained together with suitable information from rocks in a procedure of transfer learning. The correct classification rate of rocks increases from 33.3% with a machine learning model to 83.3% with a transfer learning model.

preprint2020arXiv

Logic Bugs in IoT Platforms and Systems: A Review

In recent years, IoT platforms and systems have been rapidly emerging. Although IoT is a new technology, new does not mean simpler (than existing networked systems). Contrarily, the complexity (of IoT platforms and systems) is actually being increased in terms of the interactions between the physical world and cyberspace. The increased complexity indeed results in new vulnerabilities. This paper seeks to provide a review of the recently discovered logic bugs that are specific to IoT platforms and systems. In particular, 17 logic bugs and one weakness falling into seven categories of vulnerabilities are reviewed in this survey.