Researcher profile

Yulin Zhang

Yulin Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.

preprint2022arXiv

Cover Combinatorial Filters and their Minimization Problem (Extended Version)

Recent research has examined algorithms to minimize robots' resource footprints. The class of combinatorial filters (discrete variants of widely-used probabilistic estimators) has been studied and methods for reducing their space requirements introduced. This paper extends existing combinatorial filters by introducing a natural generalization that we dub cover combinatorial filters. In addressing the new -- but still NP-complete -- problem of minimization of cover filters, this paper shows that multiple concepts previously believed to be true about combinatorial filters (and actually conjectured, claimed, or assumed to be) are in fact false. For instance, minimization does not induce an equivalence relation. We give an exact algorithm for the cover filter minimization problem. Unlike prior work (based on graph coloring) we consider a type of clique-cover problem, involving a new conditional constraint, from which we can find more general relations. In addition to solving the more general problem, the algorithm also corrects flaws present in all prior filter reduction methods. In employing SAT, the algorithm provides a promising basis for future practical development.

preprint2022arXiv

Nondeterminism subject to output commitment in combinatorial filters

We study a class of filters -- discrete finite-state transition systems employed as incremental stream transducers -- that have application to robotics: e.g., to model combinatorial estimators and also as concise encodings of feedback plans/policies. The present paper examines their minimization problem under some new assumptions. Compared to strictly deterministic filters, allowing nondeterminism supplies opportunities for compression via re-use of states. But this paper suggests that the classic automata-theoretic concept of nondeterminism, though it affords said opportunities for reduction in state complexity, is problematic in many robotics settings. Instead, we argue for a new constrained type of nondeterminism that preserves input-output behavior for circumstances when, as for robots, causation forbids 'rewinding' of the world. We identify problem instances where compression under this constrained form of nondeterminism results in improvements over all deterministic filters. In this new setting, we examine computational complexity questions for the problem of reducing the state complexity of some given input filter. A hardness result for general deterministic input filters is presented, as well as for checking specific, narrower requirements, and some special cases. These results show that this class of nondeterminism gives problems of the same complexity class as classical nondeterminism, and the narrower questions help give a more nuanced understanding of the source of this complexity.

preprint2021arXiv

A new parsimonious method for classifying Cancer Tissue-of-Origin Based on DNA Methylation 450K data

DNA methylation is a well-studied genetic modification that regulates gene transcription of Eukaryotes. Its alternations have been recognized as a significant component of cancer development. In this study, we use the DNA methylation 450k data from The Cancer Genome Atlas to evaluate the efficacy of DNA methylation data on cancer classification for 30 cancer types. We propose a new method for gene selection in high dimensional data(over 450 thousand). Variance filtering is first introduced for dimension reduction and Recursive feature elimination (RFE) is then used for feature selection. We address the problem of selecting a small subsets of genes from large number of methylated sites, and our parsimonious model is demonstrated to be efficient, achieving an accuracy over 91%, outperforming other studies which use DNA micro-arrays and RNA-seq Data . The performance of 20 models, which are based on 4 estimators (Random Forest, Decision Tree, Extra Tree and Support Vector Machine) and 5 classifiers (k-Nearest Neighbours, Support Vector Machine, XGboost, Light GBM and Multi-Layer Perceptron), is compared and robustness of the RFE algorithm is examined. Results suggest that the combined model of extra tree plus catboost classifier offers the best performance in cancer identification, with an overall validation accuracy of 91% , 92.3%, 93.3% and 93.5% for 20, 30, 40 and 50 features respectively. The biological functions in cancer development of 50 selected genes is also explored through enrichment analysis and the results show that 12 out of 16 of our top features have already been identified to be specific with cancer and we also propose some more genes to be tested for future studies. Therefore, our method may be utilzed as an auxiliary diagnostic method to determine the actual clinicopathological status of a specific cancer.

preprint2020arXiv

Abstractions for computing all robotic sensors that suffice to solve a planning problem

Whether a robot can perform some specific task depends on several aspects, including the robot's sensors and the plans it possesses. We are interested in search algorithms that treat plans and sensor designs jointly, yielding solutions---i.e., plan and sensor characterization pairs---if and only if they exist. Such algorithms can help roboticists explore the space of sensors to aid in making design trade-offs. Generalizing prior work where sensors are modeled abstractly as sensor maps on p-graphs, the present paper increases the potential sensors which can be sought significantly. But doing so enlarges a problem currently on the outer limits of being considered tractable. Toward taming this complexity, two contributions are made: (1) we show how to represent the search space for this more general problem and describe data structures that enable whole sets of sensors to be summarized via a single special representative; (2) we give a means by which other structure (either task domain knowledge, sensor technology or fabrication constraints) can be incorporated to reduce the sets to be enumerated. These lead to algorithms that we have implemented and which suffice to solve particular problem instances, albeit only of small scale. Nevertheless, the algorithm aids in helping understand what attributes sensors must possess and what information they must provide in order to ensure a robot can achieve its goals despite non-determinism.