Source author record

Xun Xu

Xun Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci cond-mat.mes-hall Artificial Intelligence cond-mat.supr-con eess.IV Genomics Machine Learning quant-ph

Catalog footprint

What is connected

20works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Reward Model Selection Crisis in Personalized Alignment

Personalized alignment from preference data has focused primarily on improving personal reward model (RM) accuracy, with the implicit assumption that better preference ranking translates to better personalized behavior. However, in deployment, computational constraints necessitate inference-time adaptation such as reward-guided decoding (RGD) rather than per-user policy fine-tuning. This creates a critical but overlooked requirement: reward models must not only rank preferences accurately but also effectively guide generation. We demonstrate that standard RM accuracy fails catastrophically as a selection criterion for deployment-ready personalized rewards. We introduce policy accuracy; a metric quantifying whether RGD-adapted LLMs correctly discriminate between preferred and dispreferred responses and show that upstream RM accuracy correlates only weakly with downstream policy accuracy (Kendall's tau = 0.08--0.31). More critically, we introduce Pref-LaMP the first personalized alignment benchmark with ground-truth user completions, enabling direct behavioural evaluation. On Pref-LaMP, we expose a complete decoupling between discriminative ranking and generation metrics: methods with 20-point RM accuracy differences produce almost identical output quality, and methods with high ranking accuracy can fail to generate behaviorally aligned responses. These findings reveal that the field has been optimizing for proxy metrics that do not predict deployment performance, and that current personalized alignment methods fail to operationalize preferences into behavioral adaptation under realistic deployment constraints. In contrast, we find simple in-context learning (ICL) to be highly effective - dominating all reward-guided methods for models $\geq$3B parameters, achieving $\sim$3 point ROUGE-1 gains over the best reward method at 7B scale.

preprint2022arXiv

Revisiting Pretraining for Semi-Supervised Learning in the Low-Label Regime

Semi-supervised learning (SSL) addresses the lack of labeled data by exploiting large unlabeled data through pseudolabeling. However, in the extremely low-label regime, pseudo labels could be incorrect, a.k.a. the confirmation bias, and the pseudo labels will in turn harm the network training. Recent studies combined finetuning (FT) from pretrained weights with SSL to mitigate the challenges and claimed superior results in the low-label regime. In this work, we first show that the better pretrained weights brought in by FT account for the state-of-the-art performance, and importantly that they are universally helpful to off-the-shelf semi-supervised learners. We further argue that direct finetuning from pretrained weights is suboptimal due to covariate shift and propose a contrastive target pretraining step to adapt model weights towards target dataset. We carried out extensive experiments on both classification and segmentation tasks by doing target pretraining then followed by semi-supervised finetuning. The promising results validate the efficacy of target pretraining for SSL, in particular in the low-label regime.

preprint2022arXiv

SemiCurv: Semi-Supervised Curvilinear Structure Segmentation

Recent work on curvilinear structure segmentation has mostly focused on backbone network design and loss engineering. The challenge of collecting labelled data, an expensive and labor intensive process, has been overlooked. While labelled data is expensive to obtain, unlabelled data is often readily available. In this work, we propose SemiCurv, a semi-supervised learning (SSL) framework for curvilinear structure segmentation that is able to utilize such unlabelled data to reduce the labelling burden. Our framework addresses two key challenges in formulating curvilinear segmentation in a semi-supervised manner. First, to fully exploit the power of consistency based SSL, we introduce a geometric transformation as strong data augmentation and then align segmentation predictions via a differentiable inverse transformation to enable the computation of pixel-wise consistency. Second, the traditional mean square error (MSE) on unlabelled data is prone to collapsed predictions and this issue exacerbates with severe class imbalance (significantly more background pixels). We propose a N-pair consistency loss to avoid trivial predictions on unlabelled data. We evaluate SemiCurv on six curvilinear segmentation datasets, and find that with no more than 5% of the labelled data, it achieves close to 95% of the performance relative to its fully supervised counterpart.

preprint2022arXiv

Weakly Supervised 3D Point Cloud Segmentation via Multi-Prototype Learning

Addressing the annotation challenge in 3D Point Cloud segmentation has inspired research into weakly supervised learning. Existing approaches mainly focus on exploiting manifold and pseudo-labeling to make use of large unlabeled data points. A fundamental challenge here lies in the large intra-class variations of local geometric structure, resulting in subclasses within a semantic class. In this work, we leverage this intuition and opt for maintaining an individual classifier for each subclass. Technically, we design a multi-prototype classifier, each prototype serves as the classifier weights for one subclass. To enable effective updating of multi-prototype classifier weights, we propose two constraints respectively for updating the prototypes w.r.t. all point features and for encouraging the learning of diverse prototypes. Experiments on weakly supervised 3D point cloud segmentation tasks validate the efficacy of proposed method in particular at low-label regime. Our hypothesis is also verified given the consistent discovery of semantic subclasses at no cost of additional annotations.

preprint2021arXiv

Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images

An accurate and automated tissue segmentation algorithm for retinal optical coherence tomography (OCT) images is crucial for the diagnosis of glaucoma. However, due to the presence of the optic disc, the anatomical structure of the peripapillary region of the retina is complicated and is challenging for segmentation. To address this issue, we developed a novel graph convolutional network (GCN)-assisted two-stage framework to simultaneously label the nine retinal layers and the optic disc. Specifically, a multi-scale global reasoning module is inserted between the encoder and decoder of a U-shape neural network to exploit anatomical prior knowledge and perform spatial reasoning. We conducted experiments on human peripapillary retinal OCT images. The Dice score of the proposed segmentation network is 0.820$\pm$0.001 and the pixel accuracy is 0.830$\pm$0.002, both of which outperform those from other state-of-the-art techniques.

preprint2021arXiv

Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining

The boundary of tumors (hepatocellular carcinoma, or HCC) contains rich semantics: capsular invasion, visibility, smoothness, folding and protuberance, etc. Capsular invasion on tumor boundary has proven to be clinically correlated with the prognostic indicator, microvascular invasion (MVI). Investigating tumor boundary semantics has tremendous clinical values. In this paper, we propose the first and novel computational framework that disentangles the task into two components: spatial vertex localization and sequential semantic classification. (1) A HCC tumor segmentor is built for tumor mask boundary extraction, followed by polar transform representing the boundary with radius and angle. Vertex generator is used to produce fixed-length boundary vertices where vertex features are sampled on the corresponding spatial locations. (2) The sampled deep vertex features with positional embedding are mapped into a sequential space and decoded by a multilayer perceptron (MLP) for semantic classification. Extensive experiments on tumor capsule semantics demonstrate the effectiveness of our framework. Mining the correlation between the boundary semantics and MVI status proves the feasibility to integrate this boundary semantics as a valid HCC prognostic biomarker.

preprint2020arXiv

Experimental Realization of Two-Dimensional Buckled Lieb lattice

Two-dimensional (2D) materials with a Lieb lattice can host exotic electronic band structures. Such a system does not exist in nature, and it is also difficult to obtain in the laboratory due to its structural instability. Here, we experimentally realized a 2D system composed of a tin overlayer on an aluminum substrate by molecular beam epitaxy. The specific arrangement of Sn atoms on the Al(100) surface, which benefits from favorable interface interactions, forms a stabilized buckled Lieb lattice. Our theoretical calculations indicate a partially broken nodal line loop protected by its mirror reflection symmetry and a topologically nontrivial insulating state with a spin-orbital coupling (SOC) effect in the band structure of this Lieb lattice. The electronic structure of this system has also been experimentally characterized by scanning tunnelling spectroscopy and angle-resolved photoemmision spectroscopy. Our work provides an appealing method for constructing 2D quantum materials based on the Lieb lattice.

preprint2020arXiv

Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels

Point cloud analysis has received much attention recently; and segmentation is one of the most important tasks. The success of existing approaches is attributed to deep network design and large amount of labelled training data, where the latter is assumed to be always available. However, obtaining 3d point cloud segmentation labels is often very costly in practice. In this work, we propose a weakly supervised point cloud segmentation approach which requires only a tiny fraction of points to be labelled in the training stage. This is made possible by learning gradient approximation and exploitation of additional spatial and color smoothness constraints. Experiments are done on three public datasets with different degrees of weak supervision. In particular, our proposed method can produce results that are close to and sometimes even better than its fully supervised counterpart with 10$\times$ fewer labels.

preprint2016arXiv

Latent Model Ensemble with Auto-localization

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN- based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR- 100, MNIST and PASCAL VOC 2007 Classification dataset.

preprint2016arXiv

Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing the conventional model training requirement of annotated examples for every category. This is achieved by establishing a mapping connecting low-level features and a semantic description of the label space, referred as visual-semantic mapping, on auxiliary data. Reusing the learned mapping to project target videos into an embedding space thus allows novel-classes to be recognised by nearest neighbour inference. However, existing ZSL methods suffer from auxiliary-target domain shift intrinsically induced by assuming the same mapping for the disjoint auxiliary and target classes. This compromises the generalisation accuracy of ZSL recognition on the target data. In this work, we improve the ability of ZSL to generalise across this domain shift in both model- and data-centric ways by formulating a visual-semantic mapping with better generalisation properties and a dynamic data re-weighting method to prioritise auxiliary data that are relevant to the target classes. Specifically: (1) We introduce a multi-task visual-semantic mapping to improve generalisation by constraining the semantic mapping parameters to lie on a low-dimensional manifold, (2) We explore prioritised data augmentation by expanding the pool of auxiliary data with additional instances weighted by relevance to the target domain. The proposed new model is applied to the challenging zero-shot action recognition problem to demonstrate its advantages over existing ZSL models.

preprint2016arXiv

Transductive Zero-Shot Action Recognition by Word-Vector Embedding

The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping between video spacetime features of actions and the semantic space is more complex and harder to learn for the purpose of generalising over any cross-category domain shift. To solve this generalisation problem in ZSL action recognition, we investigate a series of synergistic strategies to improve upon the standard ZSL pipeline. Most of these strategies are transductive in nature which means access to testing data in the training phase.

preprint2015arXiv

Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization

The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities -- and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.

preprint2015arXiv

Investigation of Electron-Phonon Coupling in Epitaxial Silicene by In-situ Raman Spectroscopy

In this letter, we report that the special coupling between Dirac fermion and lattice vibrations, in other words, electron-phonon coupling (EPC), in silicene layers on Ag(111) surface was probed by an in-situ Raman spectroscopy. We find the EPC is significantly modulated due to tensile strain, which results from the lattice mismatch between silicene and the substrate, and the charge doping from the substrate. The special phonon modes corresponding to two-dimensional electron gas scattering at edge sites in the silicene were identified. Detecting relationship between EPC and Dirac fermion through the Raman scattering will provide a direct route to investigate the exotic property in buckled two-dimensional honeycomb materials.

preprint2015arXiv

Metal-Silicene Interaction Studied by Scanning Tunneling Microscopy

Ag atoms have been deposited on 3x3 silicene and R3xR3 silicene films by molecular beam epitaxy method in ultrahigh vacuum. Using scanning tunneling microscopy and Raman spectroscopy, we found that Ag atoms do not form chemical bonds with both 3x3 silicene and R3xR3 silicene films,which is due to chemically inert surface of silicene. On 3x3 silicene films, Ag atoms mostly form into stable flat top Ag islands. In contrast, Ag atoms form nanoclusters and glide on silicene films, suggesting more inert nature. Raman spectroscopy suggests that there is more sp2 hybridization in R3xR3 than in R7xR7/3x3silicene films

preprint2015arXiv

Observation of van Hove Singularities in Twisted Silicene Multilayers

Interlayer interactions perturb the electronic structure of two-dimensional materials and lead to new physical phenomena, such as van Hove singularities and Hofstadter's butterfly pattern. Silicene, the recently discovered two-dimensional form of silicon, is quite unique, in that silicon atoms adopt competing sp2 and sp3 hybridization states leading to a low-buckled structure promising relatively strong interlayer interaction. In multilayer silicene, the stacking order provides an important yet rarely explored degree of freedom for tuning its electronic structures through manipulating interlayer coupling. Here, we report the emergence of van Hove singularities in the multilayer silicene created by an interlayer rotation. We demonstrate that even a large-angle rotation (> 20o) between stacked silicene layers can generate a Moire pattern and van Hove singularities due to the strong interlayer coupling in multilayer silicene. Our study suggests an intriguing method for expanding the tunability of the electronic structure for electronic applications in this two-dimensional material.

preprint2015arXiv

Semantic Embedding Space for Zero-Shot Action Recognition

The number of categories for action recognition is growing rapidly. It is thus becoming increasingly hard to collect sufficient training data to learn conventional models for each category. This issue may be ameliorated by the increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a mapping is constructed between visual features and a human interpretable semantic description of each category, allowing categories to be recognised in the absence of any training data. Existing ZSL studies focus primarily on image data, and attribute-based semantic representations. In this paper, we address zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels. This is more challenging because the mapping between the semantic space and space-time features of videos containing complex actions is more complex and harder to learn. We demonstrate that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping. Experiments on human action datasets including HMDB51 and UCF101 demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance.

preprint2014arXiv

Effects of Oxygen Adsorption on the Surface State of Epitaxial Silicene on Ag(111)

Epitaxial silicene, which is one single layer of silicon atoms packed in a honeycomb structure, demonstrates a strong interaction with the substrate that dramatically affects its electronic structure. The role of electronic coupling in the chemical reactivity between the silicene and the substrate is still unclear so far, which is of great importance for functionalization of silicene layers. Here, we report the reconstructions and hybridized electronic structures of epitaxial 4x4 silicene on Ag(111), which are revealed by scanning tunneling microscopy and angle-resolved photoemission spectroscopy. The hybridization between Si and Ag results in a metallic surface state, which can gradually decay due to oxygen adsorption. X-ray photoemission spectroscopy confirms the decoupling of Si-Ag bonds after oxygen treatment as well as the relatively oxygen resistance of Ag(111) surface, in contrast to 4x4 silicene [with respect to Ag(111)]. First-principles calculations have confirmed the evolution of the electronic structure of silicene during oxidation. It has been verified experimentally and theoretically that the high chemical activity of 4x4 silicene is attributable to the Si pz state, while the Ag(111) substrate exhibits relatively inert chemical behavior.

preprint2014arXiv

Tuning the Band Gap in Silicene by Oxidation

Silicene monolayers grown on Ag(111) surfaces demonstrate a band gap that is tunable by oxygen adatoms from semimetallic to semiconducting type. By using low-temperature scanning tunneling microscopy, it is found that the adsorption configurations and amounts of oxygen adatoms on the silicene surface are critical for band-gap engineering, which is dominated by different buckled structures in R13xR13, 4x4, and 2R3x2R3 silicene layers. The Si-O-Si bonds are the most energy-favored species formed on R13xR13, 4x4, and 2R3x2R3 structures under oxidation, which is verified by in-situ Raman spectroscopy as well as first-principles calculations. The silicene monolayers retain their structures when fully covered by oxygen adatoms. Our work demonstrates the feasibility of tuning the band gap of silicene with oxygen adatoms, which, in turn, expands the base of available two-dimensional electronic materials for devices with properties that is hardly achieved with graphene oxide.

preprint2014arXiv

Unabridged phase diagram for single-phased FeSexTe1-x thin films

A complete phase diagram and its corresponding physical properties are essential prerequisites to understand the underlying mechanism of iron based superconductivity. For the structurally simplest 11 (FeSeTe) system, earlier attempts using bulk samples have not been able to do so due to the fabrication difficulties. Here, thin FeSexTe1-x films with the Se content covering the full range were fabricated by using pulsed laser deposition method. Crystal structure analysis shows that all films retain the tetragonal structure in room temperature. Significantly, the highest superconducting transition temperature (TC = 20 K) occurs in the newly discovered domain, 0.6 - 0.8. The single-phased superconducting dome for the full Se doping range is the first of its kind in iron chalcogenide superconductors. Our results present a new avenue to explore novel physics as well as to optimize superconductors.

preprint2013arXiv

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this technol- ogy, de novo assembly to recover complete or full-length transcript sequences remains an algorithmic challenge. Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on transcriptome datasets from rice and mouse. Using the known transcripts from these well-annotated genomes (sequenced a decade ago) as our benchmark, we assessed how SOAPdenovo- Trans and two other popular software handle the practical issues of alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy, and faster execution. Availability and Implementation: Source code and user manual are at http://sourceforge.net/projects/soapdenovotrans/ Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com

Xun Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

The Reward Model Selection Crisis in Personalized Alignment

Revisiting Pretraining for Semi-Supervised Learning in the Low-Label Regime

SemiCurv: Semi-Supervised Curvilinear Structure Segmentation

Weakly Supervised 3D Point Cloud Segmentation via Multi-Prototype Learning

Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images

Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining

Experimental Realization of Two-Dimensional Buckled Lieb lattice

Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels

Latent Model Ensemble with Auto-localization

Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

Transductive Zero-Shot Action Recognition by Word-Vector Embedding

Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization

Investigation of Electron-Phonon Coupling in Epitaxial Silicene by In-situ Raman Spectroscopy

Metal-Silicene Interaction Studied by Scanning Tunneling Microscopy

Observation of van Hove Singularities in Twisted Silicene Multilayers

Semantic Embedding Space for Zero-Shot Action Recognition

Effects of Oxygen Adsorption on the Surface State of Epitaxial Silicene on Ag(111)

Tuning the Band Gap in Silicene by Oxidation

Unabridged phase diagram for single-phased FeSexTe1-x thin films

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads