Researcher profile

Zhen Huang

Zhen Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

CTTA-T: Continual Test-Time Adaptation for Text Understanding via Teacher-Student with a Domain-aware and Generalized Teacher

Text understanding often suffers from domain shifts. To handle testing domains, domain adaptation (DA) is trained to adapt to a fixed and observed testing domain; a more challenging paradigm, test-time adaptation (TTA), cannot access the testing domain during training and online adapts to the testing samples during testing, where the samples are from a fixed domain. We aim to explore a more practical and underexplored scenario, continual test-time adaptation (CTTA) for text understanding, which involves a sequence of testing (unobserved) domains in testing. Current CTTA methods struggle in reducing error accumulation over domains and enhancing generalization to handle unobserved domains: 1) Noise-filtering reduces accumulated errors but discards useful information, and 2) accumulating historical domains enhances generalization, but it is hard to achieve adaptive accumulation. In this paper, we propose a CTTA-T (continual test-time adaptation for text understanding) framework adaptable to evolving target domains: it adopts a teacher-student framework, where the teacher is domain-aware and generalized for evolving domains. To improve teacher predictions, we propose a refine-then-filter based on dropout-driven consistency, which calibrates predictions and removes unreliable guidance. For the adaptation-generalization trade-off, we construct a domain-aware teacher by dynamically accumulating cross-domain semantics via incremental PCA, which continuously tracks domain shifts. Experiments show CTTA-T excels baselines.

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2024arXiv

Realizing topological edge states in graphene-like elastic metamaterials

The study of topological states in electronic structures, which allows robust transport properties against impurities and defects, has been recently extended to the realm of elasticity. This work shows that nontrivial topological flexural edge states located on the free boundary of the elastic graphene-like metamaterial can be realized without breaking the time reversal, mirror, or inversion symmetry of the system. Numerical calculations and experimental studies demonstrate the robust transport of flexural waves along the boundaries of the designed structure. The topological edge states on the free boundary are not limited by the size of the finite structure, which can reduce the scale of the topological state system. In addition, unlike the edge states localized on the free boundary in graphene where the group velocity is zero, the edge states on the elastic metamaterial plate have propagation states with non-zero group velocity. There is a frequency range for the edge states, and we introduce the concept of Shannon entropy for elastic waves and use it to assess the frequency range of the edge states in graphene-like elastic metamaterials. This work represents a relevant advance in the study of elastic wave topological states, providing a theoretical basis for engineering applications such as vibration reduction and vibration isolation of mechanical structures.

preprint2024arXiv

Topological transmission in Suzuki phase sonic crystals

This work reports topological extraordinary properties of sound transmission through topological states in sonic crystals denominated Suzuki phase, consisting of a rectangular lattice of vacancies created in a triangular lattice. These low-symmetry crystals exhibit unique properties due to the embedded lattice of vacancies. A generalized folding method explains the band structure and the quasi-type-II Dirac point in the Suzuki phase, which is related to the underlying triangular lattice. In analogy to the acoustic valley Hall effect, the Suzuki phase contains three types of topological edge states on the four possible interfaces separating two Suzuki phase crystals with distinct topological phases. The edge states have defined symmetries with inherent directionality, which affect the topological sound transmission and are different from chirality, valley vorticity or helicity. Particularly, the existence of topological deaf bands is here reported. The propagation of topological eigenmodes on the same interface is also different, which is quantified using the acoustic Shannon entropy, making the topological transport dependent on the frequency of the edge states. Based on the abundant topological edge states of Suzuki phase crystals, a multifunctional device with acoustic diodes, multi-channel transmission, and selective acoustic transmission can be designed. Numerical simulations and measurements demonstrate the topological transmission. Our work extends the research platform of acoustic topological states to lattices with low symmetry, which opens new avenues for enriching topological states with broad engineering applications.

preprint2023arXiv

UniFed: All-In-One Federated Learning Platform to Unify Open-Source Frameworks

Federated Learning (FL) has become a practical and widely adopted distributed learning paradigm. However, the lack of a comprehensive and standardized solution covering diverse use cases makes it challenging to use in practice. In addition, selecting an appropriate FL framework for a specific use case can be a daunting task. In this work, we present UniFed, the first unified platform for standardizing existing open-source FL frameworks. The platform streamlines the end-to-end workflow for distributed experimentation and deployment, encompassing 11 popular open-source FL frameworks. In particular, to address the substantial variations in workflows and data formats, UniFed introduces a configuration-based schema-enforced task specification, offering 20 editable fields. UniFed also provides functionalities such as distributed execution management, logging, and data analysis. With UniFed, we evaluate and compare 11 popular FL frameworks from the perspectives of functionality, privacy protection, and performance, through conducting developer surveys and code-level investigation. We collect 15 diverse FL scenario setups (e.g., horizontal and vertical settings) for FL framework evaluation. This comprehensive evaluation allows us to analyze both model and system performance, providing detailed comparisons and offering recommendations for framework selection. UniFed simplifies the process of selecting and utilizing the appropriate FL framework for specific use cases, while enabling standardized distributed experimentation and deployment. Our results and analysis based on experiments with up to 178 distributed nodes provide valuable system design and deployment insights, aiming to empower practitioners in their pursuit of effective FL solutions.

preprint2022arXiv

Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization

Cloth-Changing person re-identification (CC-ReID) aims at matching the same person across different locations over a long-duration, e.g., over days, and therefore inevitably meets challenge of changing clothing. In this paper, we focus on handling well the CC-ReID problem under a more challenging setting, i.e., just from a single image, which enables high-efficiency and latency-free pedestrian identify for real-time surveillance applications. Specifically, we introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations by leveraging personal unique and cloth-independent gait information, we name this framework as GI-ReID. GI-ReID adopts a two-stream architecture that consists of a image ReID-Stream and an auxiliary gait recognition stream (Gait-Stream). The Gait-Stream, that is discarded in the inference for high computational efficiency, acts as a regulator to encourage the ReID-Stream to capture cloth-invariant biometric motion features during the training. To get temporal continuous motion cues from a single image, we design a Gait Sequence Prediction (GSP) module for Gait-Stream to enrich gait information. Finally, a high-level semantics consistency over two streams is enforced for effective knowledge regularization. Experiments on multiple image-based Cloth-Changing ReID benchmarks, e.g., LTCC, PRCC, Real28, and VC-Clothes, demonstrate that GI-ReID performs favorably against the state-of-the-arts. Codes are available at https://github.com/jinx-USTC/GI-ReID.

preprint2022arXiv

IMCI: Integrate Multi-view Contextual Information for Fact Extraction and Verification

With the rapid development of automatic fake news detection technology, fact extraction and verification (FEVER) has been attracting more attention. The task aims to extract the most related fact evidences from millions of open-domain Wikipedia documents and then verify the credibility of corresponding claims. Although several strong models have been proposed for the task and they have made great progress, we argue that they fail to utilize multi-view contextual information and thus cannot obtain better performance. In this paper, we propose to integrate multi-view contextual information (IMCI) for fact extraction and verification. For each evidence sentence, we define two kinds of context, i.e. intra-document context and inter-document context}. Intra-document context consists of the document title and all the other sentences from the same document. Inter-document context consists of all other evidences which may come from different documents. Then we integrate the multi-view contextual information to encode the evidence sentences to handle the task. Our experimental results on FEVER 1.0 shared task show that our IMCI framework makes great progress on both fact extraction and verification, and achieves state-of-the-art performance with a winning FEVER score of 72.97% and label accuracy of 75.84% on the online blind test set. We also conduct ablation study to detect the impact of multi-view contextual information. Our codes will be released at https://github.com/phoenixsecularbird/IMCI.

preprint2022arXiv

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Unsupervised sentence embedding aims to obtain the most appropriate embedding for a sentence to reflect its semantic. Contrastive learning has been attracting developing attention. For a sentence, current models utilize diverse data augmentation methods to generate positive samples, while consider other independent sentences as negative samples. Then they adopt InfoNCE loss to pull the embeddings of positive pairs gathered, and push those of negative pairs scattered. Although these models have made great progress on sentence embedding, we argue that they may suffer from feature suppression. The models fail to distinguish and decouple textual similarity and semantic similarity. And they may overestimate the semantic similarity of any pairs with similar textual regardless of the actual semantic difference between them. This is because positive pairs in unsupervised contrastive learning come with similar and even the same textual through data augmentation. To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE). Soft negative samples share highly similar textual but have surely and apparently different semantic with the original samples. Specifically, we take the negation of original sentences as soft negative samples, and propose Bidirectional Margin Loss (BML) to introduce them into traditional contrastive learning framework, which merely involves positive and negative samples. Our experimental results show that SNCSE can obtain state-of-the-art performance on semantic textual similarity (STS) task with average Spearman's correlation coefficient of 78.97% on BERTbase and 79.23% on RoBERTabase. Besides, we adopt rank-based error analysis method to detect the weakness of SNCSE for future study.

preprint2020arXiv

Characteristic Lengths of Interlayer Charge-Transfer in Correlated Oxide Heterostructures

Using interlayer interaction to control functional heterostructures with atomic-scale designs has become one of the most effective interface-engineering strategies nowadays. Here, we demonstrate the effect of a crystalline LaFeO3 buffer layer on amorphous and crystalline LaAlO3/SrTiO3 heterostructures. The LaFeO3 buffer layer acts as an energetically favored electron acceptor in both LaAlO3/SrTiO3 systems, resulting in modulation of interfacial carrier density and hence metal-to-insulator transition. For amorphous and crystalline LaAlO3/SrTiO3 heterostructures, the metal-to-insulator transition is found when the LaFeO3 layer thickness crosses 3 and 6 unit cells, respectively. Such different critical LaFeO3 thicknesses are explained in terms of distinct characteristic lengths of the redox-reaction-mediated and polar-catastrophe-dominated charge transfer, controlled by the interfacial atomic contact and Thomas-Fermi screening effect, respectively. Our results not only shed light on the complex interlayer charge transfer across oxide heterostructures but also provides a new route to precisely tailor the charge-transfer process at a functional interface.

preprint2020arXiv

SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet- 50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.

preprint2019arXiv

Aperiodic quantum oscillations in the two-dimensional electron gas at the LaAlO3/SrTiO3 interface

Despite several attempts, the intimate electronic structure of two-dimensional electron systems buried at the interface between LaAlO3 and SrTiO3 still remains to be experimentally revealed. Here, we investigate the transport properties of a high-mobility quasi-two-dimensional electron gas at this interface under high magnetic field (55 T) and provide new insights for electronic band structure by analyzing the Shubnikov-de Haas oscillations. Interestingly, the quantum oscillations are not 1/B-periodic and produce a highly non-linear Landau plot (Landau level index versus 1/B). Among possible scenarios, the Roth-Gao-Niu equation provides a natural explanation for 1/B-aperiodic oscillations in relation with the magnetic response functions of the system. Overall, the magneto-transport data are discussed in light of high-resolution scanning transmission electron microscopy analysis of the interface as well as calculations from density functional theory.