Researcher profile

Lijie Wen

Lijie Wen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Large vision-language models (LVLMs) often hallucinate when language priors dominate weak or ambiguous visual evidence. Existing contrastive decoding methods mitigate this problem by comparing predictions from the original image with those from externally perturbed visual inputs, but such references can introduce off-manifold artifacts and require costly extra forward passes. We propose SIRA, a training-free internal contrastive decoding framework that constructs a counterfactual reference inside the same LVLM by exploiting the staged information flow of multimodal transformers. Instead of removing visual information from the input, SIRA first lets image and text tokens interact through a shared prefix, forming an aligned multimodal state that preserves prompt interpretation, decoding history, positional structure, and early visual grounding. It then forks a counterfactual branch in later transformer layers, where attention to image-token positions is masked. This branch retains the shared multimodal context but lacks continued access to fine-grained visual evidence, yielding a language-prior-dominated internal reference for token-level contrast. During decoding, SIRA suppresses tokens that remain strong without late visual access and favors predictions whose advantage depends on the full visual pathway. Experiments on POPE, CHAIR, and AMBER with Qwen2.5-VL and LLaVA-v1.5 show that SIRA consistently reduces hallucinations while preserving descriptive coverage and incurring lower overhead than two-pass contrastive decoding. SIRA requires no training, external verifier, or perturbed input, and applies to open-weight LVLMs with white-box inference access.

preprint2026arXiv

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

Real-time duplex interaction is essential for multimodal AI systems operating in real-world scenarios, where models must continuously process streaming inputs and respond at appropriate moments. However, most existing multimodal large language models (MLLMs) are evaluated in offline settings, where the entire video input is processed before any response is generated. While recent work has started to explore real-time duplex MLLMs, there is still no comprehensive benchmark or automatic evaluation method for this setting. To address this gap, we propose Omni-DuplexEval, a benchmark for systematically evaluating real-time duplex interaction. The benchmark consists of two complementary scenarios: (1) Real-Time Description, which evaluates the ability to generate continuous, time-aligned responses that track evolving multimodal inputs, and (2) Proactive Reminder, which evaluates the ability to identify salient events and respond at appropriate moments. Omni-DuplexEval contains 660 videos with fine-grained, human-annotated labels and precise temporal metadata, spanning 9 tasks grounded in real-world scenarios, where all questions are formulated as open-ended queries. We further introduce an automatic evaluation framework based on LLM-as-a-Judge, which enables systematic assessment by jointly evaluating response-content alignment and response timing through timestamp-aware and sequential reasoning, achieving strong alignment with human judgments. Experiments on state-of-the-art duplex MLLMs reveal substantial limitations. The best-performing model achieves only 39.6% overall, while scoring only 20.0% on Proactive Reminder. Our analysis identifies two key challenges: models struggle to balance timely responses with coherent, holistic content generation, and they often fail to determine both when to respond and what to produce. We hope our work facilitates further progress in MLLMs.

preprint2022arXiv

A Multi-level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference

Natural Language Inference (NLI) is a growingly essential task in natural language understanding, which requires inferring the relationship between the sentence pairs (premise and hypothesis). Recently, low-resource natural language inference has gained increasing attention, due to significant savings in manual annotation costs and a better fit with real-world scenarios. Existing works fail to characterize discriminative representations between different classes with limited training data, which may cause faults in label prediction. Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference. MultiSCL leverages a sentence-level and pair-level contrastive learning objective to discriminate between different classes of sentence pairs by bringing those in one class together and pushing away those in different classes. MultiSCL adopts a data augmentation module that generates different views for input samples to better learn the latent representation. The pair-level representation is obtained from a cross attention module. We conduct extensive experiments on two public NLI datasets in low-resource settings, and the accuracy of MultiSCL exceeds other models by 3.1% on average. Moreover, our method outperforms the previous state-of-the-art method on cross-domain tasks of text classification.

preprint2022arXiv

CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking

The explosion of misinformation spreading in the media ecosystem urges for automated fact-checking. While misinformation spans both geographic and linguistic boundaries, most work in the field has focused on English. Datasets and tools available in other languages, such as Chinese, are limited. In order to bridge this gap, we construct CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims. The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet. Further, we develop established baselines and a novel approach that is able to model the evidence retrieval as a latent variable, allowing jointly training with the veracity prediction model in an end-to-end fashion. Extensive experiments show that CHEF will provide a challenging testbed for the development of fact-checking systems designed to retrieve and reason over non-English claims.

preprint2022arXiv

Emergence of insulating ferrimagnetism and perpendicular magnetic anisotropy in 3d-5d perovskite oxide composite films for insulator spintronic

Magnetic insulators with strong perpendicular magnetic anisotropy (PMA) play a key role in exploring pure spin current phenomena and developing ultralow-dissipation spintronic devices, thereby it is highly desirable to develop new material platforms. Here we report epitaxial growth of La2/3Sr1/3MnO3 (LSMO)-SrIrO3 (SIO) composite oxide films (LSMIO) with different crystalline orientations fabricated by sequential two-target ablation process using pulsed laser deposition. The LSMIO films exhibit high crystalline quality with homogeneous mixture of LSMO and SIO at atomic level. Ferrimagnetic and insulating transport characteristics are observed, with the temperature-dependent electric resistivity well fitted by Mott variable-range-hopping model. Moreover, the LSMIO films show strong PMA. Through further constructing all perovskite oxide heterostructures of the ferrimagnetic insulator LSMIO and a strong spin-orbital coupled SIO layer, pronounced spin Hall magnetoresistance (SMR) and spin Hall-like anomalous Hall effect (SH-AHE) were observed. These results illustrate the potential application of the ferrimagnetic insulator LSMIO in developing all-oxide ultralow-dissipation spintronic devices.

preprint2022arXiv

Graph Neural Network with Curriculum Learning for Imbalanced Node Classification

Graph Neural Network (GNN) is an emerging technique for graph-based learning tasks such as node classification. In this work, we reveal the vulnerability of GNN to the imbalance of node labels. Traditional solutions for imbalanced classification (e.g. resampling) are ineffective in node classification without considering the graph structure. Worse still, they may even bring overfitting or underfitting results due to lack of sufficient prior knowledge. To solve these problems, we propose a novel graph neural network framework with curriculum learning (GNN-CL) consisting of two modules. For one thing, we hope to acquire certain reliable interpolation nodes and edges through the novel graph-based oversampling based on smoothness and homophily. For another, we combine graph classification loss and metric learning loss which adjust the distance between different nodes associated with minority class in feature space. Inspired by curriculum learning, we dynamically adjust the weights of different modules during training process to achieve better ability of generalization and discrimination. The proposed framework is evaluated via several widely used graph datasets, showing that our proposed model consistently outperforms the existing state-of-the-art methods.

preprint2022arXiv

Pair-Level Supervised Contrastive Learning for Natural Language Inference

Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer the relationship between the sentence pair (premise and hypothesis). Many recent works have used contrastive learning by incorporating the relationship of the sentence pair from NLI datasets to learn sentence representation. However, these methods only focus on comparisons with sentence-level representations. In this paper, we propose a Pair-level Supervised Contrastive Learning approach (PairSCL). We adopt a cross attention module to learn the joint representations of the sentence pairs. A contrastive learning objective is designed to distinguish the varied classes of sentence pairs by pulling those in one class together and pushing apart the pairs in other classes. We evaluate PairSCL on two public datasets of NLI where the accuracy of PairSCL outperforms other methods by 2.1% on average. Furthermore, our method outperforms the previous state-of-the-art method on seven transfer tasks of text classification.

preprint2022arXiv

Semantic Enhanced Text-to-SQL Parsing via Iteratively Learning Schema Linking Graph

The generalizability to new databases is of vital importance to Text-to-SQL systems which aim to parse human utterances into SQL statements. Existing works achieve this goal by leveraging the exact matching method to identify the lexical matching between the question words and the schema items. However, these methods fail in other challenging scenarios, such as the synonym substitution in which the surface form differs between the corresponding question words and schema items. In this paper, we propose a framework named ISESL-SQL to iteratively build a semantic enhanced schema-linking graph between question tokens and database schemas. First, we extract a schema linking graph from PLMs through a probing procedure in an unsupervised manner. Then the schema linking graph is further optimized during the training process through a deep graph learning method. Meanwhile, we also design an auxiliary task called graph regularization to improve the schema information mentioned in the schema-linking graph. Extensive experiments on three benchmarks demonstrate that ISESL-SQL could consistently outperform the baselines and further investigations show its generalizability and robustness.

preprint2022arXiv

Towards the Future: Bring Program Correctness back to the focus

Program correctness used to be the main concern of computer software in the early days when formal semantics was a hot topic. But, the word "correct" was afterwards replaced by reliable, robust and trustworthy etc., a tradeoff situation then. This is not because correctness is no longer important, but because people found no way to get through in this direction. The tradeoff has led software engineers to focus on techniques and testing tools. Rapid development of software engineering has now reached a peak and programmers are now working freely without worrying too much about bugs, since bugs are not avoidable anyway. Is it meaningful to talk about program correctness today? Our answer is yes. It is the time to seriously consider correctness again, before it is too late, to prepare for the future. Future generation computer systems should be correct, both syntactically (statically) and semantically (dynamically). The book "OESPA: Semantic Oriented Theory of Programming" (2019) by the first author has opened a new direction for semantic study. Theoretically speaking, it is possible now, based on OESPA, to compute program semantics from program text so that program correctness could be proved. But, semantic computations and correctness proving cannot be done by hand when the size of a program is big. Automatic tools are necessary. This paper tries to lay a foundation for developing needed auto tools, so that OESPA is enriched to serve future need. To this end, a new concept named conditional semantic predicate is proposed. Concepts in OESPA, including semantic functions, semantic predicates, semantic formulas and semantic calculus, are re-represented in accordance. Such re-introduction is necessary since the book is the only publication on semantic calculus so far. The new version of semantic calculus illustrates how semantics auto-computation would be carried out.

preprint2022arXiv

What Makes the Story Forward? Inferring Commonsense Explanations as Prompts for Future Event Generation

Prediction over event sequences is critical for many real-world applications in Information Retrieval and Natural Language Processing. Future Event Generation (FEG) is a challenging task in event sequence prediction because it requires not only fluent text generation but also commonsense reasoning to maintain the logical coherence of the entire event story. In this paper, we propose a novel explainable FEG framework, Coep. It highlights and integrates two types of event knowledge, sequential knowledge of direct event-event relations and inferential knowledge that reflects the intermediate character psychology between events, such as intents, causes, reactions, which intrinsically pushes the story forward. To alleviate the knowledge forgetting issue, we design two modules, Im and Gm, for each type of knowledge, which are combined via prompt tuning. First, Im focuses on understanding inferential knowledge to generate commonsense explanations and provide a soft prompt vector for Gm. We also design a contrastive discriminator for better generalization ability. Second, Gm generates future events by modeling direct sequential knowledge with the guidance of Im. Automatic and human evaluation demonstrate that our approach can generate more coherent, specific, and logical future events.

preprint2021arXiv

Cooperative control of perpendicular magnetic anisotropy via crystal structure and orientation in single-crystal flexible SrRuO3 membranes

Flexible magnetic materials with robust and controllable perpendicular magnetic anisotropy (PMA) are highly desirable for developing flexible high-performance spintronic devices. However, it is still challenge to fabricate PMA films through current techniques of direct deposition on polymers. Here, we report a facile method for synthesizing single-crystal freestanding SrRuO3 (SRO) membranes with controlled crystal structure and orientation using water-soluble Ca3-xSrxAl2O6 sacrificial layers. Through cooperative effect of crystal structure and orientation engineering, flexible SrRuO3 membranes reveal highly tunable magnetic anisotropy from in-plane to our-of-plane with a remarkable PMA energy of 7.34*106 erg/cm3. Based on the first-principles calculations, it reveals that the underlying mechanism of PMA modulation is intimately correlated with structure-controlled Ru 4d-orbital occupation, as well as the spin-orbital matrix element differences, dependent on the crystal orientation. In addition, there are no obvious changes of the magnetism after 10,000 bending cycles, indicating an excellent magnetism reliability in the prepared films. This work provides a feasible approach to prepare the flexible oxide films with strong and controllable PMA.

preprint2021arXiv

Lateral modulation of magnetic anisotropy in tricolor 3d-5d oxide superlattices

Manipulating magnetic anisotropy (MA) purposefully in transition metal oxides (TMOs) enables the development of oxide-based spintronic devices with practical applications. Here, we report a pathway to reversibly switch the lateral magnetic easy-axis via interfacial oxygen octahedral coupling (OOC) effects in 3d-5d tricolor superlattices, i.e. [SrIrO3,mRTiO3,SrIrO3,2La0.67Sr0.33MnO3]10 (RTiO3: SrTiO3 and CaTiO3). In the heterostructures, the anisotropy energy (MAE) is enhanced over one magnitude to ~106 erg/cm3 compared to La0.67Sr0.33MnO3 films. Moreover, the magnetic easy-axis is reversibly reoriented between (100)- and (110)-directions by changing the RTiO3. Using first-principles density functional theory calculations, we find that the SrIrO3 owns a large single-ion anisotropy due to its strong spin-orbit interaction. This anisotropy can be reversibly controlled by the OOC, then reorient the easy-axis of the superlattices. Additionally, it enlarges the MAE of the films via the cooperation with a robust orbital hybridization between the Ir and Mn atoms. Our results indicate that the tricolor superlattices consisting of 3d and 5d oxides provide a powerful platform to study the MA and develop oxide-based spintronic devices.

preprint2020arXiv

An Approach for Process Model Extraction By Multi-Grained Text Classification

Process model extraction (PME) is a recently emerged interdiscipline between natural language processing (NLP) and business process management (BPM), which aims to extract process models from textual descriptions. Previous process extractors heavily depend on manual features and ignore the potential relations between clues of different text granularities. In this paper, we formalize the PME task into the multi-grained text classification problem, and propose a hierarchical neural network to effectively model and extract multi-grained information without manually-defined procedural features. Under this structure, we accordingly propose the coarse-to-fine (grained) learning mechanism, training multi-grained tasks in coarse-to-fine grained order to share the high-level knowledge for the low-level tasks. To evaluate our approach, we construct two multi-grained datasets from two different domains and conduct extensive experiments from different dimensions. The experimental results demonstrate that our approach outperforms the state-of-the-art methods with statistical significance and further investigations demonstrate its effectiveness.

preprint2020arXiv

Computation of Transition Adjacency Relations Based on Complete Prefix Unfolding (Technical Report)

An increasing number of works have devoted to the application of Transition Adjacency Relation (TAR) as a means to capture behavioral features of business process models. In this paper, we systematically study the efficient TAR derivation from process models using unfolding technique which previously has been used to address the state space explosion when dealing with concurrent behaviors of a Petri net. We reveal and formally describe the equivalence between TAR and Event Adjacency Relation (EAR), the manifestation of TAR in the Complete Prefix Unfolding (CPU) of a Petri net. By computing TARs from CPU using this equivalence, we can alleviate the concurrency caused state-explosion issues. Furthermore, structural boosting rules are categorized, proved and added to the TAR computing algorithm. Formal proofs of correctness and generality of CPU-based TAR computation are provided for the first time by this work, and they significantly expand the range of Petri nets from which TARs can be efficiently derived. Experiments on both industrial and synthesized process models show the effectiveness of proposed CPU-based algorithms as well as the observation that they scale well with the increase in size and concurrency of business process models.