Source author record

Wenjie Du

Wenjie Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Logic in Computer Science Machine Learning Artificial Intelligence Computation and Language Networking and Internet Architecture Software Engineering

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Environment-Aware Code Generation: How far are We?

Recent progress in large language models (LLMs) has improved code generation, but most evaluations still test isolated, small-scale code (e.g., a single function) under default or unspecified software environments. As a result, it is unclear whether LLMs can reliably generate executable code tailored to a user's specific environment. We present the first systematic study of Environment-Aware Code Generation (EACG), where generated code must be functionally correct and directly executable under arbitrary software configurations. To enable realistic evaluation, we introduce VersiBCB, a benchmark that is multi-package, execution-verified, and deprecation-aware, capturing complex and evolving environments that prior datasets often overlook. Using VersiBCB, we investigate three complementary adaptation axes: data, parameters, and cache, and develop representative strategies for each. Our results show that current LLMs struggle with environment-specific code generation, while our adaptations improve environment compatibility and executability. These findings highlight key challenges and opportunities for deploying LLMs in practical software engineering workflows.

preprint2026arXiv

HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation

Time series imputation benefits from leveraging cross-feature correlations, yet existing attention-based methods re-discover feature relationships at each layer, lacking persistent anchors to maintain consistent representations. To address this, we propose HELIX, which assigns each feature a learnable feature identity, a persistent embedding that captures intrinsic semantic properties throughout the network. Unlike graph-based methods that rely on predefined topology and assume homogeneous spatial relationships, HELIX learns arbitrary feature dependencies end-to-end from temporal co-variation, naturally handling datasets where features mix spatial locations with semantic variables. Integrated with hybrid temporal-feature attention, HELIX achieves the state-of-the-art performance, surpassing all 16 baselines on 5 public datasets across 21 experimental settings in our evaluation. Furthermore, our mechanistic analysis reveals that HELIX aligns learned feature identities and dependencies with latent physical and semantic structure progressively across layers, demonstrating that it more effectively translates cross-feature structure into imputation accuracy.

preprint2022arXiv

Forecasting Loss of Signal in Optical Networks with Machine Learning

Loss of Signal (LOS) represents a significant cost for operators of optical networks. By studying large sets of real-world Performance Monitoring (PM) data collected from six international optical networks, we find that it is possible to forecast LOS events with good precision 1-7 days before they occur, albeit at relatively low recall, with supervised machine learning (ML). Our study covers twelve facility types, including 100G lines and ETH10G clients. We show that the precision for a given network improves when training on multiple networks simultaneously relative to training on an individual network. Furthermore, we show that it is possible to forecast LOS from all facility types and all networks with a single model, whereas fine-tuning for a particular facility or network only brings modest improvements. Hence our ML models remain effective for optical networks previously unknown to the model, which makes them usable for commercial applications.

preprint2015arXiv

Modal Characterisations of Behavioural Pseudometrics

For the model of probabilistic labelled transition systems that allow for the co-existence of nondeterminism and probabilities, we present two notions of bisimulation metrics: one is state-based and the other is distribution-based. We provide a sound and complete modal characterisation for each of them, using real-valued modal logics based on the Hennessy-Milner logic. The logic for characterising the state-based metric is much simpler than an earlier logic by Desharnais et al. as it uses only two non-expansive operators rather than the general class of non-expansive operators.

preprint2011arXiv

Logical, Metric, and Algorithmic Characterisations of Probabilistic Bisimulation

Many behavioural equivalences or preorders for probabilistic processes involve a lifting operation that turns a relation on states into a relation on distributions of states. We show that several existing proposals for lifting relations can be reconciled to be different presentations of essentially the same lifting operation. More interestingly, this lifting operation nicely corresponds to the Kantorovich metric, a fundamental concept used in mathematics to lift a metric on states to a metric on distributions of states, besides the fact the lifting operation is related to the maximum flow problem in optimisation theory. The lifting operation yields a neat notion of probabilistic bisimulation, for which we provide logical, metric, and algorithmic characterisations. Specifically, we extend the Hennessy-Milner logic and the modal mu-calculus with a new modality, resulting in an adequate and an expressive logic for probabilistic bisimilarity, respectively. The correspondence of the lifting operation and the Kantorovich metric leads to a natural characterisation of bisimulations as pseudometrics which are post-fixed points of a monotone function. We also present an "on the fly" algorithm to check if two states in a finitary system are related by probabilistic bisimilarity, exploiting the close relationship between the lifting operation and the maximum flow problem.