Researcher profile

Xinwei Shen

Xinwei Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Perturbation is All You Need for Extrapolating Language Models

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support performance, demonstrating that perturbation offers a practical route to language modeling.

preprint2025arXiv

To ArXiv or not to ArXiv: A Study Quantifying Pros and Cons of Posting Preprints Online

Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conducted surveys of reviewers in two top-tier double-blind computer science conferences -- ICML 2021 (5361 submissions and 4699 reviewers) and EC 2021 (498 submissions and 190 reviewers). Our three main findings are as follows. First, more than a third of the reviewers self-report searching online for a paper they are assigned to review. Second, conference policies restricting authors from publicising their work on social media or posting preprints before the review process may have only limited effectiveness in maintaining anonymity. Third, outside the review process, we find that preprints from better-ranked institutions experience a very small increase in visibility compared to preprints from other institutions.

preprint2022arXiv

A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction

In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.

preprint2022arXiv

A Stochastic Planning Method for Low-carbon Building-level Integrated Energy System Considering Electric-Heat-V2G Coupling

The concept of low-carbon building is proposed to ameliorate the climate change caused by environmental problems and realize carbon neutrality at the building level in urban areas. In addition, renewable energy curtailment in the power distribution system, as well as low efficiency due to independent operation of traditional energy systems, has been addressed by the application of integrated energy system (IES) to some extent. In this paper, we propose a planning method for low-carbon building-level IES, in which electric vehicles (EV) and the mode of Vehicle to Grid (V2G) are considered and further increase the flexibility of low-carbon buildings. The proposed planning model optimize the investment, operation costs and CO2 emission for building-level IES, so as to achieve the maximum benefit of the construction of the low-carbon building and help the realization of carbon neutrality. Moreover, we consider the uncertainty of distributed renewable energy, multi-energy load fluctuation and the random behavior of EV users, then formulating a two-stage stochastic programming model with chance constraints, in which heuristic moment matching scenario generation (HMMSG) and sample average approximation (SAA) method are applied. In case study, a real IES commercial building in Shanghai, where photovoltaic (PV), energy storage system (ESS), fuel cell (FC), EV, etc. are included as planning options, is used as numerical example to verify the effectiveness of the proposed planning method, with functions of ESS and EV in IES are analyzed in detail in different operation scenarios.

preprint2022arXiv

Asymptotic Statistical Analysis of $f$-divergence GAN

Generative Adversarial Networks (GANs) have achieved great success in data generation. However, its statistical properties are not fully understood. In this paper, we consider the statistical behavior of the general $f$-divergence formulation of GAN, which includes the Kullback--Leibler divergence that is closely related to the maximum likelihood principle. We show that for parametric generative models that are correctly specified, all $f$-divergence GANs with the same discriminator classes are asymptotically equivalent under suitable regularity conditions. Moreover, with an appropriately chosen local discriminator, they become equivalent to the maximum likelihood estimate asymptotically. For generative models that are misspecified, GANs with different $f$-divergences {converge to different estimators}, and thus cannot be directly compared. However, it is shown that for some commonly used $f$-divergences, the original $f$-GAN is not optimal in that one can achieve a smaller asymptotic variance when the discriminator training in the original $f$-GAN formulation is replaced by logistic regression. The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE). Empirical studies are provided to support the theory and to demonstrate the advantage of AGE over the original $f$-GANs under model misspecification.

preprint2022arXiv

Reframed GES with a Neural Conditional Dependence Measure

In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal structure. We observe that in order to make the GES algorithm consistent in a nonparametric setting, it is not necessary to design a scoring metric that evaluates graphs. Instead, it suffices to plug in a consistent estimator of a measure of conditional dependence to guide the search. We therefore present a reframing of the GES algorithm, which is more flexible than the standard score-based version and readily lends itself to the nonparametric setting with a general measure of conditional dependence. In addition, we propose a neural conditional dependence (NCD) measure, which utilizes the expressive power of deep neural networks to characterize conditional independence in a nonparametric manner. We establish the optimality of the reframed GES algorithm under standard assumptions and the consistency of using our NCD estimator to decide conditional independence. Together these results justify the proposed approach. Experimental results demonstrate the effectiveness of our method in causal discovery, as well as the advantages of using our NCD measure over kernel-based measures.

preprint2020arXiv

Bidirectional Generative Modeling Using Adversarial Gradient Estimation

This paper considers the general $f$-divergence formulation of bidirectional generative modeling, which includes VAE and BiGAN as special cases. We present a new optimization method for this formulation, where the gradient is computed using an adversarially learned discriminator. In our framework, we show that different divergences induce similar algorithms in terms of gradient evaluation, except with different scaling. Therefore this paper gives a general recipe for a class of principled $f$-divergence based generative modeling methods. Theoretical justifications and extensive empirical studies are provided to demonstrate the advantage of our approach over existing methods.

preprint2020arXiv

Stochastic Unit Commitment in Electricity-Gas Coupled Integrated Energy Systems based on Modified Progressive Hedging

The increasing number of gas-fired units has significantly intensified the coupling between power and gas networks. Traditionally, the nonlinearity and nonconvexity in gas flow equations, together with renewable-induced stochasticity, result in a computationally expensive model for unit commitment in electricity-gas coupled integrated energy systems (IES). To accelerate stochastic day-ahead scheduling, we applied and modified Progressive Hedging (PH), a heuristic approach that can be computed in parallel to yield scenario-independent unit commitment. By applying a termination and enumeration technique, the modified PH algorithm saves considerable computational time, especially when the unit production prices are similar for all generators, and when the scale of IES is large. Moreover, an adapted second-order cone relaxation (SOCR) is utilized to tackle the nonconvex gas flow equation. Case studies are performed on the IEEE 24-bus system/Belgium 20-node gas system and the IEEE 118-bus system/Belgium 20-node gas system. The computational efficiency when employing PH is 188 times that of commercial software, even outperforming Benders Decomposition. Meanwhile, the gap between the PH algorithm and the benchmark is less than 0.01% in both IES systems, which proves that the solution produced by PH reaches acceptable optimality in this stochastic UC problem.