Source author record

Wenbin Zhang

Wenbin Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

20works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Elucidating Representation Degradation Problem in Diffusion Model Training

Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which can destabilize training and impair generation quality. Our analysis suggests that this instability is driven by mismatched target recoverability, which is associated with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. To address this, we propose Elucidated Representation Diffusion (ERD), a plug-and-play framework that dynamically reallocates optimization effort according to effective recoverability. By stabilizing representation learning without external supervision, ERD accelerates convergence and achieves strong empirical performance across diffusion backbones.

preprint2026arXiv

Fairness Definitions in Language Models Explained

Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up-to-date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their transformer architecture: encoder-only, decoder-only, and encoder-decoder LMs. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The repository is publicly available online at https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/definitions.

preprint2026arXiv

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.

preprint2024arXiv

Tracking Surface Charge Dynamics on Single Nanoparticles

Surface charges play a fundamental role in physics and chemistry, particularly in shaping the catalytic properties of nanomaterials. Tracking nanoscale surface charge dynamics remains challenging due to the involved length and time scales. Here, we demonstrate real-time access to the nanoscale charge dynamics on dielectric nanoparticles employing reaction nanoscopy. We present a four-dimensional visualization of the non-linear charge dynamics on strong-field irradiated single SiO$_2$ nanoparticles with femtosecond-nanometer resolution and reveal how surface charges affect surface molecular bonding with quantum dynamical simulations. We performed semi-classical simulations to uncover the roles of diffusion and charge loss in the surface charge redistribution process. Understanding nanoscale surface charge dynamics and its influence on chemical bonding on a single nanoparticle level unlocks an increased ability to address global needs in renewable energy and advanced healthcare.

preprint2023arXiv

Unveiling and Mitigating Bias in Ride-Hailing Pricing for Equitable Policy Making

Ride-hailing services have skyrocketed in popularity due to the convenience they offer, but recent research has shown that their pricing strategies can have a disparate impact on some riders, such as those living in disadvantaged neighborhoods with a greater share of residents of color or residents below the poverty line. Since these communities tend to be more dependent on ride-hailing services due to lack of adequate public transportation, it is imperative to address this inequity. To this end, this paper presents the first thorough study on fair pricing for ride-hailing services by devising applicable fairness measures and corresponding fair pricing mechanisms. By providing discounts that may be subsidized by the government, our approach results in an increased number and more affordable rides for the disadvantaged community. Experiments on real-world Chicago taxi data confirm our theoretical findings which provide a basis for the government to establish fair ride-hailing policies.

preprint2022arXiv

A survey on datasets for fairness-aware machine learning

As decision-making increasingly relies on Machine Learning (ML) and (big) data, the issue of fairness in data-driven Artificial Intelligence (AI) systems is receiving increasing attention from both research and industry. A large variety of fairness-aware machine learning solutions have been proposed which involve fairness-related interventions in the data, learning algorithms and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware machine learning. We focus on tabular data as the most common data representation for fairness-aware machine learning. We start our analysis by identifying relationships between the different attributes, particularly w.r.t. protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate the interesting relationships using exploratory analysis.

preprint2022arXiv

Femtosecond rotational dynamics of D$_2$ molecules in superfluid helium nanodroplets

Rotational dynamics of D$_2$ molecules inside helium nanodroplets is induced by a moderately intense femtosecond (fs) pump pulse and measured as a function of time by recording the yield of HeD$^+$ ions, created through strong-field dissociative ionization with a delayed fs probe pulse. The yield oscillates with a period of 185 fs, reflecting field-free rotational wave packet dynamics, and the oscillation persists for more than 500 periods. Within the experimental uncertainty, the rotational constant BHe of the in-droplet D$_2$ molecule, determined by Fourier analysis, is the same as Bgas for an isolated D$_2$ molecule. Our observations show that the D$_2$ molecules inside helium nanodroplets essentially rotate as free D$_2$ molecules.

preprint2022arXiv

Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing presence-based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding sequential miner with novel strategies to prune the search space efficiently. Experiments on both real and synthetic data show the utility of our approach.

preprint2022arXiv

Longitudinal Fairness with Censorship

Recent works in artificial intelligence fairness attempt to mitigate discrimination by proposing constrained optimization programs that achieve parity for some fairness statistic. Most assume availability of the class label, which is impractical in many real-world applications such as precision medicine, actuarial analysis and recidivism prediction. Here we consider fairness in longitudinal right-censored environments, where the time to event might be unknown, resulting in censorship of the class label and inapplicability of existing fairness studies. We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship for these important and socially-sensitive tasks. Our experiments on four censored datasets confirm the utility of our approach.

preprint2021arXiv

Disentangled Dynamic Graph Deep Generation

Deep generative models for graphs have exhibited promising performance in ever-increasing domains such as design of molecules (i.e, graph of atoms) and structure prediction of proteins (i.e., graph of amino acids). Existing work typically focuses on static rather than dynamic graphs, which are actually very important in the applications such as protein folding, molecule reactions, and human mobility. Extending existing deep generative models from static to dynamic graphs is a challenging task, which requires to handle the factorization of static and dynamic characteristics as well as mutual interactions among node and edge patterns. Here, this paper proposes a novel framework of factorized deep generative models to achieve interpretable dynamic graph generation. Various generative models are proposed to characterize conditional independence among node, edge, static, and dynamic factors. Then, variational optimization strategies as well as dynamic graph decoders are proposed based on newly designed factorized variational autoencoders and recurrent graph deconvolutions. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed models.

preprint2021arXiv

Using Machine Learning to Automate Mammogram Images Analysis

Breast cancer is the second leading cause of cancer-related death after lung cancer in women. Early detection of breast cancer in X-ray mammography is believed to have effectively reduced the mortality rate. However, a relatively high false positive rate and a low specificity in mammography technology still exist. In this work, a computer-aided automatic mammogram analysis system is proposed to process the mammogram images and automatically discriminate them as either normal or cancerous, consisting of three consecutive image processing, feature selection, and image classification stages. In designing the system, the discrete wavelet transforms (Daubechies 2, Daubechies 4, and Biorthogonal 6.8) and the Fourier cosine transform were first used to parse the mammogram images and extract statistical features. Then, an entropy-based feature selection method was implemented to reduce the number of features. Finally, different pattern recognition methods (including the Back-propagation Network, the Linear Discriminant Analysis, and the Naive Bayes Classifier) and a voting classification scheme were employed. The performance of each classification strategy was evaluated for sensitivity, specificity, and accuracy and for general performance using the Receiver Operating Curve. Our method is validated on the dataset from the Eastern Health in Newfoundland and Labrador of Canada. The experimental results demonstrated that the proposed automatic mammogram analysis system could effectively improve the classification performances.

preprint2019arXiv

Echo in a Single Molecule

Echo is a ubiquitous phenomenon found in many physical systems, ranging from spins in magnetic fields to particle beams in hadron accelerators. It is typically observed in inhomogeneously broadened ensembles of nonlinear objects, and is used to eliminate the effects of environmental-induced dephasing, enabling observation of proper, inherent object properties. Here, we report experimental observation of quantum wave packet echoes in a single isolated molecule. In contrast to conventional echoes, here the entire dephasing-rephasing cycle occurs within a single molecule without any inhomogeneous spread of molecular properties, or any interaction with the environment. In our experiments, we use a short laser pulse to impulsively excite a vibrational wave packet in an anharmonic molecular potential, and observe its oscillations and eventual dispersion with time. A second delayed pulsed excitation is applied, giving rise to an echo: a partial recovery of the initial coherent wavepacket. The vibrational dynamics of single molecules is visualized by time-delayed probe pulse dissociating them one at a time. Two mechanisms for the echo formation are discussed: ac Stark-induced molecular potential shaking and creation of depletion-induced "hole" in the nuclear spatial distribution. Interplay between the optically induced echoes and quantum revivals of the vibrational wave packets is observed and theoretically analyzed. The single molecule wave packet echoes may lead to the development of new tools for probing ultrafast intramolecular processes in various molecules.

preprint2016arXiv

Disentangling the role of laser coupling in directional breaking of molecules

The directional control of molecular dissociation with the laser electric field waveform is a paradigm and was demonstrated for a variety of molecules. In most cases, the directional control occurs via a dissociative ionization pathway. The role of laser-induced coupling of electronic states in the dissociating ion versus selective ionization of oriented neutral molecules, however, could not be distinguished for even small heteronuclear molecules such as CO. Here, we introduce a technique, using elliptically polarized pump and linearly polarized two-color probe pulses that unambiguously distinguishes the roles of laser-induced state coupling and selective ionization. The measured photoelectron momentum distributions governed by the light polarizations allow us to coincidently identify the ionization and dissociation from the pump and probe pulses. Directional dissociation of CO+ as a function of the relative phase of the linearly polarized two-color pulse is observed for both parallel and orthogonally oriented molecules. We find that the laser-induced coupling of various electronic states of CO+ plays an important role for the observed directional bond breaking, which is verified by quantum calculations.

preprint2016arXiv

Molecular echoes in space and time

Mountain echoes are a well-known phenomenon, where an impulse excitation is mirrored by the rocks to generate a replica of the original stimulus, often with reverberating recurrences. For spin echoes in magnetic resonance and photon echoes in atomic and molecular systems the role of the mirror is played by a second, time delayed pulse which is able to reverse the ow of time and recreate the original event. Recently, laser-induced rotational alignment and orientation echoes were introduced for molecular gases, and discussed in terms of rotational-phase-space filamentation. Here we present, for the first time, a direct spatiotemporal analysis of various molecular alignment echoes by means of coincidence Coulomb explosion imaging. We observe hitherto unreported spatially rotated echoes, that depend on the polarization direction of the pump pulses, and find surprising imaginary echoes at negative times.

preprint2014arXiv

A Multi-factor Adaptive Statistical Arbitrage Model

This paper examines the implementation of a statistical arbitrage trading strategy based on co-integration relationships where we discover candidate portfolios using multiple factors rather than just price data. The portfolio selection methodologies include K-means clustering, graphical lasso and a combination of the two. Our results show that clustering appears to yield better candidate portfolios on average than naively using graphical lasso over the entire equity pool. A hybrid approach of using the combination of graphical lasso and clustering yields better results still. We also examine the effects of an adaptive approach during the trading period, by re-computing potential portfolios once to account for change in relationships with passage of time. However, the adaptive approach does not produce better results than the one without re-learning. Our results managed to pass the test for the presence of statistical arbitrage test at a statistically significant level. Additionally we were able to validate our findings over a separate dataset for formation and trading periods.

preprint2014arXiv

News-Based Group Modeling and Forecasting

In this paper, we study news group modeling and forecasting methods using quantitative data generated by our large-scale natural language processing (NLP) text analysis system. A news group is a set of news entities, like top U.S. cities, governors, senators, golfers, or movie actors. Our fame distribution analysis of news groups shows that log-normal and power-law distributions generally could describe news groups in many aspects. We use several real news groups including cities, politicians, and CS professors, to evaluate our news group models in terms of time series data distribution analysis, group-fame probability analysis, and fame-changing analysis over long time. We also build a practical news generation model using a HMM (Hidden Markov Model) based approach. Most importantly, our analysis shows the future entity fame distribution has a power-law tail. That is, only a small number of news entities in a group could become famous in the future. Based on these analysis we are able to answer some interesting forecasting problems - for example, what is the future average fame (or maximum fame) of a specific news group? And what is the probability that some news entity become very famous within a certain future time range? We also give concrete examples to illustrate our forecasting approaches.

preprint2013arXiv

Orbital stability of peakons for a generalized Camassa-Holm equation with both quadratic and cubic nonlinearity

In this paper, we investigate the orbital stability problem of peakons for a modified Camassa-Holm equation with both quadratic and cubic nonlinearity. This equation was derived from integrable theory and admits peaked soliton (peakon) and multipeakon solutions. By constructing two suitable piecewise functions, we establish the polynomial inequality relating to two conserved quantities and the maximum of the solution to this equation. The error estimate between the maximum of the solution and the peakon then follows from the structure of the polynomial inequality. Finally, we prove that a wave starting close to the peakon remains close to some translate of it at all later times, that is, the shapes of these peakons are stable under small perturbations.

preprint2012arXiv

Group Operads and Homotopy Theory

We introduce the classical theory of the interplay between group theory and topology into the context of operads and explore some applications to homotopy theory. We first propose a notion of a group operad and then develop a theory of group operads, extending the classical theories of groups, spaces with actions of groups, covering spaces and classifying spaces of groups. In particular, the fundamental groups of a topological operad is naturally a group operad and its higher homotopy groups are naturally operads with actions of its fundamental groups operad, and a topological $K(π,1)$ operad is characterized by and can be reconstructed from its fundamental groups operad. Two most important examples of group operads are the symmetric groups operad and the braid groups operad which provide group models for $Ω^{\infty} Σ^{\infty} X$ (due to Barratt and Eccles) and $Ω^2 Σ^2 X$ (due to Fiedorowicz) respectively. We combine the two models together to produce a free group model for the canonical stabilization $Ω^2 Σ^2 X \hookrightarrow Ω^{\infty} Σ^{\infty} X$, in particular a free group model for its homotopy fibre.

preprint2012arXiv

Nanoparticle enhanced evaporation of liquids: A case study of silicone oil and water

Evaporation is a fundamental physical phenomenon, of which many challenging questions remain unanswered. Enhanced evaporation of liquids in some occasions is of enormous practical significance. Here we report the enhanced evaporation of the nearly permanently stable silicone oil by dispersing with nanopariticles including CaTiO3, anatase and rutile TiO2. The results can inspire the research of atomistic mechanism for nanoparticle enhanced evaporation and exploration of evaporation control techniques for treatment of oil pollution and restoration of dirty water.

preprint2011arXiv

Operations on Spaces over Operads and Applications to Homotopy Groups

We establish certain smash operations on spaces over operads which are general analogues of the Samelson product on single loop spaces, and obtain a conceptual description of the structure of the homotopy groups of spaces $Y$ over a symmetric $K(π,1)$ operad: $π_*Y$ is a module over the free algebraic symmetric operad generated by operations on homotopy groups induced by these smash operations. In particular the homotopy groups of double loop spaces is a module over the free algebraic symmetric operad generated by the conjugacy classes of Brunnian braids modulo the conjugation action of pure braids.

Wenbin Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Elucidating Representation Degradation Problem in Diffusion Model Training

Fairness Definitions in Language Models Explained

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Tracking Surface Charge Dynamics on Single Nanoparticles

Unveiling and Mitigating Bias in Ride-Hailing Pricing for Equitable Policy Making

A survey on datasets for fairness-aware machine learning

Femtosecond rotational dynamics of D$_2$ molecules in superfluid helium nanodroplets

Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

Longitudinal Fairness with Censorship

Disentangled Dynamic Graph Deep Generation

Using Machine Learning to Automate Mammogram Images Analysis

Echo in a Single Molecule

Disentangling the role of laser coupling in directional breaking of molecules

Molecular echoes in space and time

A Multi-factor Adaptive Statistical Arbitrage Model

News-Based Group Modeling and Forecasting

Orbital stability of peakons for a generalized Camassa-Holm equation with both quadratic and cubic nonlinearity

Group Operads and Homotopy Theory

Nanoparticle enhanced evaporation of liquids: A case study of silicone oil and water

Operations on Spaces over Operads and Applications to Homotopy Groups