Source author record

Jian Zhu

Jian Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall Computation and Language Computer Vision cond-mat.mtrl-sci eess.AS Applications Artificial Intelligence cond-mat.other cond-mat.soft cond-mat.stat-mech cond-mat.supr-con Information Retrieval Machine Learning Software Engineering Sound

Catalog footprint

What is connected

15works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Generative Diffusion Contrastive Network for Multi-View Clustering

In recent years, Multi-View Clustering (MVC) has been significantly advanced under the influence of deep learning. By integrating heterogeneous data from multiple views, MVC enhances clustering analysis, making multi-view fusion critical to clustering performance. However, there is a problem of low-quality data in multi-view fusion. This problem primarily arises from two reasons: 1) Certain views are contaminated by noisy data. 2) Some views suffer from missing data. This paper proposes a novel Stochastic Generative Diffusion Fusion (SGDF) method to address this problem. SGDF leverages a multiple generative mechanism for the multi-view feature of each sample. It is robust to low-quality data. Building on SGDF, we further present the Generative Diffusion Contrastive Network (GDCN). Extensive experiments show that GDCN achieves the state-of-the-art results in deep MVC tasks. The source code is publicly available at https://github.com/HackerHyper/GDCN.

preprint2026arXiv

POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Recent advances in spoken language processing have led to substantial progress in phonetic tasks such as automatic speech recognition (ASR), phone recognition (PR), grapheme-to-phoneme conversion (G2P), and phoneme-to-grapheme conversion (P2G). Despite their conceptual similarity, these tasks have largely been studied in isolation, each relying on task-specific architectures and datasets. In this paper, we introduce POWSM (Phonetic Open Whisper-style Speech Model), the first unified framework capable of jointly performing multiple phone-related tasks. POWSM enables seamless conversion between audio, text (graphemes), and phones, opening up new possibilities for universal and low-resource speech processing. Our model outperforms or matches specialized PR models of similar size (Wav2Vec2Phoneme and ZIPA) while jointly supporting G2P, P2G, and ASR. Our training data, code and models are released to foster open science.

preprint2026arXiv

ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

Brain encoding models not only serve to decipher how visual stimuli are transformed into neural responses, but also represent a critical step toward visual prostheses that restore vision for patients with severe vision disorders. Brain encoding involves two fundamental steps: achieving faithful reconstruction of neural responses and establishing cross-modal alignment between visual stimuli and neural responses. To this end, we propose ViBE, a novel brain encoding framework for generating magnetoencephalography (MEG) and electroencephalography (EEG) signals from visual stimuli. Specifically, we first design a spatio-temporal convolutional variational autoencoder (TSC-VAE) that captures the spatio-temporal characteristics of M/EEG signals for effective neural response reconstruction. To bridge the modality gap between visual features and neural representations, we employ Q-Former to map CLIP image embeddings to the TSC-VAE latent space, producing neural proxy embeddings. For comprehensive cross-modal alignment, we combine mean squared error (MSE) loss for point-wise feature matching with sliced Wasserstein distance (SWD) for probability distribution alignment between the neural proxy embeddings and TSC-VAE latent embeddings. We conduct extensive experiments on the THINGS-EEG2 and THINGS-MEG datasets, demonstrating the effectiveness of our approach in generating high-quality M/EEG signals from visual stimuli.

preprint2022arXiv

ByT5 model for massively multilingual grapheme-to-phoneme conversion

In this study, we tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5. We have curated a G2P dataset from various sources that covers around 100 languages and trained large-scale multilingual G2P models based on ByT5. We found that ByT5 operating on byte-level inputs significantly outperformed the token-based mT5 model in terms of multilingual G2P. Pairwise comparison with monolingual models in these languages suggests that multilingual ByT5 models generally lower the phone error rate by jointly learning from a variety of languages. The pretrained model can further benefit low resource G2P through zero-shot prediction on unseen languages or provides pretrained weights for finetuning, which helps the model converge to a lower phone error rate than randomly initialized weights. To facilitate future research on multilingual G2P, we make available our code and pretrained multilingual G2P models at: https://github.com/lingjzhu/CharsiuG2P.

preprint2022arXiv

Phone-to-audio alignment without text: A Semi-supervised Approach

The task of phone-to-audio alignment has many applications in speech research. Here we introduce two Wav2Vec2-based models for both text-dependent and text-independent phone-to-audio alignment. The proposed Wav2Vec2-FS, a semi-supervised model, directly learns phone-to-audio alignment through contrastive learning and a forward sum loss, and can be coupled with a pretrained phone recognizer to achieve text-independent alignment. The other model, Wav2Vec2-FC, is a frame classification model trained on forced aligned labels that can both perform forced alignment and text-independent segmentation. Evaluation results suggest that both proposed methods, even when transcriptions are not available, generate highly close results to existing forced alignment tools. Our work presents a neural pipeline of fully automated phone-to-audio alignment. Code and pretrained models are available at https://github.com/lingjzhu/charsiu.

preprint2022arXiv

Rethinking Position Bias Modeling with Knowledge Distillation for CTR Prediction

Click-through rate (CTR) Prediction is of great importance in real-world online ads systems. One challenge for the CTR prediction task is to capture the real interest of users from their clicked items, which is inherently biased by presented positions of items, i.e., more front positions tend to obtain higher CTR values. A popular line of existing works focuses on explicitly estimating position bias by result randomization which is expensive and inefficient, or by inverse propensity weighting (IPW) which relies heavily on the quality of the propensity estimation. Another common solution is modeling position as features during offline training and simply adopting fixed value or dropout tricks when serving. However, training-inference inconsistency can lead to sub-optimal performance. Furthermore, post-click information such as position values is informative while less exploited in CTR prediction. This work proposes a simple yet efficient knowledge distillation framework to alleviate the impact of position bias and leverage position information to improve CTR prediction. We demonstrate the performance of our proposed method on a real-world production dataset and online A/B tests, achieving significant improvements over competing baseline models. The proposed method has been deployed in the real world online ads systems, serving main traffic on one of the world's largest e-commercial platforms.

preprint2020arXiv

Formal Verification of Solidity contracts in Event-B

Smart contracts are the artifact of the blockchain that provide immutable and verifiable specifications of physical transactions. Solidity is a domain-specific programming language with the purpose of defining smart contracts. It aims at reducing the transaction costs occasioned by the execution of contracts on the distributed ledgers such as the Ethereum. However, Solidity contracts need to adhere safety and security requirements that require formal verification and certification. This paper proposes a method to meet such requirements by translating Solidity contracts to Event-B models, supporting certification. To that purpose, we define a restrained Solidity subset and a transfer function which translates Solidity contracts to Event-B models. Then we take advantage of Event-B method capabilities to refine models at different levels of abstraction to verify Solidity contracts' properties. And we can verify the generated proof obligations of the Event-B model with the help of the Rodin platform.

preprint2019arXiv

Alternative Analysis Methods for Time to Event Endpoints under Non-proportional Hazards: A Comparative Analysis

The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan-Meier curve-based tests (including weighted Kaplan-Meier and Restricted Mean Survival Time, RMST), and combination tests (including Breslow test, Lee's combo test, and MaxCombo test). Nine scenarios representing the PH and various non-PH patterns were simulated. The power, type I error, and effect estimates of each method were compared. In general, all tests control type I error well. There is not a single most powerful test across all scenarios. In the absence of prior knowledge regarding the PH or non-PH patterns, the MaxCombo test is relatively robust across patterns. Since the treatment effect changes overtime under non-PH, the overall profile of the treatment effect may not be represented comprehensively based on a single measure. Thus, multiple measures of the treatment effect should be pre-specified as sensitivity analyses to evaluate the totality of the data.

preprint2016arXiv

Stable Aqueous Dispersions of Optically and Electronically Active Phosphorene

Understanding and exploiting the remarkable optical and electronic properties of phosphorene require mass production methods that avoid chemical degradation. While solution-based strategies have been developed for scalable exfoliation of black phosphorus, these techniques have thus far employed anhydrous organic solvents in an effort to minimize exposure to known oxidants, but at the cost of limited exfoliation yield and flake size distribution. Here, we present an alternative phosphorene production method based on surfactant-assisted exfoliation and post-processing of black phosphorus in deoxygenated water. From comprehensive microscopic and spectroscopic analysis, this approach is shown to yield phosphorene dispersions that are stable, highly concentrated, and comparable to micromechanically exfoliated phosphorene in structure and chemistry. Due to the high exfoliation efficiency of this process, the resulting phosphorene flakes are thinner than anhydrous organic solvent dispersions, thus allowing the observation of layer-dependent photoluminescence down to the monolayer limit. Furthermore, to demonstrate preservation of electronic properties following solution processing, the aqueous-exfoliated phosphorene flakes are employed in field-effect transistors with high drive currents and current modulation ratios. Overall, this method enables the isolation and mass production of few-layer phosphorene, which will accelerate ongoing efforts to realize a diverse range of phosphorene-based applications.

preprint2015arXiv

Large deviations of Rouse polymer chain: First passage problem

The purpose of this paper is to investigate several analytical methods of solving first passage (FP) problem for the Rouse model, a simplest model of a polymer chain. We show that this problem has to be treated as a multi-dimensional Kramers' problem, which presents rich and unexpected behavior. We first perform direct and forward-flux sampling (FFS) simulations, and measure the mean first-passage time $τ(z)$ for the free end to reach a certain distance $z$ away from the origin. The results show that the mean FP time is getting faster if the Rouse chain is represented by more beads. Two scaling regimes of $τ(z)$ are observed, with transition between them varying as a function of chain length. We use these simulations results to test two theoretical approaches. One is a well known asymptotic theory valid in the limit of zero temperature. We show that this limit corresponds to fully extended chain when each chain segment is stretched, which is not particularly realistic. A new theory based on the well known Freidlin-Wentzell theory is proposed, where dynamics is projected onto the minimal action path. The new theory predicts both scaling regimes correctly, but fails to get the correct numerical prefactor in the first regime. Combining our theory with the FFS simulations lead us to a simple analytical expression valid for all extensions and chain lengths. One of the applications of polymer FP problem occurs in the context of branched polymer rheology. In this paper, we consider the arm-retraction mechanism in the tube model, which maps exactly on the model we have solved. The results are compared to the Milner-McLeish theory without constraint release, which is found to overestimate FP time by a factor of 10 or more.

preprint2013arXiv

Time Domain Mapping of Spin Torque Oscillator Effective Energy

Stochastic dynamics of spin torque oscillators (STOs) can be described in terms of magnetization drift and diffusion over a current-dependent effective energy surface given by the Fokker-Planck equation. Here we present a method that directly probes this effective energy surface via time-resolved measurements of the microwave voltage generated by a STO. We show that the effective energy approach provides a simple recipe for predicting spectral line widths and line shapes near the generation threshold. Our time domain technique also accurately measures the field-like component of spin torque in a wide range of the voltage bias values.

preprint2012arXiv

Voltage-Induced Ferromagnetic Resonance in Magnetic Tunnel Junctions

We demonstrate excitation of ferromagnetic resonance in CoFeB/MgO/CoFeB magnetic tunnel junctions (MTJs) by the combined action of voltage-controlled magnetic anisotropy (VCMA) and spin transfer torque (ST). Our measurements reveal that GHz-frequency VCMA torque and ST in low-resistance MTJs have similar magnitudes, and thus that both torques are equally important for understanding high-frequency voltage-driven magnetization dynamics in MTJs. As an example, we show that VCMA can increase the sensitivity of an MTJ-based microwave signal detector to the sensitivity level of semiconductor Schottky diodes.

preprint2010arXiv

Angular Dependence of the Superconducting Transition Temperature in Ferromagnet-Superconductor-Ferromagnet Trilayers

The superconducting transition temperature, $T_c$, of a ferromagnet (F) - superconductor (S) - ferromagnet trilayer depends on the mutual orientation of the magnetic moments of the F layers. This effect has been previously observed in F/S/F systems as a $T_c$ difference between parallel and antiparallel configurations of the F layers. Here we report measurements of $T_c$ in CuNi/Nb/CuNi trilayers as a function of the angle between the magnetic moments of the CuNi ferromagnets. The observed angular dependence of $T_c$ is in qualitative agreement with a F/S proximity theory that accounts for the odd triplet component of the condensate predicted to arise for non-collinear orientation of the magnetic moments of the F layers.

preprint2009arXiv

Resonant Nonlinear Damping of Quantized Spin Waves in Ferromagnetic Nanowires

We use spin torque ferromagnetic resonance to measure the spectral properties of dipole-exchange spin waves in permalloy nanowires. Our measurements reveal that geometric confinement has a profound effect on the damping of spin waves in the nanowire geometry. The damping parameter of the lowest-energy quantized spin wave mode depends on applied magnetic field in a resonant way and exhibits a maximum at a field that increases with decreasing nanowire width. This enhancement of damping originates from a nonlinear resonant three-magnon confluence process allowed at a particular bias field value determined by quantization of the spin wave spectrum in the nanowire geometry.

preprint2009arXiv

Stochastic resonance of a nanomagnet excited by spin transfer torque

Spin transfer torque from spin-polarized electrical current can excite large-amplitude magnetization dynamics in metallic ferromagnets of nanoscale dimensions. Since magnetic anisotropy energies of nanomagnets are comparable to the thermal energy scale, temperature can have a profound effect on the dynamics of a nanomagnet driven by spin transfer torque. Here we report the observation of unusual types of microwave-frequency nonlinear magnetization dynamics co-excited by alternating spin transfer torque and thermal fluctuations. In these dynamics, temperature amplifies the amplitude of GHz-range precession of magnetization and enables excitation of highly nonlinear dynamical states of magnetization by weak alternating spin transfer torque. We explain these thermally activated dynamics in terms of non-adiabatic stochastic resonance of magnetization driven by spin transfer torque. This type of magnetic stochastic resonance may find use in sensitive nanometer-scale microwave signal detectors.

Jian Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Generative Diffusion Contrastive Network for Multi-View Clustering

POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

ByT5 model for massively multilingual grapheme-to-phoneme conversion

Phone-to-audio alignment without text: A Semi-supervised Approach

Rethinking Position Bias Modeling with Knowledge Distillation for CTR Prediction

Formal Verification of Solidity contracts in Event-B

Alternative Analysis Methods for Time to Event Endpoints under Non-proportional Hazards: A Comparative Analysis

Stable Aqueous Dispersions of Optically and Electronically Active Phosphorene

Large deviations of Rouse polymer chain: First passage problem

Time Domain Mapping of Spin Torque Oscillator Effective Energy

Voltage-Induced Ferromagnetic Resonance in Magnetic Tunnel Junctions

Angular Dependence of the Superconducting Transition Temperature in Ferromagnet-Superconductor-Ferromagnet Trilayers

Resonant Nonlinear Damping of Quantized Spin Waves in Ferromagnetic Nanowires

Stochastic resonance of a nanomagnet excited by spin transfer torque