Source author record

Li Zhu

Li Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mtrl-sci Computer Vision Artificial Intelligence Discrete Mathematics Machine Learning math.NA Numerical Analysis Computer Science and Game Theory cond-mat.str-el Cryptography and Security eess.IV eess.SP Information Theory math.IT physics.comp-ph physics.geo-ph Software Engineering

Catalog footprint

What is connected

21works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Deep Research Agents (DRAs) generate citation-rich reports via multi-step search and synthesis, yet existing benchmarks mainly target text-only settings or short-form multimodal QA, missing end-to-end multimodal evidence use. We introduce MMDeepResearch-Bench (MMDR-Bench), a benchmark of 140 expert-crafted tasks across 21 domains, where each task provides an image-text bundle to evaluate multimodal understanding and citation-grounded report generation. Compared to prior setups, MMDR-Bench emphasizes report-style synthesis with explicit evidence use, where models must connect visual artifacts to sourced claims and maintain consistency across narrative, citations, and visual references. We further propose a unified, interpretable evaluation pipeline: Formula-LLM Adaptive Evaluation (FLAE) for report quality, Trustworthy Retrieval-Aligned Citation Evaluation (TRACE) for citation-grounded evidence alignment, and Multimodal Support-Aligned Integrity Check (MOSAIC) for text-visual integrity, each producing fine-grained signals that support error diagnosis beyond a single overall score. Experiments across 25 state-of-the-art models reveal systematic trade-offs between generation quality, citation discipline, and multimodal grounding, highlighting that strong prose alone does not guarantee faithful evidence use and that multimodal integrity remains a key bottleneck for deep research agents.

preprint2026arXiv

Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

Biosignals acquired from different locations on the body often provide temporally ordered views of the same underlying physiological process. However, most existing self supervised learning methods treat these signals as interchangeable views, overlooking the directional temporal dynamics that link them. A canonical example is the relationship between electrocardiography (ECG), which captures the electrical activation initiating each heartbeat, and photoplethysmography (PPG), which records the resulting peripheral pulse delayed by vascular dynamics. To capture this structured relationship, we introduce xMAE, a biosignal pretraining framework that leverages masked cross modal reconstruction across temporally ordered biosignals as a training time constraint to encourage physiologically meaningful timing structure in the learned representations. We show that pretraining with xMAE yields representations that outperform both unimodal and multimodal baselines on 15 of 19 downstream tasks, including cardiovascular outcome prediction, abnormal laboratory test detection, sleep staging, and demographic inference, while generalizing across devices, body locations, and acquisition settings. Further analysis suggests that the ECG PPG timing structure is reflected in the learned PPG representations. More broadly, xMAE demonstrates the effectiveness of incorporating temporal structure into multimodal pretraining when signals observe different stages of a shared underlying process. Code is available at https://github.com/hzhou3/xMAE.

preprint2026arXiv

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models.

preprint2025arXiv

SoK: Web3 RegTech for Cryptocurrency VASP AML/CFT Compliance

The decentralized architecture of Web3 technologies creates fundamental challenges for Anti-Money Laundering and Counter-Financing of Terrorism compliance. Traditional regulatory technology solutions designed for centralized financial systems prove inadequate for blockchain's transparent yet pseudonymous networks. This systematization examines how blockchain-native RegTech solutions leverage distributed ledger properties to enable novel compliance capabilities. We develop three taxonomies organizing the Web3 RegTech domain: a regulatory paradigm evolution framework across ten dimensions, a compliance protocol taxonomy encompassing five verification layers, and a RegTech lifecycle framework spanning preventive, real-time, and investigative phases. Through analysis of 41 operational commercial platforms and 28 academic prototypes selected from systematic literature review (2015-2025), we demonstrate that Web3 RegTech enables transaction graph analysis, real-time risk assessment, cross-chain analytics, and privacy-preserving verification approaches that are difficult to achieve or less commonly deployed in traditional centralized systems. Our analysis reveals critical gaps between academic innovation and industry deployment, alongside persistent challenges in cross-chain tracking, DeFi interaction analysis, privacy protocol monitoring, and scalability. We synthesize architectural best practices and identify research directions addressing these gaps while respecting Web3's core principles of decentralization, transparency, and user sovereignty.

preprint2022arXiv

A filtering technique for the matrix power series being near-sparse

This work presents a new algorithm for matrix power series which is near-sparse, that is, there are a large number of near-zero elements in it. The proposed algorithm uses a filtering technique to improve the sparsity of the matrices involved in the calculation process of the Paterson-Stockmeyer (PS) scheme. Based on the error analysis considering the transaction error and the error introduced by filtering, the proposed algorithm can obtain similar accuracy as the original PS scheme but is more efficient than it. For the near-sparse matrix power series, the proposed method is also more efficient than the MATLAB built-in codes.

preprint2022arXiv

A new stable and avoiding inversion iteration for computing matrix square root

The objective of this research was to compute the principal matrix square root with sparse approximation. A new stable iterative scheme avoiding fully matrix inversion (SIAI) is provided. The analysis on the sparsity and error of the matrices involved during the iterative process is given. Based on the bandwidth and error analysis, a more efficient algorithm combining the SIAI with the filtering technique is proposed. The high computational efficiency and accuracy of the proposed method are demonstrated by computing the principal square roots of different matrices to reveal its applicability over the existing methods.

preprint2022arXiv

Progressive Glass Segmentation

Glass is very common in the real world. Influenced by the uncertainty about the glass region and the varying complex scenes behind the glass, the existence of glass poses severe challenges to many computer vision tasks, making glass segmentation as an important computer vision task. Glass does not have its own visual appearances but only transmit/reflect the appearances of its surroundings, making it fundamentally different from other common objects. To address such a challenging task, existing methods typically explore and combine useful cues from different levels of features in the deep network. As there exists a characteristic gap between level-different features, i.e., deep layer features embed more high-level semantics and are better at locating the target objects while shallow layer features have larger spatial sizes and keep richer and more detailed low-level information, fusing these features naively thus would lead to a sub-optimal solution. In this paper, we approach the effective features fusion towards accurate glass segmentation in two steps. First, we attempt to bridge the characteristic gap between different levels of features by developing a Discriminability Enhancement (DE) module which enables level-specific features to be a more discriminative representation, alleviating the features incompatibility for fusion. Second, we design a Focus-and-Exploration Based Fusion (FEBF) module to richly excavate useful information in the fusion process by highlighting the common and exploring the difference between level-different features.

preprint2022arXiv

ReGO: Reference-Guided Outpainting for Scenery Image

We aim to tackle the challenging yet practical scenery image outpainting task in this work. Recently, generative adversarial learning has significantly advanced the image outpainting by producing semantic consistent content for the given image. However, the existing methods always suffer from the blurry texture and the artifacts of the generative part, making the overall outpainting results lack authenticity. To overcome the weakness, this work investigates a principle way to synthesize texture-rich results by borrowing pixels from its neighbors (i.e., reference images), named \textbf{Re}ference-\textbf{G}uided \textbf{O}utpainting (ReGO). Particularly, the ReGO designs an Adaptive Content Selection (ACS) module to transfer the pixel of reference images for texture compensating of the target one. To prevent the style of the generated part from being affected by the reference images, a style ranking loss is further proposed to augment the ReGO to synthesize style-consistent results. Extensive experiments on two popular benchmarks, NS6K \cite{yangzx} and NS8K \cite{wang}, well demonstrate the effectiveness of our ReGO. Our code will be made public available.

preprint2021arXiv

Sketch-Guided Scenery Image Outpainting

The outpainting results produced by existing approaches are often too random to meet users' requirement. In this work, we take the image outpainting one step forward by allowing users to harvest personal custom outpainting results using sketches as the guidance. To this end, we propose an encoder-decoder based network to conduct sketch-guided outpainting, where two alignment modules are adopted to impose the generated content to be realistic and consistent with the provided sketches. First, we apply a holistic alignment module to make the synthesized part be similar to the real one from the global view. Second, we reversely produce the sketches from the synthesized part and encourage them be consistent with the ground-truth ones using a sketch alignment module. In this way, the learned generator will be imposed to pay more attention to fine details and be sensitive to the guiding sketches. To our knowledge, this work is the first attempt to explore the challenging yet meaningful conditional scenery image outpainting. We conduct extensive experiments on two collected benchmarks to qualitatively and quantitatively validate the effectiveness of our approach compared with the other state-of-the-art generative models.

preprint2020arXiv

DONet: Dual Objective Networks for Skin Lesion Segmentation

Skin lesion segmentation is a crucial step in the computer-aided diagnosis of dermoscopic images. In the last few years, deep learning based semantic segmentation methods have significantly advanced the skin lesion segmentation results. However, the current performance is still unsatisfactory due to some challenging factors such as large variety of lesion scale and ambiguous difference between lesion region and background. In this paper, we propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation. Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives. Concretely, the two objectives are actually defined by different loss functions. In this way, the two decoders are encouraged to produce differentiated probability maps to match different optimization targets, resulting in complementary predictions accordingly. The complementary information learned by these two objectives are further aggregated together to make the final prediction, by which the uncertainty existing in segmentation maps can be significantly alleviated. Besides, to address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM) to model the complex correlation among skin lesions, where the features with different scale contexts are efficiently integrated to form a more robust representation. Extensive experiments on two popular benchmarks well demonstrate the effectiveness of the proposed DONet. In particular, our DONet achieves 0.881 and 0.931 dice score on ISIC 2018 and $\text{PH}^2$, respectively. Code will be made public available.

preprint2020arXiv

Localization-aware Channel Pruning for Object Detection

Channel pruning is one of the important methods for deep model compression. Most of existing pruning methods mainly focus on classification. Few of them conduct systematic research on object detection. However, object detection is different from classification, which requires not only semantic information but also localization information. In this paper, based on discrimination-aware channel pruning (DCP) which is state-of-the-art pruning method for classification, we propose a localization-aware auxiliary network to find out the channels with key information for classification and regression so that we can conduct channel pruning directly for object detection, which saves lots of time and computing resources. In order to capture the localization information, we first design the auxiliary network with a contextual ROIAlign layer which can obtain precise localization information of the default boxes by pixel alignment and enlarges the receptive fields of the default boxes when pruning shallow layers. Then, we construct a loss function for object detection task which tends to keep the channels that contain the key information for classification and regression. Extensive experiments demonstrate the effectiveness of our method. On MS COCO, we prune 70\% parameters of the SSD based on ResNet-50 with modest accuracy drop, which outperforms the-state-of-art method.

preprint2020arXiv

Prediction of an Extended Ferroelectric Clathrate

Using first-principles calculations, we predict a lightweight room-temperature ferroelectric carbon-boron framework in a host/guest clathrate structure. This ferroelectric clathrate, with composition ScB$_3$C$_3$, exhibits high polarization density and low mass density compared with widely used commercial ferroelectrics. Molecular dynamics simulations show spontaneous polarization with a moderate above-room-temperature T$_c$ of $\sim$370 K, which implies large susceptibility and possibly large electrocaloric and piezoelectric constants at room temperature. Our findings open the possibility for a new class of ferroelectric materials with potential across a broad range of applications.

preprint2019arXiv

Realization of metallic state in 1T-TaS2 with persisting long-range order of charge density wave

Metallization of 1T-TaS2 is generally initiated at the domain boundary of charge density wave (CDW), at the expense of its long-range order. However, we demonstrate in this study that the metallization of 1T-TaS2 can be also realized without breaking the long-range CDW order upon surface alkali doping. By using scanning tunneling microscopy, we find the long-range CDW order is always persisting, and the metallization is instead associated with additional in-gap excitations. Interestingly, the in-gap excitation is near the top of the lower Hubbard band, in contrast to a conventional electron-doped Mott insulator where it is beneath the upper Hubbard band. In combination with the numerical calculations, we suggest that the appearance of the in-gap excitations near the lower Hubbard band is mainly due to the effectively reduced on-site Coulomb energy by the adsorbed alkali ions.

preprint2015arXiv

A fingerprint based metric for measuring similarities of crystalline structures

Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell we introduce crystal fingerprints that can be calculated easily and allow to define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method is an useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms and high-throughput screenings.

preprint2013arXiv

Finite-State Markov Modeling of Leaky Waveguide Channels in Communication-based Train Control (CBTC) Systems

Leaky waveguide has been adopted in communication based train control (CBTC) systems, as it can significantly enhance railway network efficiency, safety and capacity. Since CBTC systems have high requirements for the train ground communications, modeling the leaky waveguide channels is very important to design the wireless networks and evaluate the performance of CBTC systems. In the letter, we develop a finite-state Markov channel (FSMC) model for leaky waveguide channels in CBTC systems based on real field channel measurements obtained from a business operating subway line. The proposed FSMC channel model takes train locations into account to have a more accurate channel model. The overall leaky waveguide is divided into intervals, and an FSMC model is applied in each interval. The accuracy of the proposed FSMC model is illustrated by the simulation results generated from the model and the real field measurement results.

preprint2013arXiv

Finite-State Markov Modeling of Tunnel Channels in Communication-based Train Control (CBTC) Systems

Communication-based train control (CBTC) is gradually adopted in urban rail transit systems, as it can significantly enhance railway network efficiency, safety and capacity. Since CBTC systems are mostly deployed in underground tunnels and trains move in high speed, building a train-ground wireless communication system for CBTC is a challenging task. Modeling the tunnel channels is very important to design and evaluate the performance of CBTC systems. Most of existing works on channel modeling do not consider the unique characteristics in CBTC systems, such as high mobility speed, deterministic moving direction, and accurate train location information. In this paper, we develop a finite state Markov channel (FSMC) model for tunnel channels in CBTC systems. The proposed FSMC model is based on real field CBTC channel measurements obtained from a business operating subway line. Unlike most existing channel models, which are not related to specific locations, the proposed FSMC channel model takes train locations into account to have a more accurate channel model. The distance between the transmitter and the receiver is divided into intervals, and an FSMC model is applied in each interval. The accuracy of the proposed FSMC model is illustrated by the simulation results generated from the model and the real field measurement results.

preprint2013arXiv

Xenon Reacts with Iron at the Conditions of the Earth's Core

Studies of the Earth's atmosphere have shown that more than 90% of xenon (Xe) is depleted compared with its abundance in chondritic meteorites. This long-standing missing Xe paradox has become the subject of considerable interest and several models for a Xe reservoir have been proposed. Whether the missing Xe is hiding in the Earth's core has remained a long unanswered question. The key to address this issue lies in the reactivity of Xe with iron (Fe, the main constituent of the Earth's core), which has been denied by earlier studies. Here we report on the first evidence of the chemical reaction of Xe and Fe at the conditions of the Earth's core, predicted through first-principles calculations and unbiased structure searching techniques. We find that Xe and Fe form a stable, inter-metallic compound of XeFe3, adopting a Cu3Au-type face-centered cubic structure above 183 GPa and at 4470 K. As the result of a Xe -> Fe charge transfer, Xe loses its chemical inertness by opening up the filled 5p electron shell and functioning as a 5p-like element, whilst Fe is unusually negatively charged, acting as an oxidant rather than a reductant as usual. Our work establishes that the Earth's core is a natural reservoir for Xe storage, and possibly provides the key to unlocking the missing Xe paradox.

preprint2012arXiv

CALYPSO: a method for crystal structure prediction

We have developed a software package CALYPSO (Crystal structure AnaLYsis by Particle Swarm Optimization) to predict the energetically stable/metastable crystal structures of materials at given chemical compositions and external conditions (e.g., pressure). The CALYPSO method is based on several major techniques (e.g. particle-swarm optimization algorithm, symmetry constraints on structural generation, bond characterization matrix on elimination of similar structures, partial random structures per generation on enhancing structural diversity, and penalty function, etc) for global structural minimization from scratch. All of these techniques have been demonstrated to be critical to the prediction of global stable structure. We have implemented these techniques into the CALYPSO code. Testing of the code on many known and unknown systems shows high efficiency and high successful rate of this CALYPSO method [Wang et al., Phys. Rev. B 82 (2010) 094116][1]. In this paper, we focus on descriptions of the implementation of CALYPSO code and why it works.

preprint2012arXiv

Room-Temperature Structures of Solid Hydrogen at High Pressures

By employing first-principles metadynamics simulations, we explore the 300 K structures of solid hydrogen over the pressure range 150-300 GPa. At 200 GPa, we find the ambient-pressure disordered hexagonal close-packed (hcp) phase transited into an insulating partially ordered hcp phase (po-hcp), a mixture of ordered graphene-like H2 layers and the other layers of weakly coupled, disordered H2 molecules. Within this phase, hydrogen remains in paired states with creation of shorter intra-molecular bonds, which are responsible for the very high experimental Raman peak above 4000 cm-1. At 275 GPa, our simulations predicted a transformation from po-hcp into the ordered molecular metallic Cmca phase (4 molecules/cell) that was previously proposed to be stable only above 400 GPa. Gibbs free energy calculations at 300 K confirmed the energetic stabilities of the po-hcp and metallic Cmca phases over all known structures at 220-242 GPa and >242 GPa, respectively. Our simulations highlighted the major role played by temperature in tuning the phase stabilities and provided theoretical support for claimed metallization of solid hydrogen below 300 GPa at 300 K.

preprint2012arXiv

Spiral Chain O4 Form of Dense Oxygen

Oxygen is in many ways a unique element: the only known diatomic molecular magnet and the capability of stabilization of the hitherto unexpected O8 cluster structure in its solid form at high pressure. Molecular dissociations upon compression as one of the fundamental problems were reported for other diatomic solids (e.g., H2, I2, Br2, and N2), but it remains elusive for solid oxygen, making oxygen an intractable system. We here report the theoretical prediction on the dissociation of molecular oxygen into a polymeric spiral chain O4 structure (θ-O4) by using first-principles calypso method on crystal structure prediction. The θ-O4 stabilizes above 2 TPa and has been observed as the third high pressure phase of sulfur (S-III). We find that the molecular O8 phase remains extremely stable in a large pressure range of 0.008 - 2 TPa, whose breakdown is driven by the pressure-induced instability of a transverse acoustic phonon mode at zone boundary, leading to the ultimate formation of θ-O4. Remarkably, stabilization of θ-O4 turns oxygen from a superconductor into an insulator with a wide band gap (approximately 5.9 eV) originating from the sp3-like hybridized orbitals of oxygen and the localization of valence electrons. (This is a pre-print version of the following article: Li Zhu et al, Spiral chain O4 form of dense oxygen, Proc. Natl. Acad. Sci. U.S.A. (2011), doi: 10.1073/pnas.1119375109, which has been published online at http://www.pnas.org/content/early/2011/12/27/1119375109 .)

preprint2010arXiv

Crystal Structure Prediction via Particle Swarm Optimization

We have developed a powerful method for crystal structure prediction from "scratch" through particle swarm optimization (PSO) algorithm within the evolutionary scheme. PSO technique is dramatically different with the genetic algorithm and has apparently avoided the use of evolution operators (e.g., crossover and mutation). The approach is based on a highly efficient global minimization of free energy surfaces merging total-energy calculations via PSO technique and requires only chemical compositions for a given compound to predict stable or metastable structures at given external conditions (e.g., pressure). A particularly devised geometrical structure factor method which allows the elimination of similar structures during structure evolution was implemented to enhance the structure search efficiency. The application of designed variable unit cell size technique has greatly reduced the computational cost. Moreover, the symmetry constraint imposed in the structure generation enables the realization of diverse structures, leads to significantly reduced search space and optimization variables, and thus fastens the global structural convergence. The PSO algorithm has been successfully applied to the prediction of many known systems (e.g., elemental, binary and ternary compounds) with various chemical bonding environments (e.g., metallic, ionic, and covalent bonding). The remarkable success rate demonstrates the reliability of this methodology and illustrates the great promise of PSO as a major technique on crystal structure determination.

Li Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

SoK: Web3 RegTech for Cryptocurrency VASP AML/CFT Compliance

A filtering technique for the matrix power series being near-sparse

A new stable and avoiding inversion iteration for computing matrix square root

Progressive Glass Segmentation

ReGO: Reference-Guided Outpainting for Scenery Image

Sketch-Guided Scenery Image Outpainting

DONet: Dual Objective Networks for Skin Lesion Segmentation

Localization-aware Channel Pruning for Object Detection

Prediction of an Extended Ferroelectric Clathrate

Realization of metallic state in 1T-TaS2 with persisting long-range order of charge density wave

A fingerprint based metric for measuring similarities of crystalline structures

Finite-State Markov Modeling of Leaky Waveguide Channels in Communication-based Train Control (CBTC) Systems

Finite-State Markov Modeling of Tunnel Channels in Communication-based Train Control (CBTC) Systems

Xenon Reacts with Iron at the Conditions of the Earth's Core

CALYPSO: a method for crystal structure prediction

Room-Temperature Structures of Solid Hydrogen at High Pressures

Spiral Chain O4 Form of Dense Oxygen

Crystal Structure Prediction via Particle Swarm Optimization