Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

CARD: Coarse-to-fine Autoregressive Modeling with Radix-based Decomposition for Transferable Free Energy Estimation

Estimating free energy differences quantifies thermodynamic preferences in molecular interactions, which is central to chemistry and drug discovery. Despite fruitful progress, existing methods still face key limitations: classical computational approaches remain prohibitively expensive due to their reliance on extensive molecular dynamics simulations, while deep learning-based methods are constrained by either less-expressive generative models or input dimensions tied to a specific system, resulting in negligible generalization. To address these challenges, we propose CARD, a generative framework that employs a novel radix-based decomposition to bijectively convert 3D coordinates into mixed discrete-continuous sequences, enabling coarse-to-fine autoregressive modeling with enhanced expressiveness. Notably, the model corresponds to a distribution with zero free energy, serving as a proposal for absolute free energy computation of arbitrary systems without relying on alchemical pathways. Experiments across diverse tasks demonstrate that CARD matches the accuracy of classical computational methods on unseen systems with diverse topologies, while achieving an approximately 40-fold speedup in inference.

preprint2025arXiv

ChartBlender: An Interactive System for Authoring and Synchronizing Visualization Charts in Video

Embedding data visualizations in video can enhance the communication of complex information. However, this process is often labor-intensive, requiring designers to adjust visualizations frame by frame manually. In this work, we present ChartBlender, a novel system that streamlines this process by enabling users to create data visualizations, embed them seamlessly into video scenes, and automatically synchronize them with both camera motion and moving objects. Particularly, ChartBlender incorporates a tracking algorithm that supports both object and camera tracking, ensuring robust alignment of visualizations with dynamic video content. To maintain visual clarity and aesthetic coherence, we also explore the design space of video-suited visualizations and develop a library of customizable templates optimized for video embedding. We evaluate \oursName\ChartBlender through two controlled experiments and expert interviews with five domain experts. Results show that our system enables accurate synchronization and accelerates the production of data-driven videos.

preprint2025arXiv

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

Recent research shows that emotions can enhance users' cognition and influence information communication. While research on visual emotion analysis is extensive, limited work has been done on helping users generate emotionally rich image content. Existing work on emotional image generation relies on discrete emotion categories, making it challenging to capture complex and subtle emotional nuances accurately. Additionally, these methods struggle to control the specific content of generated images based on text prompts. In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values. Specifically, we propose a novel emotion-embedding mapping network that embeds Valence-Arousal values into textual features, enabling the capture of specific emotions in alignment with intended input prompts. Additionally, we introduce a loss function to enhance emotion expression. The experimental results show that our method effectively generates images representing specific emotions with the desired content and outperforms existing techniques.

preprint2025arXiv

Improved Bounds for Private and Robust Alignment

In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels subject to privacy constraints and/or adversarial corruption, and analyze two distinct interplays between them: privacy-first and corruption-first. For the privacy-only setting, we show that log loss with an MLE-style algorithm achieves near-optimal rates, in contrast to conventional wisdom. For the joint privacy-and-corruption setting, we first demonstrate that existing offline algorithms in fact provide stronger guarantees -- simultaneously in terms of corruption level and privacy parameters -- than previously known, which further yields improved bounds in the corruption-only regime. In addition, we also present the first set of results for private and robust online alignment. Our results are enabled by new uniform convergence guarantees for log loss and square loss under privacy and corruption, which we believe have broad applicability across learning theory and statistics.

preprint2022arXiv

A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and Sparse data

High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of big data-related applications. Latent factor analysis (LFA) is a typical representation learning method that extracts useful yet latent knowledge from HiDS matrices via low-rank approximation. Current LFA-based models mainly focus on a single-metric representation, where the representation strategy designed for the approximation Loss function, is fixed and exclusive. However, real-world HiDS matrices are commonly heterogeneous and inclusive and have diverse underlying patterns, such that a single-metric representation is most likely to yield inferior performance. Motivated by this, we in this paper propose a multi-metric latent factor (MMLF) model. Its main idea is two-fold: 1) two vector spaces and three Lp-norms are simultaneously employed to develop six variants of LFA model, each of which resides in a unique metric representation space, and 2) all the variants are ensembled with a tailored, self-adaptive weighting strategy. As such, our proposed MMLF enjoys the merits originated from a set of disparate metric spaces all at once, achieving the comprehensive and unbiased representation of HiDS matrices. Theoretical study guarantees that MMLF attains a performance gain. Extensive experiments on eight real-world HiDS datasets, spanning a wide range of industrial and science domains, verify that our MMLF significantly outperforms ten state-of-the-art, shallow and deep counterparts.

preprint2022arXiv

A Systematic Study of Android Non-SDK (Hidden) Service API Security

Android allows apps to communicate with its system services via system service helpers so that these apps can use various functions provided by the system services. Meanwhile, the system services rely on their service helpers to enforce security checks for protection. Unfortunately, the security checks in the service helpers may be bypassed via directly exploiting the non-SDK (hidden) APIs, degrading the stability and posing severe security threats such as privilege escalation, automatic function execution without users' interactions, crashes, and DoS attacks. Google has proposed various approaches to address this problem, e.g., case-by-case fixing the bugs or even proposing a blacklist to block all the non-SDK APIs. However, the developers can still figure out new ways of exploiting these hidden APIs to evade the non-SDKs restrictions. In this paper, we systematically study the vulnerabilities due to the hidden API exploitation and analyze the effectiveness of Google's countermeasures. We aim to answer if there are still vulnerable hidden APIs that can be exploited in the newest Android 12. We develop a static analysis tool called ServiceAudit to automatically mine the inconsistent security enforcement between service helper classes and the hidden service APIs. We apply ServiceAudit to Android 6~12. Our tool discovers 112 vulnerabilities in Android 6 with higher precision than existing approaches. Moreover, in Android 11 and 12, we identify more than 25 hidden APIs with inconsistent protections; however, only one of the vulnerable APIs can lead to severe security problems in Android 11, and none of them work on Android 12.

preprint2022arXiv

An Online Sparse Streaming Feature Selection Algorithm

Online streaming feature selection (OSFS), which conducts feature selection in an online manner, plays an important role in dealing with high-dimensional data. In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i.e., how to establish the uncertain relationship between sparse streaming features and labels. Unfortunately, existing OSFS algorithms never consider such uncertain relationship. To fill this gap, we in this paper propose an online sparse streaming feature selection with uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection. In the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms on six real datasets. The results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS.

preprint2022arXiv

Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system. To perform highly accurate representation learning on it is of great significance owing to the great desire of extracting latent knowledge and patterns from it. Latent factor analysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models perform such embeddings on a HiDS matrix directly without exploiting its hidden graph structures, thereby resulting in accuracy loss. To address this issue, this paper proposes a graph-incorporated latent factor analysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden high-order interaction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representa-tion learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.

preprint2022arXiv

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks. In this paper, we propose a novel method to do ILO regularized training differently. Instead of using conventional multitask methods that entail more training overhead, we directly make the intermediate layer output as input to the decoder, that is, our decoder not only accepts the output of the final encoder layer as input, it also takes the output of the encoder ILO as input during training. With the proposed method, as both encoder and decoder are simultaneously "regularized", the network is more sufficiently trained, consistently leading to improved results, over the ILO-based CTC method, as well as over the original attention-based modeling method without the proposed method employed.

preprint2022arXiv

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

Internal Language Model Estimation (ILME) based language model (LM) fusion has been shown significantly improved recognition results over conventional shallow fusion in both intra-domain and cross-domain speech recognition tasks. In this paper, we attempt to apply our ILME method to cross-domain code-switching speech recognition (CSSR) work. Specifically, our curiosity comes from several aspects. First, we are curious about how effective the ILME-based LM fusion is for both intra-domain and cross-domain CSSR tasks. We verify this with or without merging two code-switching domains. More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR. Experimental results on SEAME that is from Southeast Asian and another Chinese Mainland CS data set demonstrate the effectiveness of the proposed ILME-based LM fusion method.

preprint2022arXiv

Multi-peak solutions for singularly perturbed nonlinear Dirichlet problems involving critical growth

We consider the following singularly perturbed elliptic problem \[ - {\varepsilon ^2}Δu + u = f(u){\text{ in }}Ω,{\text{ }}u > 0{\text{ in }}Ω,{\text{ }}u = 0{\text{ on }}\partial Ω, \] where $Ω$ is a domain in ${\mathbb{R}^N}(N \ge 3)$, not necessarily bounded, with boundary $\partial Ω\in {C^2}$ and the nonlinearity $f$ is of critical growth. In this paper, we construct a family of multi-peak solutions to the equation given above which concentrate around any prescribed finite sets of local maxima of the distance function from the boundary $\partial Ω$.

preprint2022arXiv

Online Deep Learning from Doubly-Streaming Data

This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.

preprint2022arXiv

Polarization effects on fluorescence emission of zebrafish neurons using light-sheet microscopy

Light-sheet fluorescence microscopy (LSFM) makes use of a thin plane of light to optically section and image transparent tissues or organisms {\it{in vivo}}, which has the advantages of fast imaging speed and low phototoxicity. In this paper, we have employed light-sheet microscopy to investigate the polarization effects on fluorescence emission of zebrafish neurons via modifying the electric oscillation orientation of the excitation light. The intensity of the fluorescence emission from the excited zebrafish larvae follows a cosine square function with respect to the polarization state of the excitation light and reveals a 40$\%$ higher fluorescence emission when the polarization orientation is orthogonal to the illumination and detection axes. Through registration and subtraction of fluorescence images under different polarization states, we have demonstrated that most of the enhanced fluorescence signals are from the nerve cells rather than the extracellular substance. This provides us a way to distinguish the cell boundaries and observe the organism structures with improved contrast and resolution.

preprint2022arXiv

Sketch-based 3D Shape Modeling from Sparse Point Clouds

3D modeling based on point clouds is an efficient way to reconstruct and create detailed 3D content. However, the geometric procedure may lose accuracy due to high redundancy and the absence of an explicit structure. In this work, we propose a human-in-the-loop sketch-based point cloud reconstruction framework to leverage users cognitive abilities in geometry extraction. We present an interactive drawing interface for 3D model creation from point cloud data with the help of user sketches. We adopt an optimization method in which the user can continuously edit the contours extracted from the obtained 3D model and retrieve the model iteratively. Finally, we verify the proposed user interface for modeling from sparse point clouds. see video here https://www.youtube.com/watch?v=0H19NyXDRJE .

preprint2021arXiv

Reinventing 2D Convolutions for 3D Images

There have been considerable debates over 2D and 3D representation learning on 3D medical images. 2D approaches could benefit from large-scale 2D pretraining, whereas they are generally weak in capturing large 3D contexts. 3D approaches are natively strong in 3D contexts, however few publicly available 3D medical dataset is large and diverse enough for universal 3D pretraining. Even for hybrid (2D + 3D) approaches, the intrinsic disadvantages within the 2D / 3D parts still exist. In this study, we bridge the gap between 2D and 3D convolutions by reinventing the 2D convolutions. We propose ACS (axial-coronal-sagittal) convolutions to perform natively 3D representation learning, while utilizing the pretrained weights on 2D datasets. In ACS convolutions, 2D convolution kernels are split by channel into three parts, and convoluted separately on the three views (axial, coronal and sagittal) of 3D representations. Theoretically, ANY 2D CNN (ResNet, DenseNet, or DeepLab) is able to be converted into a 3D ACS CNN, with pretrained weight of a same parameter size. Extensive experiments on several medical benchmarks (including classification, segmentation and detection tasks) validate the consistent superiority of the pretrained ACS CNNs, over the 2D / 3D CNN counterparts with / without pretraining. Even without pretraining, the ACS convolution can be used as a plug-and-play replacement of standard 3D convolution, with smaller model size and less computation.

preprint2020arXiv

A Method for Vehicle Collision Risk Assessment through Inferring Driver's Braking Actions in Near-Crash Situations

Driving information and data under potential vehicle crashes create opportunities for extensive real-world observations of driver behaviors and relevant factors that significantly influence the driving safety in emergency scenarios. Furthermore, the availability of such data also enhances the collision avoidance systems (CASs) by evaluating driver's actions in near-crash scenarios and providing timely warnings. These applications motivate the need for heuristic tools capable of inferring relationship among driving risk, driver/vehicle characteristics, and road environment. In this paper, we acquired amount of real-world driving data and built a comprehensive dataset, which contains multiple "driver-vehicle-road" attributes. The proposed method works in two steps. In the first step, a variable precision rough set (VPRS) based classification technique is applied to draw a reduced core subset from field driving dataset, which presents the essential attributes set most relevant to driving safety assessment. In the second step, we design a decision strategy by introducing mutual information entropy to quantify the significance of each attribute, then a representative index through accumulation of weighted "driver-vehicle-road" factors is calculated to reflect the driving risk for actual situation. The performance of the proposed method is demonstrated in an offline analysis of the driving data collected in field trials, where the aim is to infer the emergency braking actions in next short term. The results indicate that our proposed model is a good alternative for providing improved warnings in real-time because of its high prediction accuracy and stability.

preprint2020arXiv

Ab Initio Modeling of Phonon-Assisted Relaxation of Electrons and Excitons in Semiconductor Nanocrystals for Multiexciton Generation

Electron-phonon and exciton-phonon interactions in nanoclusters are formulated and computed under the framework of GW-BSE (Bethe-Salpeter equation) approach. The phonon effect is modeled with the two-particle representation for the first time. The nonradiative relaxation rates of electrons and excitons are calculated. It is uncovered that both single-phonon relaxation and multiple-phonon relaxation are significant in nanocrystals, and correspond to two types of physical processes that have totally different spectral lineshapes. Furthermore, the multiple-phonon relaxation always occurs and its rates are comparable to the corresponding single-phonon relaxation rates for both electrons and excitons in the system studied (Si46). The inelastic scattering rates of electrons and excitons are also calculated based on many-body Green function theory. For the electronic states in Si46, the inelastic scattering decay is predicted to be a primary decay mechanism for multiexciton relaxation, and nonradiative relaxation rates are larger than inelastic scattering rates for most excitonic states in Si46.

preprint2020arXiv

Cavity-Enhanced Photon Emission from a Single Germanium-Vacancy Center in a Diamond Membrane

The nitrogen-vacancy center in diamond has been explored extensively as a light-matter interface for quantum information applications, however it is limited by low coherent photon emission and spectral instability. Here, we present a promising interface based on an alternate defect with superior optical properties (the germanium-vacancy) coupled to a finesse $\approx11{,}000$ fiber cavity, resulting in a $31^{+11}_{-15}$-fold increase in the spectral density of emission. This work sets the stage for cryogenic experiments, where we predict a measurable increase in the spontaneous emission rate.

preprint2020arXiv

Generating Fundus Fluorescence Angiography Images from Structure Fundus Images Using Generative Adversarial Networks

Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients. To help physicians reduce the potential risks of diagnosis, an image translation method is adopted. In this work, we proposed a conditional generative adversarial network(GAN) - based method to directly learn the mapping relationship between structure fundus images and fundus fluorescence angiography images. Moreover, local saliency maps, which define each pixel's importance, are used to define a novel saliency loss in the GAN cost function. This facilitates more accurate learning of small-vessel and fluorescein leakage features.