Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

preprint2025arXiv

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Models (VLMs) for reasoning about object locations, which limits the applications in scenarios with undefined or unforeseen targets. To address this problem, we present OpenGround, a novel zero-shot framework for open-world 3D visual grounding. Central to OpenGround is the Active Cognition-based Reasoning (ACR) module, which is designed to overcome the fundamental limitation of pre-defined OLTs by progressively augmenting the cognitive scope of VLMs. The ACR module performs human-like perception of the target via a cognitive task chain and actively reasons about contextually relevant objects, thereby extending VLM cognition through a dynamically updated OLT. This allows OpenGround to function with both pre-defined and open-world categories. We also propose a new dataset named OpenTarget, which contains over 7000 object-description pairs to evaluate our method in open-world scenarios. Extensive experiments demonstrate that OpenGround achieves competitive performance on Nr3D, state-of-the-art on ScanRefer, and delivers a substantial 17.6% improvement on OpenTarget. Project Page at https://why-102.github.io/openground.io/.

preprint2024arXiv

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train

Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation. To date, a foundation model for endoscopic video analysis is still lacking. In this paper, we propose Endo-FM, a foundation model specifically developed using massive endoscopic video data. First, we build a video transformer, which captures both local and global long-range dependencies across spatial and temporal dimensions. Second, we pre-train our transformer model using global and local views via a self-supervised manner, aiming to make it robust to spatial-temporal variations and discriminative across different scenes. To develop the foundation model, we construct a large-scale endoscopy video dataset by combining 9 publicly available datasets and a privately collected dataset from Baoshan Branch of Renji Hospital in Shanghai, China. Our dataset overall consists of over 33K video clips with up to 5 million frames, encompassing various protocols, target organs, and disease types. Our pre-trained Endo-FM can be easily adopted for a given downstream task via fine-tuning by serving as the backbone. With experiments on 3 different types of downstream tasks, including classification, segmentation, and detection, our Endo-FM surpasses the current state-of-the-art (SOTA) self-supervised pre-training and adapter-based transfer learning methods by a significant margin, such as VCL (3.1% F1, 4.8% Dice, and 5.5% F1 for classification, segmentation, and detection) and ST-Adapter (5.9% F1, 9.6% Dice, and 9.9% F1 for classification, segmentation, and detection). Code, datasets, and models are released at https://github.com/med-air/Endo-FM.

preprint2022arXiv

$C^*$-algebra Net: A New Approach Generalizing Neural Network Parameters to $C^*$-algebra

We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. $C^*$-algebra is a generalization of the space of complex numbers. A typical example is the space of continuous functions on a compact space. This generalization enables us to combine multiple models continuously and use tools for functions such as regression and integration. Consequently, we can learn features of data efficiently and adapt the models to problems continuously. We apply our framework to practical problems such as density estimation and few-shot learning and show that our framework enables us to learn features of data even with a limited number of samples. Our new framework highlights the potential possibility of applying the theory of $C^*$-algebra to general neural network models.

preprint2022arXiv

Deep Learning for Wireless Dynamics

This paper aims to predict radio channel variations over time by deep learning from channel observations without knowledge of the underlying channel dynamics. In next-generation wideband cellular systems, multicarrier transmission for higher data rate leads to the high-resolution predicting problem. By leveraging recent advances of deep learning in high-resolution image processing, we propose a purely data-driven deep learning (DL) approach to predicting high-resolution temporal evolution of wideband radio channels. In order to investigate the effect of architectural design choices, we develop and study three deep learning prediction models, namely, baseline, image completion, and next-frame prediction models using UNet. Numerical results show that the proposed DL approach achieves a 52% lower prediction error than the traditional approach based on the Kalman filter (KF) in mean absolute errors. To quantify impact of channel aging and prediction on precoding performance, we also evaluate the performance degradation due to outdated and predicted channel state information (CSI) compared to perfect CSI. Our simulations show that the proposed DL approach can reduce the performance loss due to channel aging by 71% through adapting precoding vector to changes in radio channel while the traditional KF approach only shows a 27% reduction.

preprint2022arXiv

Distributed On-Sensor Compute System for AR/VR Devices: A Semi-Analytical Simulation Framework for Power Estimation

Augmented Reality/Virtual Reality (AR/VR) glasses are widely foreseen as the next generation computing platform. AR/VR glasses are a complex "system of systems" which must satisfy stringent form factor, computing-, power- and thermal- requirements. In this paper, we will show that a novel distributed on-sensor compute architecture, coupled with new semiconductor technologies (such as dense 3D-IC interconnects and Spin-Transfer Torque Magneto Random Access Memory, STT-MRAM) and, most importantly, a full hardware-software co-optimization are the solutions to achieve attractive and socially acceptable AR/VR glasses. To this end, we developed a semi-analytical simulation framework to estimate the power consumption of novel AR/VR distributed on-sensor computing architectures. The model allows the optimization of the main technological features of the system modules, as well as the computer-vision algorithm partition strategy across the distributed compute architecture. We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system, with the additional benefits in terms of latency and privacy.

preprint2022arXiv

Experimental test of Tsirelson's bound with a single photonic qubit

For many protocols, quantum strategies have advantages compared with their classical counter-partners, and these advantages have attracted many interests and applications. One of the famous examples is the Clauser-Horne-Shimony-Holt (CHSH) game, which recasts Bell's theorem~\cite{2} into the framework of a game. In the CHSH game, two space-like separated players, Alice and Bob are each assigned a classical bit $a$ and $b$ respectively. Then they return bits $x$ and $y$ according to some pre-agreed strategies. They will win the game when $x\oplus y= a\cdot b$. In the game, if the players use the classical strategies, the optimal success probability $w(\text{CHSH})=0.75$.However, if they add some quantum resources, the success probability will increase and up to maximal value $cos^2(π/8)$, which is know as the Tsirelson's bound. Moreover, Popescu and Rohrlich noted that the perfect success probability $1$ can also be achieved in a more general theory without violating the no-signaling assumption

preprint2022arXiv

Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients

Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.

preprint2022arXiv

Long-Range Order and Quantum Criticality in Antiferromagnetic Chains with Long-Range Staggered Interactions

We study quantum phase transitions in Heisenberg antiferromagnetic chains with a staggered power-law decaying long-range interactions. Employing the density-matrix renormalization group (DMRG) algorithm and the fidelity susceptibility as the criticality measure, we establish more accurate values of quantum critical points than the results obtained from the spin-wave approximation, quantum Monte Carlo and DMRG in literatures. The deviation is especially evident for strong long-range interactions. We extend isotropic long-range interactions to the anisotropic cases and find that kaleidoscope of quantum phases emerge from the interplay of anisotropy of the long-range exchange interaction and symmetry breaking. We demonstrate nonfrustrating long-range interactions induce the true long-range order in Heisenberg antiferromagnetic chains with a continuous symmetry breaking, lifting the restrictions imposed by the Mermin-Wagner theorem.

preprint2022arXiv

Optical Properties of C$-$rich ($^{12}$C, SiC and FeC) Dust Layered Structure of Massive Stars

The composition and structure of interstellar dust are important and complex for the study of the evolution of stars and the \textbf{interstellar medium} (ISM). However, there is a lack of corresponding experimental data and model theories. By theoretical calculations based on ab-initio method, we have predicted and geometry optimized the structures of Carbon-rich (C-rich) dusts, carbon ($^{12}$C), iron carbide (FeC), silicon carbide (SiC), even silicon ($^{28}$Si), iron ($^{56}$Fe), and investigated the optical absorption coefficients and emission coefficients of these materials in 0D (zero$-$dimensional), 1D, and 2D nanostructures. Comparing the \textbf{nebular spectra} of the supernovae (SN) with the coefficient of dust, we find that the optical absorption coefficient of the 2D $^{12}$C, $^{28}$Si, $^{56}$Fe, SiC and FeC structure corresponds to the absorption peak displayed in the infrared band (5$-$8) $μ$$m$ of the spectrum at 7554 days after the SN1987A explosion. And it also corresponds to the spectrum of 535 days after the explosion of SN2018bsz, when the wavelength in the range of (0.2$-$0.8) and (3$-$10) $μ$$m$. Nevertheless, 2D SiC and FeC corresponds to the spectrum of 844 days after the explosion of SN2010jl, when the wavelength is within (0.08$-$10) $μ$$m$. Therefore, FeC and SiC may be the second type of dust in SN1987A corresponding to infrared band (5$-$8) $μ$$m$ of dust and may be in the ejecta of SN2010jl and SN2018bsz.

preprint2022arXiv

Scattering-assisted and logic-controllable WGM laser in liquid crystal micropillar

Whispering gallery mode (WGM) microcavities can efficiently store and manipulate light with strong light confinement and long photon lifetime, while coupling light into and from WGMs is intrinsically hindered by their unique feature of rotational symmetry. Here, a scattering-assisted liquid crystal (LC) micropillar WGM laser is proposed. WGM lasing at the surface of the micropillar is obviously enhanced by fluorescence scattering in the core of the micropillar. Besides, weak scattering of LC molecules also builds efficient coupling channels between the laser modes and the axial transmission modes of the micropillar-based waveguide, providing an all-in-one liquid WGM laser with functions of self-seeding and self-guiding. Furthermore, based on the hysteresis characteristics of the electrically anchored LC molecules under the interaction of thermal force, an erasable read-write liquid memory device is proposed, paving the way for the application of logic-controllable WGM lasers in optical storage and optical control.

preprint2022arXiv

Some multivariable Rado numbers

The Rado number of an equation is a Ramsey-theoretic quantity associated to the equation. Let $\mathcal{E}$ be a linear equation. Denote by $\operatorname{R}_r(\mathcal{E})$ the minimal integer, if it exists, such that any $r$-coloring of $[1,\operatorname{R}_r(\mathcal{E})]$ must admit a monochromatic solution to $\mathcal{E}$. In this paper, we give upper and lower bounds for the Rado number of $\sum_{i=1}^{m-2}x_i+kx_{m-1}=\ell x_{m}$, and some exact values are also given. Furthermore, we derive some results for the cases that $\ell=m=4$ and $m=5, \ell=k+i \ (1\leq i\leq 5)$. As a generalization, the \emph{$r$-color Rado numbers} for linear equations $\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r$ is defined as the minimal integer, if it exists, such that any $r$-coloring of $[1,\operatorname{R}_r(\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r)]$ must admit a monochromatic solution to some $\mathcal{E}_i$, where $1\leq i\leq r$. A lower bound for $\operatorname{R}_r(\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r)$ and the exact values of $\operatorname{R}_2(x+y=z,\ell x=y)=5k$ and $\operatorname{R}_2(x+y=z, x+a=y)$ was given by Lovász Local Lemma.

preprint2021arXiv

A Cross Channel Context Model for Latents in Deep Image Compression

This paper presents a cross channel context model for latents in deep image compression. Generally, deep image compression is based on an autoencoder framework, which transforms the original image to latents at the encoder and recovers the reconstructed image from the quantized latents at the decoder. The transform is usually combined with an entropy model, which estimates the probability distribution of the quantized latents for arithmetic coding. Currently, joint autoregressive and hierarchical prior entropy models are widely adopted to capture both the global contexts from the hyper latents and the local contexts from the quantized latent elements. For the local contexts, the widely adopted 2D mask convolution can only capture the spatial context. However, we observe that there are strong correlations between different channels in the latents. To utilize the cross channel correlations, we propose to divide the latents into several groups according to channel index and code the groups one by one, where previously coded groups are utilized to provide cross channel context for the current group. The proposed cross channel context model is combined with the joint autoregressive and hierarchical prior entropy model. Experimental results show that, using PSNR as the distortion metric, the combined model achieves BD-rate reductions of 6.30% and 6.31% over the baseline entropy model, and 2.50% and 2.20% over the latest video coding standard Versatile Video Coding (VVC) for the Kodak and CVPR CLIC2020 professional dataset, respectively. In addition, when optimized for the MS-SSIM metric, our approach generates visually more pleasant reconstructed images.

preprint2021arXiv

Homogeneous Linear Ion Crystal in a Hybrid Potential

We investigate the properties of a linear ion crystal in a combination of quadratic and quartic potentials. Both the discrete and the continuous model are employed to explore the homogeneity of a linear ion crystal by controlling the proportional parameter between the quadratic and quartic components. It is found that a uniform ion distribution in such a hybrid potential can be made larger than that in a purely quadratic or quartic potential. The zigzag transition is also investigated in the hybrid potential. Its critical condition and phase diagram are determined numerically, which agrees well with previous theoretical and experimental results. This paves the way for experimental investigation of phase transition in a large linear coulomb crystal.

preprint2020arXiv

Chirality-Selective Transport of Benzene Molecules on Carbon Nanotubes

Using molecular dynamics simulations, we predict an effect of chirality on the conduction of benzene molecules along the surface of carbon nanotubes (CNTs) subjected to a thermal gradient. The group drift velocity of the molecules is found to be maximal in the case of an armchair CNT, and to decrease with decreasing chiral angle. This chirality effect on thermodiffusion is induced by a variation in the optimized paths of molecules that change with different electronic overlap at the interface. The mechanism for the thermophoretic transport is identified to be coupled with a gradient of adsorbate-substrate interaction energy, which originates from the anharmonic nature of the van der Waals potential.

preprint2020arXiv

Contrastive Cross-site Learning with Redesigned Net for COVID-19 CT Classification

The pandemic of coronavirus disease 2019 (COVID-19) has lead to a global public health crisis spreading hundreds of countries. With the continuous growth of new infections, developing automated tools for COVID-19 identification with CT image is highly desired to assist the clinical diagnosis and reduce the tedious workload of image interpretation. To enlarge the datasets for developing machine learning methods, it is essentially helpful to aggregate the cases from different medical systems for learning robust and generalizable models. This paper proposes a novel joint learning framework to perform accurate COVID-19 identification by effectively learning with heterogeneous datasets with distribution discrepancy. We build a powerful backbone by redesigning the recently proposed COVID-Net in aspects of network architecture and learning strategy to improve the prediction accuracy and learning efficiency. On top of our improved backbone, we further explicitly tackle the cross-site domain shift by conducting separate feature normalization in latent space. Moreover, we propose to use a contrastive training objective to enhance the domain invariance of semantic embeddings for boosting the classification performance on each dataset. We develop and evaluate our method with two public large-scale COVID-19 diagnosis datasets made up of CT images. Extensive experiments show that our approach consistently improves the performances on both datasets, outperforming the original COVID-Net trained on each dataset by 12.16% and 14.23% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.

preprint2020arXiv

Design of a novel monolithic parabolic-mirror ion-trap to precisely align the RF null point with the optical focus

We propose a novel ion trap design with the high collection efficiency parabolic-mirror integrated with the ion trap electrodes. This design has three radio frequency (RF) electrodes and eight direct current(DC) compensation electrodes. By carefully adjusting three RF voltages, the parabolic mirror focus can be made precisely coincident with the RF null point. Thus, the aberration and the ion micromotion can be minimized at the same time. This monolithic design can significantly improve the ion-ion entanglement generation speed by extending the photon collecting solid angle beyond $90\%\cdot4π$. Further analysis of the trapping setup shows that the RF voltage variation method relexes machining accuracy to a broad range. This design is expected to be a robust scheme for trapping ion to speed entanglement network node.

preprint2020arXiv

Formation of interstellar complex polycyclic aromatic hydrocarbons: Insights from molecular dynamics simulations of dehydrogenated benzene

Small organic molecules are thought to provide building blocks for the formation of complex interstellar polycyclic aromatic hydrocarbons (PAHs). However, the underlying chemical mechanisms remain unclear, particularly concerning the role of interstellar dust. Using molecular dynamics, we simulate the chemical reaction between dehydrogenated benzene molecules in the gas phase or on the surface of an onion-like carbon nanoparticle (NP). The reaction leads to the formation of PAHs of complex structures. The size of the formed molecules is found to roughly increase with increasing temperature up to 800 K, and to be correlated with the level of dehydrogenation. Morphology analysis features the formation of large rings that contain up to 32 carbon atom at high temperature. Density functional theory (DFT) calculations are performed to search the fundamental energetic reaction pathways. The DFT results quantitatively confirm the correlation between the reactivity and the dehydrogenation level, and the formation of stable C-8 rings. Moreover, the nanostructures formed on the NP surface point to a possible layer-by-layer formation mechanism for interstellar fullerene and carbon onions.

preprint2020arXiv

Gallai Ramsey number for double stars

Given a graph $G$ and a positive integer $k$, the \emph{Gallai-Ramsey number} is defined to be the minimum number of vertices $n$ such that any $k$-edge coloring of $K_n$ contains either a rainbow (all different colored) copy of $G$ or a monochromatic copy of $G$. In this paper, we obtain general upper and lower bounds on the Gallai-Ramsey numbers for double stars $S(n,m)$, where $S(n,m)$ is the graph obtained from the union of two stars $K_{1,n}$ and $K_{1,m}$ by adding an edge between their centers. We also provide the sharp result in some cases.

preprint2020arXiv

Gallai-Ramsey number for the union of stars

Given a graph $G$ and a positive integer $k$, define the \emph{Gallai-Ramsey number} to be the minimum number of vertices $n$ such that any $k$-edge coloring of the complete graph $K_n$ contains either a rainbow (all different colored) triangle or a monochromatic copy of $G$. In this paper, we obtain the exact value of the Gallai-Ramsey numbers for the union of two stars in many cases and bounds in other cases. This work represents the first class of disconnected graphs to be considered as the desired monochromatic subgraph.

preprint2020arXiv

Neural Architecture Search on Acoustic Scene Classification

Convolutional neural networks are widely adopted in Acoustic Scene Classification (ASC) tasks, but they generally carry a heavy computational burden. In this work, we propose a lightweight yet high-performing baseline network inspired by MobileNetV2, which replaces square convolutional kernels with unidirectional ones to extract features alternately in temporal and frequency dimensions. Furthermore, we explore a dynamic architecture space built on the basis of the proposed baseline with the recent Neural Architecture Search (NAS) paradigm, which first trains a supernet that incorporates all candidate networks and then applies a well-known evolutionary algorithm NSGA-II to discover more efficient networks with higher accuracy and lower computational cost. Experimental results demonstrate that our searched network is competent in ASC tasks, which achieves 90.3% F1-score on the DCASE2018 task 5 evaluation set, marking a new state-of-the-art performance while saving 25% of FLOPs compared to our baseline network.

preprint2020arXiv

Selective Conduction of Organic Molecules via Free-Standing Graphene

A race is held between ten species of organic gas molecules on a graphene substrate driven by thermal gradients via molecular dynamics. Fast conduction of the molecules is observed with selectivity for aromatic compounds. This selectivity stems from the fact that the planar structure of the aromatic molecule helps keep a shorter distance to the substrate, which is the key to the driving force at the gas-solid interface. The drift velocity monotonically increases with decreasing molecule density, with no ballistic transport observable even for a single molecule. A non-linear regime is discovered for the conduction of benzene molecules under large thermal gradients. At low temperature, molecules formed aggregation and move collectively along specific path in the graphene substrate.