Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models

Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. While polarization imaging captures polarimetric physical parameters that resolve these ambiguities, existing methods are constrained by fixed-format outputs and remain isolated from open-ended reasoning. To bridge this semantic-physical gap, we introduce PolarVLM, the first multimodal framework integrating polarimetric physical parameters into VLMs. By employing a dual-stream architecture and a progressive two-stage training strategy, PolarVLM effectively prevents physical misinterpretations while preserving general visual abilities. Complementing our architecture, we construct PolarVQA, the first benchmark for polarization-aware VQA, featuring 75K physics-grounded instruction-tuning pairs targeting reflective and transparent scenes. Experiments show that PolarVLM surpasses the RGB baseline by 25.4% overall across five evaluation tasks, with remarkable gains of 26.6% in reflection recognition and 34.0% in glass counting, successfully unlocking physics-aware semantic understanding.

preprint2023arXiv

ECSAS: Exploring Critical Scenarios from Action Sequence in Autonomous Driving

Critical scenario generation requires the ability of sampling critical combinations from the infinite parameter space in the logic scenario. Existing solutions aim to explore the correlation of action parameters in the initial scenario rather than action sequences. How to model action sequences so that one can further consider the effects of different action parameters in the scenario is the bottleneck of the problem. In this paper, we attack the problem by proposing the ECSAS framework. Specifically, we first propose a description language, BTScenario, allowing us to model action sequences of the scenarios. We then use reinforcement learning to search for combinations of critical action parameters. To increase efficiency, we further propose several optimizations, including action masking and replay buffer. We have implemented ECSAS, and experimental results show that it is more efficient than native approaches such as random and combination testing in various nontrivial scenarios.

preprint2022arXiv

A New Probabilistic V-Net Model with Hierarchical Spatial Feature Transform for Efficient Abdominal Multi-Organ Segmentation

Accurate and robust abdominal multi-organ segmentation from CT imaging of different modalities is a challenging task due to complex inter- and intra-organ shape and appearance variations among abdominal organs. In this paper, we propose a probabilistic multi-organ segmentation network with hierarchical spatial-wise feature modulation to capture flexible organ semantic variants and inject the learnt variants into different scales of feature maps for guiding segmentation. More specifically, we design an input decomposition module via a conditional variational auto-encoder to learn organ-specific distributions on the low dimensional latent space and model richer organ semantic variations that is conditioned on input images.Then by integrating these learned variations into the V-Net decoder hierarchically via spatial feature transformation, which has the ability to convert the variations into conditional Affine transformation parameters for spatial-wise feature maps modulating and guiding the fine-scale segmentation. The proposed method is trained on the publicly available AbdomenCT-1K dataset and evaluated on two other open datasets, i.e., 100 challenging/pathological testing patient cases from AbdomenCT-1K fully-supervised abdominal organ segmentation benchmark and 90 cases from TCIA+&BTCV dataset. Highly competitive or superior quantitative segmentation results have been achieved using these datasets for four abdominal organs of liver, kidney, spleen and pancreas with reported Dice scores improved by 7.3% for kidneys and 9.7% for pancreas, while being ~7 times faster than two strong baseline segmentation methods(nnUNet and CoTr).

preprint2022arXiv

Edge-preserving Near-light Photometric Stereo with Neural Surfaces

This paper presents a near-light photometric stereo method that faithfully preserves sharp depth edges in the 3D reconstruction. Unlike previous methods that rely on finite differentiation for approximating depth partial derivatives and surface normals, we introduce an analytically differentiable neural surface in near-light photometric stereo for avoiding differentiation errors at sharp depth edges, where the depth is represented as a neural function of the image coordinates. By further formulating the Lambertian albedo as a dependent variable resulting from the surface normal and depth, our method is insusceptible to inaccurate depth initialization. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method for detailed shape recovery with edge preservation.

preprint2022arXiv

Improved bounds for randomly colouring simple hypergraphs

We study the problem of sampling almost uniform proper $q$-colourings in $k$-uniform simple hypergraphs with maximum degree $Δ$. For any $δ> 0$, if $k \geq\frac{20(1+δ)}δ$ and $q \geq 100Δ^{\frac{2+δ}{k-4/δ-4}}$, the running time of our algorithm is $\tilde{O}(\mathrm{poly}(Δk)\cdot n^{1.01})$, where $n$ is the number of vertices. Our result requires fewer colours than previous results for general hypergraphs (Jain, Pham, and Voung, 2021; He, Sun, and Wu, 2021), and does not require $Ω(\log n)$ colours unlike the work of Frieze and Anastos (2017).

preprint2022arXiv

Inapproximability of counting hypergraph colourings

Recent developments in approximate counting have made startling progress in developing fast algorithmic methods for approximating the number of solutions to constraint satisfaction problems (CSPs) with large arities, using connections to the Lovasz Local Lemma. Nevertheless, the boundaries of these methods for CSPs with non-Boolean domain are not well-understood. Our goal in this paper is to fill in this gap and obtain strong inapproximability results by studying the prototypical problem in this class of CSPs, hypergraph colourings. More precisely, we focus on the problem of approximately counting $q$-colourings on $K$-uniform hypergraphs with bounded degree $Δ$. An efficient algorithm exists if $Δ\lesssim \frac{q^{K/3-1}}{4^KK^2}$ (Jain, Pham, and Vuong, 2021; He, Sun, and Wu, 2021). Somewhat surprisingly however, a hardness bound is not known even for the easier problem of finding colourings. For the counting problem, the situation is even less clear and there is no evidence of the right constant controlling the growth of the exponent in terms of $K$. To this end, we first establish that for general $q$ computational hardness for finding a colouring on simple/linear hypergraphs occurs at $Δ\gtrsim Kq^K$, almost matching the algorithm from the Lovasz Local Lemma. Our second and main contribution is to obtain a far more refined bound for the counting problem that goes well beyond the hardness of finding a colouring and which we conjecture that is asymptotically tight (up to constant factors). We show in particular that for all even $q\geq 4$ it is NP-hard to approximate the number of colourings when $Δ\gtrsim q^{K/2}$.

preprint2022arXiv

Swendsen-Wang dynamics for the ferromagnetic Ising model with external fields

We study the sampling problem for the ferromagnetic Ising model with consistent external fields, and in particular, Swendsen-Wang dynamics on this model. We introduce a new grand model unifying two closely related models: the subgraph world and the random cluster model. Through this new viewpoint, we show: (1) polynomial mixing time bounds for Swendsen-Wang dynamics and (edge-flipping) Glauber dynamics of the random cluster model, generalising the bounds and simplifying the proofs for the no-field case by Guo and Jerrum (2018); (2) near linear mixing time for the two dynamics above if the maximum degree is bounded and all fields are (consistent and) bounded away from $1$.

preprint2020arXiv

Kaluza-Klein modes of $U(1)$ gauge vector field on brane with codimension-$d$

From the paper [JHEP 01 (2019) 021], it is known that the effective action of a massless $U(1)$ gauge vector field on a codimension-2 brane is gauge invariant due to the coupling between the vector Kaluza-Klein (KK) modes with two types of scalar KK modes. It is interesting to generalize this result to a brane world model with an arbitrary number of extra dimensions. In this work, we first investigate the case with three extra dimensions. After KK decomposition, there are three types of scalar KK modes. In addition to the mutual coupling between these scalar modes, there are also coupling between the scalar and the vector KK modes. The coupling constants are not all independent. The relationships between the coupling constants enable us to obtain a gauge invariant effective action, from which we can see that the masses of the vector KK modes depend on all the three extra dimensions. The masses of the scalar modes, however, depend only on two of the three extra dimensions. Then we generalize the results into branes with codimension $d$ ($d=1, 2...$), and find that $d$ will directly affect the masses of the KK modes. But there is always a gauge invariant effective action for the massive vector KK modes.

preprint2020arXiv

Perfect sampling from spatial mixing

We introduce a new perfect sampling technique that can be applied to general Gibbs distributions and runs in linear time if the correlation decays faster than the neighborhood growth. In particular, in graphs with sub-exponential neighborhood growth like $\mathbb{Z}^d$, our algorithm achieves linear running time as long as Gibbs sampling is rapidly mixing. As concrete applications, we obtain the currently best perfect samplers for colorings and for monomer-dimer models in such graphs.

preprint2020arXiv

Rapid mixing from spectral independence beyond the Boolean domain

We extend the notion of spectral independence (introduced by Anari, Liu, and Oveis Gharan [ALO20]) from the Boolean domain to general discrete domains. This property characterises distributions with limited correlations, and implies that the corresponding Glauber dynamics is rapidly mixing. As a concrete application, we show that Glauber dynamics for sampling proper $q$-colourings mixes in polynomial-time for the family of triangle-free graphs with maximum degree $Δ$ provided $q\ge (α^*+δ)Δ$ where $α^*\approx 1.763$ is the unique solution to $α^*=\exp(1/α^*)$ and $δ>0$ is any constant. This is the first efficient algorithm for sampling proper $q$-colourings in this regime with possibly unbounded $Δ$. Our main tool of establishing spectral independence is the recursive coupling by Goldberg, Martin, and Paterson [GMP05].

preprint2020arXiv

Towards a Fast Steady-State Visual Evoked Potentials (SSVEP) Brain-Computer Interface (BCI)

Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performance improvements were achieved with tedious calibration and subject-specific training; resulting in the user&#39;s discomfort. So, we propose a training-free method by combining spatial-filtering and temporal alignment (CSTA) to recognize SSVEP responses in sub-second response time. CSTA exploits linear correlation and non-linear similarity between steady-state responses and stimulus templates with complementary fusion to achieve desirable performance improvements. We evaluated the performance of CSTA in terms of accuracy and Information Transfer Rate (ITR) in comparison with both training-based and training-free methods using two SSVEP data-sets. We observed that CSTA achieves the maximum mean accuracy of 97.43$\pm$2.26 % and 85.71$\pm$13.41 % with four-class and forty-class SSVEP data-sets respectively in sub-second response time in offline analysis. CSTA yields significantly higher mean performance (p<0.001) than the training-free method on both data-sets. Compared with training-based methods, CSTA shows 29.33$\pm$19.65 % higher mean accuracy with statistically significant differences in time window less than 0.5 s. In longer time windows, CSTA exhibits either better or comparable performance though not statistically significantly better than training-based methods. We show that the proposed method brings advantages of subject-independent SSVEP classification without requiring training while enabling high target recognition performance in sub-second response time.