Researcher profile

Kun Huang

Kun Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

How Mobile World Model Guides GUI Agents?

Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk interactions. Existing mobile world models provide either text-based or image-based future states, yet it remains unclear which representation is useful, whether generated rollouts can replace real environments, and how test-time guidance helps agents of different strengths. To answer the above questions, we filter and annotate mobile world-model data, then train world models across four modalities: delta text, full text, diffusion-based images, and renderable code. These models achieve SoTA performance on both MobileWorldBench and Code2WorldBench. Furthermore, by evaluating their downstream utility on AITZ, AndroidControl, and AndroidWorld, we obtain three findings. First, renderable code reconstruction achieves high in-distribution fidelity and provides effective multimodal supervision for data construction, while text-based feedback is more robust for online out-of-distribution (OOD) execution. Second, world-model-generated trajectories can provide transferable interaction experience in the training process and improve agents' end-to-end task performance, although these data do not preserve the original distribution. Last, for overconfident mobile agents with low action entropy, posterior self-reflection provides limited gains, suggesting that world models are more effective as prior perception or training supervision than as universal post-hoc verifiers.

preprint2022arXiv

A Compressed Gradient Tracking Method for Decentralized Optimization with Linear Convergence

Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network using only local computation and peer-to-peer communication. In this paper, we propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithm.

preprint2022arXiv

Accurate calibration of multi-perspective cameras from a generalization of the hand-eye constraint

Multi-perspective cameras are quickly gaining importance in many applications such as smart vehicles and virtual or augmented reality. However, a large system size or absence of overlap in neighbouring fields-of-view often complicate their calibration. We present a novel solution which relies on the availability of an external motion capture system. Our core contribution consists of an extension to the hand-eye calibration problem which jointly solves multi-eye-to-base problems in closed form. We furthermore demonstrate its equivalence to the multi-eye-in-hand problem. The practical validity of our approach is supported by our experiments, indicating that the method is highly efficient and accurate, and outperforms existing closed-form alternatives.

preprint2022arXiv

Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection Fusion

Optical flow computation is essential in the early stages of the video processing pipeline. This paper focuses on a less explored problem in this area, the 360$^\circ$ optical flow estimation using deep neural networks to support increasingly popular VR applications. To address the distortions of panoramic representations when applying convolutional neural networks, we propose a novel multi-projection fusion framework that fuses the optical flow predicted by the models trained using different projection methods. It learns to combine the complementary information in the optical flow results under different projections. We also build the first large-scale panoramic optical flow dataset to support the training of neural networks and the evaluation of panoramic optical flow estimation methods. The experimental results on our dataset demonstrate that our method outperforms the existing methods and other alternative deep networks that were developed for processing 360° content.

preprint2022arXiv

Label Adversarial Learning for Skeleton-level to Pixel-level Adjustable Vessel Segmentation

You can have your cake and eat it too. Microvessel segmentation in optical coherence tomography angiography (OCTA) images remains challenging. Skeleton-level segmentation shows clear topology but without diameter information, while pixel-level segmentation shows a clear caliber but low topology. To close this gap, we propose a novel label adversarial learning (LAL) for skeleton-level to pixel-level adjustable vessel segmentation. LAL mainly consists of two designs: a label adversarial loss and an embeddable adjustment layer. The label adversarial loss establishes an adversarial relationship between the two label supervisions, while the adjustment layer adjusts the network parameters to match the different adversarial weights. Such a design can efficiently capture the variation between the two supervisions, making the segmentation continuous and tunable. This continuous process allows us to recommend high-quality vessel segmentation with clear caliber and topology. Experimental results show that our results outperform manual annotations of current public datasets and conventional filtering effects. Furthermore, such a continuous process can also be used to generate an uncertainty map representing weak vessel boundaries and noise.

preprint2021arXiv

Quantum key distribution over scattering channel

Scattering of light by cloud, haze, and fog decreases the transmission efficiency of communication channels in quantum key distribution (QKD), reduces the system's practical security, and thus constrains the deployment of free-space QKD. Here, we employ the wavefront shaping technology to compensate distorted optical signals in high-loss scattering quantum channels and fulfill a polarization-encoded BB84 QKD experiment. With this quantum channel compensation technology, we achieve a typical enhancement of about 250 in transmission efficiency and improve the secure key rate from 0 to $1.85\times10^{-6}$ per sifted key. The method and its first time validation show the great potential to expand the territory of QKD systems from lossless channels to highly scattered ones and therefore enhances the deployment ability of global quantum communication network.

preprint2020arXiv

Generalized Perfect Optical Vortex along Arbitrary Trajectories

Perfect optical vortex (POV) is a type of vortex beam with an infinite thin ring and a fixed radius independent of its topological charge. Here we propose the concept of generalized perfect optical vortex along arbitrary curves beyond the regular shapes of circle and ellipse. Generalized perfect optical vortices also share the similar properties as POVs, such as defined only along infinite thin curves and owning topological charges independent of scales. Notably, they naturally degenerate to the POVs and elliptic POVs along circles and ellipses, respectively. We also experimentally generated the generalized perfect optical vortices through a digital micromirror device (DMD) and measured the phase distributions by interferometry, exhibiting good agreements with the simulations. Moreover, we derive a proper modified formula to yield the generalized perfect optical vortices with uniform intensity distribution along predesigned curves. The generalized perfect optical vortices might find the potential applications in optical tweezers and communication.

preprint2020arXiv

Learning to Parse Wireframes in Images of Man-Made Environments

In this paper, we propose a learning-based approach to the task of automatically extracting a "wireframe" representation for images of cluttered man-made environments. The wireframe (see Fig. 1) contains all salient straight lines and their junctions of the scene that encode efficiently and accurately large-scale geometry and object shapes. To this end, we have built a very large new dataset of over 5,000 images with wireframes thoroughly labelled by humans. We have proposed two convolutional neural networks that are suitable for extracting junctions and lines with large spatial support, respectively. The networks trained on our dataset have achieved significantly better performance than state-of-the-art methods for junction detection and line segment detection, respectively. We have conducted extensive experiments to evaluate quantitatively and qualitatively the wireframes obtained by our method, and have convincingly shown that effectively and efficiently parsing wireframes for images of man-made environments is a feasible goal within reach. Such wireframes could benefit many important visual tasks such as feature correspondence, 3D reconstruction, vision-based mapping, localization, and navigation. The data and source code are available at https://github.com/huangkuns/wireframe.

preprint2020arXiv

Low-Rank Reorganization via Proportional Hazards Non-negative Matrix Factorization Unveils Survival Associated Gene Clusters

One of the central goals in precision health is the understanding and interpretation of high-dimensional biological data to identify genes and markers associated with disease initiation, development, and outcomes. Though significant effort has been committed to harness gene expression data for multiple analyses while accounting for time-to-event modeling by including survival times, many traditional analyses have focused separately on non-negative matrix factorization (NMF) of the gene expression data matrix and survival regression with Cox proportional hazards model. In this work, Cox proportional hazards regression is integrated with NMF by imposing survival constraints. This is accomplished by jointly optimizing the Frobenius norm and partial log likelihood for events such as death or relapse. Simulation results on synthetic data demonstrated the superiority of the proposed method, when compared to other algorithms, in finding survival associated gene clusters. In addition, using human cancer gene expression data, the proposed technique can unravel critical clusters of cancer genes. The discovered gene clusters reflect rich biological implications and can help identify survival-related biomarkers. Towards the goal of precision health and cancer treatments, the proposed algorithm can help understand and interpret high-dimensional heterogeneous genomics data with accurate identification of survival-associated gene clusters.