Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges

The ability to reason from audio, including speech, environmental sounds, and music, is essential for AI agents to interact effectively in real-world scenarios. Existing benchmarks mainly focus on static or single-scene settings and English audio data and do not fully capture scenarios where multiple speakers, unfolding events, and heterogeneous audio sources interact. To address these challenges, we introduce CMDAR, a Chinese benchmark for evaluating models on complex, multi-scene, and dynamically evolving audio reasoning tasks. CMDAR comprises 3,000 carefully curated question-answer pairs linked to diverse audio clips, covering five categories of complex reasoning and spanning three question types. We benchmark 26 state-of-the-art audio language models on CMDAR and observe that they exhibit limitations in complex reasoning tasks. In CMDAR-main, Qwen2.5-Omni achieves 76.67% accuracy, whereas GPT-4o Audio reaches 68.47%. However, GPT-4o Audio substantially outperforms Qwen2.5-Omni on the more challenging multiple-choice with multiple audios and open-ended tasks. And we provide detail analysis corresponding suggestions for the future development of large audio language models.

preprint2026arXiv

Coding Agent Is Good As World Simulator

World models have emerged as a powerful paradigm for building interactive simulation environments, with recent video-based approaches demonstrating impressive progress in generating visually plausible dynamics. However, because these models typically infer dynamics from video and represent them in latent states, they do not explicitly enforce physical constraints. As a result, the generated video rollouts are not physically plausible, exhibiting unstable contacts, distorted shapes, or inconsistent motion. In this paper, we present an agentic framework constructing physics-based world models through executable simulation code. The framework coordinates planning, code generation, visual review, and physics analysis agents. The planning agent converts the natural language prompt into a structured scene plan, the code agent implements it as executable simulation code, and the visual review agent provide visual feedback while the physics analysis agent checks physical consistency. The code is iteratively revised based on the feedback until the simulation matches the prompt reqirements and physical constraints. Experimental results show that our framework outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality, which could be applied to various scenarios including driving simulation and embodied robot tasks.

preprint2026arXiv

DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior

Gaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.

preprint2026arXiv

MG-SLAM: Structure Gaussian Splatting SLAM with Manhattan World Hypothesis

Gaussian Splatting SLAMs have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM, an RGB-D system that leverages the Manhattan World hypothesis to enhance geometric accuracy and completeness. By seamlessly integrating fused line segments derived from structured scenes, our method ensures robust tracking in textureless indoor areas. Moreover, The extracted lines and planar surface assumption allow strategic interpolation of new Gaussians in regions of missing geometry, enabling efficient scene completion. Extensive experiments conducted on both synthetic and real-world scenes demonstrate that these advancements enable our method to achieve state-of-the-art performance, marking a substantial improvement in the capabilities of Gaussian SLAM systems.

preprint2026arXiv

MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Large multimodal Mixture-of-Experts (MoEs) effectively scale the model size to boost performance while maintaining fixed active parameters. However, previous works primarily utilized full-precision experts during sparse up-cycling. Despite they show superior performance on end tasks, the large amount of experts introduces higher memory footprint, which poses significant challenges for the deployment on edge devices. In this work, we propose MoTE, a scalable and memory-efficient approach to train Mixture-of-Ternary-Experts models from dense checkpoint. Instead of training fewer high-precision experts, we propose to train more low-precision experts during up-cycling. Specifically, we use the pre-trained FFN as a shared expert and train ternary routed experts with parameters in {-1, 0, 1}. Extensive experiments show that our approach has promising scaling trend along model size. MoTE achieves comparable performance to full-precision baseline MoE-LLaVA while offering lower memory footprint. Furthermore, our approach is compatible with post-training quantization methods and the advantage further amplifies when memory-constraint goes lower. Given the same amount of expert memory footprint of 3.4GB and combined with post-training quantization, MoTE outperforms MoE-LLaVA by a gain of 4.3% average accuracy on end tasks, demonstrating its effectiveness and potential for memory-constrained devices.

preprint2026arXiv

Truth or Tribe: How In-group Favoritism Prioritize Facts in Persona Agents

In-group favoritism refers to the phenomena of favoring members of one's in-group over out-group members and is widely observed in numerous social cooperative behaviors. Recently, in-group favoritism biases have also been identified in generative language models. However, whether the in-group favoritism exists when persona agents are faced with contradicting information (e.g., misinformation), and how to mitigate the adverse effects of in-group favoritism biases in persona agents have been understudied. To address these problems, we propose a Truth or Tribe simulation framework to study the agent cooperation within the spread of contradicting information through a triadic interaction paradigm, and conduct controlled trials to evaluate the primary moderating factors. Extensive results showcase that persona agents display strong in-group favoritism, accepting incorrect answers from identity-similar peers at much higher rates than from dissimilar peers. In-group favoritism continues to emerge in defeasible reasoning contexts where no absolute truth exists, and it intensifies as cognitive complexity increases. Furthermore, three intervention strategies--Identity-Blind Instruction, Structured Counterfactual Reasoning, and Heterogeneous Perspective Ensemble--are proposed to mitigate the in-group favoritism.

preprint2024arXiv

PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization

Visual-inertial SLAM is crucial in various fields, such as aerial vehicles, industrial robots, and autonomous driving. The fusion of camera and inertial measurement unit (IMU) makes up for the shortcomings of a signal sensor, which significantly improves the accuracy and robustness of localization in challenging environments. This article presents PLE-SLAM, an accurate and real-time visual-inertial SLAM algorithm based on point-line features and efficient IMU initialization. First, we use parallel computing methods to extract features and compute descriptors to ensure real-time performance. Adjacent short line segments are merged into long line segments, and isolated short line segments are directly deleted. Second, a rotation-translation-decoupled initialization method is extended to use both points and lines. Gyroscope bias is optimized by tightly coupling IMU measurements and image observations. Accelerometer bias and gravity direction are solved by an analytical method for efficiency. To improve the system's intelligence in handling complex environments, a scheme of leveraging semantic information and geometric constraints to eliminate dynamic features and A solution for loop detection and closed-loop frame pose estimation using CNN and GNN are integrated into the system. All networks are accelerated to ensure real-time performance. The experiment results on public datasets illustrate that PLE-SLAM is one of the state-of-the-art visual-inertial SLAM systems.

preprint2024arXiv

Temporal Adaptive RGBT Tracking with Modality Prompt

RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results. However, these RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training. The former struggles to cope with the object state changes, while the latter neglects the correlation between spatial and temporal information. To alleviate these limitations, we propose a novel Temporal Adaptive RGBT Tracking framework, named as TATrack. TATrack has a spatio-temporal two-stream structure and captures temporal information by an online updated template, where the two-stream structure refers to the multi-modal feature extraction and cross-modal interaction for the initial template and the online update template respectively. TATrack contributes to comprehensively exploit spatio-temporal information and multi-modal information for target localization. In addition, we design a spatio-temporal interaction (STI) mechanism that bridges two branches and enables cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that our method achieves state-of-the-art performance, while running at real-time speed.

preprint2023arXiv

Bi-Hölder extensions of quasi-isometries on pseudoconvex domains of finite type in $\mathbb{C}^2$

In this paper, we prove that the identity map for the smoothly bounded pseudoconvex domain of finite type in $\mathbb{C}^2$ extends to a bi-Hölder map between the Euclidean boundary and Gromov boundary. As an application, we show the bi-Hölder boundary extensions for quasi-isometries between these domains. Moreover, we get a more accurate index of the Gehring-Hayman type theorem for the bounded $m$-convex domains with Dini-smooth boundary.

preprint2023arXiv

The state-of-the-art 3D anisotropic intracranial hemorrhage segmentation on non-contrast head CT: The INSTANCE challenge

Automatic intracranial hemorrhage segmentation in 3D non-contrast head CT (NCCT) scans is significant in clinical practice. Existing hemorrhage segmentation methods usually ignores the anisotropic nature of the NCCT, and are evaluated on different in-house datasets with distinct metrics, making it highly challenging to improve segmentation performance and perform objective comparisons among different methods. The INSTANCE 2022 was a grand challenge held in conjunction with the 2022 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). It is intended to resolve the above-mentioned problems and promote the development of both intracranial hemorrhage segmentation and anisotropic data processing. The INSTANCE released a training set of 100 cases with ground-truth and a validation set with 30 cases without ground-truth labels that were available to the participants. A held-out testing set with 70 cases is utilized for the final evaluation and ranking. The methods from different participants are ranked based on four metrics, including Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), Relative Volume Difference (RVD) and Normalized Surface Dice (NSD). A total of 13 teams submitted distinct solutions to resolve the challenges, making several baseline models, pre-processing strategies and anisotropic data processing techniques available to future researchers. The winner method achieved an average DSC of 0.6925, demonstrating a significant growth over our proposed baseline method. To the best of our knowledge, the proposed INSTANCE challenge releases the first intracranial hemorrhage segmentation benchmark, and is also the first challenge that intended to resolve the anisotropic problem in 3D medical image segmentation, which provides new alternatives in these research fields.

preprint2022arXiv

An Efficient Target Detection and Recognition Method in Aerial Remote-sensing Images Based on Multiangle Regions-of-Interest

Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition.

preprint2022arXiv

DeepNet: Scaling Transformers to 1,000 Layers

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded in a stable way. The proposed method combines the best of two worlds, i.e., good performance of Post-LN and stable training of Pre-LN, making DeepNorm a preferred alternative. We successfully scale Transformers up to 1,000 layers (i.e., 2,500 attention and feed-forward network sublayers) without difficulty, which is one order of magnitude deeper than previous deep Transformers. Remarkably, on a multilingual benchmark with 7,482 translation directions, our 200-layer model with 3.2B parameters significantly outperforms the 48-layer state-of-the-art model with 12B parameters by 5 BLEU points, which indicates a promising scaling direction.

preprint2022arXiv

Extend The Levin-Wen Model To Two-dimensional Topological Orders With Gapped Boundary Junctions

A realistic material may possess defects, which often bring the material new properties that have practical applications. The boundary defects of a two-dimensional topologically ordered system are thought of as an alternative way of realizing topological quantum computation. To facilitate the study of such boundary defects, in this paper, we construct an exactly solvable Hamiltonian model of topological orders with gapped boundary junctions, where the boundary defects reside, by placing the Levin-Wen model on a disk, whose gapped boundary is separated into multiple segments by junctions. We find that the Hamiltonian of a gapped boundary junction is characterized by either a morphism between or a common Frobenius subalgebra of the two Frobenius algebras (in the input fusion category) characetrizing the two boundary segments joint by the junction. We derive a formula of the ground state degeneracy and an explicit ground-state basis of our model. We propose the notion of mobile and immobile charges on the boundary and find that they are quantum observables and label the ground-state basis. Our model is computation friendly.

preprint2022arXiv

Parameterized Colorings And Labellings Of Graphs In Topological Coding

The coming quantum computation is forcing us to reexamine the cryptosystems people use. We are applying graph colorings of topological coding to modern information security and future cryptography against supercomputer and quantum computer attacks in the near future. Many of techniques introduced here are associated with many mathematical conjecture and NP-problems. We will introduce a group of W-constraint (k,d)-total colorings and algorithms for realizing these colorings in some kinds of graphs, which are used to make quickly public-keys and private-keys with anti-quantum computing, these (k,d)-total colorings are: graceful (k,d)-total colorings, harmonious (k,d)-total colorings, (k,d)-edge-magic total colorings, (k,d)-graceful-difference total colorings and (k,d)-felicitous-difference total colorings. One of useful tools we used is called Topcode-matrix with elements can be all sorts of things, for example, sets, graphs, number-based strings. Most of parameterized graphic colorings/labelings are defined by Topcode-matrix algebra here. From the application point of view, many of our coloring techniques are given by algorithms and easily converted into programs.

preprint2022arXiv

Topological Authentication Technique In Topologically Asymmetric Cryptosystem

Making topological authentication from theory to practical application is an important and challenging task. More and more researchers pay attention on coming quantum computation, privacy data protection, lattices and cryptography. Research show the advantages of topological authentications through graph operations, various matrices, graph colorings and graph labelings are: related with two or more different mathematical areas, be not pictures, there are huge number of colorings and labelings, rooted on modern mathematics, diversity of asymmetric ciphers, simplicity and convenience, easily created, irreversibility, computational security, provable security, and so on. Topological authentications based on various graph homomorphisms, degree-sequence homomorphisms, graph-set homomorphisms. Randomly topological coding and topological authentications are based on Hanzi authentication, randomly adding-edge-removing operation, randomly leaf-adding algorithms, graph random increasing techniques, operation graphic lattice and dynamic networked models and their spanning trees and maximum leaf spanning trees. Realization of topological authentication is an important topic, we study: number-based strings generated from colored graphs, particular graphs (complete graphs, trees, planar graphs), some methods of generating public-keys. some techniques of topologically asymmetric cryptosystem are: W-type matching labelings, dual-type labelings, reciprocal-type labelings, topological homomorphisms, indexed colorings, graphic lattices, degree-sequence lattices, every-zero Cds-matrix groups of degree-sequences, every-zero graphic groups, graphic lattices having coloring closure property, self-similar networked lattices.

preprint2021arXiv

AEGCN: An Autoencoder-Constrained Graph Convolutional Network

We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network, to solve node classification task on graph domains. As suggested by its name, the core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder. Comparing with vanilla graph convolutional networks, the autoencoder step is added to reduce the information loss brought by Laplacian smoothing. We consider applying our model on both homogeneous graphs and heterogeneous graphs. For homogeneous graphs, the autoencoder approximates to the adjacency matrix of the input graph by taking hidden layer representations as encoder and another one-layer graph convolutional network as decoder. For heterogeneous graphs, since there are multiple adjacency matrices corresponding to different types of edges, the autoencoder approximates to the feature matrix of the input graph instead, and changes the encoder to a particularly designed multi-channel pre-processing network with two layers. In both cases, the error occurred in the autoencoder approximation goes to the penalty term in the loss function. In extensive experiments on citation networks and other heterogeneous graphs, we demonstrate that adding autoencoder constraints significantly improves the performance of graph convolutional networks. Further, we notice that our technique can be applied on graph attention network to improve the performance as well. This reveals the wide applicability of the proposed autoencoder technique.

preprint2020arXiv

Graph Homomorphisms Based On Particular Total Colorings of Graphs and Graphic Lattices

Lattice-based cryptography is not only for thwarting future quantum computers, and is also the basis of Fully Homomorphic Encryption. Motivated from the advantage of graph homomorphisms we combine graph homomorphisms with graph total colorings together for designing new types of graph homomorphisms: totally-colored graph homomorphisms, graphic-lattice homomorphisms from sets to sets, every-zero graphic group homomorphisms from sets to sets. Our graph-homomorphism lattices are made up by graph homomorphisms. These new homomorphisms induce some problems of graph theory, for example, Number String Decomposition and Graph Homomorphism Problem.

preprint2020arXiv

Ice-Flower Systems And Star-graphic Lattices

Lattice theory has been believed to resist classical computers and quantum computers. Since there are connections between traditional lattices and graphic lattices, it is meaningful to research graphic lattices. We define the so-called ice-flower systems by our uncolored or colored leaf-splitting and leaf-coinciding operations. These ice-flower systems enable us to construct several star-graphic lattices. We use our star-graphic lattices to express some well-known results of graph theory and compute the number of elements of a particular star-graphic lattice. For more researching ice-flower systems and star-graphic lattices we propose Decomposition Number String Problem, finding strongly colored uniform ice-flower systems and connecting our star-graphic lattices with traditional lattices.

preprint2020arXiv

Localization of the Kobayashi metric and applications

In this paper we introduce a new class of domains -- log-type convex domains, which have no boundary regularity assumptions. Then we will localize the Kobayashi metric in log-type convex subdomains. As an application, we prove a local version of continuous extension of rough isometric maps between two bounded domains with log-type convex Dini-smooth boundary points. Moreover we prove that the Teichmüller space $\mathcal T_{g,n}$ is not biholomorphic to any bounded pseudoconvex domain in $\mathbb C^{3g-3+n}$ which is locally log-type convex near some boundary point.

preprint2020arXiv

Recognizing Handwritten Mathematical Expressions as LaTex Sequences Using a Multiscale Robust Neural Network

In this paper, a robust multiscale neural network is proposed to recognize handwritten mathematical expressions and output LaTeX sequences, which can effectively and correctly focus on where each step of output should be concerned and has a positive effect on analyzing the two-dimensional structure of handwritten mathematical expressions and identifying different mathematical symbols in a long expression. With the addition of visualization, the model's recognition process is shown in detail. In addition, our model achieved 49.459% and 46.062% ExpRate on the public CROHME 2014 and CROHME 2016 datasets. The present model results suggest that the state-of-the-art model has better robustness, fewer errors, and higher accuracy.

preprint2020arXiv

Robust Encoder-Decoder Learning Framework towards Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Deep Neural Network

Offline handwritten mathematical expression recognition is a challenging task, because handwritten mathematical expressions mainly have two problems in the process of recognition. On one hand, it is how to correctly recognize different mathematical symbols. On the other hand, it is how to correctly recognize the two-dimensional structure existing in mathematical expressions. Inspired by recent work in deep learning, a new neural network model that combines a Multi-Scale convolutional neural network (CNN) with an Attention recurrent neural network (RNN) is proposed to identify two-dimensional handwritten mathematical expressions as one-dimensional LaTeX sequences. As a result, the model proposed in the present work has achieved a WER error of 25.715% and ExpRate of 28.216%.

preprint2020arXiv

The Gehring-Hayman type theorems on complex domains

In this paper we establish Gehring-Hayman type theorems for some complex domains. Suppose that $Ω\subset \mathbb{C}^n$ is a bounded $m$-convex domain with Dini-smooth boundary, or a bounded strongly pseudoconvex domain with $C^2$-smooth boundary. Then we prove that the Euclidean length of Kobayashi geodesic $[x,y]$ in $Ω$ is less than $c_1|x-y|^{c_2}$. Furthermore, if $Ω$ endowed with the Kobayashi metric is Gromov hyperbolic, then we can generalize this result to quasi-geodesics with respect to Bergman metric, Carathéodory metric or Kähler-Einstein metric. As applications, we prove the bi-Hölder equivalence between the Euclidean boundary and the Gromov boundary. Moreover, by using this boundary correspondence, we can show some extension results for biholomorphisms, and more general rough quasi-isometries with respect to the Kobayashi metrics between the domains.

preprint2019arXiv

Deep Multiphase Level Set for Scene Parsing

Recently, Fully Convolutional Network (FCN) seems to be the go-to architecture for image segmentation, including semantic scene parsing. However, it is difficult for a generic FCN to discriminate pixels around the object boundaries, thus FCN based methods may output parsing results with inaccurate boundaries. Meanwhile, level set based active contours are superior to the boundary estimation due to the sub-pixel accuracy that they achieve. However, they are quite sensitive to initial settings. To address these limitations, in this paper we propose a novel Deep Multiphase Level Set (DMLS) method for semantic scene parsing, which efficiently incorporates multiphase level sets into deep neural networks. The proposed method consists of three modules, i.e., recurrent FCNs, adaptive multiphase level set, and deeply supervised learning. More specifically, recurrent FCNs learn multi-level representations of input images with different contexts. Adaptive multiphase level set drives the discriminative contour for each semantic class, which makes use of the advantages of both global and local information. In each time-step of the recurrent FCNs, deeply supervised learning is incorporated for model training. Extensive experiments on three public benchmarks have shown that our proposed method achieves new state-of-the-art performances.

preprint2019arXiv

Electric-magnetic duality in the quantum double models of topological orders with gapped boundaries

We generalize the Electric-magnetic (EM) duality in the quantum double (QD) models to the case of topological orders with gapped boundaries. We also map the QD models with boundaries to the Levin-Wen (LW) models with boundaries. To this end, we Fourier transform and rewrite the extended QD model with a finite gauge group $G$ on a trivalent lattice with a boundary. Gapped boundary conditions of the model before the transformation are known to be characterized by the subgroups $K \subseteq G$. We find that after the transformation, the boundary conditions are then characterized by the Frobenius algebras $A_{G,K}$ in $\mathrm{Rep}_G$. An $A_{G,K}$ is the dual space of the quotient of the group algebra of $G$ over that of $K$, and $\mathrm{Rep}_G$ is the category of the representations of $G$. The EM duality on the boundary is revealed by mapping the $K$'s to $A_{G,K}$'s. We also show that our transformed extended QD model can be mapped to an extended LW model on the same lattice via enlarging the Hilbert space of the extended LW model. Moreover, our transformed extended QD model elucidates the phenomenon of anyon splitting in anyon condensation.