Researcher profile

Teng Wang

Teng Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Charge-dependent nucleon-nucleon interaction at N$^3$LO in nuclear lattice effective field theory

The nuclear lattice effective field theory (NLEFT) is an efficient tool for solving nuclear many-body problems, which takes high-fidelity lattice chiral interactions as input and computes nuclear low-energy observables via quantum Monte Carlo techniques. In this work, we present the first next-to-next-to-next-to-leading order (N$^3$LO) chiral forces on the lattice with the isospin-breaking effects fully taken into account. We focus on both the charge-independence breaking (CIB) and charge-symmetry breaking (CSB) effects. Specifically, we include the isospin-breaking effect from the mass difference between the charged and neutral pions in the one-pion-exchange potential (OPEP), the Coulomb force for the $pp$ interaction and the contribution of two additional charge-dependent contact operators. We also explicitly incorporate the two-pion-exchange potentials which was mostly neglected in previous NLEFT calculations. With these improvements, we are able to accurately reproduce the $np$ and $pp$ scattering phase shifts up to relative momentum $p \sim 200$ MeV as well as the deuteron properties. The construction of these charge-dependent lattice nuclear forces establishes a solid foundation for future high-precision nuclear ab initio calculations within the NLEFT framework.

preprint2026arXiv

CVBench: Benchmarking Cross-Video Synergies for Complex Multimodal Reasoning

While multimodal large language models (MLLMs) exhibit strong performance on single-video tasks (e.g., video question answering), their capability for spatiotemporal pattern reasoning across multiple videos remains a critical gap in pattern recognition research. However, this capability is essential for real-world applications, including multi-camera surveillance and cross-video procedural learning. To bridge this gap, we present CVBench, the first diagnostic benchmark designed to assess cross-video relational reasoning rigorously. CVBench comprises 1,000 question-answer pairs spanning three hierarchical tiers: cross-video object association (identifying shared entities), cross-video event association (linking temporal or causal event chains), and cross-video complex reasoning (integrating commonsense and domain knowledge). Built from five domain-diverse video clusters (e.g., sports, life records), the benchmark challenges models to analyze and integrate spatiotemporal patterns from dynamic visual streams. Extensive evaluation of 10+ leading MLLMs (including GPT-4o, Gemini-2.0-flash, Qwen2.5-VL) under zero-shot or chain-of-thought prompting paradigms. Key findings reveal stark performance gaps: even top models, such as GPT-4o, achieve only 63.5% accuracy on causal reasoning tasks, compared to the 91.3% accuracy of human performance. Crucially, our analysis reveals fundamental bottlenecks inherent in current MLLMs architectures, notably deficient inter-video context retention and poor disambiguation of overlapping entities. CVBench establishes a rigorous framework for advancing pattern recognition methodologies in multi-video scenarios, providing architectural insights for next-generation models. The data and evaluation code are available at: https://github.com/Hokhim2/CVBench.

preprint2026arXiv

Investigating nuclear beta decay using lattice quantum Monte Carlo approach

We present an \textit{ab initio} calculation of nuclear $β$ decay within the framework of nuclear lattice effective field theory (NLEFT), employing auxiliary-field quantum Monte Carlo methods to solve the nuclear many-body problem. Our approach combines next-to-next-to-leading order two- and three-body chiral interactions with one- and two-body axial current operators, all consistently derived in chiral effective field theory. Low-energy constants are determined exclusively from nucleon-nucleon scattering phase shifts and few-body observables for systems with $A \leq 3$. Using these interactions and transition operators, we perform two-channel Monte Carlo simulations to compute the $β$-decay matrix element for $^6$He, obtaining results in reasonable agreement with experimental measurements. To address the Monte Carlo sign problem, we implement a perturbative expansion around a leading-order Hamiltonian with approximate Wigner-SU(4) symmetry. This systematic approach provides a foundation for extending NLEFT simulations to precision studies of weak processes in medium-mass nuclei.

preprint2025arXiv

Basal layer of granular flow down smooth and rough inclines: kinematics, slip laws and rheology

Granular flow down an inclined plane is ubiquitous in geophysical and industrial applications. On rough inclines, the flow exhibits Bagnold's velocity profile and follows the so-called $μ(I)$ local rheology. On insufficiently rough or smooth inclines, however, velocity slip occurs at the bottom and a basal layer with strong agitation emerges below the bulk, which is not predicted by the local rheology. Here, we use discrete element method simulations to study detailed dynamics of the basal layer in granular flows down both smooth and rough inclines. We control the roughness via a dimensionless parameter, $R_a$, varied systematically from 0 (flat, frictional plane) to near 1 (very rough plane). Three flow regimes are identified: a slip regime ($R_a \lesssim 0.45$) where a dilated basal layer appears, a no-slip regime ($R_a \gtrsim 0.6$) and an intermediate transition regime. In the slip regime, the kinematics profiles (velocity, shear rate and granular temperature) of the basal layer strongly deviate from Bagnold's profiles. General basal slip laws are developed which express the slip velocity as a function of the local shear rate (or granular temperature), base roughness and slope angle. Moreover, the basal layer thickness is insensitive to flow conditions but depends somewhat on the inter-particle coefficient of restitution. Finally, we show that the rheological properties of the basal layer do not follow the $μ(I)$ rheology, but are captured by Bagnold's stress scaling and an extended kinetic theory for granular flows. Our findings can help develop more predictive granular flow models in the future.

preprint2022arXiv

Anomalous High-Field Magnetotransport in CaFeAsF due to the Quantum Hall Effect

CaFeAsF is an iron-based superconductor parent compound whose Fermi surface is quasi-two dimensional, composed of Dirac-electron and Schrödinger-hole cylinders elongated along the $c$ axis. We measured the longitudinal and Hall resistivities in CaFeAsF with the electrical current in the $ab$ plane in magnetic fields up to 45 T applied along the $c$ axis and obtained the corresponding conductivities via tensor inversion. We found that both the longitudinal and Hall conductivities approached zero above $\sim$40 T as the temperature was lowered to 0.4 K. Our analysis indicates that the Landau-level filling factor is $ν$ = 2 for both electrons and holes at these high field strengths, resulting in a total filling factor $ν$ = $ν_{hole} - ν_{electron}$ = 0. We therefore argue that the $ν$ = 0 quantum Hall state emerges under these conditions.

preprint2022arXiv

Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation

Scene graph generation is a sophisticated task because there is no specific recognition pattern (e.g., "looking at" and "near" have no conspicuous difference concerning vision, whereas "near" could occur between entities with different morphology). Thus some scene graph generation methods are trapped into most frequent relation predictions caused by capricious visual features and trivial dataset annotations. Therefore, recent works emphasized the "unbiased" approaches to balance predictions for a more informative scene graph. However, human's quick and accurate judgments over relations between numerous objects should be attributed to "bias" (i.e., experience and linguistic knowledge) rather than pure vision. To enhance the model capability, inspired by the "cognitive bias" mechanism, we propose a novel 3-paradigms framework that simulates how humans incorporate the label linguistic features as guidance of vision-based representations to better mine hidden relation patterns and alleviate noisy visual propagation. Our framework is model-agnostic to any scene graph model. Comprehensive experiments prove our framework outperforms baseline modules in several metrics with minimum parameters increment and achieves new SOTA performance on Visual Genome dataset.

preprint2022arXiv

Exploiting Context Information for Generic Event Boundary Captioning

Generic Event Boundary Captioning (GEBC) aims to generate three sentences describing the status change for a given time boundary. Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information. To tackle this issue, we design a model that directly takes the whole video as input and generates captions for all boundaries parallelly. The model could learn the context information for each time boundary by modeling the boundary-boundary interactions. Experiments demonstrate the effectiveness of context information. The proposed method achieved a 72.84 score on the test set, and we reached the $2^{nd}$ place in this challenge. Our code is available at: \url{https://github.com/zjr2000/Context-GEBC}

preprint2022arXiv

Lattice QCD calculation of $K\to \ellν_\ell \ell'^+ \ell'^-$ decay width

We develop a methodology for the computation of the $K\to \ellν_\ell \ell'^+ \ell'^-$ decay width using lattice QCD and present an exploratory study here. We use a scalar function method to account for the momentum dependence of the decay amplitude and adopt the infinite volume reconstruction method to reduce the systematic errors such as the temporal truncation effects and the finite-volume effects. We then perform a four-body phase-space integral to obtain the decay width. The only remaining technical problem is the possible power-law finite-volume effects associated with the process of $K\toππ\ellν_\ell\to \ellν_\ell \ell'^+ \ell'^-$, where the intermediate state involves multiple hadrons. In this work, we use a gauge ensemble of twisted mass fermion with a pion mass $m_π=352$ MeV and a nearly-physical kaon mass. At this kinematics, the $ππ$ in the intermediate state cannot be on shell simultaneously as $2m_π>m_K$ and the finite-volume effects associated with $ππ$ state are exponentially suppressed. Using the developed methods mentioned above, we calculate the branching ratios for four channels of $K\to \ellν_\ell\ell'^+ \ell'^-$, and obtain the results comparable to the experimental measurements and ChPT predictions. Our work demonstrates the capability of lattice QCD to improve Standard Model prediction in $K\to \ellν_\ell \ell'^+ \ell'^-$ decay width.

preprint2022arXiv

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Visual place recognition is one of the essential and challenging problems in the fields of robotics. In this letter, we for the first time explore the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments. We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation and recover the static image directly from the corresponding dynamic image. We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors. In parallel, the static image is encoded using the popular Bag-of-words model. On the basis of the above multi-modal features, we finally measure the similarity between the query image and target landmark by the joint similarity of their semantic and visual codes. Extensive experiments demonstrate the effectiveness and robustness of the proposed approach for place recognition in dynamic environments.

preprint2022arXiv

Nonlinear stability of planar viscous shock wave to three-dimensional compressible Navier-Stokes equations

We prove the nonlinear stability of the planar viscous shock up to a time-dependent shift for the three-dimensional (3D) compressible Navier-Stokes equations under the generic perturbations, in particular, without zero mass conditions. Moreover, the time-dependent shift function keeps the shock profile shape time-asymptotically. Our stability result is unconditional for the weak planar Navier-Stokes shock. Our proof is motivated by the $a$-contraction method (a kind of weighted $L^2$-relative entropy method) with time-dependent shift introduced in [10,11,13] for the stability of viscous shock in one-dimensional (1D) case. Instead of the classical anti-derivative techniques, we perform the stability analysis of planar Navier-Stokes shock in original $H^2$-perturbation framework and therefore zero mass conditions are not necessarily needed, which, in turn, brings out the essential difficulties due to the compressibility of viscous shock. Furthermore, compared with 1D case, there are additional difficulties coming from the wave propagation along the multi-dimensional transverse directions and their interactions with the viscous shock. To overcome these difficulties, a multi-dimensional version sharp weighted Poincar${\rm \acute{e}}$ inequality (see Lemma 3.1), $a$-contraction techniques with time-dependent shift, and some essential physical structures of the multi-dimensional Navier-Stokes system are fully used.

preprint2022arXiv

Semantic-Aware Pretraining for Dense Video Captioning

This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021. We present a semantic-aware pretraining method for dense video captioning, which empowers the learned features to recognize high-level semantic concepts. Diverse video features of different modalities are fed into an event captioning module to generate accurate and meaningful sentences. Our final ensemble model achieves a 10.00 METEOR score on the test set.

preprint2022arXiv

Topological frequency shift of quantum oscillation in CaFeAsF

Guo, Alexandradinata, \textit{et al.} have recently proposed that quantum-oscillation frequencies from Dirac/Weyl fermions exhibit a negative shift proportional to $T^2$ because of the energy dependence of the effective mass peculiar to a linear band-dispersion. We have measured Shubnikov--de Haas oscillation in CaFeAsF up to $T$ = 9 K. The frequency of the $α$ Dirac electron exhibits a negative shift with increasing $T$, while that of the $β$ Schrödinger hole does not. For $T \geqslant 5$ K where $β$ is negligible, the $α$-frequency shift is proportional to $T^2$ and its rate agrees with the theoretical prediction within experimental accuracy. At lower temperatures, the shifts of $α$ and $β$ deviate from theoretical expectations, which we ascribe to the inaccuracy in the frequency determination due to unfavorable interference between frequencies. Our results confirm that the topological frequency shift can be utilized to identify Dirac/Weyl fermions when quantum-oscillation frequencies can be determined accurately.

preprint2022arXiv

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, and urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. Besides, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset. Code is available at https://github.com/WangLibo1995/GeoSeg.

preprint2022arXiv

Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization

Ground-to-aerial geolocalization refers to localizing a ground-level query image by matching it to a reference database of geo-tagged aerial imagery. This is very challenging due to the huge perspective differences in visual appearances and geometric configurations between these two views. In this work, we propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture, which couples CNN-based local features with Transformer-based global representations for enhanced representation learning. Specifically, our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context from the CNN map. In particular, our Transformer head acts as a spatial-aware importance generator to select salient CNN features as the final feature representation. Such a coupling procedure allows us to leverage a lightweight Transformer network to greatly enhance the discriminative capability of the embedded features. Furthermore, we design a dual-branch Transformer head network to combine image features from multi-scale windows in order to improve details of the global feature representation. Extensive experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12\% and 84.92\% on CVUSA and CVACT_val, respectively, which outperforms the second-performing baseline with less than 50% parameters and almost 2x higher frame rate, therefore achieving a preferable accuracy-efficiency tradeoff.

preprint2022arXiv

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques. To reduce the dependency on well-aligned image-text pairs, it is promising to directly leverage the large-scale text-only and image-only corpora. This paper proposes a data augmentation method, namely cross-modal CutMix (CMC), for implicit cross-modal alignment learning in unpaired VLP. Specifically, CMC transforms natural sentences from the textual view into a multi-modal view, where visually-grounded words in a sentence are randomly replaced by diverse image patches with similar semantics. There are several appealing proprieties of the proposed CMC. First, it enhances the data diversity while keeping the semantic meaning intact for tackling problems where the aligned data are scarce; Second, by attaching cross-modal noise on uni-modal data, it guides models to learn token-level interactions across modalities for better denoising. Furthermore, we present a new unpaired VLP method, dubbed as VLMixer, that integrates CMC with contrastive learning to pull together the uni-modal and multi-modal views for better instance-level alignments among different modalities. Extensive experiments on five downstream tasks show that VLMixer could surpass previous state-of-the-art unpaired VLP methods.

preprint2021arXiv

A Comprehensive Survey on Local Differential Privacy Toward Data Statistics and Analysis

Collecting and analyzing massive data generated from smart devices have become increasingly pervasive in crowdsensing, which are the building blocks for data-driven decision-making. However, extensive statistics and analysis of such data will seriously threaten the privacy of participating users. Local differential privacy (LDP) has been proposed as an excellent and prevalent privacy model with distributed architecture, which can provide strong privacy guarantees for each user while collecting and analyzing data. LDP ensures that each user's data is locally perturbed first in the client-side and then sent to the server-side, thereby protecting data from privacy leaks on both the client-side and server-side. This survey presents a comprehensive and systematic overview of LDP with respect to privacy models, research tasks, enabling mechanisms, and various applications. Specifically, we first provide a theoretical summarization of LDP, including the LDP model, the variants of LDP, and the basic framework of LDP algorithms. Then, we investigate and compare the diverse LDP mechanisms for various data statistics and analysis tasks from the perspectives of frequency estimation, mean estimation, and machine learning. What's more, we also summarize practical LDP-based application scenarios. Finally, we outline several future research directions under LDP.

preprint2020arXiv

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020. Our approach follows a two-stage pipeline: first, we extract a set of temporal event proposals; then we propose a multi-event captioning model to capture the event-level temporal relationships and effectively fuse the multi-modal information. Our approach achieves a 9.28 METEOR score on the test set.

preprint2020arXiv

Elastoresistance measurements on CaKFe$_4$As$_4$ and KCa$_2$Fe$_4$As$_4$F$_2$ with the Fe site of $C_{2v}$ symmetry

We report resistance and elastoresistance measurements on (Ba$_{0.5}$K$_{0.5}$)Fe$_2$As$_2$, CaKFe$_4$As$_4$, and KCa$_2$Fe$_4$As$_4$F$_2$. The Fe-site symmetry is $D_{2d}$ in the first compound but $C_{2v}$ in the latter two, which lifts the degeneracy of the Fe $d_{xz}$ and $d_{yz}$ orbitals. The temperature dependence of the resistance and elastoresistance is similar between the three compounds. Especially, the [110] elastoresistance is enhanced with decreasing temperature irrespective of the Fe-site symmetry. This appears to be in conflict with recent Raman scattering studies on CaKFe$_4$As$_4$, which suggest the absence of nematic fluctuations. We consider possible ways of reconciliation and suggest that the present result is important in elucidating the origin of in-plane resistivity anisotropy in iron-based superconductors.

preprint2020arXiv

Low temperature specific heat of 12442-type KCa_2Fe_4As_4F_2 single crystals

Low-temperature specific heat (SH) is measured for the 12442-type KCa$_2$Fe$_4$As$_4$F$_2$ single crystal under different magnetic fields. A clear SH jump with the height of $ΔC/T|_{T_c}$ = 130 mJ/mol K$^2$ is observed at the superconducting transition temperature $T_c$. It is found that the electronic SH coefficient $Δγ(H)$ quickly increases when the field is in the low-field region below 3 T and then considerably slows down the increase with a further increase in the field, which indicates a rather strong anisotropy or multi-gap feature with a small minimum in the superconducting gap(s). The temperature-dependent SH data indicates the presence of the $T^2$ term, which supplies further information and supports the picture with a line-nodal gap structure. Moreover, the onset point of the SH transition remains almost unchanged under the field as high as 9 T, which is similar to that observed in cuprates, and placed this system in the middle between the BCS limit and the Bose-Einstein condensation.