Researcher profile

Hong Chang

Hong Chang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

MATS: An Audio Language Model under Text-only Supervision

Large audio-language models (LALMs), built upon powerful Large Language Models (LLMs), have exhibited remarkable audio comprehension and reasoning capabilities. However, the training of LALMs demands a large corpus of audio-language pairs, which requires substantial costs in both data collection and training resources. In this paper, we propose \textbf{MATS}, an audio-language multimodal LLM designed to handle \textbf{M}ultiple \textbf{A}udio task using solely \textbf{T}ext-only \textbf{S}upervision. By leveraging pre-trained audio-language alignment models such as CLAP, we develop a text-only training strategy that projects the shared audio-language latent space into LLM latent space, endowing the LLM with audio comprehension capabilities without relying on audio data during training. To further bridge the modality gap between audio and language embeddings within CLAP, we propose the \textbf{S}trongly-rel\textbf{a}ted \textbf{n}oisy \textbf{t}ext with \textbf{a}udio (\textbf{Santa}) mechanism. Santa maps audio embeddings into CLAP language embedding space while preserving essential information from the audio input. Extensive experiments demonstrate that MATS, despite being trained exclusively on text data, achieves competitive performance compared to recent LALMs trained on large-scale audio-language pairs. The code is publicly available in \href{https://github.com/wangwen-banban/MATS}{https://github.com/wangwen-banban/MATS}.

preprint2022arXiv

Clothes-Changing Person Re-identification with RGB Modality Only

The key to address clothes-changing person re-identification (re-id) is to extract clothes-irrelevant features, e.g., face, hairstyle, body shape, and gait. Most current works mainly focus on modeling body shape from multi-modality information (e.g., silhouettes and sketches), but do not make full use of the clothes-irrelevant information in the original RGB images. In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w.r.t. clothes. Extensive experiments demonstrate that using RGB images only, CAL outperforms all state-of-the-art methods on widely-used clothes-changing person re-id benchmarks. Besides, compared with images, videos contain richer appearance and additional temporal information, which can be used to model proper spatiotemporal patterns to assist clothes-changing re-id. Since there is no publicly available clothes-changing video re-id dataset, we contribute a new dataset named CCVID and show that there exists much room for improvement in modeling spatiotemporal information. The code and new dataset are available at: https://github.com/guxinqian/Simple-CCReID.

preprint2022arXiv

Efficient geodesics in the curve complex and their dot graphs

For the complex of curves of a closed orientable surface of genus $g$, $\mathcal{C}(S_{g>1})$, the notion of efficient geodesic in was introduced in arXiv:1408.4133. There it was established that there always exists (finitely many) efficient geodesics between any two vertices, $ v_α , v_β \in \mathcal{C}(S_g)$, representing homotopy classes of simple closed curves, $α, β\subset S_g$. The main tool for used in establishing the existence of efficient geodesic was a dot graph, a booking scheme for recording the intersection pattern of a reference arc, $γ\subset S_g$, with the simple closed curves associated with the vertices of geodesic path in the zero skeleton, $\mathcal{C}^0(S_g)$. In particular, for an efficient geodesic between $v_α$ and $v_β$ of length $d \geq 3$, it was shown that any curve corresponding to the vertex that is distance one from $v_α$ intersects any $γ$ at most $d -2$ times. In this note we make a more expansive study of the characterizing "shape" of the dot graphs over the entire set of vertices in an efficient geodesic edge-path. The key take away of this study is that the shape of a dot graph for any efficient geodesic is contained within a spindle shape region. Since the Nielson-Thurston coordinates of any curve on $S_g$ are directly derived from its intersection number with finitely many reference arcs, spindle shaped dot graphs control the coordinate behavior of curves associated with the vertices of an efficient geodesic.

preprint2022arXiv

Floquet engineering Hz-Level Rabi Spectra in Shallow Optical Lattice Clock

Quantum metrology with ultra-high precision usually requires atoms prepared in an ultra-stable environment with well-defined quantum states. Thus, in optical lattice clock systems deep lattice potentials are used to trap ultra-cold atoms. However, decoherence, induced by Raman scattering and higher order light shifts, can significantly be reduced if atomic clocks are realized in shallow optical lattices. On the other hand, in such lattices, tunneling among different sites can cause additional dephasing and strongly broadening of the Rabi spectrum. Here, in our experiment, we periodically drive a shallow $^{87}$Sr optical lattice clock. Counter intuitively, shaking the system can deform the wide broad spectral line into a sharp peak with 5.4Hz line-width. With careful comparison between the theory and experiment, we demonstrate that the Rabi frequency and the Bloch bands can be tuned, simultaneously and independently. Our work not only provides a different idea for quantum metrology, such as building shallow optical lattice clock in outer space, but also paves the way for quantum simulation of new phases of matter by engineering exotic spin orbit couplings.

preprint2022arXiv

Theoretical Calculation of the Quadratic Zeeman Shift Coefficient of the 3P0 clock state for Strontium Optical Lattice Clock

The quadratic Zeeman shift coefficient of 3P0 clock state for strontium is determined in theory and experiment. In theory, we derived the expression of the quadratic Zeeman shift of 3P0 clock state for 88Sr and 87Sr in the weak-magnetic-field approximation. By using the multi-configuration Dirac-Hartree-Fock theory, the quadratic Zeeman shift coefficients were calculated. To determine the calculated results, the quadratic Zeeman shift coefficient of 3P0,F=9/2,MF=+/-9/2 clock state was measured in our 87Sr optical lattice clock. The calculated results C2=-23.38(5) MHz/T2 for 88Sr and the 3P0,F=9/2,MF=+/-9/2 clock state for 87Sr agree well with the other experimental and theoretical values, especially the most accurate measurement recently. As the 1S0,F=9/2,MF=+/-5/2-3P0,F=9/2,MF=+/-3/2 transitions have been used as another clock transition for less sensitive to the magnetic field noise, we also calculated the quadratic Zeeman shift coefficients for the other magnetic states.

preprint2020arXiv

Appearance-Preserving 3D Convolution for Video-based Person Re-identification

Due to the imperfect person detection results and posture changes, temporal appearance misalignment is unavoidable in video-based person re-identification (ReID). In this case, 3D convolution may destroy the appearance representation of person video clips, thus it is harmful to ReID. To address this problem, we propose AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel. With APM aligning the adjacent feature maps in pixel level, the following 3D convolution can model temporal information on the premise of maintaining the appearance representation quality. It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds. Extensive experiments demonstrate the effectiveness of AP3D for video-based ReID and the results on three widely used datasets surpass the state-of-the-arts. Code is available at: https://github.com/guxinqian/AP3D.

preprint2020arXiv

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

Although two-stage object detectors have continuously advanced the state-of-the-art performance in recent years, the training process itself is far from crystal. In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and thus are harmful to training high quality detectors. Consequently, we propose Dynamic R-CNN to adjust the label assignment criteria (IoU threshold) and the shape of regression loss function (parameters of SmoothL1 Loss) automatically based on the statistics of proposals during training. This dynamic design makes better use of the training samples and pushes the detector to fit more high quality samples. Specifically, our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP$_{90}$ on the MS COCO dataset with no extra overhead. Codes and models are available at https://github.com/hkzhang95/DynamicRCNN.

preprint2020arXiv

IAUnet: Global Context-Aware Feature Learning for Person Re-Identification

Person re-identification (reID) by CNNs based networks has achieved favorable performance in recent years. However, most of existing CNNs based methods do not take full advantage of spatial-temporal context modeling. In fact, the global spatial-temporal context can greatly clarify local distractions to enhance the target feature representation. To comprehensively leverage the spatial-temporal context information, in this work, we present a novel block, Interaction-Aggregation-Update (IAU), for high-performance person reID. Firstly, Spatial-Temporal IAU (STIAU) module is introduced. STIAU jointly incorporates two types of contextual interactions into a CNN framework for target feature learning. Here the spatial interactions learn to compute the contextual dependencies between different body parts of a single frame. While the temporal interactions are used to capture the contextual dependencies between the same body parts across all frames. Furthermore, a Channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts. Therefore, the IAU block enables the feature to incorporate the globally spatial, temporal, and channel context. It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet. The experiments show that IAUnet performs favorably against state-of-the-art on both image and video reID tasks and achieves compelling results on a general object categorization task. The source code is available at https://github.com/blue-blue272/ImgReID-IAnet.

preprint2020arXiv

Temporal Complementary Learning for Video Person Re-Identification

This paper proposes a Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification. Firstly, we introduce a Temporal Saliency Erasing (TSE) module including a saliency erasing operation and a series of ordered learners. Specifically, for a specific frame of a video, the saliency erasing operation drives the specific learner to mine new and complementary parts by erasing the parts activated by previous frames. Such that the diverse visual features can be discovered for consecutive frames and finally form an integral characteristic of the target identity. Furthermore, a Temporal Saliency Boosting (TSB) module is designed to propagate the salient information among video frames to enhance the salient feature. It is complementary to TSE by effectively alleviating the information loss caused by the erasing operation of TSE. Extensive experiments show our method performs favorably against state-of-the-arts. The source code is available at https://github.com/blue-blue272/VideoReID-TCLNet.