Source author record

Jiansheng Chen

Jiansheng Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph Computer Vision Artificial Intelligence astro-ph.CO Multimedia

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs

Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hallucinations by Cross-modal Attention Pattern) has explored hallucination detection from the perspective of cross-modal attention, but does not address hallucination mitigation. In this paper, we propose MHSA (Mitigating Hallucinations via Steered Attention), a lightweight framework that mitigates hallucinations by learning to correct cross-modal attention patterns in LVLMs. MHSA trains a simple three-layer MLP generator to produce corrected attention, guided by supervisory signals from the DHCP discriminator and the LVLM itself. During inference, MHSA mitigates both discriminative and generative hallucinations across various datasets and LVLMs by simply replacing the original cross-modal attention with the corrected one, without modifying any LVLM parameters. By extending cross-modal attention mechanisms from hallucination detection to hallucination mitigation, MHSA offers a novel perspective on hallucination research in LVLMs and helps enhance their reliability.

preprint2023arXiv

Distribution Aligned Feature Clustering for Zero-Shot Sketch-Based Image Retrieval

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a challenging cross-modal retrieval task. In prior arts, the retrieval is conducted by sorting the distance between the query sketch and each image in the gallery. However, the domain gap and the zero-shot setting make neural networks hard to generalize. This paper tackles the challenges from a new perspective: utilizing gallery image features. We propose a Cluster-then-Retrieve (ClusterRetri) method that performs clustering on the gallery images and uses the cluster centroids as proxies for retrieval. Furthermore, a distribution alignment loss is proposed to align the image and sketch features with a common Gaussian distribution, reducing the domain gap. Despite its simplicity, our proposed method outperforms the state-of-the-art methods by a large margin on popular datasets, e.g., up to 31% and 39% relative improvement of mAP@all on the Sketchy and TU-Berlin datasets.

preprint2020arXiv

Teacher-Critical Training Strategies for Image Captioning

Existing image captioning models are usually trained by cross-entropy (XE) loss and reinforcement learning (RL), which set ground-truth words as hard targets and force the captioning model to learn from them. However, the widely adopted training strategies suffer from misalignment in XE training and inappropriate reward assignment in RL training. To tackle these problems, we introduce a teacher model that serves as a bridge between the ground-truth caption and the caption model by generating some easier-to-learn word proposals as soft targets. The teacher model is constructed by incorporating the ground-truth image attributes into the baseline caption model. To effectively learn from the teacher model, we propose Teacher-Critical Training Strategies (TCTS) for both XE and RL training to facilitate better learning processes for the caption model. Experimental evaluations of several widely adopted caption models on the benchmark MSCOCO dataset show the proposed TCTS comprehensively enhances most evaluation metrics, especially the Bleu and Rouge-L scores, in both training stages. TCTS is able to achieve to-date the best published single model Bleu-4 and Rouge-L performances of 40.2% and 59.4% on the MSCOCO Karpathy test split. Our codes and pre-trained models will be open-sourced.

preprint2010arXiv

Determination of fundamental properties of an M31 globular cluster from main-sequence photometry

M31 globular cluster B379 is the first extragalactic cluster, the age of which was determined by main-sequence photometry. In this method, the age of a cluster is obtained by fitting its CMD with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. (2004a) constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of BC03 and its multi-photometry, Ma et al. (2007) independently determined the age of B379, which is in good agreement with the determination of Brown et al. (2004a). The BC03 models are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 which, being determined based on the Padova evolutionary tracks, is in agreement with the determination of Brown et al. (2004a). So, in this paper, we re-determine its age using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are also determined in this paper. The results obtained in this paper are consistent with the previous determinations, which including the age obtained by Brown et al. (2004a). So, this paper confirms the consistence of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones, i.e. the results' comparison between Brown et al. (2004a) and Ma et al. (2007) is meaningful. The results obtained in this paper are: the metallicity [M/H]=-0.325, the age $τ=11.0\pm1.5$ Gyr, the reddening value E(B-V)=0.08, and the distance modulus $(m-M)_{0}=24.44\pm0.10$.

preprint2006arXiv

Optical Monitoring of BL Lacertae Object S5 0716+714 with a Novel Multi-Peak Interference Filter

We at first introduce a novel photometric system, which consists of a Schmidt telescope, an objective prism, a CCD camera, and, especially, a multi-peak interference filter. The multi-peak interference filter enables light in multi passbands to pass through it simultaneously. The light in different passbands is differentially refracted by the objective prism and is focused on the CCD separately, so we have multi "images" for each object on the CCD frames. This system enables us to monitor blazars exactly simultaneously in multi wavebands on a single telescope, and to accurately trace the color change during the variation. We used this novel system to monitor the BL Lacertae object S5 0716+714 during 2006 January and February and achieved a very high temporal resolution. The object was very bright and very active during this period. Two strong flares were observed, with variation amplitudes of about 0.8 and 0.6 mags in the $V'$ band, respectively. Strong bluer-when-brighter correlations were found for both internight and intranight variations. No apparent time lag was observed between the $V'$- and $R'$-band variations, and the observed bluer-when-brighter chromatism may be mainly attributed to the larger variation amplitude at shorter wavelength. In addition to the bluer-when-brighter trend, the object also showed a bluer color when it was more active. The observed variability and its color behaviors are consistent with the shock-in-jet model.

preprint2005arXiv

Metallicity Estimates for Old Star Clusters in M33

Using the theoretical stellar population synthesis models of BC96, Kong et al. (2003) showed that some BATC colors and color indices could be used to disentangle the age and metallicity effect. They found that there is a very good relation between the flux ratio of L_{8510}/L_{9170} and the metallicity for stellar populations older than 1 Gyr. In this paper, based on the Kong et al. results and on the multicolor spectrophotometry of Ma et al. (2001, 2002a,b,c), we estimate the metallicities of 31 old star clusters in the nearby spiral galaxy M33, 23 of which are ``true'' globular clusters. The results show that most of these old clusters are metal poor. We also find that the ages and metal abundance for these old star clusters of M33 do not vary with deprojected radial position.

preprint2004arXiv

Multicolor Photometric Observations of Optical Candidates to Faint ROSAT X-ray Sources in a 1 deg$^2$ field of the BATC Survey

We present optical candidates for 75 X-ray sources in a $\sim 1$ deg$^2$ overlapping region with the medium deep ROSAT survey. These candidates are selected using the multi-color CCD imaging observations made for the T329 field of the Beijing-Arizona-Taipei-Connecticut (BATC) Sky Survey. These X-ray sources are relatively faint (CR $<< 0.2 s^{-1}$) and thus mostly are not included in the RBS catalog, they also remain as X-ray sources without optical candidates in a previous identification program carried out by the Hamburg Quasar Survey. Within their position-error circles, almost all the X-ray sources are observed to have one or more spatially associated optical candidates within them down to the magnitude $m_V \sim 23.1$. We have classified 149 of 156 detected optical candidates with 73 of the 75 X-ray sources with a SED-based Object Classification Approach (SOCA). These optical candidates include: 31 QSOs, 39 stars, 37 starburst galaxies, 42 galaxies, and 7 "just" visible objects. We have also cross-correlated the positions of these optical objects with NED, the FIRST radio source catalog and the 2MASS catalog. Separately, we have also SED-classified the remaining 6011 objects in our field of view. Optical objects are found at the $6.5σ$ level above what one would expect from a random distribution, only QSOs are over-represented in these error circles at greater than 4$σ$ frequency. We estimate redshifts for all extragalactic objects, and find a good correspondence of our predicted redshift with the measured redshift (a mean error of 0.04 in $Δz$. There appears to be a supercluster at z $\sim$ 0.3-0.35 in this direction, including many of the galaxies in the X-ray error circles are found in this redshift range.