Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Machine Learning for neutron source distributions

In light of the recent advancements in machine learning, we propose a novel approach to neutron source distribution estimation through the utilisation of probabilistic generative models. The estimation is based on a Monte Carlo particle list, which is only required during the training stage of the machine learning model. Once the source distribution has been learned, the model is independent of the original particle list, allowing for further sampling in an efficient, rapid, and memory-costless manner. The performance of various generative models is evaluated, including a variational autoencoder, a normalizing flow, a generative adversarial network, and a denoising diffusion model. These approaches are then compared to existing source distribution estimations, and the advantages and disadvantages of each approach are discussed. The results demonstrate that source distributions can be modeled through the use of probabilistic generative models, which paves the way for further advancements in this field.

preprint2026arXiv

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs

Surgical scene understanding is a cornerstone of computer-assisted intervention. While recent advances, particularly in surgical image segmentation, have driven progress, real-world clinical applications require a more holistic understanding that jointly captures procedural context, semantic reasoning, and precise visual grounding. However, existing approaches typically address these components in isolation, leading to fragmented representations and limited semantic consistency. To address this limitation, we propose SurgMLLM, a unified surgical scene understanding framework that bridges high-level reasoning and low-level visual grounding within a single model. Given surgical videos, SurgMLLM fine-tunes a multimodal large language model (MLLM) to support structured interpretability reasoning, which is used to jointly model phases, instrument-verb-target (IVT) triplets, and triplet-entity segmentation tokens. These tokens are then temporally aggregated and serve as prompts for a segmentation network, enabling accurate pixel-wise grounding of triplet instruments and targets. The entire framework is trained end-to-end with a unified objective that couples language-based reasoning supervision with visual grounding losses, promoting coherent cross-task learning and clinically consistent scene representations. To facilitate unified evaluation, we introduce CholecT45-Scene, extending CholecT45 dataset with 64,299 frames of pixel-level mask annotations for instruments and targets, aligned with existing triplet labels. Extensive experiments show that SurgMLLM significantly advances surgical scene understanding, improving the primary triplet recognition metric AP_IVT from 40.7% to 46.0% and consistently outperforming prior methods in phase recognition and segmentation. These results highlight the effectiveness of unified reasoning-and-grounding for reliable, context-aware surgical assistance.

preprint2026arXiv

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

Cultural heritage exhibitions often struggle to sustain attention and support reflective engagement. Physical exhibitions rely on fixed interpretive aids that lack adaptability to individual backgrounds or curiosity, and their effectiveness depends heavily on a visitor's Personal Context, prior knowledge, and cultural literacy. Meanwhile, digital exhibitions prioritize convenience and accessibility but risk weakening the Physical and Social Contexts that define embodied cultural experience. WhiteTesseract addresses this gap by enabling in-situ interpretation through high-resolution XR and conversational AI. The system integrates spatial intelligence via artwork recognition to allow visitors to selectively reduce environmental distractions (via diminished reality) and engage in context-aware dialogue (via large language models). The goal is to preserve the richness of the physical and social environment while providing a flexible space for personal reflection, enhancing Personal Context without compromising physical authenticity. We deployed the system in a Claude Monet exhibition and conducted a controlled user study with 26 participants. Quantitative results showed that WhiteTesseract modulation significantly increased average viewing duration from 35.3 to 98.3 seconds (p < 0.001). Analysis of 529 visitor-AI interactions revealed that 60% extended beyond factual queries to include analytical, emotional, and comparative inquiries. These findings demonstrate how XR and AI can enrich the physical exhibition experience by supporting deeper, more personalized engagement without displacing the embodied value of cultural heritage. We discuss technical and social constraints for real-world deployment and limitations of our controlled setting.

preprint2022arXiv

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pretrained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pretrain it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian). Further analysis demonstrates that our data construction method is complementary to rule-based approaches.

preprint2022arXiv

GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning

Generalized Zero-Shot Learning (GZSL) aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. However, due to the generation shifts, the synthesized samples by most existing methods may drift from the real distribution of the unseen data. To address this issue, we propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation. Specifically, we discover and address three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder. First, to enhance the reflection of the semantic information in the generated samples, we explicitly embed the semantic information into the transformation in each conditional affine coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a boundary sample mining strategy with entropy maximization to discover more difficult visual variants of semantic prototypes and hereby adjust the decision boundary of the classifiers. Third, a relative positioning strategy is proposed to revise the attribute embeddings, guiding them to fully preserve the inter-class geometric structure and further avoid structure disorder in the semantic space. Extensive experimental results on four GZSL benchmark datasets demonstrate that GSMFlow achieves the state-of-the-art performance on GZSL.

preprint2022arXiv

Local Spiral Structure Traced by Red Clump Stars

Using the cross-matched data of Gaia EDR3 and the 2MASS Point Source Catalog, a sample of RC stars with parallax accuracies better than 20% is identified and used to reveal the nearby spiral pattern traced by old stars. As shown in the overdensity distribution of RC stars, there is an arc-like feature extended from $l~\sim$ 90$^\circ$ to $\sim$ 243$^\circ$, which passed close to the Sun. This feature is probably an arm segment traced by old stars, indicating the Galaxy potential in the vicinity of the Sun. By comparing to the spiral arms depicted by young objects, we found that there are considerable offsets between the two different components of Galactic spiral arms. The spiral arm traced by RC stars tends to have a larger pitch angle, hence a more loose wound pattern.

preprint2022arXiv

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

Growing interests in RGB-D salient object detection (RGB-D SOD) have been witnessed in recent years, owing partly to the popularity of depth sensors and the rapid progress of deep learning techniques. Unfortunately, existing RGB-D SOD methods typically demand large quantity of training images being thoroughly annotated at pixel-level. The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios. On the other hand, current unsupervised RGB-D SOD methods still heavily rely on handcrafted feature representations. This inspires us to propose in this paper a deep unsupervised RGB-D saliency detection approach, which requires no manual pixel-level annotation during training. It is realized by two key ingredients in our training pipeline. First, a depth-disentangled saliency update (DSU) framework is designed to automatically produce pseudo-labels with iterative follow-up refinements, which provides more trustworthy supervision signals for training the saliency network. Second, an attentive training strategy is introduced to tackle the issue of noisy pseudo-labels, by properly re-weighting to highlight the more reliable pseudo-labels. Extensive experiments demonstrate the superior efficiency and effectiveness of our approach in tackling the challenging unsupervised RGB-D SOD scenarios. Moreover, our approach can also be adapted to work in fully-supervised situation. Empirical studies show the incorporation of our approach gives rise to notably performance improvement in existing supervised RGB-D SOD models.

preprint2022arXiv

Text Revision by On-the-Fly Representation Optimization

Text revision refers to a family of natural language generation tasks, where the source and target sequences share moderate resemblance in surface form but differentiate in attributes, such as text formality and simplicity. Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems, which rely on large-scale parallel training corpus. In this paper, we present an iterative in-place editing approach for text revision, which requires no parallel data. In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification. During inference, the editing at each iteration is realized by two-step span replacement. At the first step, the distributed representation of the text optimizes on the fly towards an attribute function. At the second step, a text span is masked and another new one is proposed conditioned on the optimized representation. The empirical experiments on two typical and important text revision tasks, text formalization and text simplification, show the effectiveness of our approach. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains better performance than strong unsupervised methods on text formalization \footnote{Code and model are available at \url{https://github.com/jingjingli01/OREO}}.

preprint2022arXiv

Water Maser Survey towards off-plane O-rich AGBs around the orbital plane of the Sagittarius Stellar Stream

A 22 GHz water maser survey was conducted towards 178 O-rich AGB stars with the aim of identifying maser emission associated with the Sagittarius stellar stream. In this survey, maser emissions were detected in 21 targets, of which 20 were new detections. We studied the Galactic distributions of H2O and SiO maser-traced AGBs towards the Sgr orbital plane, and found an elongated structure towards the (l, b)~(340, 40) direction. In order to verify its association with the Sagittarius tidal stream, we further studied the 3D motions of these sources, but found, kinematically, these maser-traced AGBs are still Galactic disc sources rather than Stream debris. In addition, we found a remarkable outward motion, ~50 km/s away from the Galactic center of these maser-traced AGBs, but with no systermatic lag of rotational speed which were reported in 2000 for solar neighborhood Miras.

preprint2021arXiv

Dual MINE-based Neural Secure Communications under Gaussian Wiretap Channel

Recently, some researches are devoted to the topic of end-to-end learning a physical layer secure communication system based on autoencoder under Gaussian wiretap channel. However, in those works, the reliability and security of the encoder model were learned through necessary decoding outputs of not only legitimate receiver but also the eavesdropper. In fact, the assumption of known eavesdropper&#39;s decoder or its output is not practical. To address this issue, in this paper we propose a dual mutual information neural estimation (MINE) based neural secure communications model. The security constraints of this method is constructed only with the input and output signal samples of the legal and eavesdropper channels and benefit that training the encoder is completely independent of the decoder. Moreover, since the design of secure coding does not rely on the eavesdropper&#39;s decoding results, the security performance would not be affected by the eavesdropper&#39;s decoding means. Numerical results show that the performance of our model is guaranteed whether the eavesdropper learns the decoder himself or uses the legal decoder.

preprint2021arXiv

Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning

Compared to conventional zero-shot learning (ZSL) where recognising unseen classes is the primary or only aim, the goal of generalized zero-shot learning (GZSL) is to recognise both seen and unseen classes. Most GZSL methods typically learn to synthesise visual representations from semantic information on the unseen classes. However, these types of models are prone to overfitting the seen classes, resulting in distribution overlap between the generated features of the seen and unseen classes. The overlapping region is filled with uncertainty as the model struggles to determine whether a test case from within the overlap is seen or unseen. Further, these generative methods suffer in scenarios with sparse training samples. The models struggle to learn the distribution of high dimensional visual features and, therefore, fail to capture the most discriminative inter-class features. To address these issues, in this paper, we propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features and applies the entropy-based calibration to minimize the uncertainty in the overlapped area between the seen and unseen classes. Specifically, the dual generative model with the triplet loss synthesises inter-class discriminative latent features that can be mapped from either visual or semantic space. To calibrate the uncertainty for seen classes, we calculate the entropy over the softmax probability distribution from a general classifier. With this approach, recognising the seen samples within the seen classes is relatively straightforward, and there is less risk that a seen sample will be misclassified into an unseen class in the overlapped region. Extensive experiments on six benchmark datasets demonstrate that the proposed method outperforms state-of-the-art approaches.

preprint2021arXiv

Light Deflection under the Gravitational Field of Jupiter -- Testing General Relativity

We measured the relative positions between two pairs of compact extragalactic sources (CESs), J1925-2219 \& J1923-2104 (C1--C2) and J1925-2219 \& J1928-2035 (C1--C3) on 2020 October 23--25 and 2021 February 5 (totaling four epochs), respectively, using the Very Long Baseline Array (VLBA) at 15 GHz. Accounting for the deflection angle dominated by Jupiter, as well as the contributions from the Sun, planets other than Earth, the Moon and Ganymede (the most massive of the solar system&#39;s moons), our theoretical calculations predict that the dynamical ranges of the relative positions across four epochs in R.A. of the C1--C2 pair and C1--C3 pair are 841.2 and 1127.9 $μ$as, respectively. The formal accuracy in R.A. is about 20 $μ$as, but the error in Decl. is poor. The measured standard deviations of the relative positions across the four epochs are 51.0 and 29.7 $μ$as in R.A. for C1--C2 and C1--C3, respectively. These values indicate that the accuracy of the post-Newtonian relativistic parameter, $γ$, is $\sim 0.061$ for C1--C2 and $\sim 0.026$ for C1--C3. Combining the two CES pairs, the measured value of $γ$ is $0.984 \pm 0.037$, which is comparable to the latest published results for Jupiter as a gravitational lens reported by Fomalont \& Kopeikin, i.e., $1.01 \pm 0.03$.

preprint2020arXiv

44 GHz methanol masers: Observations toward 95 GHz methanol masers

We report a simultaneous 44 and 95 GHz class I methanol maser survey toward 144 sources from the 95 GHz class I methanol maser catalog. The observations were made with the three telescopes of the Korean very long baseline interferometry network operating in single-dish mode. The detection rates are 89% at 44 GHz and 77% at 95 GHz. There are 106 new discoveries at 44 GHz. Comparing the previous 95 GHz detections with new observations of the same transitions made using the Purple Mountain Observatory 13.7 m radio telescope shows no clear evidence of variability on a timescale of six years. Emission from the 44 and 95 GHz transitions shows strong correlations in peak velocity, peak flux density, and integrated flux density, indicating that they are likely cospatial. We found that the peak flux density ratio Spk,95/Spk,44 decreases as the 44 GHz peak flux density increases. We found that some class I methanol masers in our sample could be associated with infrared dark clouds, while others are associated with H II regions, indicating that some sources occur at an early stage of high-mass star formation, while others are located toward more evolved sources.

preprint2020arXiv

Accurate RGB-D Salient Object Detection via Collaborative Learning

Benefiting from the spatial cues embedded in depth images, recent progress on RGB-D saliency detection shows impressive ability on some challenge scenarios. However, there are still two limitations. One hand is that the pooling and upsampling operations in FCNs might cause blur object boundaries. On the other hand, using an additional depth-network to extract depth features might lead to high computation and storage cost. The reliance on depth inputs during testing also limits the practical applications of current RGB-D models. In this paper, we propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way, which solves those problems tactfully. The explicitly extracted edge information goes together with saliency to give more emphasis to the salient regions and object boundaries. Depth and saliency learning is innovatively integrated into the high-level feature learning process in a mutual-benefit manner. This strategy enables the network to be free of using extra depth networks and depth inputs to make inference. To this end, it makes our model more lightweight, faster and more versatile. Experiment results on seven benchmark datasets show its superior performance.

preprint2020arXiv

Dual-level Semantic Transfer Deep Hashing for Efficient Social Image Retrieval

Social network stores and disseminates a tremendous amount of user shared images. Deep hashing is an efficient indexing technique to support large-scale social image retrieval, due to its deep representation capability, fast retrieval speed and low storage cost. Particularly, unsupervised deep hashing has well scalability as it does not require any manually labelled data for training. However, owing to the lacking of label guidance, existing methods suffer from severe semantic shortage when optimizing a large amount of deep neural network parameters. Differently, in this paper, we propose a Dual-level Semantic Transfer Deep Hashing (DSTDH) method to alleviate this problem with a unified deep hash learning framework. Our model targets at learning the semantically enhanced deep hash codes by specially exploiting the user-generated tags associated with the social images. Specifically, we design a complementary dual-level semantic transfer mechanism to efficiently discover the potential semantics of tags and seamlessly transfer them into binary hash codes. On the one hand, instance-level semantics are directly preserved into hash codes from the associated tags with adverse noise removing. Besides, an image-concept hypergraph is constructed for indirectly transferring the latent high-order semantic correlations of images and tags into hash codes. Moreover, the hash codes are obtained simultaneously with the deep representation learning by the discrete hash optimization strategy. Extensive experiments on two public social image retrieval datasets validate the superior performance of our method compared with state-of-the-art hashing methods. The source codes of our method can be obtained at https://github.com/research2020-1/DSTDH

preprint2020arXiv

Hybrid-DNNs: Hybrid Deep Neural Networks for Mixed Inputs

Rapid development of big data and high-performance computing have encouraged explosive studies of deep learning in geoscience. However, most studies only take single-type data as input, frittering away invaluable multisource, multi-scale information. We develop a general architecture of hybrid deep neural networks (HDNNs) to support mixed inputs. Regarding as a combination of feature learning and target learning, the new proposed networks provide great capacity in high-hierarchy feature extraction and in-depth data mining. Furthermore, the hybrid architecture is an aggregation of multiple networks, demonstrating good flexibility and wide applicability. The configuration of multiple networks depends on application tasks and varies with inputs and targets. Concentrating on reservoir production prediction, a specific HDNN model is configured and applied to an oil development block. Considering their contributions to hydrocarbon production, core photos, logging images and curves, geologic and engineering parameters can all be taken as inputs. After preprocessing, the mixed inputs are prepared as regular-sampled structural and numerical data. For feature learning, convolutional neural networks (CNN) and multilayer perceptron (MLP) network are configured to separately process structural and numerical inputs. Learned features are then concatenated and fed to subsequent networks for target learning. Comparison with typical MLP model and CNN model highlights the superiority of proposed HDNN model with high accuracy and good generalization.

preprint2020arXiv

Multi-Feature Discrete Collaborative Filtering for Fast Cold-start Recommendation

Hashing is an effective technique to address the large-scale recommendation problem, due to its high computation and storage efficiency on calculating the user preferences on items. However, existing hashing-based recommendation methods still suffer from two important problems: 1) Their recommendation process mainly relies on the user-item interactions and single specific content feature. When the interaction history or the content feature is unavailable (the cold-start problem), their performance will be seriously deteriorated. 2) Existing methods learn the hash codes with relaxed optimization or adopt discrete coordinate descent to directly solve binary hash codes, which results in significant quantization loss or consumes considerable computation time. In this paper, we propose a fast cold-start recommendation method, called Multi-Feature Discrete Collaborative Filtering (MFDCF), to solve these problems. Specifically, a low-rank self-weighted multi-feature fusion module is designed to adaptively project the multiple content features into binary yet informative hash codes by fully exploiting their complementarity. Additionally, we develop a fast discrete optimization algorithm to directly compute the binary hash codes with simple operations. Experiments on two public recommendation datasets demonstrate that MFDCF outperforms the state-of-the-arts on various aspects.

preprint2020arXiv

Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches

Zero-shot learning (ZSL) is commonly used to address the very pervasive problem of predicting unseen classes in fine-grained image classification and other tasks. One family of solutions is to learn synthesised unseen visual samples produced by generative models from auxiliary semantic information, such as natural language descriptions. However, for most of these models, performance suffers from noise in the form of irrelevant image backgrounds. Further, most methods do not allocate a calculated weight to each semantic patch. Yet, in the real world, the discriminative power of features can be quantified and directly leveraged to improve accuracy and reduce computational complexity. To address these issues, we propose a novel framework called multi-patch generative adversarial nets (MPGAN) that synthesises local patch features and labels unseen classes with a novel weighted voting strategy. The process begins by generating discriminative visual features from noisy text descriptions for a set of predefined local patches using multiple specialist generative models. The features synthesised from each patch for unseen classes are then used to construct an ensemble of diverse supervised classifiers, each corresponding to one local patch. A voting strategy averages the probability distributions output from the classifiers and, given that some patches are more discriminative than others, a discrimination-based attention mechanism helps to weight each patch accordingly. Extensive experiments show that MPGAN has significantly greater accuracy than state-of-the-art methods.

preprint2020arXiv

Transferring Inter-Class Correlation

The Teacher-Student (T-S) framework is widely utilized in the classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). Since the transferring knowledge is related to the network capacities and structures between the teacher and the student, how to define efficient knowledge remains an open question. To address this issue, we design a novel transferring knowledge, the Self-Attention based Inter-Class Correlation (ICC) map in the output layer, and propose our T-S framework, Inter-Class Correlation Transfer (ICCT).

preprint2020arXiv

Unsupervised Text Generation by Learning from Search

In this work, we present TGLS, a novel framework to unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance with the state-of-the-art supervised methods in paraphrase generation.