Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

RA-CMF: Region-Adaptive Conditional MeanFlow for CT Image Reconstruction

The use of CT imaging is important for screening, diagnosis, therapy planning, and prognosis of lung cancers. Unfortunately, due to differences in imaging protocols and scanner models, CT images acquired by different means may show large differences in noise statistics, contrast, and texture. In this study, we develop a novel conditional MeanFlow pipeline for CT image reconstruction. We introduce a conditional MeanFlow network that models the enhancement trajectory by predicting image-conditioned flow fields given intermediate image states. The image enhancement network is trained with a MeanFlow consistency loss along with the image reconstruction loss. In order to provide an adaptive refinement process in terms of spatial location of enhancements, we integrate a regional reinforcement learning-driven policy network into our approach. The policy network receives information about the MeanFlow rollouts and provides predictions in terms of tile-wise refinement budgets, stopping criteria, and total budget allocation of enhancement processes. Our policy network is trained through reinforcement learning in a policy gradient framework, where the goal of the training reward is to maximize improvement of enhancements while minimizing unnecessary computations and avoiding instabilities. In this way, our approach combines conditional flow-based enhancement with reinforcement learning-based spatial enhancement control. This allows our approach to focus more attention on enhancing difficult areas while stabilizing areas already showing sufficient quality. Our results show high accuracy in the tumor ROI, with the average radiomic feature CCC being 0.96, an average PSNR of 31.30 $\pm$ 4.16, and average SSIM of 0.94 $\pm$ 0.07. Moreover, there is an improvement in the overall quality of images, with an average PSNR of 34.23 $\pm$ 1.71 and average SSIM of 0.95 $\pm$ 0.01.

preprint2024arXiv

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{https://tianchi.aliyun.com/dataset/95414}, and the source codes are open-sourced at \url{https://github.com/michael-wzhu/text2dt}.

preprint2022arXiv

Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever

Recommender retrievers aim to rapidly retrieve a fraction of items from the entire item corpus when a user query requests, with the representative two-tower model trained with the log softmax loss. For efficiently training recommender retrievers on modern hardwares, inbatch sampling, where the items in the mini-batch are shared as negatives to estimate the softmax function, has attained growing interest. However, existing inbatch sampling based strategies just correct the sampling bias of inbatch items with item frequency, being unable to distinguish the user queries within the mini-batch and still incurring significant bias from the softmax. In this paper, we propose a Cache-Augmented Inbatch Importance Resampling (XIR) for training recommender retrievers, which not only offers different negatives to user queries with inbatch items, but also adaptively achieves a more accurate estimation of the softmax distribution. Specifically, XIR resamples items for the given mini-batch training pairs based on certain probabilities, where a cache with more frequently sampled items is adopted to augment the candidate item set, with the purpose of reusing the historical informative samples. XIR enables to sample query-dependent negatives based on inbatch items and to capture dynamic changes of model training, which leads to a better approximation of the softmax and further contributes to better convergence. Finally, we conduct experiments to validate the superior performance of the proposed XIR compared with competitive approaches.

preprint2022arXiv

Fast Variational AutoEncoder with Inverted Multi-Index for Collaborative Filtering

Variational AutoEncoder (VAE) has been extended as a representative nonlinear method for collaborative filtering. However, the bottleneck of VAE lies in the softmax computation over all items, such that it takes linear costs in the number of items to compute the loss and gradient for optimization. This hinders the practical use due to millions of items in real-world scenarios. Importance sampling is an effective approximation method, based on which the sampled softmax has been derived. However, existing methods usually exploit the uniform or popularity sampler as proposal distributions, leading to a large bias of gradient estimation. To this end, we propose to decompose the inner-product-based softmax probability based on the inverted multi-index, leading to sublinear-time and highly accurate sampling. Based on the proposed proposals, we develop a fast Variational AutoEncoder (FastVAE) for collaborative filtering. FastVAE can outperform the state-of-the-art baselines in terms of both sampling quality and efficiency according to the experiments on three real-world datasets.

preprint2022arXiv

Learning 3D Mineral Prospectivity from 3D Geological Models Using Convolutional Neural Networks: Application to a Structure-controlled Hydrothermal Gold Deposit

The three-dimensional (3D) geological models are the typical and key data source in the 3D mineral prospecitivity modeling. Identifying prospectivity-informative predictor variables from the 3D geological models is a challenging and tedious task. Motivated by the ability of convolutional neural networks (CNNs) to learn the intrinsic features, in this paper, we present a novel method that leverages CNNs to learn 3D mineral prospectivity from the 3D geological models. By exploiting the learning ability of CNNs, the presented method allows for disentangling complex correlation to the mineralization and thus opens a door to circumvent the tedious work for designing the predictor variables. Specifically, to explore the unstructured 3D geological models with the CNNs whose input should be structured, we develop a 2D CNN framework in which the geometry of geological boundary is compiled and reorganized into multi-channel images and fed into the CNN. This ensures an effective and efficient training of CNNs while allowing the prospective model to approximate the ore-forming process. The presented method is applied to a typical structure-controlled hydrothermal deposit, the Dayingezhuang gold deposit, eastern China, in which the presented method was compared with the prospectivity modeling methods using hand-designed predictor variables. The results demonstrate the presented method capacitates a performance boost of the 3D prospectivity modeling and empowers us to decrease work-load and prospecting risk in prediction of deep-seated orebodies.

preprint2022arXiv

Multivariate Sparse Group Lasso Joint Model for Radiogenomics Data

Radiogenomics is an emerging field in cancer research that combines medical imaging data with genomic data to predict patients clinical outcomes. In this paper, we propose a multivariate sparse group lasso joint model to integrate imaging and genomic data for building prediction models. Specifically, we jointly consider two models, one regresses imaging features on genomic features, and the other regresses patients clinical outcomes on genomic features. The regularization penalties through sparse group lasso allow incorporation of intrinsic group information, e.g. biological pathway and imaging category, to select both important intrinsic groups and important features within a group. To integrate information from the two models, in each model, we introduce a weight in the penalty term of each individual genomic feature, where the weight is inversely correlated with the model coefficient of that feature in the other model. This weight allows a feature to have a higher chance of selection by one model if it is selected by the other model. Our model is applicable to both continuous and time to event outcomes. It also allows the use of two separate datasets to fit the two models, addressing a practical challenge that many genomic datasets do not have imaging data available. Simulations and real data analyses demonstrate that our method outperforms existing methods in the literature.

preprint2021arXiv

Automated Creative Optimization for E-Commerce Advertising

Advertising creatives are ubiquitous in E-commerce advertisements and aesthetic creatives may improve the click-through rate (CTR) of the products. Nowadays smart advertisement platforms provide the function of compositing creatives based on source materials provided by advertisers. Since a great number of creatives can be generated, it is difficult to accurately predict their CTR given a limited amount of feedback. Factorization machine (FM), which models inner product interaction between features, can be applied for the CTR prediction of creatives. However, interactions between creative elements may be more complex than the inner product, and the FM-estimated CTR may be of high variance due to limited feedback. To address these two issues, we propose an Automated Creative Optimization (AutoCO) framework to model complex interaction between creative elements and to balance between exploration and exploitation. Specifically, motivated by AutoML, we propose one-shot search algorithms for searching effective interaction functions between elements. We then develop stochastic variational inference to estimate the posterior distribution of parameters based on the reparameterization trick, and apply Thompson Sampling for efficiently exploring potentially better creatives. We evaluate the proposed method with both a synthetic dataset and two public datasets. The experimental results show our method can outperform competing baselines with respect to cumulative regret. The online A/B test shows our method leads to a 7 increase in CTR compared to the baseline.

preprint2021arXiv

Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure

Ad creatives are one of the prominent mediums for online e-commerce advertisements. Ad creatives with enjoyable visual appearance may increase the click-through rate (CTR) of products. Ad creatives are typically handcrafted by advertisers and then delivered to the advertising platforms for advertisement. In recent years, advertising platforms are capable of instantly compositing ad creatives with arbitrarily designated elements of each ingredient, so advertisers are only required to provide basic materials. While facilitating the advertisers, a great number of potential ad creatives can be composited, making it difficult to accurately estimate CTR for them given limited real-time feedback. To this end, we propose an Adaptive and Efficient ad creative Selection (AES) framework based on a tree structure. The tree structure on compositing ingredients enables dynamic programming for efficient ad creative selection on the basis of CTR. Due to limited feedback, the CTR estimator is usually of high variance. Exploration techniques based on Thompson sampling are widely used for reducing variances of the CTR estimator, alleviating feedback sparsity. Based on the tree structure, Thompson sampling is adapted with dynamic programming, leading to efficient exploration for potential ad creatives with the largest CTR. We finally evaluate the proposed algorithm on the synthetic dataset and the real-world dataset. The results show that our approach can outperform competing baselines in terms of convergence rate and overall CTR.

preprint2021arXiv

Elliptic Quantum Curves of 6d SO(N) theories

We discuss supersymmetric defects in 6d $\mathcal{N}=(1,0)$ SCFTs with $\mathrm{SO}(N_c)$ gauge group and $N_c-8$ fundamental flavors. The codimension 2 and 4 defects are engineered by coupling the 6d gauge fields to charged free fields in four and two dimensions, respectively. We find that the partition function in the presence of the codimension 2 defect on $\mathbb{R}^4\times \mathbb{T}^2$ in the Nekrasov-Shatashvili limit satisfies an elliptic difference equation which quantizes the Seiberg-Witten curve of the 6d theory. The expectation value of the codimension 4 defect appearing in the difference equation is an even (under reflection) degree $N_c$ section over the elliptic curve when $N_c$ is even, and an odd section when $N_c$ is odd. We also find that RG-flows of the defects and the associated difference equations in the 6d $\mathrm{SO}(2N+1)$ gauge theories triggered by Higgs VEVs of KK-momentum states provide quantum Seiberg-Witten curves for $\mathbb{Z}_2$ twisted compactifications of the 6d $\mathrm{SO}(2N)$ gauge theories.

preprint2020arXiv

4d N=1 from 6d D-type N=(1,0)

Compactifications of 6d N=(1,0) SCFTs give rise to new 4d N=1 SCFTs and shed light on interesting dualities between such theories. In this paper we continue exploring this line of research by extending the class of compactified 6d theories to the D-type case. The simplest such 6d theory arises from D5 branes probing D-type singularities. Equivalently, this theory can be obtained from an F-theory compactification using "-2"-curves intersecting according to a D-type quiver. Our approach is two-fold. We start by compactifying the 6d SCFT on a Riemann surface and compute the central charges of the resulting 4d theory by integrating the 6d anomaly polynomial over the Riemann surface. As a second step, in order to find candidate 4d UV Lagrangians, there is an intermediate 5d theory that serves to construct 4d domain walls. These can be used as building blocks to obtain torus compactifications. In contrast to the A-type case, the vanishing of anomalies in the 4d theory turns out to be very restrictive and constraints the choices of gauge nodes and matter content severely. As a consequence, in this paper one has to resort to non-maximal boundary conditions for the 4d domain walls. However, the comparison to the 6d theory compactified on the Riemann surface becomes less tractable.

preprint2020arXiv

A Coalition-Based Communication Framework for Task-Driven Flying Ad-Hoc Networks

In this paper, we develop a task-driven networking framework for Flying Ad-hoc Networks (FANETs), where a coalition-based model is outlined. Firstly, we present a brief survey to show the state-of-the-art studies on the intra-communication of unmanned aerial vehicle (UAV) swarms. The features and deficiencies of existing models are analyzed. To capture the task-driven requirement of the flying multi-agent system, a coalition-based framework is proposed. We discuss the composition, networking mode and the classification of data transmission. After that, the application scenario of UAV coalitions is given, where large-scale, distributed and highly dynamic characteristics greatly increase the difficulty of resource optimization for UAVs. To tackle the problem, we design an intelligence-based optimization architecture, which mainly includes the game model, machine learning and real-time decision. Under the guidance of game theories and machine learning, UAVs can make comprehensive decisions by combining the previous training results with their sensing, information interaction, and game strategies. Finally, a preliminary case and promising open issues of UAV coalitions are studied.

preprint2020arXiv

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Co-saliency detection aims to discover the common and salient foregrounds from a group of relevant images. For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC). Three major contributions have been made, and are experimentally shown to have substantial practical merits. First, we propose a graph convolutional network design to extract information cues to characterize the intra- and interimage correspondence. Second, we develop an attention graph clustering algorithm to discriminate the common objects from all the salient foreground objects in an unsupervised fashion. Third, we present a unified framework with encoder-decoder structure to jointly train and optimize the graph convolutional network, attention graph cluster, and co-saliency detection decoder in an end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method obtains significant improvements over the state-of-the-arts on most of them.

preprint2020arXiv

STAN-CT: Standardizing CT Image using Generative Adversarial Network

Computed tomography (CT) plays an important role in lung malignancy diagnostics and therapy assessment and facilitating precision medicine delivery. However, the use of personalized imaging protocols poses a challenge in large-scale cross-center CT image radiomic studies. We present an end-to-end solution called STAN-CT for CT image standardization and normalization, which effectively reduces discrepancies in image features caused by using different imaging protocols or using different CT scanners with the same imaging protocol. STAN-CT consists of two components: 1) a novel Generative Adversarial Networks (GAN) model that is capable of effectively learning the data distribution of a standard imaging protocol with only a few rounds of generator training, and 2) an automatic DICOM reconstruction pipeline with systematic image quality control that ensure the generation of high-quality standard DICOM images. Experimental results indicate that the training efficiency and model performance of STAN-CT have been significantly improved compared to the state-of-the-art CT image standardization and normalization algorithms.

preprint2020arXiv

Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network

Due to a variety of motions across different frames, it is highly challenging to learn an effective spatiotemporal representation for accurate video saliency prediction (VSP). To address this issue, we develop an effective spatiotemporal feature alignment network tailored to VSP, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutional Long Short-Term Memory (Bi-ConvLSTM) network. The MDAN learns to align the features of the neighboring frames to the reference one in a coarse-to-fine manner, which can well handle various motions. Specifically, the MDAN owns a pyramidal feature hierarchy structure that first leverages deformable convolution (Dconv) to align the lower-resolution features across frames, and then aggregates the aligned features to align the higher-resolution features, progressively enhancing the features from top to bottom. The output of MDAN is then fed into the Bi-ConvLSTM for further enhancement, which captures the useful long-time temporal information along forward and backward timing directions to effectively guide attention orientation shift prediction under complex scene transformation. Finally, the enhanced features are decoded to generate the predicted saliency map. The proposed model is trained end-to-end without any intricate post processing. Extensive evaluations on four VSP benchmark datasets demonstrate that the proposed method achieves favorable performance against state-of-the-art methods. The source codes and all the results will be released.