Source author record

Jie Cao

Jie Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

27works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards

While Reinforcement Learning (RL) shows promise in training tool-use Large Language Models (LLMs) using verifiable outcome rewards, existing methods largely overlook the potential of reasoning rewards based on chain-of-thought quality for better tool utilization. Furthermore, naïvely combining reasoning and outcome rewards may yield suboptimal performance or conflict with the primary optimization objective. To address this, we propose Advantage-Weighted Policy Optimization (AWPO), a principled RL framework that adaptively integrates reasoning rewards into advantage estimation to improve tool-use performance. AWPO incorporates variance-aware gating and difficulty-aware weighting to adaptively modulate advantages from reasoning signals based on group-relative statistics, alongside a tailored clipping mechanism for stable optimization. Extensive experiments demonstrate that AWPO achieves state-of-the-art performance across standard tool-use benchmarks, significantly outperforming strong baselines and leading closed-source models in challenging multi-turn scenarios. Notably, with exceptional parameter efficiency, our 4B model surpasses Grok-4 by $16.0\%$ in multi-turn accuracy while preserving generalization capability on the out-of-distribution MMLU-Pro benchmark.

preprint2026arXiv

HydroAgent: Closing the Gap Between Frontier LLMs and Human Experts in Hydrologic Model Calibration via Simulator-Grounded RL

Calibrating distributed hydrologic models is a critical bottleneck across operational water resources management - streamflow prediction, reservoir operation, drought monitoring, infrastructure design, and flood forecasting all depend on it. Each basin demands an expert to translate hydrograph signatures into adjustments of a high-dimensional parameter vector, and the resulting workflow does not transfer between watersheds. We ask: can frontier large language model (LLM) agents replace the human hydrologic modeler, and if not, what would it take? We benchmark nine frontier LLM agents - Claude Opus 4.6/4.7, Sonnet 4.6, GPT-5/5.4/5.4-pro, and Gemini 2.5-pro/3.1-pro/3-flash - on the operational CREST distributed hydrologic model used by the U.S. National Weather Service for flash-flood forecasting. Best-of-twenty-rounds Nash-Sutcliffe Efficiency (NSE) across four held-out gauges spanning 329-40,792 km2 ranges from -0.16 (GPT-5.4) to 0.75 (Sonnet 4.6); the ceiling reproduces across all three vendors and capability tiers, with the strongest models concentrating in the 0.65-0.75 band, and no model reaches the human-expert reference except Opus-4.7 on one gauge. We argue this gap is not a parameter-count problem but a domain-grounding problem. We then propose HYDROAGENT, fine-tuning open-weight Qwen3-4B with supervised fine-tuning on 2,576 expert calibration trajectories and Group-Relative Policy Optimization using NSE as a verifiable reward from online CREST simulations - reinforcement learning with simulation feedback (RLSF). For Earth system science, a small domain-tuned policy with simulator-in-the-loop RL is a more compute-efficient and physically faithful path than scaling generic frontier models, and the multi-modal richness of Earth data - remote sensing, in-situ time series, and forecaster narrative - makes domain agents a leveraged direction for AI in physical science.

preprint2026arXiv

Incentive Mechanism Design for Resource Management in Satellite Networks: A Comprehensive Survey

Resource management is one of the challenges in satellite networks due to their high mobility, wide coverage, long propagation distances, and stringent constraints on energy, communication, and computation resources. Traditional resource allocation approaches rely only on hard and rigid system performance metrics. Meanwhile, incentive mechanisms, which are based on game theory and auction theory, investigate systems from the "economic" perspective in addition to the "system" perspective. Particularly, incentive mechanisms are able to take into account rationality and other behavior of human users into account, which guarantees benefits/utility of all system entities, thereby improving the scalability, adaptability, and fairness in resource allocation. This paper presents a comprehensive survey of incentive mechanism design for resource management in satellite networks. The paper covers key issues in the satellite networks, such as communication resource allocation, computation offloading, privacy and security, and coordination. We conclude with future research directions including learning-based mechanism design for satellite networks.

preprint2026arXiv

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4\% in Avg@16 and 7.0\% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

preprint2025arXiv

Do LLMs Encode Frame Semantics? Evidence from Frame Identification

We investigate whether large language models encode latent knowledge of frame semantics, focusing on frame identification, a core challenge in frame semantic parsing that involves selecting the appropriate semantic frame for a target word in context. Using the FrameNet lexical resource, we evaluate models under prompt-based inference and observe that they can perform frame identification effectively even without explicit supervision. To assess the impact of task-specific training, we fine-tune the model on FrameNet data, which substantially improves in-domain accuracy while generalizing well to out-of-domain benchmarks. Further analysis shows that the models can generate semantically coherent frame definitions, highlighting the model's internalized understanding of frame semantics.

preprint2025arXiv

Introduction to the Chinese Space Station Survey Telescope (CSST)

The Chinese Space Station Survey Telescope (CSST) is an upcoming Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxies and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.

preprint2025arXiv

OUNLP at TSAR 2025 Shared Task: Multi-Round Text Simplifier via Code Generation

This paper describes the OUNLP system submitted to the TSAR-2025 Shared Task (Alva-Manchego et al., 2025), designed for readability-controlled text simplification using LLM-prompting-based generation. Based on the analysis of prompt-based text simplification methods, we discovered an interesting finding that text simplification performance is highly related to the gap between the source CEFR (Arase et al., 2022) level and the target CEFR level. Inspired by this finding, we propose two multi-round simplification methods and generate them via GPT-4o: rule-based simplification (MRS-Rule) and jointly rule-based LLM simplification (MRS-Joint). Our submitted systems ranked 7 out of 20 teams. Later improvements with MRS-Joint show that taking the LLM simplified candidates as the starting point could further boost the multi-round simplification performance.

preprint2024arXiv

Goal-Oriented Communication, Estimation, and Control over Bidirectional Wireless Links

We consider a wireless networked control system (WNCS) with bidirectional imperfect links for real-time applications such as smart grids. To maintain the stability of WNCS, captured by the probability that plant state violates preset values, at minimal cost, heterogeneous physical processes are monitored by multiple sensors. This status information, such as dynamic plant state and Markov Process-based context information, is then received/estimated by the controller for remote control. However, scheduling multiple sensors and designing the controller with limited resources is challenging due to their coupling, delay, and transmission loss. We formulate a Constrained Markov Decision Problem (CMDP) to minimize violation probability with cost constraints. We reveal the relationship between the goal and different updating actions by analyzing the significance of information that incorporates goal-related usefulness and contextual importance. Subsequently, a goal-oriented deterministic scheduling policy is proposed. Two sensing-assisted control strategies and a control-aware estimation policy are proposed to improve the violation probability-cost tradeoff, integrated with the scheduling policy to form a goal-oriented co-design framework. Additionally, we explore retransmission in downlink transmission and qualitatively analyze its preference scenario. Simulation results demonstrate that the proposed goal-oriented co-design policy outperforms previous work in simultaneously reducing violation probability and cost

preprint2024arXiv

Goal-Oriented Integration of Sensing, Communication, Computing, and Control for Mission-Critical Internet-of-Things

Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architecture, in which goal-oriented communication is leveraged to bridge and empower sensing, communication, control, and computing functionalities. By revealing the interplay among multiple subsystems in terms of key performance indicators and parameters, this paper introduces unified metrics, i.e., task completion effectiveness and cost, to facilitate S3C co-design in MC-IoT. The preliminary results demonstrate the benefits of GIS3C in improving task completion effectiveness while reducing costs. We also identify and highlight the gaps and challenges in applying GIS3C in the future 6G networks.

preprint2024arXiv

Risk-Aware and Energy-Efficient AoI Optimization for Multi-Connectivity WNCS with Short Packet Transmissions

Age of Information (AoI) has been proposed to quantify the freshness of information for emerging real-time applications such as remote monitoring and control in wireless networked control systems (WNCSs). Minimization of the average AoI and its outage probability can ensure timely and stable transmission. Energy efficiency (EE) also plays an important role in WNCSs, as many devices are featured by low cost and limited battery. Multi-connectivity over multiple links enables a decrease in AoI, at the cost of energy. We tackle the unresolved problem of selecting the optimal number of connections that is both AoI-optimal and energy-efficient, while avoiding risky states. To address this issue, the average AoI and peak AoI (PAoI), as well as PAoI violation probability are formulated as functions of the number of connections. Then the EE-PAoI ratio is introduced to allow a tradeoff between AoI and energy, which is maximized by the proposed risk-aware, AoI-optimal and energy-efficient connectivity scheme. To obtain this, we analyze the property of the formulated EE-PAoI ratio and prove the monotonicity of PAoI violation probability. Interestingly, we reveal that the multi-connectivity scheme is not always preferable, and the signal-to-noise ratio (SNR) threshold that determines the selection of the multi-connectivity scheme is derived as a function of the coding rate. Also, the optimal number of connections is obtained and shown to be a decreasing function of the transmit power. Simulation results demonstrate that the proposed scheme enables more than 15 folds of EE-PAoI gain at the low SNR than the single-connectivity scheme.

preprint2023arXiv

Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification

The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting -- a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we for the first time introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. Experimental results on three widely used fine-grained image classification datasets consistently show considerable improvements compared with other methods. Codes are available at: https://github.com/PRIS-CV/Bi-FRN.

preprint2022arXiv

Boundaries of capture hyperbolic components

In complex dynamics, the boundaries of higher dimensional hyperbolic components in holomorphic families of polynomials or rational maps are mysterious objects, whose topological and analytic properties are fundamental problems. In this paper, we show that in some typical families of polynomials (i.e. algebraic varieties defined by periodic critical relations), the boundary of a capture hyperbolic component $\mathcal H$ is homeomorphic to the sphere $S^{2\dim_\mathbb{C}(\mathcal{H})-1}$. Furthermore, we establish an unexpected identity for the Hausdorff dimension of $\partial \mathcal H$: $$\operatorname{H{.}dim}(\partial\mathcal{H}) = 2 \dim_\mathbb{C}(\mathcal{H})-2+\max_{f\in\partial\mathcal{H}} \operatorname{H{.}dim}(\partial A^J(f)),$$ where $A^J(f)$ is the union of the bounded attracting Fatou components of $f$ associated with the free critical points in the Julia set $J(f)$. In the proof, some new results with independent interests are discovered.

preprint2022arXiv

Improving Subgraph Recognition with Variational Graph Information Bottleneck

Subgraph recognition aims at discovering a compressed substructure of a graph that is most informative to the graph property. It can be formulated by optimizing Graph Information Bottleneck (GIB) with a mutual information estimator. However, GIB suffers from training instability and degenerated results due to its intrinsic optimization process. To tackle these issues, we reformulate the subgraph recognition problem into two steps: graph perturbation and subgraph selection, leading to a novel Variational Graph Information Bottleneck (VGIB) framework. VGIB first employs the noise injection to modulate the information flow from the input graph to the perturbed graph. Then, the perturbed graph is encouraged to be informative to the graph property. VGIB further obtains the desired subgraph by filtering out the noise in the perturbed graph. With the customized noise prior for each input, the VGIB objective is endowed with a tractable variational upper bound, leading to a superior empirical performance as well as theoretical properties. Extensive experiments on graph interpretation, explainability of Graph Neural Networks, and graph classification show that VGIB finds better subgraphs than existing methods. Code is avaliable at https://github.com/Samyu0304/VGIB

preprint2022arXiv

Modeling the Social Influence of COVID-19 via Personalized Propagation with Deep Learning

Social influence prediction has permeated many domains, including marketing, behavior prediction, recommendation systems, and more. However, traditional methods of predicting social influence not only require domain expertise,they also rely on extracting user features, which can be very tedious. Additionally, graph convolutional networks (GCNs), which deals with graph data in non-Euclidean space, are not directly applicable to Euclidean space. To overcome these problems, we extended DeepInf such that it can predict the social influence of COVID-19 via the transition probability of the page rank domain. Furthermore, our implementation gives rise to a deep learning-based personalized propagation algorithm, called DeepPP. The resulting algorithm combines the personalized propagation of a neural prediction model with the approximate personalized propagation of a neural prediction model from page rank analysis. Four social networks from different domains as well as two COVID-19 datasets were used to demonstrate the efficiency and effectiveness of the proposed algorithm. Compared to other baseline methods, DeepPP provides more accurate social influence predictions. Further, experiments demonstrate that DeepPP can be applied to real-world prediction data for COVID-19.

preprint2022arXiv

Point Cloud Semantic Segmentation using Multi Scale Sparse Convolution Neural Network

In recent years, with the development of computing resources and LiDAR, point cloud semantic segmentation has attracted many researchers. For the sparsity of point clouds, although there is already a way to deal with sparse convolution, multi-scale features are not considered. In this letter, we propose a feature extraction module based on multi-scale sparse convolution and a feature selection module based on channel attention and build a point cloud segmentation network framework based on this. By introducing multi-scale sparse convolution, the network could capture richer feature information based on convolution kernels with different sizes, improving the segmentation result of point cloud segmentation. Experimental results on Stanford large-scale 3-D Indoor Spaces(S3DIS) dataset and outdoor dataset(SemanticKITTI), demonstrate effectiveness and superiority of the proposed mothod.

preprint2022arXiv

Styleverse: Towards Identity Stylization across Heterogeneous Domains

We propose a new challenging task namely IDentity Stylization (IDS) across heterogeneous domains. IDS focuses on stylizing the content identity, rather than completely swapping it using the reference identity. We use an effective heterogeneous-network-based framework $Styleverse$ that uses a single domain-aware generator to exploit the Metaverse of diverse heterogeneous faces, based on the proposed dataset FS13 with limited data. FS13 means 13 kinds of Face Styles considering diverse lighting conditions, art representations and life dimensions. Previous similar tasks, \eg, image style transfer can handle textural style transfer based on a reference image. This task usually ignores the high structure-aware facial area and high-fidelity preservation of the content. However, Styleverse intends to controllably create topology-aware faces in the Parallel Style Universe, where the source facial identity is adaptively styled via AdaIN guided by the domain-aware and reference-aware style embeddings from heterogeneous pretrained models. We first establish the IDS quantitative benchmark as well as the qualitative Styleverse matrix. Extensive experiments demonstrate that Styleverse achieves higher-fidelity identity stylization compared with other state-of-the-art methods.

preprint2022arXiv

Triangle-oriented Community Detection considering Node Features and Network Topology

The joint use of node features and network topology to detect communities is called community detection in attributed networks. Most of the existing work along this line has been carried out through objective function optimization and has proposed numerous approaches. However, they tend to focus only on lower-order details, i.e., capture node features and network topology from node and edge views, and purely seek a higher degree of optimization to guarantee the quality of the found communities, which exacerbates unbalanced communities and free-rider effect. To further clarify and reveal the intrinsic nature of networks, we conduct triangle-oriented community detection considering node features and network topology. Specifically, we first introduce a triangle-based quality metric to preserve higher-order details of node features and network topology, and then formulate so-called two-level constraints to encode lower-order details of node features and network topology. Finally, we develop a local search framework based on optimizing our objective function consisting of the proposed quality metric and two-level constraints to achieve both non-overlapping and overlapping community detection in attributed networks. Extensive experiments demonstrate the effectiveness and efficiency of our framework and its potential in alleviating unbalanced communities and free-rider effect.

preprint2022arXiv

Ultralow loss hollow-core negative curvature fibers with nested elliptical antiresonance tubes

Hollow-core negative curvature fibers can confine light within air core and have small nonlinearity and dispersion and high damage threshold, thereby attracting a great deal of interest in the field of hollow core fibers. However, reducing the loss of hollow-core negative curvature fibers is a serious problem. On this basis, three new types of fibers with different nested tube structures are proposed in the near-infrared spectral regions and compared in detail with a previously proposed hollow-core negative curvature fiber. We used finite-element method for numerical simulation studies of their transmission loss, bending loss, and single-mode performance, and then the transmission performance of various structural fibers is compared. We found that the nested elliptical antiresonant fiber 1 has better transmission performance than that of the three other types of fibers in the spectral range of 0.72-1.6 μm. Results show that the transmission loss of the LP01 mode is as low as 6.45*10-6 dB/km at λ = 1.06 μm. To the best of our knowledge, the record low level of transmission loss of hollow-core antiresonant fibers with nested tube structures was created. In addition, the nested elliptical antiresonant fiber 1 has better bending resistance, and its bending loss was below 2.99*10-2 dB/km at 5 cm bending radius.

preprint2022arXiv

Window Filtering Algorithm for Pulsed Light Coherent Combining of Low Repetition Frequency

The multi-dithering method has been well verified in phase locking of polarization coherent combination experiment. However, it is hard to apply to low repetition frequency pulsed lasers, since there exists an overlap frequency domain between pulse laser and the amplitude phase noise and traditional filters cannot effectively separate phase noise. Aiming to solve the problem in this paper, we propose a novel method of pulse noise detection, identification, and filtering based on the autocorrelation characteristics between noise signals. In the proposed algorithm, a self-designed window algorithm is used to identify the pulse, and then the pulse signal group in the window is replaced by interpolation, which effectively filter the pulse signal doped in the phase noise within 0.1 ms. After filtering the pulses in the phase noise, the phase difference of two pulsed beams (10 kHz) is successfully compensated to zero in 1 ms, and the coherent combination of closed-loop phase lock is realized. At the same time, the phase correction times are few, the phase lock effect is stable, and the final light intensity increases to the ideal value (0.9 Imax).

preprint2020arXiv

Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation

The target of human pose estimation is to determine body part or joint locations of each person from an image. This is a challenging problems with wide applications. To address this issue, this paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation. Technically, a parallel pyramid structure is proposed to compensate the loss of information. We take the design of parallel structure for reverse compensation. Meanwhile, the overall computational complexity does not increase. We further define an Attention Partial Module (APM) operator to extract weighted features from different scale feature maps generated by the parallel pyramid structure. Compared with refining through upsampling operator, APM can better capture the relationship between channels. At last, we proposed a differentiable auto data augmentation method to further improve estimation accuracy. We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component. Experiments corroborate the effectiveness of our proposed method. Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.

preprint2020arXiv

BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification

Few-shot learning for fine-grained image classification has gained recent attention in computer vision. Among the approaches for few-shot learning, due to the simplicity and effectiveness, metric-based methods are favorably state-of-the-art on many tasks. Most of the metric-based methods assume a single similarity measure and thus obtain a single feature space. However, if samples can simultaneously be well classified via two distinct similarity measures, the samples within a class can distribute more compactly in a smaller feature space, producing more discriminative feature maps. Motivated by this, we propose a so-called \textit{Bi-Similarity Network} (\textit{BSNet}) that consists of a single embedding module and a bi-similarity module of two similarity measures. After the support images and the query images pass through the convolution-based embedding module, the bi-similarity module learns feature maps according to two similarity measures of diverse characteristics. In this way, the model is enabled to learn more discriminative and less similarity-biased features from few shots of fine-grained images, such that the model generalization ability can be significantly improved. Through extensive experiments by slightly modifying established metric/similarity based networks, we show that the proposed approach produces a substantial improvement on several fine-grained image benchmark datasets. Codes are available at: https://github.com/spraise/BSNet

preprint2020arXiv

Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models. Existing approaches can use a unified model to achieve translations between all the visual domains. However, their outcomes are far from satisfying when there are large domain variations. In this paper, we reveal that improving the sample selection strategy is an effective solution. To select informative samples, we dynamically estimate sample importance during the training of Generative Adversarial Networks, presenting Informative Sample Mining Network. We theoretically analyze the relationship between the sample importance and the prediction of the global optimal discriminator. Then a practical importance estimation function for general conditions is derived. Furthermore, we propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness. Extensive experiments on a wide range of specific image-to-image translation tasks are conducted, and the results demonstrate our superiority over current state-of-the-art methods.

preprint2020arXiv

OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac{1}{K}$, where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: https://github.com/dongliangchang/OSLNet.

preprint2020arXiv

Reference-guided Face Component Editing

Face portrait editing has achieved great progress in recent years. However, previous methods either 1) operate on pre-defined face attributes, lacking the flexibility of controlling shapes of high-level semantic facial components (e.g., eyes, nose, mouth), or 2) take manually edited mask or sketch as an intermediate representation for observable changes, but such additional input usually requires extra efforts to obtain. To break the limitations (e.g. shape, mask or sketch) of the existing methods, we propose a novel framework termed r-FACE (Reference-guided FAce Component Editing) for diverse and controllable face component editing with geometric changes. Specifically, r-FACE takes an image inpainting model as the backbone, utilizing reference images as conditions for controlling the shape of face components. In order to encourage the framework to concentrate on the target face components, an example-guided attention module is designed to fuse attention features and the target face component features extracted from the reference image. Through extensive experimental validation and comparisons, we verify the effectiveness of the proposed framework.

preprint2020arXiv

ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Despite achieving state-of-the-art performance, deep learning methods generally require a large amount of labeled data during training and may suffer from overfitting when the sample size is small. To ensure good generalizability of deep networks under small sample sizes, learning discriminative features is crucial. To this end, several loss functions have been proposed to encourage large intra-class compactness and inter-class separability. In this paper, we propose to enhance the discriminative power of features from a new perspective by introducing a novel neural network termed Relation-and-Margin learning Network (ReMarNet). Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms. Specifically, a relation network is used to learn the features that can support classification based on the similarity between a sample and a class prototype; at the meantime, a fully connected network with the cross entropy loss is used for classification via the decision boundary. Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples and achieves competitive performance against state-of-the-art methods. Codes are available at https://github.com/liyunyu08/ReMarNet.

preprint2019arXiv

Topological phase transition based on the attractive Hubbard model

We theoretically investigate the effect of an attractive on-site interaction on the two-band magnetic Dirac fermion model based on a square lattice system. When the attractive fermion interaction is taken into account by the mean-field approximation, a phase diagram is obtained. It is found that a quantum phase transition from a band insulator state to quantum anomalous Hall state occurs with increased attractive interaction. For an existing quantum anomalous Hall state, the attractive interaction enlarges its nontrivial band gap and makes the topological edge states more localized, which protects the transport of linear-dispersive edge states against finite-size and further disorder effects.

preprint2011arXiv

Stable Nanotubular Crystal of Silicon: A Predicted Allotrope With Direct Band Gap

On basis of the first principle calculation we show that a crystalline structure of silicon, as a novel allotrope with nanotubular holes along two perpendicular directions, is stable. The calculations on geometrical and electronic properties reveal that this allotrope possesses a direct band gap wider by 0.5eV than the indirect one of silicon with diamond structure. The crystal belongs to I41/AMD space group, showing anisotropic optical properties and Young modulus. The bulk modulus is 64.4GPa and the density is 1.9g/cm$^{3}$, lower than that of the diamond silicon due to the presence of nanotubular holes. It is hopeful that the allotrope may widely expand applications of silicon in many fields due to its direct band gap and specific nanotubular structure.

Jie Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

AWPO: Enhancing Tool-Use of Large Language Models through Adaptive Integration of Reasoning Rewards

HydroAgent: Closing the Gap Between Frontier LLMs and Human Experts in Hydrologic Model Calibration via Simulator-Grounded RL

Incentive Mechanism Design for Resource Management in Satellite Networks: A Comprehensive Survey

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Do LLMs Encode Frame Semantics? Evidence from Frame Identification

Introduction to the Chinese Space Station Survey Telescope (CSST)

OUNLP at TSAR 2025 Shared Task: Multi-Round Text Simplifier via Code Generation

Goal-Oriented Communication, Estimation, and Control over Bidirectional Wireless Links

Goal-Oriented Integration of Sensing, Communication, Computing, and Control for Mission-Critical Internet-of-Things

Risk-Aware and Energy-Efficient AoI Optimization for Multi-Connectivity WNCS with Short Packet Transmissions

Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification

Boundaries of capture hyperbolic components

Improving Subgraph Recognition with Variational Graph Information Bottleneck

Modeling the Social Influence of COVID-19 via Personalized Propagation with Deep Learning

Point Cloud Semantic Segmentation using Multi Scale Sparse Convolution Neural Network

Styleverse: Towards Identity Stylization across Heterogeneous Domains

Triangle-oriented Community Detection considering Node Features and Network Topology

Ultralow loss hollow-core negative curvature fibers with nested elliptical antiresonance tubes

Window Filtering Algorithm for Pulsed Light Coherent Combining of Low Repetition Frequency

Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation

BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification

Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

Reference-guided Face Component Editing

ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Topological phase transition based on the attractive Hubbard model

Stable Nanotubular Crystal of Silicon: A Predicted Allotrope With Direct Band Gap