Source author record

Feng Yao

Feng Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security eess.IV Computation and Language Artificial Intelligence Methodology Networking and Internet Architecture Software Engineering

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQL

Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought have demonstrated remarkable capabilities in code generation and mathematical reasoning. However, their potential in multi-turn Text-to-SQL tasks remains largely underexplored. Existing approaches typically rely on unstable API-based inference or require expensive fine-tuning on small-scale models. In this work, we present Rose-SQL, a training-free framework that leverages small-scale LRMs through in-context learning to enable accurate context-dependent parsing. We introduce the Role-State, a fine-grained representation that bridges the structural gap between schema linking and SQL generation by serving as a structural blueprint. To handle conversational dependencies, Rose-SQL traces the evolution of Role-State through historical context via structural isomorphism checks, guiding the model to infer the possible SQL composition for the current question through verified interaction trajectories. Experiments on the SParC and CoSQL benchmarks show that, within the Qwen3 series, Rose-SQL outperforms in-context learning baselines at the 4B scale and substantially surpasses state-of-the-art fine-tuned models at the 8B and 14B scales, while showing consistent gains on additional reasoning backbones.

preprint2022arXiv

Faithful learning with sure data for lung nodule diagnosis

Recent evolution in deep learning has proven its value for CT-based lung nodule classification. Most current techniques are intrinsically black-box systems, suffering from two generalizability issues in clinical practice. First, benign-malignant discrimination is often assessed by human observers without pathologic diagnoses at the nodule level. We termed these data as "unsure data". Second, a classifier does not necessarily acquire reliable nodule features for stable learning and robust prediction with patch-level labels during learning. In this study, we construct a sure dataset with pathologically-confirmed labels and propose a collaborative learning framework to facilitate sure nodule classification by integrating unsure data knowledge through nodule segmentation and malignancy score regression. A loss function is designed to learn reliable features by introducing interpretability constraints regulated with nodule segmentation maps. Furthermore, based on model inference results that reflect the understanding from both machine and experts, we explore a new nodule analysis method for similar historical nodule retrieval and interpretable diagnosis. Detailed experimental results demonstrate that our approach is beneficial for achieving improved performance coupled with faithful model reasoning for lung cancer prediction. Extensive cross-evaluation results further illustrate the effect of unsure data for deep-learning-based methods in lung nodule classification.

preprint2022arXiv

LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

Recognizing facts is the most fundamental step in making judgments, hence detecting events in the legal documents is important to legal case analysis tasks. However, existing Legal Event Detection (LED) datasets only concern incomprehensive event types and have limited annotated data, which restricts the development of LED methods and their downstream applications. To alleviate these issues, we present LEVEN a large-scale Chinese LEgal eVENt detection dataset, with 8,116 legal documents and 150,977 human-annotated event mentions in 108 event types. Not only charge-related events, LEVEN also covers general events, which are critical for legal case understanding but neglected in existing LED datasets. To our knowledge, LEVEN is the largest LED dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of LED methods. The results of extensive experiments indicate that LED is challenging and needs further effort. Moreover, we simply utilize legal events as side information to promote downstream applications. The method achieves improvements of average 2.2 points precision in low-resource judgment prediction, and 1.5 points mean average precision in unsupervised case retrieval, which suggests the fundamentality of LED. The source code and dataset can be obtained from https://github.com/thunlp/LEVEN.

preprint2022arXiv

Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction

The LIDC-IDRI database is the most popular benchmark for lung cancer prediction. However, with subjective assessment from radiologists, nodules in LIDC may have entirely different malignancy annotations from the pathological ground truth, introducing label assignment errors and subsequent supervision bias during training. The LIDC database thus requires more objective labels for learning-based cancer prediction. Based on an extra small dataset containing 180 nodules diagnosed by pathological examination, we propose to re-label LIDC data to mitigate the effect of original annotation bias verified on this robust benchmark. We demonstrate in this paper that providing new labels by similar nodule retrieval based on metric learning would be an effective re-labeling strategy. Training on these re-labeled LIDC nodules leads to improved model performance, which is enhanced when new labels of uncertain nodules are added. We further infer that re-labeling LIDC is current an expedient way for robust lung cancer prediction while building a large pathological-proven nodule database provides the long-term solution.

preprint2021arXiv

Learning Tubule-Sensitive CNNs for Pulmonary Airway and Artery-Vein Segmentation in CT

Training convolutional neural networks (CNNs) for segmentation of pulmonary airway, artery, and vein is challenging due to sparse supervisory signals caused by the severe class imbalance between tubular targets and background. We present a CNNs-based method for accurate airway and artery-vein segmentation in non-contrast computed tomography. It enjoys superior sensitivity to tenuous peripheral bronchioles, arterioles, and venules. The method first uses a feature recalibration module to make the best use of features learned from the neural networks. Spatial information of features is properly integrated to retain relative priority of activated regions, which benefits the subsequent channel-wise recalibration. Then, attention distillation module is introduced to reinforce representation learning of tubular objects. Fine-grained details in high-resolution attention maps are passing down from one layer to its previous layer recursively to enrich context. Anatomy prior of lung context map and distance transform map is designed and incorporated for better artery-vein differentiation capacity. Extensive experiments demonstrated considerable performance gains brought by these components. Compared with state-of-the-art methods, our method extracted much more branches while maintaining competitive overall segmentation performance. Codes and models are available at http://www.pami.sjtu.edu.cn/News/56

preprint2016arXiv

Event-Driven Implicit Authentication for Mobile Access Control

In order to protect user privacy on mobile devices, an event-driven implicit authentication scheme is proposed in this paper. Several methods of utilizing the scheme for recognizing legitimate user behavior are investigated. The investigated methods compute an aggregate score and a threshold in real-time to determine the trust level of the current user using real data derived from user interaction with the device. The proposed scheme is designed to: operate completely in the background, require minimal training period, enable high user recognition rate for implicit authentication, and prompt detection of abnormal activity that can be used to trigger explicitly authenticated access control. In this paper, we investigate threshold computation through standard deviation and EWMA (exponentially weighted moving average) based algorithms. The result of extensive experiments on user data collected over a period of several weeks from an Android phone indicates that our proposed approach is feasible and effective for lightweight real-time implicit authentication on mobile smartphones.

preprint2016arXiv

Fuzzy Logic-based Implicit Authentication for Mobile Access Control

In order to address the increasing compromise of user privacy on mobile devices, a Fuzzy Logic based implicit authentication scheme is proposed in this paper. The proposed scheme computes an aggregate score based on selected features and a threshold in real-time based on current and historic data depicting user routine. The tuned fuzzy system is then applied to the aggregated score and the threshold to determine the trust level of the current user. The proposed fuzzy-integrated implicit authentication scheme is designed to: operate adaptively and completely in the background, require minimal training period, enable high system accuracy while provide timely detection of abnormal activity. In this paper, we explore Fuzzy Logic based authentication in depth. Gaussian and triangle-based membership functions are investigated and compared using real data over several weeks from different Android phone users. The presented results show that our proposed Fuzzy Logic approach is a highly effective, and viable scheme for lightweight real-time implicit authentication on mobile devices.

preprint2016arXiv

Nonparametric estimation of conditional value-at-risk and expected shortfall based on extreme value theory

We propose nonparametric estimators for conditional value-at-risk (CVaR) and conditional expected shortfall (CES) associated with conditional distributions of a series of returns on a financial asset. The return series and the conditioning covariates, which may include lagged returns and other exogenous variables, are assumed to be strong mixing and follow a nonparametric conditional location-scale model. First stage nonparametric estimators for location and scale are combined with a generalized Pareto approximation for distribution tails proposed by Pickands (1975) to give final estimators for CVaR and CES. We provide consistency and asymptotic normality of the proposed estimators under suitable normalization. We also present the results of a Monte Carlo study that sheds light on their finite sample performance. Empirical viability of the model and estimators is investigated through a backtesting exercise using returns on future contracts for five agricultural commodities.

preprint2015arXiv

Software as a Service: Analyzing Security Issues

Software-as-a-service (SaaS) is a type of software service delivery model which encompasses a broad range of business opportunities and challenges. Users and service providers are reluctant to integrate their business into SaaS due to its security concerns while at the same time they are attracted by its benefits. This article highlights SaaS utility and applicability in different environments like cloud computing, mobile cloud computing, software defined networking and Internet of things. It then embarks on the analysis of SaaS security challenges spanning across data security, application security and SaaS deployment security. A detailed review of the existing mainstream solutions to tackle the respective security issues mapping into different SaaS security challenges is presented. Finally, possible solutions or techniques which can be applied in tandem are presented for a secure SaaS platform.

Feng Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Rose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQL

Faithful learning with sure data for lung nodule diagnosis

LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction

Learning Tubule-Sensitive CNNs for Pulmonary Airway and Artery-Vein Segmentation in CT

Event-Driven Implicit Authentication for Mobile Access Control

Fuzzy Logic-based Implicit Authentication for Mobile Access Control

Nonparametric estimation of conditional value-at-risk and expected shortfall based on extreme value theory

Software as a Service: Analyzing Security Issues