Source author record

Zihao Chen

Zihao Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Vision Computation and Language eess.IV quant-ph Cryptography and Security physics.optics

Catalog footprint

What is connected

14works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.

preprint2026arXiv

Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies

Data selection is a key component of efficient instruction tuning for large language models, as recent work has shown that data quality often matters more than data quantity. Accordingly, prior studies have introduced various multi-dimensional heuristics to evaluate and filter instruction data. However, most existing methods rely on static task-agnostic and model-agnostic weighting schemes, which overlook the varying requirements of specific downstream tasks and the differing pre-existing capabilities of models. In this paper, we propose a framework for learning multi-indicator weights that jointly adapts data selection to both the downstream task and the specific model. Our method identifies optimal weight configurations without full-scale fine-tuning by utilizing in-context learning (ICL) signals on compact tiny-validation sets. These signals serve as efficient performance proxies that ensure high-fidelity evaluation at minimal computational cost. Experiments across multiple benchmarks and model families, including Mistral, Qwen, and Llama, show that the approach achieves performance comparable to or exceeding full-dataset tuning while using only 30\% of the training samples on GSM8K. Furthermore, our analysis reveals a trade-off between semantic diversity and logical complexity in reasoning tasks, highlighting the necessity of joint task-model adaptation.

preprint2026arXiv

Quantifying the Upper Limit of Backflash Attack in Quantum Key Distribution

Quantum key distribution (QKD) provides information-theoretic security grounded in the fundamental laws of physics. Nevertheless, practical imperfections can introduce side channels that expose QKD systems to quantum hacking, especially passive attacks that are inherently difficult to detect. In this study, we experimentally and theoretically investigate the upper limit of the backflash attack-a representative passive side-channel threat. Using a fully equipped fiber-based QKD receiver, we demonstrate the feasibility of the attack and reveal its limited capability in distinguishing quantum states. We further develop a theoretical framework to quantify the maximum distinguishability achievable by an eavesdropper, taking into account the broadband spectral nature of backflash photons. The analysis shows that Eve can extract effective key information from at most 95.7% of the backflash photons. Based on these findings, we evaluate the secure key rate of a decoy-state BB84 QKD system under backflash attack. Our results provide a quantitative assessment of the vulnerability of QKD systems to backflash emissions and offer a general methodology to evaluate the practical security of QKD systems.

preprint2025arXiv

PRISM: A hierarchical multiscale approach for time series forecasting

Forecasting is critical in areas such as finance, biology, and healthcare. Despite the progress in the field, making accurate forecasts remains challenging because real-world time series contain both global trends, local fine-grained structure, and features on multiple scales in between. Here, we present a new forecasting method, PRISM (Partitioned Representation for Iterative Sequence Modeling), that addresses this challenge through a learnable tree-based partitioning of the signal. At the root of the tree, a global representation captures coarse trends in the signal, while recursive splits reveal increasingly localized views of the signal. At each level of the tree, data are projected onto a time-frequency basis (e.g., wavelets or exponential moving averages) to extract scale-specific features, which are then aggregated across the hierarchy. This design allows the model to jointly capture global structure and local dynamics of the signal, enabling accurate forecasting. Experiments across benchmark datasets show that our method outperforms state-of-the-art methods for forecasting. Overall, these results demonstrate that our hierarchical approach provides a lightweight and flexible framework for forecasting multivariate time series. The code is available at https://github.com/nerdslab/prism.

preprint2023arXiv

JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications

Contrastive learning is widely used for sentence representation learning. Despite this prevalence, most studies have focused exclusively on English and few concern domain adaptation for domain-specific downstream tasks, especially for low-resource languages like Japanese, which are characterized by insufficient target domain data and the lack of a proper training strategy. To overcome this, we propose a novel Japanese sentence representation framework, JCSE (derived from ``Contrastive learning of Sentence Embeddings for Japanese''), that creates training data by generating sentences and synthesizing them with sentences available in a target domain. Specifically, a pre-trained data generator is finetuned to a target domain using our collected corpus. It is then used to generate contradictory sentence pairs that are used in contrastive learning for adapting a Japanese language model to a specific task in the target domain. Another problem of Japanese sentence representation learning is the difficulty of evaluating existing embedding methods due to the lack of benchmark datasets. Thus, we establish a comprehensive Japanese Semantic Textual Similarity (STS) benchmark on which various embedding models are evaluated. Based on this benchmark result, multiple embedding methods are chosen and compared with JCSE on two domain-specific tasks, STS in a clinical domain and information retrieval in an educational domain. The results show that JCSE achieves significant performance improvement surpassing direct transfer and other training strategies. This empirically demonstrates JCSE's effectiveness and practicability for downstream tasks of a low-resource language.

preprint2022arXiv

Data-Consistent Non-Cartesian Deep Subspace Learning for Efficient Dynamic MR Image Reconstruction

Non-Cartesian sampling with subspace-constrained image reconstruction is a popular approach to dynamic MRI, but slow iterative reconstruction limits its clinical application. Data-consistent (DC) deep learning can accelerate reconstruction with good image quality, but has not been formulated for non-Cartesian subspace imaging. In this study, we propose a DC non-Cartesian deep subspace learning framework for fast, accurate dynamic MR image reconstruction. Four novel DC formulations are developed and evaluated: two gradient decent approaches, a directly solved approach, and a conjugate gradient approach. We applied a U-Net model with and without DC layers to reconstruct T1-weighted images for cardiac MR Multitasking (an advanced multidimensional imaging method), comparing our results to the iteratively reconstructed reference. Experimental results show that the proposed framework significantly improves reconstruction accuracy over the U-Net model without DC, while significantly accelerating the reconstruction over conventional iterative reconstruction.

preprint2022arXiv

Experimental study of secure quantum key distribution with source and detection imperfections

The quantum key distribution (QKD), guaranteed by the principle of quantum physics, is a promising solution for future secure information and communication technology. However, device imperfections compromise the security of real-life QKD systems, restricting the wide deployment of QKD. This study reports a decoy-state BB84 QKD experiment that considers both source and detection imperfections. In particular, we achieved a rigorous finite-key security bound over fiber links of up to 75 km by applying a systematic performance analysis. Furthermore, our study considers more device imperfections than most previous experiments, and the proposed theory can be extended to other discrete-variable QKD systems. These features constitute a crucial step toward securing QKD with imperfect practical devices.

preprint2022arXiv

Low-threshold nanolasers based on miniaturized bound states in the continuum

The pursuit of compact lasers with low-thresholds has imposed strict requirements on tight light confinements with minimized radiation losses. Bound states in the continuum (BICs) have been recently demonstrated as an effective mechanism to trap light along the out-of-plane direction, paving the way to low-threshold lasers. To date, most reported BIC lasers are still bulky due to the absence of in-plane light confinement. In this work, we combine BICs and photonic band gaps to realize three-dimensional (3D) light confinements, as referred to miniaturized (mini-) BICs. Together with 3D carrier confinements provided by quantum dots (QDs) as optical gain materials, we have realized highly-compact active BIC resonators with a record-high quality ($Q$) factor up to 32500, which enables single-mode continuous wave (CW) lasing with the lowest threshold of 80 W/cm$^{2}$ among the reported BIC lasers. In addidtion, our photon statistics measurements under both CW and pulsed excitations confirm the occurence of the phase transition from spontaneous emission to stimulated emission, further suggesting that conventional criteria of input-output and linewidth are not sufficient for claiming nanoscale lasing. Our work reveal a via path towards compact BIC lasers with ultra-low power consumption and potentially boost the applications in cavity quantum electrodynamics (QEDs), nonlinear optics and integrated photonics.

preprint2022arXiv

Magnitude-image based data-consistent deep learning method for MRI super resolution

Magnetic Resonance Imaging (MRI) is important in clinic to produce high resolution images for diagnosis, but its acquisition time is long for high resolution images. Deep learning based MRI super resolution methods can reduce scan time without complicated sequence programming, but may create additional artifacts due to the discrepancy between training data and testing data. Data consistency layer can improve the deep learning results but needs raw k-space data. In this work, we propose a magnitude-image based data consistency deep learning MRI super resolution method to improve super resolution images' quality without raw k-space data. Our experiments show that the proposed method can improve NRMSE and SSIM of super resolution images compared to the same Convolutional Neural Network (CNN) block without data consistency module.

preprint2022arXiv

Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021

Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular Lesions DetectiOn and Segmentation} (\textit{Where is VALDO?}) challenge that was run as a satellite event at the international conference on Medical Image Computing and Computer Aided Intervention (MICCAI) 2021. This challenge aimed to promote the development of methods for automated detection and segmentation of small and sparse imaging markers of cerebral small vessel disease, namely enlarged perivascular spaces (EPVS) (Task 1), cerebral microbleeds (Task 2) and lacunes of presumed vascular origin (Task 3) while leveraging weak and noisy labels. Overall, 12 teams participated in the challenge proposing solutions for one or more tasks (4 for Task 1 - EPVS, 9 for Task 2 - Microbleeds and 6 for Task 3 - Lacunes). Multi-cohort data was used in both training and evaluation. Results showed a large variability in performance both across teams and across tasks, with promising results notably for Task 1 - EPVS and Task 2 - Microbleeds and not practically useful results yet for Task 3 - Lacunes. It also highlighted the performance inconsistency across cases that may deter use at an individual level, while still proving useful at a population level.

preprint2020arXiv

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

Neural Architecture Search (NAS) achieved many breakthroughs in recent years. In spite of its remarkable progress, many algorithms are restricted to particular search spaces. They also lack efficient mechanisms to reuse knowledge when confronting multiple tasks. These challenges preclude their applicability, and motivate our proposal of CATCH, a novel Context-bAsed meTa reinforcement learning (RL) algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. CATCH utilizes a probabilistic encoder to encode task properties into latent context variables, which then guide CATCH's controller to quickly "catch" top-performing networks. The contexts also assist a network evaluator in filtering inferior candidates and speed up learning. Extensive experiments demonstrate CATCH's universality and search efficiency over many other widely-recognized algorithms. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified. This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings.

preprint2019arXiv

Accelerating Training using Tensor Decomposition

Tensor decomposition is one of the well-known approaches to reduce the latency time and number of parameters of a pre-trained model. However, in this paper, we propose an approach to use tensor decomposition to reduce training time of training a model from scratch. In our approach, we train the model from scratch (i.e., randomly initialized weights) with its original architecture for a small number of epochs, then the model is decomposed, and then continue training the decomposed model till the end. There is an optional step in our approach to convert the decomposed architecture back to the original architecture. We present results of using this approach on both CIFAR10 and Imagenet datasets, and show that there can be upto 2x speed up in training time with accuracy drop of upto 1.5% only, and in other cases no accuracy drop. This training acceleration approach is independent of hardware and is expected to have similar speed ups on both CPU and GPU platforms.

preprint2016arXiv

A Proximal Stochastic Quasi-Newton Algorithm

In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal mapping. We propose a proximal stochastic second-order method, which is efficient and scalable. It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient. We prove that our method can achieve linear rate of convergence.

preprint2016arXiv

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint