Source author record

Hong Zhou

Hong Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Artificial Intelligence cond-mat.mes-hall Computation and Language Data Structures and Algorithms eess.IV eess.SY Machine Learning math.DS Systems and Control

Catalog footprint

What is connected

14works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI

Accurate molecular subtype classification is essential for personalized breast cancer treatment, yet conventional immunohistochemical analysis relies on invasive biopsies and is prone to sampling bias. Although dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) enables non-invasive tumor characterization, clinical workflows typically acquire only single-phase post-contrast images to reduce scan time and contrast agent dose. In this study, we propose a spatial multi-task learning framework for breast cancer molecular subtype prediction from clinically practical single-phase DCE-MRI. The framework simultaneously predicts estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) status, and the Ki-67 proliferation index -- biomarkers that collectively define molecular subtypes. The architecture integrates a deep feature extraction network with multi-scale spatial attention to capture intratumoral and peritumoral characteristics, together with a region-of-interest weighting module that emphasizes the tumor core, rim, and surrounding tissue. Multi-task learning exploits biological correlations among biomarkers through shared representations with task-specific prediction branches. Experiments on a dataset of 960 cases (886 internal cases split 7:1:2 for training/validation/testing, and 74 external cases evaluated via five-fold cross-validation) demonstrate that the proposed method achieves an AUC of 0.893, 0.824, and 0.857 for ER, PR, and HER2 classification, respectively, and a mean absolute error of 8.2\% for Ki-67 regression, significantly outperforming radiomics and single-task deep learning baselines. These results indicate the feasibility of accurate, non-invasive molecular subtype prediction using standard imaging protocols.

preprint2026arXiv

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regions. To address these limitations, we propose \textbf{TokenSeg}, a boundary-aware sparse token representation framework for efficient 3D medical volume segmentation. Specifically, (1) we design a \emph{multi-scale hierarchical encoder} that extracts 400 candidate tokens across four resolution levels to capture both global anatomical context and fine boundary details; (2) we introduce a \emph{boundary-aware tokenizer} that combines VQ-VAE quantization with importance scoring to select 100 salient tokens, over 60\% of which lie near tumor boundaries; and (3) we develop a \emph{sparse-to-dense decoder} that reconstructs full-resolution masks through token reprojection, progressive upsampling, and skip connections. Extensive experiments on a 3D breast DCE-MRI dataset comprising 960 cases demonstrate that TokenSeg achieves state-of-the-art performance with 94.49\% Dice and 89.61\% IoU, while reducing GPU memory and inference latency by 64\% and 68\%, respectively. To verify the generalization capability, our evaluations on MSD cardiac and brain MRI benchmark datasets demonstrate that TokenSeg consistently delivers optimal performance across heterogeneous anatomical structures. These results highlight the effectiveness of anatomically informed sparse representation for accurate and efficient 3D medical image segmentation.

preprint2023arXiv

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Extracting summaries from long documents can be regarded as sentence classification using the structural information of the documents. How to use such structural information to summarize a document is challenging. In this paper, we propose GoSum, a novel graph and reinforcement learning based extractive model for long-paper summarization. In particular, GoSum encodes sentence states in reinforcement learning by building a heterogeneous graph for each input document at different discourse levels. An edge in the graph reflects the discourse hierarchy of a document for restraining the semantic drifts across section boundaries. We evaluate GoSum on two datasets of scientific articles summarization: PubMed and arXiv. The experimental results have demonstrated that GoSum achieve state-of-the-art results compared with strong baselines of both extractive and abstractive models. The ablation studies further validate that the performance of our GoSum benefits from the use of discourse information.

preprint2022arXiv

Contrastive Learning of Semantic and Visual Representations for Text Tracking

Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video. Most existing approaches tackle this task by appearance similarity in continuous frames, while ignoring the abundant semantic features. In this paper, we explore to robustly track video text with contrastive learning of semantic and visual representations. Correspondingly, we present an end-to-end video text tracker with Semantic and Visual Representations(SVRep), which detects and tracks texts by exploiting the visual and semantic relationships between different texts in a video sequence. Besides, with a light-weight architecture, SVRep achieves state-of-the-art performance while maintaining competitive inference speed. Specifically, with a backbone of ResNet-18, SVRep achieves an ${\rm ID_{F1}}$ of $\textbf{65.9\%}$, running at $\textbf{16.7}$ FPS, on the ICDAR2015(video) dataset with $\textbf{8.6\%}$ improvement than the previous state-of-the-art methods.

preprint2022arXiv

Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization

Data-free quantization is a task that compresses the neural network to low bit-width without access to original training data. Most existing data-free quantization methods cause severe performance degradation due to inaccurate activation clipping range and quantization error, especially for low bit-width. In this paper, we present a simple yet effective data-free quantization method with accurate activation clipping and adaptive batch normalization. Accurate activation clipping (AAC) improves the model accuracy by exploiting accurate activation information from the full-precision model. Adaptive batch normalization firstly proposes to address the quantization error from distribution changes by updating the batch normalization layer adaptively. Extensive experiments demonstrate that the proposed data-free quantization method can yield surprisingly performance, achieving 64.33% top-1 accuracy of ResNet18 on ImageNet dataset, with 3.7% absolute improvement outperforming the existing state-of-the-art methods.

preprint2022arXiv

End-to-End Video Text Spotting with Transformer

Recent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0% improvements on video text spotting tasks. The code of TransDETR can be found at https://github.com/weijiawu/TransDETR.

preprint2022arXiv

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors (e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made possible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components (i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygonfree system can combine general detectors (e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText, ICDAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs. The code can be found at https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Training.

preprint2022arXiv

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

Video text spotting(VTS) is the task that requires simultaneously detecting, tracking and recognizing text in the video. Existing video text spotting methods typically develop sophisticated pipelines and multiple models, which is not friend for real-time applications. Here we propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText). Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework. 2) With contrastive learning, CoText models long-range dependencies and learning temporal information across multiple frames. 3) A simple, lightweight architecture is designed for effective and accurate performance, including GPU-parallel detection post-processing, CTC-based recognition head with Masked RoI. Extensive experiments show the superiority of our method. Especially, CoText achieves an video text spotting IDF1 of 72.0% at 41.0 FPS on ICDAR2015video, with 10.5% and 32.0 FPS improvement the previous best method. The code can be found at github.com/weijiawu/CoText.

preprint2022arXiv

The Observability in Unobservable Systems

In this paper, we introduce the concept of observability of targeted state variables for systems that may not be fully observable. For their estimation, we introduce and exemplify a deep filter, which is a neural network specifically designed for the estimation of targeted state variables without computing the trajectory of the entire system. The observability definition is quantitative rather than a yes or no answer so that one can compare the level of observability between different sensor locations.

preprint2020arXiv

A Spectral Approach to Network Design

We present a spectral approach to design approximation algorithms for network design problems. We observe that the underlying mathematical questions are the spectral rounding problems, which were studied in spectral sparsification and in discrepancy theory. We extend these results to incorporate additional non-negative linear constraints, and show that they can be used to significantly extend the scope of network design problems that can be solved. Our algorithm for spectral rounding is an iterative randomized rounding algorithm based on the regret minimization framework. In some settings, this provides an alternative spectral algorithm to achieve constant factor approximation for the classical survivable network design problem, and partially answers a question of Bansal about survivable network design with concentration property. We also show many other applications of the spectral rounding results, including weighted experimental design and additive spectral sparsification.

preprint2016arXiv

Observation of Optical and Electrical In-plane Anisotropy in High-mobility Few-layer ZrTe5

Transition metal pentatelluride ZrTe5 is a versatile material in condensed-matter physics and has been intensively studied since the 1980s. The most fascinating feature of ZrTe5 is that it is a 3D Dirac semimetal which has linear energy dispersion in all three dimensions in momentum space. Structure-wise, ZrTe5 is a layered material held together by weak interlayer van der Waals force. The combination of its unique band structure and 2D atomic structure provides a fertile ground for more potential exotic physical phenomena in ZrTe5 related to 3D Dirac semimentals. However the physical properties of its few-layer form have yet to be thoroughly explored. Here we report strong optical and electrical in-plane anisotropy of mechanically exfoliated few-layer ZrTe5. Raman spectroscopy shows significant intensity change with sample orientations, and the behavior of angle-resolved phonon modes at the gamma point is explained by theoretical calculation. DC conductance measurement indicates a 50% of difference along different in-plane directions. The diminishing of resistivity anomaly in few-layer samples indicates the evolution of band structure with reduced thickness. Low-temperature Hall experiment sheds lights on more intrinsic anisotropic electrical transport, with hole mobility of 3,000 and 1,500 cm2/Vs along a-axis and c-axis respectively. Pronounced quantum oscillations in magneto-resistance are observed at low temperatures with highest electron mobility up to 44,000 cm2/Vs.

preprint2016arXiv

Performance Enhancement of Black Phosphorus Field-Effect Transistors by Chemical Doping

In this letter, a new approach to chemically dope black phosphorus (BP) is presented, which significantly enhances the device performance of BP field-effect transistors for an initial period of 18 h, before degrading to previously reported levels. By applying 2,3,5,6-tetrafluoro-7,7,8,8-tetracyanoquinodimethane (F4-TCNQ), low ON-state resistance of 3.2 ohm.mm and high field-effect mobility of 229 cm2/Vs are achieved with a record high drain current of 532 mA/mm at a moderate channel length of 1.5 μm.

preprint2016arXiv

Weak Localization in Few-Layer Black Phosphorus

We have conducted a comprehensive investigation into the magneto-transport properties of few-layer black phosphorus in terms of phase coherence length, phase coherence time, and mobility via weak localization measurement and Hall-effect measurement. We present magnetoresistance data showing the weak localization effect in bare p-type few-layer black phosphorus and reveal its strong dependence on temperature and carrier concentration. The measured weak localization agrees well with the Hikami-Larkin-Nagaoka model and the extracted phase coherence length of 104 nm at 350 mK, decreasing as ~T^-0.51+-0.05 with increased temperature. Weak localization measurement allows us to qualitatively probe the temperature-dependent phase coherence time τ, which is in agreement with the theory of carrier interaction in the diffusive regime. We also observe the universal conductance fluctuation phenomenon in few-layer black phosphorus within moderate magnetic field and low temperature regime.

preprint2015arXiv

Er3+-doped Na0.5Bi0.5TiO3 Ferroelectric Thin Films with Enhanced Electrical Properties and Strong Green Up-conversion Luminescence

Ferroelectric materials with up-conversion luminescence (UCL) properties have potential opto-electric applications for display and sensing etc. Here, we demonstrate strong green UCL and enhanced electrical properties in Er3+-doped Na0.5Bi0.5TiO3 thin films. The thin films are prepared via using a modified chemical solution deposition method. These thin films are phase-pure and crystallized in perovskite structure. The largest remnant polarization (Pr) and highest dielectric constant are obtained from Na0.5Bi0.49Er0.01TiO3 thin films, and their values are 0.22 C/m2 and 1166, respectively. Meanwhile, strong green UCL at 525 nm and 548 nm are observed in Er3+-doped thin films. They are attributed to 2H11/2 to 4I15/2 and 4S3/2 to 4I15/2 transitions of Er3+ ions. These thin films have potentials in optoelectrical device applications.

Hong Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Contrastive Learning of Semantic and Visual Representations for Text Tracking

Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization

End-to-End Video Text Spotting with Transformer

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

The Observability in Unobservable Systems

A Spectral Approach to Network Design

Observation of Optical and Electrical In-plane Anisotropy in High-mobility Few-layer ZrTe5

Performance Enhancement of Black Phosphorus Field-Effect Transistors by Chemical Doping

Weak Localization in Few-Layer Black Phosphorus

Er3+-doped Na0.5Bi0.5TiO3 Ferroelectric Thin Films with Enhanced Electrical Properties and Strong Green Up-conversion Luminescence