Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
46works
0followers
33topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

46 published item(s)

preprint2026arXiv

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.

preprint2024arXiv

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera

Visual obstacle discovery is a key step towards autonomous navigation of indoor mobile robots. Successful solutions have many applications in multiple scenes. One of the exceptions is the reflective ground. In this case, the reflections on the floor resemble the true world, which confuses the obstacle discovery and leaves navigation unsuccessful. We argue that the key to this problem lies in obtaining discriminative features for reflections and obstacles. Note that obstacle and reflection can be separated by the ground plane in 3D space. With this observation, we firstly introduce a pre-calibration based ground detection scheme that uses robot motion to predict the ground plane. Due to the immunity of robot motion to reflection, this scheme avoids failed ground detection caused by reflection. Given the detected ground, we design a ground-pixel parallax to describe the location of a pixel relative to the ground. Based on this, a unified appearance-geometry feature representation is proposed to describe objects inside rectangular boxes. Eventually, based on segmenting by detection framework, an appearance-geometry fusion regressor is designed to utilize the proposed feature to discover the obstacles. It also prevents our model from concentrating too much on parts of obstacles instead of whole obstacles. For evaluation, we introduce a new dataset for Obstacle on Reflective Ground (ORG), which comprises 15 scenes with various ground reflections, a total of more than 200 image sequences and 3400 RGB images. The pixel-wise annotations of ground and obstacle provide a comparison to our method and other methods. By reducing the misdetection of the reflection, the proposed approach outperforms others. The source code and the dataset will be available at https://github.com/XuefengBUPT/IndoorObstacleDiscovery-RG.

preprint2024arXiv

Observation and manipulation of quantum interference in a superconducting Kerr parametric oscillator

Quantum tunneling is the phenomenon that makes superconducting circuits "quantum". Recently, there has been a renewed interest in using quantum tunneling in phase space of a Kerr parametric oscillator as a resource for quantum information processing. Here, we report a direct observation of quantum interference induced by such tunneling in a planar superconducting circuit through Wigner tomography. We experimentally elucidate all essential properties of this quantum interference, such as mapping from Fock states to cat states, a temporal oscillation due to the pump detuning, as well as its characteristic Rabi oscillations and Ramsey fringes. Finally, we perform gate operations as manipulations of the observed quantum interference. Our findings lay the groundwork for further studies on quantum properties of superconducting Kerr parametric oscillators and their use in quantum information technologies.

preprint2023arXiv

Test Reuse Based on Adaptive Semantic Matching across Android Mobile Applications

Automatic test generation can help verify and develop the behavior of mobile applications. Test reuse based on semantic similarities between applications of the same category has been utilized to reduce the manual effort of Graphical User Interface (GUI) testing. However, most of the existing studies fail to solve the semantic problem of event matching, which leads to the failure of test reuse. To overcome this challenge, we propose TRASM (Test Reuse based on Adaptive Semantic Matching), a test reuse approach based on adaptive strategies to find a better event matching across android mobile applications. TRASM first performs GUI events deduplication on the initial test set obtained from test generation, and then employs an adaptive strategy to find better event matching, which enables reusing the existing test. Preliminary experiments with comparison to baseline methods on 15 applications demonstrate that TRASM can improve the precision of GUI event matching while reducing the failure of test reuse and the running time required for test reuse.

preprint2022arXiv

Dispersion Analysis of CIP-FEM for Helmholtz Equation

When solving the Helmholtz equation numerically, the accuracy of numerical solution deteriorates as the wave number $k$ increases, known as `pollution effect' which is directly related to the phase difference between the exact and numerical solutions, caused by the numerical dispersion. In this paper, we propose a dispersion analysis for the continuous interior penalty finite element method (CIP-FEM) and derive an explicit formula of the penalty parameter for the $p^{\rm th}$ order CIP-FEM on tensor product (Cartesian) meshes, with which the phase difference is reduced from $\mathcal{O}\big(k(kh)^{2p}\big)$ to $\mathcal{O}\big(k(kh)^{2p+2}\big)$. Extensive numerical tests show that the pollution error of the CIP-FE solution is also reduced by two orders in $kh$ with the same penalty parameter.

preprint2022arXiv

Dual-Distribution Discrepancy for Anomaly Detection in Chest X-Rays

Chest X-ray (CXR) is the most typical radiological exam for diagnosis of various diseases. Due to the expensive and time-consuming annotations, detecting anomalies in CXRs in an unsupervised fashion is very promising. However, almost all of the existing methods consider anomaly detection as a one-class classification (OCC) problem. They model the distribution of only known normal images during training and identify the samples not conforming to normal profile as anomalies in the testing phase. A large number of unlabeled images containing anomalies are thus ignored in the training phase, although they are easy to obtain in clinical practice. In this paper, we propose a novel strategy, Dual-distribution Discrepancy for Anomaly Detection (DDAD), utilizing both known normal images and unlabeled images. The proposed method consists of two modules. During training, one module takes both known normal and unlabeled images as inputs, capturing anomalous features from unlabeled images in some way, while the other one models the distribution of only known normal images. Subsequently, inter-discrepancy between the two modules, and intra-discrepancy inside the module that is trained on only normal images are designed as anomaly scores to indicate anomalies. Experiments on three CXR datasets demonstrate that the proposed DDAD achieves consistent, significant gains and outperforms state-of-the-art methods. Code is available at https://github.com/caiyu6666/DDAD.

preprint2022arXiv

Field Evaluation of Four Low-cost PM Sensors and Design, Development and Field Evaluation of A Wearable PM Exposure Monitoring System

To mitigate the significant biases/errors in research studying the associations between PM and health, which are introduced by the coarse/inadequate assessments of PM exposure from conventional PM monitoring paradigm, a personalized monitoring system consisting of a low-cost wearable PM device is proposed. However, due to the absence of a unifying evaluation protocol for low-cost PM sensors, the evaluation results/performance specifications from existing studies/datasheets are of limited reference values when attempting to determine the best candidate for the proposed system. In this regard, the authors appeal to the research community to develop a standardized evaluation protocol for low-cost PM sensors/devices, and a unifying attempt is established in this manuscript by adopting the definitive terminology from international documents and the evaluation metrics regarded as best practices. Collocated on the rooftop of the HKUST Supersite, four empirically selected PM sensors were compared against each other and calibrated against two reference monitors. They were then evaluated against the reference following the protocol. The PlanTower PMS-A003 sensor was selected for the wearable device as it outperformed the others in terms of affordability, portability, detection capability, data quality, as well as humidity and condensation insusceptibility. An automated approach was proposed to identify and remove the condensation associated abnormal measurements. The proposed device has better affordability and portability as well as similar usability and data accessibility compared to those existing devices recognized. The first 10 devices were also evaluated and calibrated at the Supersite. Additional 120 units were manufactured and delivered to the subjects to acquire their daily PM2.5 exposures for investigating the association with subclinical atherosclerosis.

preprint2022arXiv

Focal Inverse Distance Transform Maps for Crowd Localization

In this paper, we focus on the crowd localization task, a crucial topic of crowd analysis. Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for the crowd localization task. Compared with the density maps, the FIDT maps accurately describe the persons' locations without overlapping in dense regions. Based on the FIDT maps, a Local-Maxima-Detection-Strategy (LMDS) is derived to effectively extract the center point for each individual. Furthermore, we introduce an Independent SSIM (I-SSIM) loss to make the model tend to learn the local structural information, better recognizing local maxima. Extensive experiments demonstrate that the proposed method reports state-of-the-art localization performance on six crowd datasets and one vehicle dataset. Additionally, we find that the proposed method shows superior robustness on the negative and extremely dense scenes, which further verifies the effectiveness of the FIDT maps. The code and model will be available at https://github.com/dk-liang/FIDTM.

preprint2022arXiv

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability. It is challenging to estimate the BEV semantic maps from monocular images due to the spatial gap, since it is implicitly required to realize both the perspective-to-BEV transformation and segmentation. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet, consisting of (i) the geometry-guided pre-alignment and (ii) ray-based transformer. In the first stage, we decouple the BEV segmentation into the perspective image segmentation and geometric prior-based mapping, with explicit supervision by projecting the BEV semantic labels onto the image plane to learn visibility-aware features and learnable geometry to translate into BEV space. Second, the pre-aligned coarse BEV features are further deformed by ray-based transformers to take visibility knowledge into account. GitNet achieves the leading performance on the challenging nuScenes and Argoverse Datasets.

preprint2022arXiv

Graph Neural Networks: Taxonomy, Advances and Trends

Graph neural networks provide a powerful toolkit for embedding real-world graphs into low-dimensional spaces according to specific tasks. Up to now, there have been several surveys on this topic. However, they usually lay emphasis on different angles so that the readers can not see a panorama of the graph neural networks. This survey aims to overcome this limitation, and provide a comprehensive review on the graph neural networks. First of all, we provide a novel taxonomy for the graph neural networks, and then refer to up to 400 relevant literatures to show the panorama of the graph neural networks. All of them are classified into the corresponding categories. In order to drive the graph neural networks into a new stage, we summarize four future research directions so as to overcome the facing challenges. It is expected that more and more scholars can understand and exploit the graph neural networks, and use them in their research community.

preprint2022arXiv

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Real-world data often follows a long-tailed distribution, which makes the performance of existing classification algorithms degrade heavily. A key issue is that samples in tail categories fail to depict their intra-class diversity. Humans can imagine a sample in new poses, scenes, and view angles with their prior knowledge even if it is the first time to see this category. Inspired by this, we propose a novel reasoning-based implicit semantic data augmentation method to borrow transformation directions from other classes. Since the covariance matrix of each category represents the feature transformation directions, we can sample new directions from similar categories to generate definitely different instances. Specifically, the long-tailed distributed data is first adopted to train a backbone and a classifier. Then, a covariance matrix for each category is estimated, and a knowledge graph is constructed to store the relations of any two categories. Finally, tail samples are adaptively enhanced via propagating information from all the similar categories in the knowledge graph. Experimental results on CIFAR-100-LT, ImageNet-LT, and iNaturalist 2018 have demonstrated the effectiveness of our proposed method compared with the state-of-the-art methods.

preprint2022arXiv

Improving Information Freshness via Backbone-Assisted Cooperative Access Points

Information freshness, characterized by age of information (AoI), is important for sensor applications involving timely status updates. In many cases, the wireless signals from one sensor can be received by multiple access points (APs). This paper investigates the average AoI for cooperative APs, in which they can share information through a wired backbone network. We first study a basic backbone-assisted COoperative AP (Co-AP) system where APs share only decoded packets. Experimental results on software-defined radios (SDR) indicate that Co-AP significantly improves the average AoI performance over a single-AP system. Next, we investigate an improved Co-AP system, called Soft-Co-AP. In addition to sharing decoded packets, Soft-Co-AP shares and collects soft information of packets that the APs fail to decode for further joint decoding. A critical issue in Soft-Co-AP is determining the number of quantization bits that represent the soft information (each soft bit) shared over the backbone. While more quantization bits per soft bit improves the joint decoding performance, it leads to higher backbone delay. We experimentally study the average AoI of Soft-Co-AP by evaluating the tradeoff between the backbone delay and the number of quantization bits. SDR experiments show that when the number of sensors is large, Soft-Co-AP further reduces the average AoI by 12% compared with Co-AP. Interestingly, good average AoI performance is usually achieved when the number of quantization bits per soft bit is neither too large nor too small.

preprint2022arXiv

Investigations into the characteristics and influences of nonequilibrium evolution

In order to estimate qualitatively the influence of nonequilibrium evolution in relativistic heavy ion collisions, we use the three dimensional Ising model with Metropolis algorithm to study the evolution from nonequilibrium to equilibrium on the phase boundary. The evolution of order parameter approaches its equilibrium value exponentially, the same as that given by Langevin equation. The average relaxation time is defined which is demonstrated to well represent the relaxation time in dynamical equations. It is shown that the average relaxation time at critical temperature diverges as the zth power of system size. The third and the fourth cumulants of order parameter during the nonequilibrium evolution could be either positive or negative, depending on the observation time, consistent with dynamical models at T > Tc. It is found that the nonequilibrium evolution at T > Tc lasts very short, and the influence is weaker than that at T < Tc. Those qualitative features are instructive to determine experimentally the critical point and the phase boundary of QCD.

preprint2022arXiv

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been widely used in correspondence to each single task. This may lead to the representation learned by the model being task-specific, and inevitably result in the lack of generalization ability of the feature based on multi-modal modeling. More recent studies have shown that establishing cross-modal relationship between auditory and visual stream is a promising solution for the challenge of audio-visual multi-task learning. Therefore, as a motivation to bridge the multi-modal associations in audio-visual tasks, a unified framework is proposed to achieve target speaker detection and speech enhancement with joint learning of audio-visual modeling in this study.

preprint2022arXiv

MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners

For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics.

preprint2022arXiv

One Password: An Encryption Scheme for Hiding Users&#39; Register Information

In recent years, the attack which leverages register information (e.g. accounts and passwords) leaked from 3rd party applications to try other applications is popular and serious. We call this attack &#34;database collision&#34;. Traditionally, people have to keep dozens of accounts and passwords for different applications to prevent this attack. In this paper, we propose a novel encryption scheme for hiding users&#39; register information and preventing this attack. Specifically, we first hash the register information using existing safe hash function. Then the hash string is hidden, instead a coefficient vector is stored for verification. Coefficient vectors of the same register information are generated randomly for different applications. Hence, the original information is hardly cracked by dictionary based attack or database collision in practice. Using our encryption scheme, each user only needs to keep one password for dozens of applications.

preprint2022arXiv

Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions

Role-oriented dialogue summarization is to generate summaries for different roles in the dialogue, e.g., merchants and consumers. Existing methods handle this task by summarizing each role&#39;s content separately and thus are prone to ignore the information from other roles. However, we believe that other roles&#39; content could benefit the quality of summaries, such as the omitted information mentioned by other roles. Therefore, we propose a novel role interaction enhanced method for role-oriented dialogue summarization. It adopts cross attention and decoder self-attention interactions to interactively acquire other roles&#39; critical information. The cross attention interaction aims to select other roles&#39; critical dialogue utterances, while the decoder self-attention interaction aims to obtain key information from other roles&#39; summaries. Experimental results have shown that our proposed method significantly outperforms strong baselines on two public role-oriented dialogue summarization datasets. Extensive analyses have demonstrated that other roles&#39; content could help generate summaries with more complete semantics and correct topic structures.

preprint2022arXiv

Reliable and Broad-range Layer Identification of Au-assisted Exfoliated Large Area MoS$_2$ and WS$_2$ Using Reflection Spectroscopic Fingerprints

The emerging Au-assisted exfoliation technique provides a wealth of large-area and high-quality ultrathin two-dimensional (2D) materials compared with traditional tape-based exfoliation. Fast, damage-free, and reliable determination of the layer number of such 2D films is essential to study layer-dependent physics and promote device applications. Here, an optical method has been developed for simple, high throughput, and accurate determination of the layer number for Au-assisted exfoliated MoS$_2$ and WS$_2$ films in a broad thickness range. The method is based on quantitative analysis of layer-dependent white light reflection spectra, revealing that the reflection peak intensity can be used as a clear indicator for determining the layer number. The simple yet robust method will facilitate the fundamental study on layer-dependent optical, electrical, and thermal properties and device applications of 2D materials. The technique can also be readily combined with photoluminescence and Raman spectroscopies to study other layer-dependent physical properties of 2D materials.

preprint2022arXiv

Scaling Bockchain with Adaptivity

This paper presents Balloon, a scalable blockchain consensus protocol which could dynamically adapt its performance to the overall computation power change. Balloon is based on a parallel chain architecture combined with a greedy heaviest sub-chain selection strategy. It adopts an inovative block sampling approach to assess the change of block generation rate in the network. By introducing view change mechanism, Balllon is able to dynamically adjust the number of parallel sub-chains. Balloon redefines the concept of block subtree weight with view change in consideration, so that a total order of blocks could be obtained safely. To deal with rapidly increasing block generation rate in the blockchain network, participants of previous Nakamoto-style protocols are required to continuously increase their mining difficulty so as to maintain an expected security gurantee. Balloon, however, could accomadate a fixed difficulty setup and assign superfluous block processing capability to new sub-chains, which makes it more open and also economical.

preprint2022arXiv

Spatial Power Spectral Analysis of the Suzaku X-ray Background

Power spectra of spatial fluctuations of X-ray emission may impose constraints on the origins of the emission independent of that from the energy spectra. We generated spatial power spectrum densities (PSD) of blank X-ray skies observed with Suzaku X-ray observatory utilizing the modified $Δ$-variance method. Using the total measured count rate as the diagnostic tool, we found that a model consisting of the sum of two components, one for the unresolved faint point sources and one for the uniform flat-field emission, can well represent the observed PSD in three different energy bands (0.2-0.5 keV, 0.5-2 keV, and 2-10 keV); only an upper limit is obtained for the latter component in 2-10 keV. X-ray counting rates corresponding to the best-fit PSD model functions and diffuse emission fractions were estimated, and we confirmed that the sum of the counting rates of two model components is consistent with those actually observed with the detector for all energy bands. The ratio of the flat-field counting rate to the total in 0.5-2 keV, however, is significantly larger than the diffuse emission fraction estimated from the model fits of energy spectra. We discussed that this discrepancy can be reconciled by systematic effects in the PSD and energy spectrum analyses. The present study demonstrates that the spatial power spectrum is powerful in constraining the origins of the X-ray emission.

preprint2022arXiv

Towards Personalized Healthcare in Cardiac Population: The Development of a Wearable ECG Monitoring System, an ECG Lossy Compression Schema, and a ResNet-Based AF Detector

Cardiovascular diseases (CVDs) are the number one cause of death worldwide. While there is growing evidence that the atrial fibrillation (AF) has strong associations with various CVDs, this heart arrhythmia is usually diagnosed using electrocardiography (ECG) which is a risk-free, non-intrusive, and cost-efficient tool. Continuously and remotely monitoring the subjects&#39; ECG information unlocks the potentials of prompt pre-diagnosis and timely pre-treatment of AF before the development of any life-threatening conditions/diseases. Ultimately, the CVDs associated mortality could be reduced. In this manuscript, the design and implementation of a personalized healthcare system embodying a wearable ECG device, a mobile application, and a back-end server are presented. This system continuously monitors the users&#39; ECG information to provide personalized health warnings/feedbacks. The users are able to communicate with their paired health advisors through this system for remote diagnoses, interventions, etc. The implemented wearable ECG devices have been evaluated and showed excellent intra-consistency (CVRMS=5.5%), acceptable inter-consistency (CVRMS=12.1%), and negligible RR-interval errors (ARE<1.4%). To boost the battery life of the wearable devices, a lossy compression schema utilizing the quasi-periodic feature of ECG signals to achieve compression was proposed. Compared to the recognized schemata, it outperformed the others in terms of compression efficiency and distortion, and achieved at least 2x of CR at a certain PRD or RMSE for ECG signals from the MIT-BIH database. To enable automated AF diagnosis/screening in the proposed system, a ResNet-based AF detector was developed. For the ECG records from the 2017 PhysioNet CinC challenge, this AF detector obtained an average testing F1=85.10% and a best testing F1=87.31%, outperforming the state-of-the-art.

preprint2022arXiv

TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation

The research focus of scene text detection and recognition has shifted to arbitrary shape text in recent years, where the text shape representation is a fundamental problem. An ideal representation should be compact, complete, efficient, and reusable for subsequent recognition in our opinion. However, previous representations have flaws in one or more aspects. Thin-Plate-Spline (TPS) transformation has achieved great success in scene text recognition. Inspired by this, we reversely think of its usage and sophisticatedly take TPS as an exquisite representation for arbitrary shape text representation. The TPS representation is compact, complete, and efficient. With the predicted TPS parameters, the detected text region can be directly rectified to a near-horizontal one to assist the subsequent recognition. To further exploit the potential of the TPS representation, the Border Alignment Loss is proposed. Based on these designs, we implement the text detector TPSNet, which can be extended to a text spotter conveniently. Extensive evaluation and ablation of several public benchmarks demonstrate the effectiveness and superiority of the proposed method for text representation and spotting. Particularly, TPSNet achieves the detection F-Measure improvement of 4.4\% (78.4\% vs. 74.0\%) on Art dataset and the end-to-end spotting F-Measure improvement of 5.0\% (78.5\% vs. 73.5\%) on Total-Text, which are large margins with no bells and whistles.

preprint2022arXiv

TransCrowd: weakly-supervised crowd counting with transformers

The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.

preprint2022arXiv

Transformation-Invariant Learning of Optimal Individualized Decision Rules with Time-to-Event Outcomes

In many important applications of precision medicine, the outcome of interest is time to an event (e.g., death, relapse of disease) and the primary goal is to identify the optimal individualized decision rule (IDR) to prolong survival time. Existing work in this area have been mostly focused on estimating the optimal IDR to maximize the We propose a new robust framework for estimating an optimal static or dynamic IDR with time-to-event outcomes based on an easy-to-interpret quantile criterion. The new method does not need to specify an outcome regression model and is robust for heavy-tailed distribution. The estimation problem corresponds to a nonregular M-estimation problem with both finite and infinite-dimensional nuisance parameters. Employing advanced empirical process techniques, we establish the statistical theory of the estimated parameter indexing the optimal IDR. Furthermore, we prove a novel result that the proposed approach can consistently estimate the optimal value function under mild conditions even when the optimal IDR is non-unique, which happens in the challenging setting of exceptional laws. We also propose a smoothed resampling procedure for inference. The proposed methods are implemented in the R-package QTOCen. We demonstrate the performance of the proposed new methods via extensive Monte Carlo studies and a real data application.restricted mean survival time in the population.

preprint2022arXiv

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection

Recent scene text detection methods are almost based on deep learning and data-driven. Synthetic data is commonly adopted for pre-training due to expensive annotation cost. However, there are obvious domain discrepancies between synthetic data and real-world data. It may lead to sub-optimal performance to directly adopt the model initialized by synthetic data in the fine-tuning stage. In this paper, we propose a new training paradigm for scene text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate \textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage. Three training strategies are further explored to perceive information from real-world data in an unsupervised way. With UNITS, scene text detectors are improved without introducing any parameters and computations during inference. Extensive experimental results show consistent performance improvements on three public datasets.

preprint2021arXiv

0.8% Nyquist computational ghost imaging via non-experimental deep learning

We present a framework for computational ghost imaging based on deep learning and customized pink noise speckle patterns. The deep neural network in this work, which can learn the sensing model and enhance image reconstruction quality, is trained merely by simulation. To demonstrate the sub-Nyquist level in our work, the conventional computational ghost imaging results, reconstructed imaging results using white noise and pink noise via deep learning are compared under multiple sampling rates at different noise conditions. We show that the proposed scheme can provide high-quality images with a sampling rate of 0.8% even when the object is outside the training dataset, and it is robust to noisy environments. This method is excellent for various applications, particularly those that require a low sampling rate, fast reconstruction efficiency, or experience strong noise interference.

preprint2021arXiv

Ghost Imaging Based on Recurrent Neural Network

Benefit from the promising features of second-order correlation, ghost imaging (GI) has received extensive attentions in recent years. Simultaneously, GI is affected by the poor trade-off between sampling rate and imaging quality. The traditional image reconstruction method in GI is to accumulate the action result of each speckle and the corresponding bucket signal. We found that the image reconstruction process of GI is very similar to the Recurrent Neural Network (RNN), which is one of the deep learning algorithm. In this paper, we proposed a novel method that effectively implements GI on the RNN architecture, called GI-RNN. The state of each layer in RNN is determined by the output of the previous layer and the input of this layer, and the output of the network is the sum of all previous states. Therefore, we take the speckle of each illumination and the corresponding bucket signal as the input of each layer, and the output of the network is the sum of all previous speckle and bucket signal, which is the image of the target. The testing results show that the proposed method can achieve image reconstruction at a very low sampling rate (0.38$\%$). Moreover, we compare GI-RNN with traditional GI algorithm and compressed sensing algorithm. The results of different targets show that GI-RNN is 6.61 dB higher than compressed sensing algorithm and 12.58 dB higher than traditional GI algorithm on average. In our view, the proposed method makes an important step to applications of GI.

preprint2021arXiv

Nth-order nonlinear intensity fluctuation amplifier

Stronger light intensity fluctuations are pursued by related applications such as optical resolution, image enhancement, and beam positioning. In this paper, an Nth-order light intensity fluctuation amplifier is proposed, which was demonstrated by a four-wave mixing process with different statistical distribution coupling lights. Firstly, its amplification mechanism is revealed both theoretically and experimentally. The ratio $R$ of statistical distributions and the degree of second-order coherence ${g^{(2)}}(0)$ of beams are used to characterize the affected modulations and the increased light intensity fluctuations through the four-wave mixing process. The results show that the amplification of light intensity fluctuations is caused by not only the fluctuating light fields of incident coupling beams, but also the fluctuating nonlinear coefficient of interaction. At last, we highlight the potentiality of applying such amplifier to other N-order nonlinear optical effects.

preprint2021arXiv

Quick Annotator: an open-source digital pathology based rapid image annotation tool

Image based biomarker discovery typically requires an accurate segmentation of histologic structures (e.g., cell nuclei, tubules, epithelial regions) in digital pathology Whole Slide Images (WSI). Unfortunately, annotating each structure of interest is laborious and often intractable even in moderately sized cohorts. Here, we present an open-source tool, Quick Annotator (QA), designed to improve annotation efficiency of histologic structures by orders of magnitude. While the user annotates regions of interest (ROI) via an intuitive web interface, a deep learning (DL) model is concurrently optimized using these annotations and applied to the ROI. The user iteratively reviews DL results to either (a) accept accurately annotated regions, or (b) correct erroneously segmented structures to improve subsequent model suggestions, before transitioning to other ROIs. We demonstrate the effectiveness of QA over comparable manual efforts via three use cases. These include annotating (a) 337,386 nuclei in 5 pancreatic WSIs, (b) 5,692 tubules in 10 colorectal WSIs, and (c) 14,187 regions of epithelium in 10 breast WSIs. Efficiency gains in terms of annotations per second of 102x, 9x, and 39x were respectively witnessed while retaining f-scores >.95, suggesting QA may be a valuable tool for efficiently fully annotating WSIs employed in downstream biomarker studies.

preprint2020arXiv

Artificial intelligence control of a turbulent jet

An artificial intelligence (AI) control system is developed to maximize the mixing rate of a turbulent jet. This system comprises six independently operated unsteady minijet actuators, two hot-wire sensors placed in the jet, and genetic programming for the unsupervised learning of a near-optimal control law. The ansatz of this law includes multi-frequency open-loop forcing, sensor-feedback and nonlinear combinations thereof. Mixing performance is quantified by the decay rate of the centreline mean velocity of jet. Intriguingly, the learning process of AI control discovers the classical forcings, i.e. axisymmetric, helical and flapping achievable from conventional control techniques, one by one in the order of increased performance, and finally converges to a hitherto unexplored forcing. Careful examination of the control landscape unveils typical control laws, generated in the learning process, and their evolutions. The best AI forcing produces a complex turbulent flow structure that is characterized by periodically generated mushroom structures, helical motion and oscillating jet column, all enhancing the mixing rate and vastly outperforming others. Being never reported before, this flow structure is examined in various aspects, including the velocity spectra, mean and fluctuating velocity fields and their downstream evolution, and flow visualization images in three orthogonal planes, all compared with other classical flow structures. Along with the knowledge of the minijet-produced flow and its effect on the initial condition of the main jet, these aspects cast valuable insight into the physics behind the highly effective mixing of this newly found flow structure. The results point to the great potential of AI in conquering the vast opportunity space of control laws for many actuators and sensors and in optimizing turbulence.

preprint2020arXiv

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently deep learning based Natural Language Processing (NLP) models have shown great potential in the modeling of source code. However, a major limitation of these approaches is that they take source code as simple tokens of text and ignore its contextual, syntactical and structural dependencies. In this work, we present CodeGRU, a gated recurrent unit based source code language model that is capable of capturing source code&#39;s contextual, syntactical and structural dependencies. We introduce a novel approach which can capture the source code context by leveraging the source code token types. Further, we adopt a novel approach which can learn variable size context by taking into account source code&#39;s syntax, and structural information. We evaluate CodeGRU with real-world data set and it shows that CodeGRU outperforms the state-of-the-art language models and help reduce the vocabulary size up to 24.93\%. Unlike previous works, we tested CodeGRU with an independent test set which suggests that our methodology does not requisite the source code comes from the same domain as training data while providing suggestions. We further evaluate CodeGRU with two software engineering applications: source code suggestion, and source code completion. Our experiment confirms that the source code&#39;s contextual information can be vital and can help improve the software language models. The extensive evaluation of CodeGRU shows that it outperforms the state-of-the-art models. The results further suggest that the proposed approach can help reduce the vocabulary size and is of practical use for software developers.

preprint2020arXiv

Deep Transfer Learning for Source Code Modeling

In recent years, deep learning models have shown great potential in source code modeling and analysis. Generally, deep learning-based approaches are problem-specific and data-hungry. A challenging issue of these approaches is that they require training from starch for a different related problem. In this work, we propose a transfer learning-based approach that significantly improves the performance of deep learning-based source code models. In contrast to traditional learning paradigms, transfer learning can transfer the knowledge learned in solving one problem into another related problem. First, we present two recurrent neural network-based models RNN and GRU for the purpose of transfer learning in the domain of source code modeling. Next, via transfer learning, these pre-trained (RNN and GRU) models are used as feature extractors. Then, these extracted features are combined into attention learner for different downstream tasks. The attention learner leverages from the learned knowledge of pre-trained models and fine-tunes them for a specific downstream task. We evaluate the performance of the proposed approach with extensive experiments with the source code suggestion task. The results indicate that the proposed approach outperforms the state-of-the-art models in terms of accuracy, precision, recall, and F-measure without training the models from scratch.

preprint2020arXiv

DeepVS: An Efficient and Generic Approach for Source Code Modeling Usage

The source code suggestions provided by current IDEs are mostly dependent on static type learning. These suggestions often end up proposing irrelevant suggestions for a peculiar context. Recently, deep learning-based approaches have shown great potential in the modeling of source code for various software engineering tasks. However, these techniques lack adequate generalization and resistance to acclimate the use of such models in a real-world software development environment. This letter presents \textit{DeepVS}, an end-to-end deep neural code completion tool that learns from existing codebases by exploiting the bidirectional Gated Recurrent Unit (BiGRU) neural net. The proposed tool is capable of providing source code suggestions instantly in an IDE by using pre-trained BiGRU neural net. The evaluation of this work is two-fold, quantitative and qualitative. Through extensive evaluation on ten real-world open-source software systems, the proposed method shows significant performance enhancement and its practicality. Moreover, the results also suggest that \textit{DeepVS} tool is capable of suggesting zero-day (unseen) code tokens by learning coding patterns from real-world software systems.

preprint2020arXiv

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

Deep neural networks are highly effective when a large number of labeled samples are available but fail with few-shot classification tasks. Recently, meta-learning methods have received much attention, which train a meta-learner on massive additional tasks to gain the knowledge to instruct the few-shot classification. Usually, the training tasks are randomly sampled and performed indiscriminately, often making the meta-learner stuck into a bad local optimum. Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better. Inspired by this idea, we propose an easy-to-hard expert meta-training strategy to arrange the training tasks properly, where easy tasks are preferred in the first phase, then, hard tasks are emphasized in the second phase. A task hardness aware module is designed and integrated into the training procedure to estimate the hardness of a task based on the distinguishability of its categories. In addition, we explore multiple hardness measurements including the semantic relation, the pairwise Euclidean distance, the Hausdorff distance, and the Hilbert-Schmidt independence criterion. Experimental results on the miniImageNet and tieredImageNetSketch datasets show that the meta-learners can obtain better results with our expert training strategy.

preprint2020arXiv

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed videos for model training. However, trimmed datasets are manually annotated from untrimmed videos. In this sense, these methods are not really self-supervised. In this paper, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn spatio-temporal features. ERUV first generates single-shot videos by shot change detection. Then a designed sampling strategy is used to model relations for video clips. The strategy is saved as our self-supervision signals. Finally, the network learns representations by predicting the category of relations between the video clips. ERUV is able to compare the differences and similarities of videos, which is also an essential procedure for action and video related tasks. We validate our learned models with action recognition and video retrieval tasks with three kinds of 3D CNNs. Experimental results show that ERUV is able to learn richer representations and it outperforms state-of-the-art self-supervised methods with significant margins.

preprint2020arXiv

FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-Oriented Scene Text Detection

Recent scene text detection works mainly focus on curve text detection. However, in real applications, the curve texts are more scarce than the multi-oriented ones. Accurate detection of multi-oriented text with large variations of scales, orientations, and aspect ratios is of great significance. Among the multi-oriented detection methods, direct regression for the geometry of scene text shares a simple yet powerful pipeline and gets popular in academic and industrial communities, but it may produce imperfect detections, especially for long texts due to the limitation of the receptive field. In this work, we aim to improve this while keeping the pipeline simple. A fully convolutional corner refinement network (FC2RN) is proposed for accurate multi-oriented text detection, in which an initial corner prediction and a refined corner prediction are obtained at one pass. With a novel quadrilateral RoI convolution operation tailed for multi-oriented scene text, the initial quadrilateral prediction is encoded into the feature maps which can be further used to predict offset between the initial prediction and the ground-truth as well as output a refined confidence score. Experimental results on four public datasets including MSRA-TD500, ICDAR2017-RCTW, ICDAR2015, and COCO-Text demonstrate that FC2RN can outperform the state-of-the-art methods. The ablation study shows the effectiveness of corner refinement and scoring for accurate text localization.

preprint2020arXiv

Progressive Cluster Purification for Unsupervised Feature Learning

In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training, while the sizes of clusters continuously expand consistently with the growth of model representation capability. With a well-designed cluster purification mechanism, it further purifies clusters by filtering noise samples which facilitate the subsequent feature learning by utilizing the refined clusters as pseudo-labels. Experiments on commonly used benchmarks demonstrate that the proposed PCP improves baseline method with significant margins. Our code will be available at https://github.com/zhangyifei0115/PCP.

preprint2020arXiv

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nevertheless, they still face lots of challenges like image blur, uneven illumination, and incomplete characters. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. The semantic information is used both in the encoder module for supervision and in the decoder module for initializing. In particular, the state-of-the art ASTER method is integrated into the proposed framework as an exemplar. Extensive experiments demonstrate that the proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets.

preprint2020arXiv

Self-Training for Domain Adaptive Scene Text Detection

Though deep learning based scene text detection has achieved great progress, well-trained detectors suffer from severe performance degradation for different domains. In general, a tremendous amount of data is indispensable to train the detector in the target domain. However, data collection and annotation are expensive and time-consuming. To address this problem, we propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images. To reduce the noise of hard examples, a novel text mining module is implemented based on the fusion of detection and tracking results. Then, an image-to-video generation method is designed for the tasks that videos are unavailable and only images can be used. Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT, demonstrate the effectiveness of our self-training method. The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.

preprint2020arXiv

Subtracting non-critical fluctuations in higher cumulants of conserved charges

Using the sample produced by the AMPT default model, we construct a corresponding mixed sample by the method of mixed events. The mixed sample provides an effective estimation for non-critical fluctuations which are caused by global and systematic effects. The dynamical cumulants of conserved charges are defined as the cumulants of the original sample minus the cumulants of the mixed sample. It is demonstrated that dynamical cumulants are subtracted statistical fluctuations, and centrality bin width or detection efficiency independent, in consistent with formulae corrected cumulants. Therefore, dynamical cumulants are helpful in obtaining critical fluctuations at the RHIC BES.

preprint2020arXiv

Two-Level Residual Distillation based Triple Network for Incremental Object Detection

Modern object detection methods based on convolutional neural network suffer from severe catastrophic forgetting in learning new classes without original data. Due to time consumption, storage burden and privacy of old data, it is inadvisable to train the model from scratch with both old and new data when new object classes emerge after the model trained. In this paper, we propose a novel incremental object detector based on Faster R-CNN to continuously learn from new object classes without using old data. It is a triple network where an old model and a residual model as assistants for helping the incremental model learning on new classes without forgetting the previous learned knowledge. To better maintain the discrimination of features between old and new classes, the residual model is jointly trained on new classes in the incremental learning procedure. In addition, a corresponding distillation scheme is designed to guide the training process, which consists of a two-level residual distillation loss and a joint classification distillation loss. Extensive experiments on VOC2007 and COCO are conducted, and the results demonstrate that the proposed method can effectively learn to incrementally detect objects of new classes, and the problem of catastrophic forgetting is mitigated in this context.

preprint2020arXiv

Two-photon superbunching effect of broadband chaotic stationary light at femtosecond timescale based on cascaded Michelson interferometer

It is challenging for observing superbunching effect with true chaotic light, here we propose and demonstrate a method to achieve superbunching effect of the degree of second-order coherence is 2.42 with broadband stationary chaotic light based on a cascaded Michelson interferometer (CMI), exceeding the theoretical upper limit of 2 for the two-photon bunching effect of chaotic light. The superbunching correlation peak is measured with an ultrafast two-photon absorption detector which the full width at half maximum reaches about 95 fs. Two-photon superbunching theory in a CMI is developed to interpret the effect and is in agreement with experimental results. The theory also predicts that the degree of second-order coherence can be much greater than $2$ if chaotic light propagates $N$ times in a CMI. Finally, a new type of weak signals detection setup which employs broadband chaotic light circulating in a CMI is proposed. Theoretically, it can increase the detection sensitivity of weak signals 79 times after the chaotic light circulating 100 times in the CMI.

preprint2020arXiv

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates &#34;blanks&#34; by withholding video clips and then creates &#34;options&#34; by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with &#34;options&#34; and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

preprint2020arXiv

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP

preprint2019arXiv

Tunable Microwave Single-photon Source Based on Transmon Qubit with High Efficiency

Single-photon sources are of great interest because they are key elements in different promising applications of quantum technologies. Here we demonstrate a highly efficient tunable on-demand microwave single-photon source based on a transmon qubit with the intrinsic emission efficiency above 98$\%$. The high efficiency ensures a negligible pure dephasing rate and the necessary condition for generation of indistinguishable photons. We provide an extended discussion and analysis of the efficiency of the photon generation. To further experimentally confirm the single-photon property of the source, correlation functions of the emission field are also measured using linear detectors with a GPU-enhanced signal processing technique. Our results experimentally demonstrate that frequency tunability and negligible pure dephasing rate can be achieved simultaneously and show that such a tunable single-photon source can be good for various practical applications in quantum communication, simulations and information processing in the microwave regime.