Researcher profile

Guanghui Wang

Guanghui Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
32works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

32 published item(s)

preprint2026arXiv

Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT

Large-scale 3D vision-language models (VLMs) like LLaVA-3D offer strong spatial reasoning but are difficult to deploy due to high computational costs. We propose a knowledge distillation framework that transfers spatial reasoning from a 7B teacher to a 2.29B student model. Our approach achieves 8.7x lower inference latency and a 3x reduction in model size while retaining 54-72% of the teacher's performance. The framework utilizes VGGT as the vision encoder and a multi-task distillation pipeline with uncertainty-aware loss weighting. To improve reasoning without chain-of-thought (CoT) data, we introduce "Hidden CoT": learnable latent tokens that serve as an internal scratchpad before answer generation. This is the first use of latent scratchpad reasoning in distilled 3D VLMs. The student model jointly performs spatial description, depth estimation, and object detection. Experiments on ScanNet and 3D-FRONT show strong spatial understanding, reaching 68-72% accuracy on proximity and contact tasks. Our framework enables efficient 3D scene QA on resource-constrained platforms.

preprint2026arXiv

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Self-evolving skill libraries face a silent failure mode we term \emph{library drift}: unbounded skill accumulation without outcome-driven lifecycle management causes retrieval degradation, false-positive injections, and performance stagnation. Recent evaluation confirms the symptom--LLM-authored skills deliver +0.0pp gain while human-curated ones deliver +16.2pp (SkillsBench)--yet the underlying mechanism has not been isolated. We provide (1) a reproducible trigger: ablations that isolate drift--one disables skill injection (flat floor, +0.002), one imposes premature retirement (active harm, $-$0.019); (2) trace-level diagnostics: an append-only evidence log with per-skill contribution scores, attribution verdicts, and router engagement metrics that make the failure visible before it reaches end-task scores; and (3) a verified fix: a minimal governance recipe (outcome-driven retirement + bounded active-cap + meta-skill authoring prior) that lifts held-out pass@1 from a 0.258 baseline to a late-window mean of 0.584 (rolling gain $+$0.328) on MBPP+ hard-100 over 100 rounds. Eight ablations decompose which governance mechanisms are load-bearing and which are subsumed, providing a concrete playbook for diagnosing library drift in any self-evolving agent.

preprint2026arXiv

Model-Agnostic and Uncertainty-Aware Dimensionality Reduction in Supervised Learning

Dimension reduction is a fundamental tool for analyzing high-dimensional data in supervised learning. Traditional methods for estimating intrinsic order often prioritize model-specific structural assumptions over predictive utility. This paper introduces predictive order determination (POD), a model-agnostic framework that determines the minimal predictively sufficient dimension by directly evaluating out-of-sample predictiveness. POD quantifies uncertainty via error bounds for over- and underestimation and achieves consistency under mild conditions. By unifying dimension reduction with predictive performance, POD applies flexibly across diverse reduction tasks and supervised learners. Simulations and real-data analyses show that POD delivers accurate, uncertainty-aware order estimates, making it a versatile component for prediction-centric pipelines.

preprint2025arXiv

Arbitrary orientations of cycles in oriented graphs

We show that every sufficiently large oriented graph $G$ with minimum indegree and outdegree both at least $(3|V(G)|-1)/8$ contains every orientation of a Hamilton cycle. This result improves the approximate bound established by Kelly and resolves a long-standing problem posed by Häggkvist and Thomason in 1995. The degree condition is tight and it can be improved to $(3|V(G)|-4)/8$ for Hamilton cycles that are nearly directed, generalizing a classic result by Keevash, Kühn and Osthus. Additionally, we derive a pancyclicity result for arbitrary orientations. More precisely, the above degree condition suffices to guarantee the existence of cycles of every possible orientation and every possible length unless $G$ is isomorphic to one of the exceptional oriented graphs.

preprint2025arXiv

Ore-type condition for antidirected Hamilton cycles in oriented graphs

An antidirected cycle in a digraph $G$ is a subdigraph whose underlying graph is a cycle, and in which no two consecutive edges form a directed path in $G$. Let $σ_{+-}(G)$ be the minimum value of $d^+(x)+d^-(y)$ over all pairs of vertices $x, y$ such that there is no edge from $x$ to $y$, that is, $$σ_{+-}(G)=\min\{d^+(x)+d^-(y): \{x,y\}\subseteq V(G), xy\notin E(G)\}.$$ In 1972, Woodall extended Ore's theorem to digraphs by showing that every digraph $G$ on $n$ vertices with $σ_{+-}(G)\geqslant n$ contains a directed Hamilton cycle. Very recently, this result was generalized to oriented graphs under the condition $σ_{+-}(G)\geqslant(3n-3)/4$. In this paper, we give the exact Ore-type degree threshold for the existence of antidirected Hamilton cycles in oriented graphs. More precisely, we prove that for sufficiently large even integer $n$, every oriented graph $G$ on $n$ vertices with $σ_{+-}(G)\geqslant(3n+2)/4$ contains an antidirected Hamilton cycle. Moreover, we show that this degree condition is best possible.

preprint2025arXiv

STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability

Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive framework for evaluating and improving consistency in LLM-generated structured outputs. Our approach combines: (1) STED (Semantic Tree Edit Distance), a novel similarity metric balancing semantic flexibility with structural strictness when comparing JSON outputs, and (2) a consistency scoring framework aggregating multiple STED measurements across repeated generations to quantify reliability. Through systematic experiments on synthetic datasets with controlled schema, expression, and semantic variations, we demonstrate STED achieves superior performance ($0.86-0.90$ similarity for semantic equivalents, $0.0$ for structural breaks) compared to existing metrics including TED, BERTScore, and DeepDiff. Applying our framework to benchmark six LLMs reveals significant variations: Claude-3.7-Sonnet demonstrates exceptional consistency, maintaining near-perfect structural reliability even at high temperatures ($T=0.9$), while models like Claude-3-Haiku and Nova-Pro exhibit substantial degradation requiring careful tuning. Our framework enables practical applications including targeted model selection for structured tasks, iterative prompt refinement for reproducible results, and diagnostic analysis to identify inconsistency root causes. This work provides theoretical foundations and practical tools for ensuring reliable structured output generation in LLM-based production systems.

preprint2022arXiv

$F$-factors in Quasi-random Hypergraphs

Given $k\ge 2$ and two $k$-graphs ($k$-uniform hypergraphs) $F$ and $H$, an $F$-factor in $H$ is a set of vertex-disjoint copies of $F$ that together covers the vertex set of $H$. Lenz and Mubayi [J. Combin. Theory Ser. B, 2016] studied the $F$-factor problem in quasi-random $k$-graphs with minimum degree $Ω(n^{k-1})$. They posed the problem of characterizing the $k$-graphs $F$ such that every sufficiently large quasi-random $k$-graph with constant edge density and minimum degree $Ω(n^{k-1})$ contains an $F$-factor, and in particular, they showed that all linear $k$-graphs satisfy this property. In this paper we prove a general theorem on $F$-factors which reduces the $F$-factor problem of Lenz and Mubayi to a natural sub-problem, that is, the $F$-cover problem. By using this result, we answer the question of Lenz and Mubayi for those $F$ which are $k$-partite $k$-graphs, and for all 3-graphs $F$, separately. Our characterization result on 3-graphs is motivated by the recent work of Reiher, Rödl and Schacht [J. Lond. Math. Soc., 2018] that classifies the 3-graphs with vanishing Turán density in quasi-random $k$-graphs.

preprint2022arXiv

$H$-factors in graphs with small independence number

Let $H$ be an $h$-vertex graph. The vertex arboricity $ar(H)$ of $H$ is the least integer $r$ such that $V(H)$ can be partitioned into $r$ parts and each part induces a forest in $H$. We show that for sufficiently large $n\in h\mathbb{N}$, every $n$-vertex graph $G$ with $δ(G)\geq \max\left\{\left(1-\frac{2}{f(H)}+o(1)\right)n, \left(\frac{1}{2}+o(1)\right)n\right\}$ and $α(G)=o(n)$ contains an $H$-factor, where $f(H)=2ar(H)$ or $2ar(H)-1$. The result can be viewed an analogue of the Alon--Yuster theorem \cite{MR1376050} in Ramsey--Turán theory, which generalises the results of Balogh--Molla--Sharifzadeh~\cite{MR3570984} and Knierm--Su~\cite{MR4193066} on clique factors. In particular the degree conditions are asymptotically sharp for infinitely many graphs $H$ which are not cliques.

preprint2022arXiv

Aggregating Global Features into Local Vision Transformer

Local Transformer-based classification models have recently achieved promising results with relatively low computational costs. However, the effect of aggregating spatial global information of local Transformer-based architecture is not clear. This work investigates the outcome of applying a global attention-based module named multi-resolution overlapped attention (MOA) in the local window-based transformer after each stage. The proposed MOA employs slightly larger and overlapped patches in the key to enable neighborhood pixel information transmission, which leads to significant performance gain. In addition, we thoroughly investigate the effect of the dimension of essential architecture components through extensive experiments and discover an optimum architecture design. Extensive experimental results CIFAR-10, CIFAR-100, and ImageNet-1K datasets demonstrate that the proposed approach outperforms previous vision Transformers with a comparatively fewer number of parameters.

preprint2022arXiv

Clique immersion in graphs without fixed bipartite graph

A graph $G$ contains $H$ as an \emph{immersion} if there is an injective mapping $ϕ: V(H)\rightarrow V(G)$ such that for each edge $uv\in E(H)$, there is a path $P_{uv}$ in $G$ joining vertices $ϕ(u)$ and $ϕ(v)$, and all the paths $P_{uv}$, $uv\in E(H)$, are pairwise edge-disjoint. An analogue of Hadwiger's conjecture for the clique immersions by Lescure and Meyniel, and independently by Abu-Khzam and Langston, states that every graph $G$ contains $K_{χ(G)}$ as an immersion. We prove that for any constant $\varepsilon>0$ and integers $s,t\ge2$, there exists $d_0=d_0(\varepsilon,s,t)$ such that every $K_{s,t}$-free graph $G$ with $d(G)\ge d_0$ contains a clique immersion of order $(1-\varepsilon)d(G)$. This implies that the above-mentioned conjecture is asymptotically true for graphs without a fixed complete bipartite graph.

preprint2022arXiv

In Defense of Subspace Tracker: Orthogonal Embedding for Visual Tracking

The paper focuses on a classical tracking model, subspace learning, grounded on the fact that the targets in successive frames are considered to reside in a low-dimensional subspace or manifold due to the similarity in their appearances. In recent years, a number of subspace trackers have been proposed and obtained impressive results. Inspired by the most recent results that the tracking performance is boosted by the subspace with discrimination capability learned over the recently localized targets and their immediately surrounding background, this work aims at solving such a problem: how to learn a robust low-dimensional subspace to accurately and discriminatively represent these target and background samples. To this end, a discriminative approach, which reliably separates the target from its surrounding background, is injected into the subspace learning by means of joint learning, achieving a dimension-adaptive subspace with superior discrimination capability. The proposed approach is extensively evaluated and compared with the state-of-the-art trackers on four popular tracking benchmarks. The experimental results demonstrate that the proposed tracker performs competitively against its counterparts. In particular, it achieves more than 9% performance increase compared with the state-of-the-art subspace trackers.

preprint2022arXiv

Integer colorings with no rainbow $k$-term arithmetic progression

In this paper, we study the rainbow Erdős-Rothschild problem with respect to $k$-term arithmetic progressions. For a set of positive integers $S \subseteq [n]$, an $r$-coloring of $S$ is \emph{rainbow $k$-AP-free} if it contains no rainbow $k$-term arithmetic progression. Let $g_{r,k}(S)$ denote the number of rainbow $k$-AP-free $r$-colorings of $S$. For sufficiently large $n$ and fixed integers $r\ge k\ge 3$, we show that $g_{r,k}(S)<g_{r,k}([n])$ for any proper subset $S\subset [n]$. Further, we prove that $\lim_{n\to \infty}g_{r,k}([n])/(k-1)^n= \binom{r}{k-1}$. Our result is asymptotically best possible and implies that, almost all rainbow $k$-AP-free $r$-colorings of $[n]$ use only $k-1$ colors.

preprint2022arXiv

Matching of given sizes in hypergraphs

For all integers $k,d$ such that $k \geq 3$ and $k/2\leq d \leq k-1$, let $n$ be a sufficiently large integer {\rm(}which may not be divisible by $k${\rm)} and let $s\le \lfloor n/k\rfloor-1$. We show that if $H$ is a $k$-uniform hypergraph on $n$ vertices with $δ_{d}(H)>\binom{n-d}{k-d}-\binom{n-d-s+1}{k-d}$, then $H$ contains a matching of size $s$. This improves a recent result of Lu, Yu, and Yuan and also answers a question of Kühn, Osthus, and Townsend. In many cases, our result can be strengthened to $s\leq \lfloor n/k\rfloor$, which then covers the entire possible range of $s$. On the other hand, there are examples showing that the result does not hold for certain $n, k, d$ and $s= \lfloor n/k\rfloor$.

preprint2022arXiv

Model Predictive Control of Nonlinear Latent Force Models: A Scenario-Based Approach

Control of nonlinear uncertain systems is a common challenge in the robotics field. Nonlinear latent force models, which incorporate latent uncertainty characterized as Gaussian processes, carry the promise of representing such systems effectively, and we focus on the control design for them in this work. To enable the design, we adopt the state-space representation of a Gaussian process to recast the nonlinear latent force model and thus build the ability to predict the future state and uncertainty concurrently. Using this feature, a stochastic model predictive control problem is formulated. To derive a computational algorithm for the problem, we use the scenario-based approach to formulate a deterministic approximation of the stochastic optimization. We evaluate the resultant scenario-based model predictive control approach through a simulation study based on motion planning of an autonomous vehicle, which shows much effectiveness. The proposed approach can find prospective use in various other robotics applications.

preprint2022arXiv

Momentum Accelerates the Convergence of Stochastic AUPRC Maximization

In this paper, we study stochastic optimization of areas under precision-recall curves (AUPRC), which is widely used for combating imbalanced classification tasks. Although a few methods have been proposed for maximizing AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an undeveloped territory. A state-of-the-art complexity is $O(1/ε^5)$ for finding an $ε$-stationary solution. In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/ε^4)$ for finding an $ε$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity, which enjoy faster convergence in practice. To this end, we propose two innovative techniques that are critical for improving the convergence: (i) the biased estimators for tracking individual ranking scores are updated in a randomized coordinate-wise manner; and (ii) a momentum update is used on top of the stochastic gradient estimator for tracking the gradient of the objective. The novel analysis of Adam-style updates is also one main contribution. Extensive experiments on various data sets demonstrate the effectiveness of the proposed algorithms. Of independent interest, the proposed stochastic momentum and adaptive algorithms are also applicable to a class of two-level stochastic dependent compositional optimization problems.

preprint2022arXiv

Multiple Classifiers Based Maximum Classifier Discrepancy for Unsupervised Domain Adaptation

Adversarial training based on the maximum classifier discrepancy between two classifier structures has achieved great success in unsupervised domain adaptation tasks for image classification. The approach adopts the structure of two classifiers, though simple and intuitive, the learned classification boundary may not well represent the data property in the new domain. In this paper, we propose to extend the structure to multiple classifiers to further boost its performance. To this end, we develop a very straightforward approach to adding more classifiers. We employ the principle that the classifiers are different from each other to construct a discrepancy loss function for multiple classifiers. The proposed construction method of loss function makes it possible to add any number of classifiers to the original framework. The proposed approach is validated through extensive experimental evaluations. We demonstrate that, on average, adopting the structure of three classifiers normally yields the best performance as a trade-off between accuracy and efficiency. With minimum extra computational costs, the proposed approach can significantly improve the performance of the original algorithm. The source code of the proposed approach can be downloaded from \url{https://github.com/rucv/MMCD\_DA}.

preprint2022arXiv

Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks with Implicit Gradients

Deep neural networks for 3D point cloud classification, such as PointNet, have been demonstrated to be vulnerable to adversarial attacks. Current adversarial defenders often learn to denoise the (attacked) point clouds by reconstruction, and then feed them to the classifiers as input. In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients. Such classifiers can be formulated using a bilevel optimization framework. We further propose an effective and efficient instantiation of our approach, namely, Lattice Point Classifier (LPC), based on structured sparse coding in the permutohedral lattice and 2D convolutional neural networks (CNNs) that is end-to-end trainable. We demonstrate state-of-the-art robust point cloud classification performance on ModelNet40 and ScanNet under seven different attackers. For instance, we achieve 89.51% and 83.16% test accuracy on each dataset under the recent JGBA attacker that outperforms DUP-Net and IF-Defense with PointNet by ~70%. Demo code is available at https://zhang-vislab.github.io.

preprint2022arXiv

Shape of the asymptotic maximum sum-free sets in integer lattice grids

We determine the shape of all sum-free sets in $\{1,2,\ldots,n\}^2$ of size close to the maximum $\frac{3}{5}n^2$, solving a problem of Elsholtz and Rackham. We show that all such asymptotic maximum sum-free sets lie completely in the stripe $\frac{4}{5}n-o(n)\le x+y\le\frac{8}{5}n+ o(n)$. We also determine for any positive integer $p$ the maximum size of a subset $A\subseteq \{1,2,\ldots,n\}^2$ which forbids the triple $(x,y,z)$ satisfying $px+py=z$.

preprint2021arXiv

An Unsupervised Domain Adaptation Model based on Dual-module Adversarial Training

In this paper, we propose a dual-module network architecture that employs a domain discriminative feature module to encourage the domain invariant feature module to learn more domain invariant features. The proposed architecture can be applied to any model that utilizes domain invariant features for unsupervised domain adaptation to improve its ability to extract domain invariant features. We conduct experiments with the Domain-Adversarial Training of Neural Networks (DANN) model as a representative algorithm. In the training process, we supply the same input to the two modules and then extract their feature distribution and prediction results respectively. We propose a discrepancy loss to find the discrepancy of the prediction results and the feature distribution between the two modules. Through the adversarial training by maximizing the loss of their feature distribution and minimizing the discrepancy of their prediction results, the two modules are encouraged to learn more domain discriminative and domain invariant features respectively. Extensive comparative evaluations are conducted and the proposed approach outperforms the state-of-the-art in most unsupervised domain adaptation tasks.

preprint2021arXiv

Classification of Long Noncoding RNA Elements Using Deep Convolutional Neural Networks and Siamese Networks

In the last decade, the discovery of noncoding RNA(ncRNA) has exploded. Classifying these ncRNA is critical todetermining their function. This thesis proposes a new methodemploying deep convolutional neural networks (CNNs) to classifyncRNA sequences. To this end, this paper first proposes anefficient approach to convert the RNA sequences into imagescharacterizing their base-pairing probability. As a result, clas-sifying RNA sequences is converted to an image classificationproblem that can be efficiently solved by available CNN-basedclassification models. This research also considers the foldingpotential of the ncRNAs in addition to their primary sequence.Based on the proposed approach, a benchmark image classifi-cation dataset is generated from the RFAM database of ncRNAsequences. In addition, three classical CNN models and threeSiamese network models have been implemented and comparedto demonstrate the superior performance and efficiency of theproposed approach. Extensive experimental results show thegreat potential of using deep learning approaches for RNAclassification.

preprint2021arXiv

Rainbow Pancyclicity in Graph Systems

Let $G_1,...,G_n$ be graphs on the same vertex set of size $n$, each graph with minimum degree $δ(G_i)\ge n/2$. A recent conjecture of Aharoni asserts that there exists a rainbow Hamiltonian cycle i.e. a cycle with edge set $\{e_1,...,e_n\}$ such that $e_i\in E(G_i)$ for $1\leq i \leq n$. This can be viewed as a rainbow version of the well-known Dirac theorem. In this paper, we prove this conjecture asymptotically by showing that for every $\varepsilon>0$, there exists an integer $N>0$, such that when $n>N$ for any graphs $G_1,...,G_n$ on the same vertex set of size $n$ with $δ(G_i)\ge (\frac{1}{2}+\varepsilon)n$, there exists a rainbow Hamiltonian cycle. Our main tool is the absorption technique. Additionally, we prove that with $δ(G_i)\geq \frac{n+1}{2}$ for each $i$, one can find rainbow cycles of length $3,...,n-1$.

preprint2021arXiv

Six-channel Image Representation for Cross-domain Object Detection

Most deep learning models are data-driven and the excellent performance is highly dependent on the abundant and diverse datasets. However, it is very hard to obtain and label the datasets of some specific scenes or applications. If we train the detector using the data from one domain, it cannot perform well on the data from another domain due to domain shift, which is one of the big challenges of most object detection models. To address this issue, some image-to-image translation techniques have been employed to generate some fake data of some specific scenes to train the models. With the advent of Generative Adversarial Networks (GANs), we could realize unsupervised image-to-image translation in both directions from a source to a target domain and from the target to the source domain. In this study, we report a new approach to making use of the generated images. We propose to concatenate the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset, hoping to address the domain shift problem while exploiting the success of available detection models. The idea of augmented data representation may inspire further study on object detection and other applications.

preprint2021arXiv

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images

Depth estimation and semantic segmentation play essential roles in scene understanding. The state-of-the-art methods employ multi-task learning to simultaneously learn models for these two tasks at the pixel-wise level. They usually focus on sharing the common features or stitching feature maps from the corresponding branches. However, these methods lack in-depth consideration on the correlation of the geometric cues and the scene parsing. In this paper, we first introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks through an analysis of the imaging process, then propose a Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption. To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation. In addition, considering the mutual implicit relationship between these two tasks, we exploit the iterative idea from the expectation-maximization algorithm to train the proposed network more effectively. Extensive experimental results on the Cityscapes and NYU v2 dataset are presented to demonstrate the superior performance of the proposed approach.

preprint2020arXiv

A Comparative Study on Polyp Classification using Convolutional Neural Networks

Colorectal cancer is the third most common cancer diagnosed in both men and women in the United States. Most colorectal cancers start as a growth on the inner lining of the colon or rectum, called &#39;polyp&#39;. Not all polyps are cancerous, but some can develop into cancer. Early detection and recognition of the type of polyps is critical to prevent cancer and change outcomes. However, visual classification of polyps is challenging due to varying illumination conditions of endoscopy, variant texture, appearance, and overlapping morphology between polyps. More importantly, evaluation of polyp patterns by gastroenterologists is subjective leading to a poor agreement among observers. Deep convolutional neural networks have proven very successful in object classification across various object categories. In this work, we compare the performance of the state-of-the-art general object classification models for polyp classification. We trained a total of six CNN models end-to-end using a dataset of 157 video sequences composed of two types of polyps: hyperplastic and adenomatous. Our results demonstrate that the state-of-the-art CNN models can successfully classify polyps with an accuracy comparable or better than reported among gastroenterologists. The results of this study can guide future research in polyp classification.

preprint2020arXiv

Bandit Convex Optimization in Non-stationary Environments

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the \emph{dynamic regret} as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves $O(T^{3/4}(1+P_T)^{1/2})$ and $O(T^{1/2}(1+P_T)^{1/2})$ dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the $Ω(T^{1/2}(1+P_T)^{1/2})$ lower bound established in this paper. Notably, our algorithm is more adaptive to non-stationary environments since it does not require prior knowledge of the path-length $P_T$ ahead of time, which is generally unknown.

preprint2020arXiv

Boosting Occluded Image Classification via Subspace Decomposition Based Estimation of Deep Features

Classification of partially occluded images is a highly challenging computer vision problem even for the cutting edge deep learning technologies. To achieve a robust image classification for occluded images, this paper proposes a novel scheme using subspace decomposition based estimation (SDBE). The proposed SDBE-based classification scheme first employs a base convolutional neural network to extract the deep feature vector (DFV) and then utilizes the SDBE to compute the DFV of the original occlusion-free image for classification. The SDBE is performed by projecting the DFV of the occluded image onto the linear span of a class dictionary (CD) along the linear span of an occlusion error dictionary (OED). The CD and OED are constructed respectively by concatenating the DFVs of a training set and the occlusion error vectors of an extra set of image pairs. Two implementations of the SDBE are studied in this paper: the $l_1$-norm and the squared $l_2$-norm regularized least-squares estimates. By employing the ResNet-152, pre-trained on the ILSVRC2012 training set, as the base network, the proposed SBDE-based classification scheme is extensively evaluated on the Caltech-101 and ILSVRC2012 datasets. Extensive experimental results demonstrate that the proposed SDBE-based scheme dramatically boosts the classification accuracy for occluded images, and achieves around $22.25\%$ increase in classification accuracy under $20\%$ occlusion on the ILSVRC2012 dataset.

preprint2020arXiv

Classification of Noncoding RNA Elements Using Deep Convolutional Neural Networks

The paper proposes to employ deep convolutional neural networks (CNNs) to classify noncoding RNA (ncRNA) sequences. To this end, we first propose an efficient approach to convert the RNA sequences into images characterizing their base-pairing probability. As a result, classifying RNA sequences is converted to an image classification problem that can be efficiently solved by available CNN-based classification models. The paper also considers the folding potential of the ncRNAs in addition to their primary sequence. Based on the proposed approach, a benchmark image classification dataset is generated from the RFAM database of ncRNA sequences. In addition, three classical CNN models have been implemented and compared to demonstrate the superior performance and efficiency of the proposed approach. Extensive experimental results show the great potential of using deep learning approaches for RNA classification.

preprint2020arXiv

Location-Aware Box Reasoning for Anchor-Based Single-Shot Object Detection

In the majority of object detection frameworks, the confidence of instance classification is used as the quality criterion of predicted bounding boxes, like the confidence-based ranking in non-maximum suppression (NMS). However, the quality of bounding boxes, indicating the spatial relations, is not only correlated with the classification scores. Compared with the region proposal network (RPN) based detectors, single-shot object detectors suffer the box quality as there is a lack of pre-selection of box proposals. In this paper, we aim at single-shot object detectors and propose a location-aware anchor-based reasoning (LAAR) for the bounding boxes. LAAR takes both the location and classification confidences into consideration for the quality evaluation of bounding boxes. We introduce a novel network block to learn the relative location between the anchors and the ground truths, denoted as a localization score, which acts as a location reference during the inference stage. The proposed localization score leads to an independent regression branch and calibrates the bounding box quality by scoring the predicted localization score so that the best-qualified bounding boxes can be picked up in NMS. Experiments on MS COCO and PASCAL VOC benchmarks demonstrate that the proposed location-aware framework enhances the performances of current anchor-based single-shot object detection frameworks and yields consistent and robust detection results.

preprint2020arXiv

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+ε$ moments for some $ε\in(0,1]$. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{1+ε}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Meanwhile, we provide an $Ω(d^{\fracε{1+ε}}T^{\frac{1}{1+ε}})$ lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of $d$ and $T$ when $ε=1$. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.

preprint2020arXiv

Plug-and-Play Rescaling Based Crowd Counting in Static Images

Crowd counting is a challenging problem especially in the presence of huge crowd diversity across images and complex cluttered crowd-like background regions, where most previous approaches do not generalize well and consequently produce either huge crowd underestimation or overestimation. To address these challenges, we propose a new image patch rescaling module (PRM) and three independent PRM employed crowd counting methods. The proposed frameworks use the PRM module to rescale the image regions (patches) that require special treatment, whereas the classification process helps in recognizing and discarding any cluttered crowd-like background regions which may result in overestimation. Experiments on three standard benchmarks and cross-dataset evaluation show that our approach outperforms the state-of-the-art models in the RMSE evaluation metric with an improvement up to 10.4%, and possesses superior generalization ability to new datasets.

preprint2020arXiv

Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters

In this paper, we investigate the empirical impact of orthogonality regularization (OR) in deep learning, either solo or collaboratively. Recent works on OR showed some promising results on the accuracy. In our ablation study, however, we do not observe such significant improvement from existing OR techniques compared with the conventional training based on weight decay, dropout, and batch normalization. To identify the real gain from OR, inspired by the locality sensitive hashing (LSH) in angle estimation, we propose to introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously to achieve (near) orthogonality among the filters, without using any other explicit regularization. Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network. We reveal that OR helps stabilize the training process and leads to faster convergence and better generalization.

preprint2020arXiv

ZoomCount: A Zooming Mechanism for Crowd Counting in Static Images

This paper proposes a novel approach for crowd counting in low to high density scenarios in static images. Current approaches cannot handle huge crowd diversity well and thus perform poorly in extreme cases, where the crowd density in different regions of an image is either too low or too high, leading to crowd underestimation or overestimation. The proposed solution is based on the observation that detecting and handling such extreme cases in a specialized way leads to better crowd estimation. Additionally, existing methods find it hard to differentiate between the actual crowd and the cluttered background regions, resulting in further count overestimation. To address these issues, we propose a simple yet effective modular approach, where an input image is first subdivided into fixed-size patches and then fed to a four-way classification module labeling each image patch as low, medium, high-dense or no-crowd. This module also provides a count for each label, which is then analyzed via a specifically devised novel decision module to decide whether the image belongs to any of the two extreme cases (very low or very high density) or a normal case. Images, specified as high- or low-density extreme or a normal case, pass through dedicated zooming or normal patch-making blocks respectively before routing to the regressor in the form of fixed-size patches for crowd estimate. Extensive experimental evaluations demonstrate that the proposed approach outperforms the state-of-the-art methods on four benchmarks under most of the evaluation criteria.