Source author record

Bo He

Bo He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Artificial Intelligence cond-mat.mtrl-sci math.NT hep-ex hep-ph Neural and Evolutionary Computing Robotics

Catalog footprint

What is connected

15works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

This paper presents VideoLoom, a unified Video Large Language Model (Video LLM) for joint spatial-temporal understanding. To facilitate the development of fine-grained spatial and temporal localization capabilities, we curate LoomData-8.7k, a human-centric video dataset with temporally grounded and spatially localized captions. With this, VideoLoom achieves state-of-the-art or highly competitive performance across a variety of spatial and temporal benchmarks (e.g., 63.1 J&F on ReVOS for referring video object segmentation, and 48.3 R1@0.7 on Charades-STA for temporal grounding). In addition, we introduce LoomBench, a novel benchmark consisting of temporal, spatial, and compositional video-question pairs, enabling a comprehensive evaluation of Video LLMs from diverse aspects. Collectively, these contributions offer a universal and effective suite for joint spatial-temporal video understanding, setting a new standard in multimodal intelligence.

preprint2022arXiv

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training. Without the boundary information of action segments, existing methods mostly rely on multiple instance learning (MIL), where the predictions of unlabeled instances (i.e., video snippets) are supervised by classifying labeled bags (i.e., untrimmed videos). However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. Our framework entails three segment-centric components: (i) dynamic segment sampling for compensating the contribution of short actions; (ii) intra- and inter-segment attention for modeling action dynamics and capturing temporal dependencies; (iii) pseudo instance-level supervision for improving action boundary prediction. Furthermore, a multi-step refinement strategy is proposed to progressively improve action proposals along the model training process. Extensive experiments on THUMOS-14 and ActivityNet-v1.3 demonstrate the effectiveness of our approach, establishing new state of the art on both datasets. The code and models are publicly available at~\url{https://github.com/boheumd/ASM-Loc}.

preprint2022arXiv

ColdGuess: A General and Effective Relational Graph Convolutional Network to Tackle Cold Start Cases

Low-quality listings and bad actor behavior in online retail websites threatens e-commerce business as these result in sub-optimal buying experience and erode customer trust. When a new listing is created, how to tell it has good-quality? Is the method effective, fast, and scalable? Previous approaches often have three limitations/challenges: (1) unable to handle cold start problems where new sellers/listings lack sufficient selling histories. (2) inability of scoring hundreds of millions of listings at scale, or compromise performance for scalability. (3) has space challenges from large-scale graph with giant e-commerce business size. To overcome these limitations/challenges, we proposed ColdGuess, an inductive graph-based risk predictor built upon a heterogeneous seller product graph, which effectively identifies risky seller/product/listings at scale. ColdGuess tackles the large-scale graph by consolidated nodes, and addresses the cold start problems using homogeneous influence1. The evaluation on real data demonstrates that ColdGuess has stable performance as the number of unknown features increases. It outperforms the lightgbm2 by up to 34 pcp ROC-AUC in a cold start case when a new seller sells a new product . The resulting system, ColdGuess, is effective, adaptable to changing risky seller behavior, and is already in production

preprint2022arXiv

High flexoelectric constants in Janus transition-metal dichalcogenides

Due to their combination of mechanical stiffness and flexibility, two-dimensional (2D) materials have received significant interest as potential electromechanical materials. Flexoelectricity is an electromechanical coupling between strain gradient and polarization. Unlike piezoelectricity, which exists only in non-centrosymmetric materials, flexoelectricity theoretically exists in all dielectric materials. However, most work on the electromechanical energy conversion potential of 2D materials has focused on their piezoelectric, and not flexoelectric behavior and properties. In the present work, we demonstrate that the intrinsic structural asymmetry present in monolayer Janus transition metal dichalcogenides (TMDCs) enables significant flexoelectric properties. We report these flexoelectric properties using a recently developed charge-dipole model that couples with classical molecular dynamics simulations. By employing a prescribed bending deformation, we directly calculate the flexoelectric constants while eliminating the piezoelectric contribution to the polarization. We find that the flexoelectric response of a Janus TMDC is positively correlated to its initial degree of asymmetry, which contributes to stronger $σ-σ$ interactions as the initial degree of asymmetry rises. In addition, the high transfer of charge across atoms in Janus TMDCs leads to larger electric fields due to $π-σ$ coupling. These enhanced $σ-σ$ and $π-σ$ interactions are found to cause the flexoelectric coefficients of the Janus TMDCs to be several times higher than traditional TMDCs such as MoS$_{2}$, whose flexoelectric constant is already ten times larger than graphene.

preprint2022arXiv

Intrinsic bending flexoelectric constants in two-dimensional materials

Flexoelectricity is a form of electromechanical coupling that has recently emerged because, unlike piezoelectricity, it is theoretically possible in any dielectric material. Two-dimensional (2D) materials have also garnered significant interest because of their unusual electromechanical properties and high flexibility, but the intrinsic flexoelectric properties of these materials remain unresolved. In this work, using atomistic modeling accounting for charge-dipole interactions, we report the intrinsic flexoelectric constants for a range of two-dimensional materials, including graphene allotropes, nitrides, graphene analogs of group-IV elements, and the transition metal dichalcogenides (TMDCs). We accomplish this through a proposed mechanical bending scheme that eliminates the piezoelectric contribution to the total polarization, which enables us to directly measure the flexoelectric constants. While flat 2D materials like graphene have low flexoelectric constants due to weak $π-σ$ interactions, buckling is found to increase the flexoelectric constants in monolayer group-IV elements. Finally, due to significantly enhanced charge transfer coupled with structural asymmetry due to bending, the TMDCs are found to have the largest flexoelectric constants, including MoS$_{2}$ having a flexoelectric constant ten times larger than graphene.

preprint2022arXiv

Learning Semantic Correspondence with Sparse Annotations

Finding dense semantic correspondence is a fundamental problem in computer vision, which remains challenging in complex scenes due to background clutter, extreme intra-class variation, and a severe lack of ground truth. In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. To this end, we first propose a teacher-student learning paradigm for generating dense pseudo-labels and then develop two novel strategies for denoising pseudo-labels. In particular, we use spatial priors around the sparse annotations to suppress the noisy pseudo-labels. In addition, we introduce a loss-driven dynamic label selection strategy for label denoising. We instantiate our paradigm with two variants of learning strategies: a single offline teacher setting, and mutual online teachers setting. Our approach achieves notable improvements on three challenging benchmarks for semantic correspondence and establishes the new state-of-the-art. Project page: https://shuaiyihuang.github.io/publications/SCorrSAN.

preprint2021arXiv

GTA: Global Temporal Attention for Video Action Understanding

Self-attention learns pairwise interactions to model long-range dependencies, yielding great improvements for video action recognition. In this paper, we seek a deeper understanding of self-attention for temporal modeling in videos. We first demonstrate that the entangled modeling of spatio-temporal information by flattening all pixels is sub-optimal, failing to capture temporal relationships among frames explicitly. To this end, we introduce Global Temporal Attention (GTA), which performs global temporal attention on top of spatial attention in a decoupled manner. We apply GTA on both pixels and semantically similar regions to capture temporal relationships at different levels of spatial granularity. Unlike conventional self-attention that computes an instance-specific attention matrix, GTA directly learns a global attention matrix that is intended to encode temporal structures that generalize across different samples. We further augment GTA with a cross-channel multi-head fashion to exploit channel interactions for better temporal modeling. Extensive experiments on 2D and 3D networks demonstrate that our approach consistently enhances temporal modeling and provides state-of-the-art performance on three video action recognition datasets.

preprint2020arXiv

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks---straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with the deep interactive RL method and can adapt to the actual environment by further learning from environmental rewards.

preprint2015arXiv

Another generalization of a theorem of Baker and Davenport

Dujella and Pethő, generalizing a result of Baker and Davenport, proved that the set $\{1, 3\}$ cannot be extended to a Diophantine quintuple. As a consequence of our main result, it is shown that the Diophantine pair $\{1, b\}$ cannot be extended to a Diophantine quintuple if $b-1$ is a prime.

preprint2014arXiv

HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection

In this paper, we propose a novel method for fast face recognition called L1/2 Regularized Sparse Representation using Hierarchical Feature Selection (HSR). By employing hierarchical feature selection, we can compress the scale and dimension of global dictionary, which directly contributes to the decrease of computational cost in sparse representation that our approach is strongly rooted in. It consists of Gabor wavelets and Extreme Learning Machine Auto-Encoder (ELM-AE) hierarchically. For Gabor wavelets part, local features can be extracted at multiple scales and orientations to form Gabor-feature based image, which in turn improves the recognition rate. Besides, in the presence of occluded face image, the scale of Gabor-feature based global dictionary can be compressed accordingly because redundancies exist in Gabor-feature based occlusion dictionary. For ELM-AE part, the dimension of Gabor-feature based global dictionary can be compressed because high-dimensional face images can be rapidly represented by low-dimensional feature. By introducing L1/2 regularization, our approach can produce sparser and more robust representation compared to regularized Sparse Representation based Classification (SRC), which also contributes to the decrease of the computational cost in sparse representation. In comparison with related work such as SRC and Gabor-feature based SRC (GSRC), experimental results on a variety of face databases demonstrate the great advantage of our method for computational cost. Moreover, we also achieve approximate or even better recognition rate.

preprint2014arXiv

LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data

Extreme learning machine (ELM) as a neural network algorithm has shown its good performance, such as fast speed, simple structure etc, but also, weak robustness is an unavoidable defect in original ELM for blended data. We present a new machine learning framework called LARSEN-ELM for overcoming this problem. In our paper, we would like to show two key steps in LARSEN-ELM. In the first step, preprocessing, we select the input variables highly related to the output using least angle regression (LARS). In the second step, training, we employ Genetic Algorithm (GA) based selective ensemble and original ELM. In the experiments, we apply a sum of two sines and four datasets from UCI repository to verify the robustness of our approach. The experimental results show that compared with original ELM and other methods such as OP-ELM, GASEN-ELM and LSBoost, LARSEN-ELM significantly improve robustness performance while keeping a relatively high speed.

preprint2014arXiv

On Diophantine quintuple Conjecture

In this note, we prove that if $\{a,b,c,d,e\}$ with $a<b<c<d<e$ is a Diophantine quintuple, then $d<10^{76}$.

preprint2014arXiv

RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement

Extreme learning machine (ELM) as an emerging branch of shallow networks has shown its excellent generalization and fast learning speed. However, for blended data, the robustness of ELM is weak because its weights and biases of hidden nodes are set randomly. Moreover, the noisy data exert a negative effect. To solve this problem, a new framework called RMSE-ELM is proposed in this paper. It is a two-layer recursive model. In the first layer, the framework trains lots of ELMs in different groups concurrently, then employs selective ensemble to pick out an optimal set of ELMs in each group, which can be merged into a large group of ELMs called candidate pool. In the second layer, selective ensemble is recursively used on candidate pool to acquire the final ensemble. In the experiments, we apply UCI blended datasets to confirm the robustness of our new approach in two key aspects (mean square error and standard deviation). The space complexity of our method is increased to some degree, but the results have shown that RMSE-ELM significantly improves robustness with slightly computational time compared with representative methods (ELM, OP-ELM, GASEN-ELM, GASEN-BP and E-GASEN). It becomes a potential framework to solve robustness issue of ELM for high-dimensional blended data in the future.

preprint2014arXiv

Robust OS-ELM with a novel selective ensemble based on particle swarm optimization

In this paper, a robust online sequential extreme learning machine (ROS-ELM) is proposed. It is based on the original OS-ELM with an adaptive selective ensemble framework. Two novel insights are proposed in this paper. First, a novel selective ensemble algorithm referred to as particle swarm optimization selective ensemble (PSOSEN) is proposed. Noting that PSOSEN is a general selective ensemble method which is applicable to any learning algorithms, including batch learning and online learning. Second, an adaptive selective ensemble framework for online learning is designed to balance the robustness and complexity of the algorithm. Experiments for both regression and classification problems with UCI data sets are carried out. Comparisons between OS-ELM, simple ensemble OS-ELM (EOS-ELM) and the proposed ROS-ELM empirically show that ROS-ELM significantly improves the robustness and stability.

preprint2002arXiv

Less suppressed mu-e-gamma and tau-mu-gamma loop amplitudes and extra dimension theories

When mu-e-gamma (or tau-mu-gamma) loop involves a vector boson, the amplitude is suppressed by more than two powers of heavy particle masses. However we show that the scalar boson loop diagrams are much less damped. Particularly, the loop amplitude in which the intermediate fermion and scalar boson have comparable masses is as large as possible, as allowed by the decoupling theorem. Such a situation is realized in the "universal extra dimension theory", and can yield a large enough rate to be detectable in current experiments. Our investigation involves precise calculation of the scalar boson loop's dependence on the masses of the intermediate states.

Bo He

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

ColdGuess: A General and Effective Relational Graph Convolutional Network to Tackle Cold Start Cases

High flexoelectric constants in Janus transition-metal dichalcogenides

Intrinsic bending flexoelectric constants in two-dimensional materials

Learning Semantic Correspondence with Sparse Annotations

GTA: Global Temporal Attention for Video Action Understanding

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

Another generalization of a theorem of Baker and Davenport

HSR: L1/2 Regularized Sparse Representation for Fast Face Recognition using Hierarchical Feature Selection

LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data

On Diophantine quintuple Conjecture

RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement

Robust OS-ELM with a novel selective ensemble based on particle swarm optimization

Less suppressed mu-e-gamma and tau-mu-gamma loop amplitudes and extra dimension theories