Source author record

Jun Hou

Jun Hou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning astro-ph.GA eess.IV astro-ph.CO eess.SY Systems and Control

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

Image-based 3D detection is an indispensable component of the perception system for autonomous driving. However, it still suffers from the unsatisfying performance, one of the main reasons for which is the limited training data. Unfortunately, annotating the objects in the 3D space is extremely time/resource-consuming, which makes it hard to extend the training set arbitrarily. In this work, we focus on the semi-supervised manner and explore the feasibility of a cheaper alternative, i.e. pseudo-labeling, to leverage the unlabeled data. For this purpose, we conduct extensive experiments to investigate whether the pseudo-labels can provide effective supervision for the baseline models under varying settings. The experimental results not only demonstrate the effectiveness of the pseudo-labeling mechanism for image-based 3D detection (e.g. under monocular setting, we achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP), but also show several interesting and noteworthy findings (e.g. the models trained with pseudo-labels perform better than that trained with ground-truth annotations based on the same training data). We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting. The codes, pseudo-labels, and pre-trained models will be publicly available.

preprint2022arXiv

Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning

Video highlight detection is a crucial yet challenging problem that aims to identify the interesting moments in untrimmed videos. The key to this task lies in effective video representations that jointly pursue two goals, \textit{i.e.}, cross-modal representation learning and fine-grained feature discrimination. In this paper, these two challenges are tackled by not only enriching intra-modality and cross-modality relations for representation modeling but also shaping the features in a discriminative manner. Our proposed method mainly leverages the intra-modality encoding and cross-modality co-occurrence encoding for fully representation modeling. Specifically, intra-modality encoding augments the modality-wise features and dampens irrelevant modality via within-modality relation learning in both audio and visual signals. Meanwhile, cross-modality co-occurrence encoding focuses on the co-occurrence inter-modality relations and selectively captures effective information among multi-modality. The multi-modal representation is further enhanced by the global information abstracted from the local context. In addition, we enlarge the discriminative power of feature embedding with a hard-pairs guided contrastive learning (HPCL) scheme. A hard-pairs sampling strategy is further employed to mine the hard samples for improving feature discrimination in HPCL. Extensive experiments conducted on two benchmarks demonstrate the effectiveness and superiority of our proposed methods compared to other state-of-the-art methods.

preprint2022arXiv

Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation

It has been found that temporal action proposal generation, which aims to discover the temporal action instances within the range of the start and end frames in the untrimmed videos, can largely benefit from proper temporal and semantic context exploitation. The latest efforts were dedicated to considering the temporal context and similarity-based semantic contexts through self-attention modules. However, they still suffer from cluttered background information and limited contextual feature learning. In this paper, we propose a novel Pyramid Region-based Slot Attention (PRSlot) module to address these issues. Instead of using the similarity computation, our PRSlot module directly learns the local relations in an encoder-decoder manner and generates the representation of a local region enhanced based on the attention over input features called \textit{slot}. Specifically, upon the input snippet-level features, PRSlot module takes the target snippet as \textit{query}, its surrounding region as \textit{key} and then generates slot representations for each \textit{query-key} slot by aggregating the local snippet context with a parallel pyramid strategy. Based on PRSlot modules, we present a novel Pyramid Region-based Slot Attention Network termed PRSA-Net to learn a unified visual representation with rich temporal and semantic context for better proposal generation. Extensive experiments are conducted on two widely adopted THUMOS14 and ActivityNet-1.3 benchmarks. Our PRSA-Net outperforms other state-of-the-art methods. In particular, we improve the AR@100 from the previous best 50.67% to 56.12% for proposal generation and raise the mAP under 0.5 tIoU from 51.9\% to 58.7\% for action detection on THUMOS14. \textit{Code is available at} \url{https://github.com/handhand123/PRSA-Net}

preprint2022arXiv

StyleFlow For Content-Fixed Image to Image Translation

Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performance in weakly constrained tasks, they failed to fully preserve the content in both strongly and normally constrained tasks, including photo-realism synthesis, style transfer, and colorization, etc. To achieve content-preserving transfer in strongly constrained and normally constrained tasks, we propose StyleFlow, a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module. With the invertible network structure, StyleFlow first projects input images into deep feature space in the forward pass, while the backward pass utilizes the SAN module to perform content-fixed feature transformation and then projects back to image space. Our model supports both image-guided translation and multi-modal synthesis. We evaluate our model in several I2I translation benchmarks, and the results show that the proposed model has advantages over previous methods in both strongly constrained and normally constrained tasks.

preprint2020arXiv

A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data

A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When training dataset is sufficiently large, we prove that the corrected models deliver correct classification results as if there is no corruption in the training data. For datasets of finite size, the corrected models produce significantly better recovery results, compared to the models without the correction algorithm. All of the theoretical findings in the paper are verified by our numerical examples.

preprint2020arXiv

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower accuracy. To design an efficient and effective model, we propose the guided training of CTC (GTC), where CTC model learns a better alignment and feature representations from a more powerful attentional guidance. With the benefit of guided training, CTC model achieves robust and accurate prediction for both regular and irregular scene text while maintaining a fast inference speed. Moreover, to further leverage the potential of CTC decoder, a graph convolutional network (GCN) is proposed to learn the local correlations of extracted features. Extensive experiments on standard benchmarks demonstrate that our end-to-end model achieves a new state-of-the-art for regular and irregular scene text recognition and needs 6 times shorter inference time than attentionbased methods.

preprint2020arXiv

Individual Cell Fault Detection for Parallel-Connected Battery Cells Based on the Statistical Model and Analysis

Fault diagnosis is extremely important to the safe operation of Lithium-ion batteries. To avoid severe safety issues (e.g., thermal runaway), initial faults should be timely detected and resolved. In this paper, we consider parallel-connected battery cells with only one voltage and one current sensor. The lack of independent current sensors makes it difficult to detect individual cell degradation. To this end, based on the high-frequency response of the battery, a simplified fault detection-oriented model is derived and validated by a physics-informed battery model. The resistance of the battery string, which is significantly influenced by the faulty cell, is estimated and used as the health indicator. The statistical resistance distribution of battery strings is first analyzed considering the distribution of fresh and aged cells. A fault diagnosis algorithm is proposed and the thresholds (i.e., 2 standard deviation interval) are obtained through statistical analysis. Monte Carlo simulation results show that the proposed fault diagnosis algorithm can balance false alarms and missed detections well. In addition, it is verified that the proposed algorithm is robust to the uniform parameter changes of individual battery cells.

preprint2016arXiv

Constraining SN feedback: a tug of war between reionization and the Milky Way satellites

Theoretical models of galaxy formation based on the cold dark matter cosmogony typically require strong feedback from supernova (SN) explosions in order to reproduce the Milky Way satellite galaxy luminosity function and the faint end of the field galaxy luminosity function. However, too strong a SN feedback also leads to the universe reionizing too late, and the metallicities of Milky Way satellites being too low. The combination of these four observations therefore places tight constraints on SN feedback. We investigate these constraints using the semi-analytical galaxy formation model galform. We find that these observations favour a SN feedback model in which the feedback strength evolves with redshift. We find that, for our best fit model, half of the ionizing photons are emitted by galaxies with rest-frame far-UV absolute magnitudes $M_{\rm AB}(1500{\rm Å})<-17.5$, which implies that already observed galaxy populations contribute about half of the photons responsible for reionization. The $z=0$ descendants of these galaxies are mainly galaxies with stellar mass $M_*>10^{10}\,{\rm M}_{\odot}$ and preferentially inhabit halos with mass $M_{\rm halo}>10^{13}\,{\rm M}_{\odot}$.

preprint2014arXiv

Probing baryonic processes and gastrophysics in the formation of the Milky Way dwarf satellites: I. metallicity distribution properties

In this paper, we study the chemical properties of the stars in the dwarf satellites around the MW-like host galaxies, and explore the possible effects of several baryonic processes, including supernova (SN) feedback, the reionization of the universe and H$_2$ cooling, on them and how current and future observations may put some constraints on these processes. We use a semi-analytical model to generate MW-like galaxies, for which a fiducial model can reproduce the luminosity function and the stellar metallicity--stellar mass correlation of the MW dwarfs. Using the simulated MW-like galaxies, we focus on investigating three metallicity properties of their dwarfs: the stellar metallicity--stellar mass correlation of the dwarf population, and the metal-poor and metal-rich tails of the stellar metallicity distribution in individual dwarfs. We find that (1) the slope of the stellar metallicity--stellar mass correlation is sensitive to the SN feedback strength and the reionization epoch; (2) the extension of the metal-rich tails is mainly sensitive to the SN feedback strength; (3) the extension of the metal-poor tails is mainly sensitive to the reionization epoch; (4) none of the three chemical properties are sensitive to the H$_2$ cooling process; and (5) comparison of our model results with the current observational slope of the stellar metallicity--stellar mass relation suggests that the local universe is reionized earlier than the cosmic average and local sources may have a significant contribution to the reionization in the local region, and an intermediate to strong SN feedback strength is preferred. Future observations of metal-rich and metal-poor tails of stellar metallicity distributions will put further constraints on the SN feedback and the reionization processes.

Jun Hou

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning

Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation

StyleFlow For Content-Fixed Image to Image Translation

A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

Individual Cell Fault Detection for Parallel-Connected Battery Cells Based on the Statistical Model and Analysis

Constraining SN feedback: a tug of war between reionization and the Milky Way satellites

Probing baryonic processes and gastrophysics in the formation of the Milky Way dwarf satellites: I. metallicity distribution properties