Researcher profile

Hajime Nagahara

Hajime Nagahara contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Point-Supervised Facial Expression Spotting with Gaussian-Based Instance-Adaptive Intensity Modeling

Automatic facial expression spotting, which aims to identify facial expression instances in untrimmed videos, is crucial for facial expression analysis. Existing methods primarily focus on fully-supervised learning and rely on costly, time-consuming temporal boundary annotations. In this paper, we investigate point-supervised facial expression spotting (P-FES), where only a single timestamp annotation per instance is required for training. We propose a unique two-branch framework for P-FES. First, to mitigate the limitation of hard pseudo-labeling, which often confuses neutral and expression frames with various intensities, we propose a Gaussian-based instance-adaptive intensity modeling (GIM) module to model instance-level expression intensity distribution for soft pseudo-labeling. By detecting the pseudo-apex frame around each point label, estimating the duration, and constructing an instance-level Gaussian distribution, GIM assigns soft pseudo-labels to expression frames for more reliable intensity supervision. The GIM module is incorporated into our framework to optimize the class-agnostic expression intensity branch. Second, we design a class-aware apex classification branch that distinguishes macro- and micro-expressions solely based on their pseudo-apex frames. During inference, the two branches work independently: the class-agnostic expression intensity branch generates expression proposals, while the class-aware apex-classification branch is responsible for macro- and micro-expression classification. Furthermore, we introduce an intensity-aware contrastive loss to enhance discriminative feature learning and suppress neutral noise by contrasting neutral frames with expression frames with various intensities. Extensive experiments on the SAMM-LV, CAS(ME)$^2$, and CAS(ME)$^3$ datasets demonstrate the effectiveness of our proposed framework.

preprint2022arXiv

Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conduct an exhaustive analysis based on hundreds of cross-experiments on 12 vision-and-language tasks categorized in 4 groups. Whereas tasks in the same group are prone to improve each other, results show that this is not always the case. Other factors such as dataset size or pre-training stage have also a great impact on how well the knowledge is transferred.

preprint2022arXiv

Privacy-Preserving Action Recognition via Motion Difference Quantization

The widespread use of smart computer vision systems in our personal spaces has led to an increased consciousness about the privacy and security risks that these systems pose. On the one hand, we want these systems to assist in our daily lives by understanding their surroundings, but on the other hand, we want them to do so without capturing any sensitive information. Towards this direction, this paper proposes a simple, yet robust privacy-preserving encoder called BDQ for the task of privacy-preserving human action recognition that is composed of three modules: Blur, Difference, and Quantization. First, the input scene is passed to the Blur module to smoothen the edges. This is followed by the Difference module to apply a pixel-wise intensity subtraction between consecutive frames to highlight motion features and suppress obvious high-level privacy attributes. Finally, the Quantization module is applied to the motion difference frames to remove the low-level privacy attributes. The BDQ parameters are optimized in an end-to-end fashion via adversarial training such that it learns to allow action recognition attributes while inhibiting privacy attributes. Our experiments on three benchmark datasets show that the proposed encoder design can achieve state-of-the-art trade-off when compared with previous works. Furthermore, we show that the trade-off achieved is at par with the DVS sensor-based event cameras. Code available at: https://github.com/suakaw/BDQ_PrivacyAR.

preprint2020arXiv

Joint Learning of Vessel Segmentation and Artery/Vein Classification with Post-processing

Retinal imaging serves as a valuable tool for diagnosis of various diseases. However, reading retinal images is a difficult and time-consuming task even for experienced specialists. The fundamental step towards automated retinal image analysis is vessel segmentation and artery/vein classification, which provide various information on potential disorders. To improve the performance of the existing automated methods for retinal image analysis, we propose a two-step vessel classification. We adopt a UNet-based model, SeqNet, to accurately segment vessels from the background and make prediction on the vessel type. Our model does segmentation and classification sequentially, which alleviates the problem of label distribution bias and facilitates training. To further refine classification results, we post-process them considering the structural information among vessels to propagate highly confident prediction to surrounding vessels. Our experiments show that our method improves AUC to 0.98 for segmentation and the accuracy to 0.92 in classification over DRIVE dataset.