Source author record

Kanji Tanaka

Kanji Tanaka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

14works
3topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

MOON: Multi-Objective Optimization-Driven Object-Goal Navigation Using a Variable-Horizon Set-Orienteering Planner

This paper proposes MOON (Multi-Objective Optimization-driven Object-goal Navigation), a novel framework designed for efficient navigation in large-scale, complex indoor environments. While existing methods often rely on local heuristics, they frequently fail to address the strategic trade-offs between competing objectives in vast areas. To overcome this, we formulate the task as a multi-objective optimization problem (MOO) that balances frontier-based exploration with the exploitation of observed landmarks. Our prototype integrates three key pillars: (1) QOM [IROS05] for discriminative landmark encoding; (2) StructNav [RSS23] to enhance the navigation pipeline; and (3) a variable-horizon Set Orienteering Problem (SOP) formulation for globally coherent planning. To further support the framework's scalability, we provide a detailed theoretical foundation for the budget-constrained SOP formulation and the data-driven mode-switching strategy that enables long-horizon resource allocation. Additionally, we introduce a high-speed neural planner that distills the expert solver into a transformer-based model, reducing decision latency by a factor of nearly 10 while maintaining high planning quality.

preprint2022arXiv

Active Domain-Invariant Self-Localization Using Ego-Centric and World-Centric Maps

The training of a next-best-view (NBV) planner for visual place recognition (VPR) is a fundamentally important task in autonomous robot navigation, for which a typical approach is the use of visual experiences that are collected in the target domain as training data. However, the collection of a wide variety of visual experiences in everyday navigation is costly and prohibitive for real-time robotic applications. We address this issue by employing a novel {\it domain-invariant} NBV planner. A standard VPR subsystem based on a convolutional neural network (CNN) is assumed to be available, and its domain-invariant state recognition ability is proposed to be transferred to train the domain-invariant NBV planner. Specifically, we divide the visual cues that are available from the CNN model into two types: the output layer cue (OLC) and intermediate layer cue (ILC). The OLC is available at the output layer of the CNN model and aims to estimate the state of the robot (e.g., the robot viewpoint) with respect to the world-centric view coordinate system. The ILC is available within the middle layers of the CNN model as a high-level description of the visual content (e.g., a saliency image) with respect to the ego-centric view. In our framework, the ILC and OLC are mapped to a state vector and subsequently used to train a multiview NBV planner via deep reinforcement learning. Experiments using the public NCLT dataset validate the effectiveness of the proposed method.

preprint2022arXiv

Domain Invariant Siamese Attention Mask for Small Object Change Detection via Everyday Indoor Robot Navigation

The problem of image change detection via everyday indoor robot navigation is explored from a novel perspective of the self-attention technique. Detecting semantically non-distinctive and visually small changes remains a key challenge in the robotics community. Intuitively, these small non-distinctive changes may be better handled by the recent paradigm of the attention mechanism, which is the basic idea of this work. However, existing self-attention models require significant retraining cost per domain, so it is not directly applicable to robotics applications. We propose a new self-attention technique with an ability of unsupervised on-the-fly domain adaptation, which introduces an attention mask into the intermediate layer of an image change detection model, without modifying the input and output layers of the model. Experiments, in which an indoor robot aims to detect visually small changes in everyday navigation, demonstrate that our attention technique significantly boosts the state-of-the-art image change detection model.

preprint2022arXiv

Exploring Self-Attention for Visual Intersection Classification

In robot vision, self-attention has recently emerged as a technique for capturing non-local contexts. In this study, we introduced a self-attention mechanism into the intersection recognition system as a method to capture the non-local contexts behind the scenes. An intersection classification system comprises two distinctive modules: (a) a first-person vision (FPV) module, which uses a short egocentric view sequence as the intersection is passed, and (b) a third-person vision (TPV) module, which uses a single view immediately before entering the intersection. The self-attention mechanism is effective in the TPV module because most parts of the local pattern (e.g., road edges, buildings, and sky) are similar to each other, and thus the use of a non-local context (e.g., the angle between two diagonal corners around an intersection) would be effective. This study makes three major contributions. First, we proposed a self-attention-based approach for intersection classification using TPVs. Second, we presented a practical system in which a self-attention-based TPV module is combined with an FPV module to improve the overall recognition performance. Finally, experiments using the public KITTI dataset show that the above self-attention-based system outperforms conventional recognition based on local patterns and recognition based on convolution operations.

preprint2022arXiv

Minimum Cost Multicuts for Incorrect Landmark Edge Detection in Pose-graph SLAM

Pose-graph SLAM is the de facto standard framework for constructing large-scale maps from multi-session experiences of relative observations and motions during visual robot navigation. It has received increasing attention in the context of recent advanced SLAM frameworks such as graph neural SLAM. One remaining challenge is landmark misrecognition errors (i.e., incorrect landmark edges) that can have catastrophic effects on the inferred pose-graph map. In this study, we present comprehensive criteria to maximize global consistency in the pose graph using a new robust graph cut technique. Our key idea is to formulate the problem as a minimum-cost multi-cut that enables us to optimize not only landmark correspondences but also the number of landmarks while allowing for a varying number of landmarks. This makes our proposed approach invariant against the type of landmark measurement, graph topology, and metric information, such as the speed of the robot motion. The proposed graph cut technique was integrated into a practical SLAM framework and verified experimentally using the public NCLT dataset.

preprint2021arXiv

Domain-invariant NBV Planner for Active Cross-domain Self-localization

Pole-like landmark has received increasing attention as a domain-invariant visual cue for visual robot self-localization across domains (e.g., seasons, times of day, weathers). However, self-localization using pole-like landmarks can be ill-posed for a passive observer, as many viewpoints may not provide any pole-like landmark view. To alleviate this problem, we consider an active observer and explore a novel "domain-invariant" next-best-view (NBV) planner that attains consistent performance over different domains (i.e., maintenance-free), without requiring the expensive task of training data collection and retraining. In our approach, a novel multi-encoder deep convolutional neural network enables to detect domain invariant pole-like landmarks, which are then used as the sole input to a model-free deep reinforcement learning -based domain-invariant NBV planner. Further, we develop a practical system for active self-localization using sparse invariant landmarks and dense discriminative landmarks. In experiments, we demonstrate that the proposed method is effective both in efficient landmark detection and in discriminative self-localization.

preprint2016arXiv

Compressive Change Retrieval for Moving Object Detection

Change detection, or anomaly detection, from street-view images acquired by an autonomous robot at multiple different times, is a major problem in robotic mapping and autonomous driving. Formulation as an image comparison task, which operates on a given pair of query and reference images is common to many existing approaches to this problem. Unfortunately, providing relevant reference images is not straightforward. In this paper, we propose a novel formulation for change detection, termed compressive change retrieval, which can operate on a query image and similar reference images retrieved from the web. Compared to previous formulations, there are two sources of difficulty. First, the retrieved reference images may frequently contain non-relevant reference images, because even state-of-the-art place-recognition techniques suffer from retrieval noise. Second, image comparison needs to be conducted in a compressed domain to minimize the storage cost of large collections of street-view images. To address the above issues, we also present a practical change detection algorithm that uses compressed bag-of-words (BoW) image representation as a scalable solution. The results of experiments conducted on a practical change detection task, "moving object detection (MOD)," using the publicly available Malaga dataset validate the effectiveness of the proposed approach.

preprint2016arXiv

Deformable Map Matching for Uncertain Loop-Less Maps

In the classical context of robotic mapping and localization, map matching is typically defined as the task of finding a rigid transformation (i.e., 3DOF rotation/translation on the 2D moving plane) that aligns the query and reference maps built by mobile robots. This definition is valid in loop-rich trajectories that enable a mapper robot to close many loops, for which precise maps can be assumed. The same cannot be said about the newly emerging autonomous navigation and driving systems, which typically operate in loop-less trajectories that have no large loop (e.g., straight paths). In this paper, we propose a solution that overcomes this limitation by merging the two maps. Our study is motivated by the observation that even when there is no large loop in either the query or reference map, many loops can often be obtained in the merged map. We add two new aspects to map matching: (1) image retrieval with discriminative deep convolutional neural network (DCNN) features, which efficiently generates a small number of good initial alignment hypotheses; and (2) map merge, which jointly deforms the two maps to minimize differences in shape between them. To realize practical computation time, we also present a preemption scheme that avoids excessive evaluation of useless map-matching hypotheses. To verify our approach experimentally, we created a novel collection of uncertain loop-less maps by utilizing the recently published North Campus Long-Term (NCLT) dataset and its ground-truth GPS data. The results obtained using these map collections confirm that our approach improves on previous map-matching approaches.

preprint2016arXiv

Multi-Model Hypothesize-and-Verify Approach for Incremental Loop Closure Verification

Loop closure detection, which is the task of identifying locations revisited by a robot in a sequence of odometry and perceptual observations, is typically formulated as a visual place recognition (VPR) task. However, even state-of-the-art VPR techniques generate a considerable number of false positives as a result of confusing visual features and perceptual aliasing. In this paper, we propose a robust incremental framework for loop closure detection, termed incremental loop closure verification. Our approach reformulates the problem of loop closure detection as an instance of a multi-model hypothesize-and-verify framework, in which multiple loop closure hypotheses are generated and verified in terms of the consistency between loop closure hypotheses and VPR constraints at multiple viewpoints along the robot's trajectory. Furthermore, we consider the general incremental setting of loop closure detection, in which the system must update both the set of VPR constraints and that of loop closure hypotheses when new constraints or hypotheses arrive during robot navigation. Experimental results using a stereo SLAM system and DCNN features and visual odometry validate effectiveness of the proposed approach.

preprint2015arXiv

Discriminative Map Retrieval Using View-Dependent Map Descriptor

Map retrieval, the problem of similarity search over a large collection of 2D pointset maps previously built by mobile robots, is crucial for autonomous navigation in indoor and outdoor environments. Bag-of-words (BoW) methods constitute a popular approach to map retrieval; however, these methods have extremely limited descriptive ability because they ignore the spatial layout information of the local features. The main contribution of this paper is an extension of the bag-of-words map retrieval method to enable the use of spatial information from local features. Our strategy is to explicitly model a unique viewpoint of an input local map; the pose of the local feature is defined with respect to this unique viewpoint, and can be viewed as an additional invariant feature for discriminative map retrieval. Specifically, we wish to determine a unique viewpoint that is invariant to moving objects, clutter, occlusions, and actual viewpoints. Hence, we perform scene parsing to analyze the scene structure, and consider the "center" of the scene structure to be the unique viewpoint. Our scene parsing is based on a Manhattan world grammar that imposes a quasi-Manhattan world constraint to enable the robust detection of a scene structure that is invariant to clutter and moving objects. Experimental results using the publicly available radish dataset validate the efficacy of the proposed approach.

preprint2015arXiv

Incremental Loop Closure Verification by Guided Sampling

Loop closure detection, the task of identifying locations revisited by a robot in a sequence of odometry and perceptual observations, is typically formulated as a combination of two subtasks: (1) bag-of-words image retrieval and (2) post-verification using RANSAC geometric verification. The main contribution of this study is the proposal of a novel post-verification framework that achieves good precision recall trade-off in loop closure detection. This study is motivated by the fact that not all loop closure hypotheses are equally plausible (e.g., owing to mutual consistency between loop closure constraints) and that if we have evidence that one hypothesis is more plausible than the others, then it should be verified more frequently. We demonstrate that the problem of loop closure detection can be viewed as an instance of a multi-model hypothesize-and-verify framework and build guided sampling strategies on the framework where loop closures proposed using image retrieval are verified in a planned order (rather than in a conventional uniform order) to operate in a constant time. Experimental results using a stereo SLAM system confirm that the proposed strategy, the use of loop closure constraints and robot trajectory hypotheses as a guide, achieves promising results despite the fact that there exists a significant number of false positive constraints and hypotheses.

preprint2015arXiv

Incremental RANSAC for Online Relocation in Large Dynamic Environments

Vehicle relocation is the problem in which a mobile robot has to estimate the self-position with respect to an a priori map of landmarks using the perception and the motion measurements without using any knowledge of the initial self-position. Recently, RANdom SAmple Consensus (RANSAC), a robust multi-hypothesis estimator, has been successfully applied to offline relocation in static environments. On the other hand, online relocation in dynamic environments is still a difficult problem, for available computation time is always limited, and for measurement include many outliers. To realize real time algorithm for such an online process, we have developed an incremental version of RANSAC algorithm by extending an efficient preemption RANSAC scheme. This novel scheme named incremental RANSAC is able to find inlier hypotheses of self-positions out of large number of outlier hypotheses contaminated by outlier measurements.

preprint2015arXiv

Self-localization Using Visual Experience Across Domains

In this study, we aim to solve the single-view robot self-localization problem by using visual experience across domains. Although the bag-of-words method constitutes a popular approach to single-view localization, it fails badly when it's visual vocabulary is learned and tested in different domains. Further, we are interested in using a cross-domain setting, in which the visual vocabulary is learned in different seasons and routes from the input query/database scenes. Our strategy is to mine a cross-domain visual experience, a library of raw visual images collected in different domains, to discover the relevant visual patterns that effectively explain the input scene, and use them for scene retrieval. In particular, we show that the appearance and the pose of the mined visual patterns of a query scene can be efficiently and discriminatively matched against those of the database scenes by employing image-to-class distance and spatial pyramid matching. Experimental results obtained using a novel cross-domain dataset show that our system achieves promising results despite our visual vocabulary being learned and tested in different domains.