Source author record

Ryosuke Nakamura

Ryosuke Nakamura appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision astro-ph.EP eess.IV physics.app-ph physics.optics Robotics

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

Feed-forward only convolutional neural networks (CNNs) may ignore intrinsic relationships and potential benefits of feedback connections in vision tasks such as saliency detection, despite their significant representation capabilities. In this work, we propose a feedback-recursive convolutional framework (SalFBNet) for saliency detection. The proposed feedback model can learn abundant contextual representations by bridging a recursive pathway from higher-level feature blocks to low-level layer. Moreover, we create a large-scale Pseudo-Saliency dataset to alleviate the problem of data deficiency in saliency detection. We first use the proposed feedback model to learn saliency distribution from pseudo-ground-truth. Afterwards, we fine-tune the feedback model on existing eye-fixation datasets. Furthermore, we present a novel Selective Fixation and Non-Fixation Error (sFNE) loss to make proposed feedback model better learn distinguishable eye-fixation-based features. Extensive experimental results show that our SalFBNet with fewer parameters achieves competitive results on the public saliency detection benchmarks, which demonstrate the effectiveness of proposed feedback model and Pseudo-Saliency data. Source codes and Pseudo-Saliency dataset can be found at https://github.com/gqding/SalFBNet

preprint2022arXiv

Surgical Skill Assessment via Video Semantic Aggregation

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissues, and background in the spatial dimension, impeding the subsequent temporal relationship modeling. In this paper, we propose a novel skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them across spatiotemporal dimensions. The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network's decisions. It also enables us to further incorporate auxiliary information such as the kinematic data to improve representation learning and performance. The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods. Source code is available at: bit.ly/MICCAI2022ViSA.

preprint2022arXiv

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi-modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach achieves superior or on-par performance compared to state-of-the-art methods both in object and scene recognition tasks. Code is available at https://github.com/acaglayan/CNN_randRNN.

preprint2020arXiv

SOIC: Semantic Online Initialization and Calibration for LiDAR and Camera

This paper presents a novel semantic-based online extrinsic calibration approach, SOIC (so, I see), for Light Detection and Ranging (LiDAR) and camera sensors. Previous online calibration methods usually need prior knowledge of rough initial values for optimization. The proposed approach removes this limitation by converting the initialization problem to a Perspective-n-Point (PnP) problem with the introduction of semantic centroids (SCs). The closed-form solution of this PnP problem has been well researched and can be found with existing PnP methods. Since the semantic centroid of the point cloud usually does not accurately match with that of the corresponding image, the accuracy of parameters are not improved even after a nonlinear refinement process. Thus, a cost function based on the constraint of the correspondence between semantic elements from both point cloud and image data is formulated. Subsequently, optimal extrinsic parameters are estimated by minimizing the cost function. We evaluate the proposed method either with GT or predicted semantics on KITTI dataset. Experimental results and comparisons with the baseline method verify the feasibility of the initialization strategy and the accuracy of the calibration approach. In addition, we release the source code at https://github.com/--/SOIC.

preprint2019arXiv

Optical vortex-induced forward mass transfer: Manifestation of helical trajectory of optical vortex

The orbital angular momentum of an optical vortex field is found to twist high viscosity donor material to form a micron-scale 'spin jet'. This unique phenomenon manifests the helical trajectory of the optical vortex. Going beyond both the conventional ink jet and laser induced forward mass transfer (LIFT) patterning technologies, it also offers the formation and ejection of a micron-scale 'spin jet' of the donor material even with an ultrahigh viscosity of 4 Pas. This optical vortex laser induced forward mass transfer (OV-LIFT) patterning technique will enable the development of next generation printed photonic/electric/spintronic circuits formed of ultrahigh viscosity donor dots containing functional nanoparticles, such as quantum dots, metallic particles and magnetic ferrite particles, with ultrahigh spatial resolution. It can also potentially explore a completely new needleless drug injection.

preprint2017arXiv

Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion

Most of the traditional convolutional neural networks (CNNs) implements bottom-up approach (feed-forward) for image classifications. However, many scientific studies demonstrate that visual perception in primates rely on both bottom-up and top-down connections. Therefore, in this work, we propose a CNN network with feedback structure for Solar power plant detection on middle-resolution satellite images. To express the strength of the top-down connections, we introduce feedback CNN network (FB-Net) to a baseline CNN model used for solar power plant classification on multi-spectral satellite data. Moreover, we introduce a method to improve class activation mapping (CAM) to our FB-Net, which takes advantage of multi-channel pulse coupled neural network (m-PCNN) for weakly-supervised localization of the solar power plants from the features of proposed FB-Net. For the proposed FB-Net CAM with m-PCNN, experimental results demonstrated promising results on both solar-power plant image classification and detection task.

preprint2009arXiv

The Hayabusa Spacecraft Asteroid Multi-Band Imaging Camera: AMICA

The Hayabusa Spacecraft Asteroid Multiband Imaging Camera (AMICA) has acquired more than 1400 multispectral and high-resolution images of its target asteroid, 25143 Itokawa, since late August 2005. In this paper, we summarize the design and performance of AMICA. In addition, we describe the calibration methods, assumptions, and models, based on measurements. Major calibration steps include corrections for linearity and modeling and subtraction of bias, dark current, read-out smear, and pixel-to-pixel responsivity variations. AMICA v-band data were calibrated to radiance using in-flight stellar observations. The other band data were calibrated to reflectance by comparing them to ground-based observations to avoid the uncertainty of the solar irradiation in those bands. We found that the AMICA signal was linear with respect to the input signal to an accuracy of << 1% when the signal level was < 3800 DN. We verified that the absolute radiance calibration of the AMICA v-band (0.55 micron) was accurate to 4% or less, the accuracy of the disk-integrated spectra with respect to the AMICA v-band was about 1%, and the pixel-to-pixel responsivity (flatfield) variation was 3% or less. The uncertainty in background zero-level was 5 DN. From wide-band observations of star clusters, we found that the AMICA optics have an effective focal length of 120.80 \pm 0.03 mm, yielding a field-of-view (FOV) of 5.83 deg x 5.69 deg. The resulting geometric distortion model was accurate to within a third of a pixel. We demonstrated an image-restoration technique using the point-spread functions of stars, and confirmed that the technique functions well in all loss-less images. An artifact not corrected by this calibration is scattered light associated with bright disks in the FOV.

Ryosuke Nakamura

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

Surgical Skill Assessment via Video Semantic Aggregation

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

SOIC: Semantic Online Initialization and Calibration for LiDAR and Camera

Optical vortex-induced forward mass transfer: Manifestation of helical trajectory of optical vortex

Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Weakly-Supervised CNN with Feedback Features and m-PCNN Fusion

The Hayabusa Spacecraft Asteroid Multi-Band Imaging Camera: AMICA