Source author record

Ikuro Sato

Ikuro Sato appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision hep-lat Machine Learning Biomolecules hep-ph nucl-th

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

Many image understanding tasks involve identifying what is present and where it appears. However, tasks that address where, such as object discovery, detection, and segmentation, are often considerably more complex than image classification, which primarily focuses on what. One possible reason is that classification-oriented backbones tend to emphasize semantic information about what, while implicitly entangling or suppressing information about where. In this work, we focus on an inductive bias termed what-where separation, which encourages models to represent object appearance and spatial location in a decomposed manner. To incorporate this bias throughout an attentive backbone in the style of Vision Transformer (ViT), we propose the What-Where Transformer (WWT). Our method introduces two key novel designs: (1) it treats tokens as representations of what and attention maps as representations of where, and processes them in concurrent feed-forward modules via a multi-stream, slot-based architecture; (2) it reuses both the final-layer tokens and attention maps for downstream tasks, and directly exposes them to gradients derived from task losses, thereby facilitating more effective and explicit learning of localization. We demonstrate that even under standard single-label classification-based supervision on ImageNet, WWT exhibits emergent multiple object discovery directly from raw attention maps, rather than via additional postprocessing such as token clustering. Furthermore, WWT achieves superior performance compared to ViT-based methods on zero-shot object discovery and weakly supervised semantic segmentation, and it is transferable to various localization setups with minimal modifications. Code will be published after acceptance.

preprint2022arXiv

Feature Space Particle Inference for Neural Network Ensembles

Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the over-parameterization issues, while seeking samples directly from the function-space posterior often results in serious underfitting. In this study, we propose optimizing particles in the feature space where the activation of a specific intermediate layer lies to address the above-mentioned difficulties. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness. Extensive evaluation on real-world datasets shows that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness. Code is available at https://github.com/DensoITLab/featurePI .

preprint2022arXiv

Implicit Neural Representations for Variable Length Human Motion Generation

We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity. Code at https://github.com/PACerv/ImplicitMotion.

preprint2022arXiv

PoF: Post-Training of Feature Extractor for Improving Generalization

It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1

preprint2015arXiv

APAC: Augmented PAttern Classification with Neural Networks

Deep neural networks have been exhibiting splendid accuracies in many of visual pattern classification problems. Many of the state-of-the-art methods employ a technique known as data augmentation at the training stage. This paper addresses an issue of decision rule for classifiers trained with augmented data. Our method is named as APAC: the Augmented PAttern Classification, which is a way of classification using the optimal decision rule for augmented data learning. Discussion of methods of data augmentation is not our primary focus. We show clear evidences that APAC gives far better generalization performance than the traditional way of class prediction in several experiments. Our convolutional neural network model with APAC achieved a state-of-the-art accuracy on the MNIST dataset among non-ensemble classifiers. Even our multilayer perceptron model beats some of the convolutional models with recently invented stochastic regularization techniques on the CIFAR-10 dataset.

preprint2015arXiv

Pairwise Rotation Hashing for High-dimensional Features

Binary Hashing is widely used for effective approximate nearest neighbors search. Even though various binary hashing methods have been proposed, very few methods are feasible for extremely high-dimensional features often used in visual tasks today. We propose a novel highly sparse linear hashing method based on pairwise rotations. The encoding cost of the proposed algorithm is $\mathrm{O}(n \log n)$ for n-dimensional features, whereas that of the existing state-of-the-art method is typically $\mathrm{O}(n^2)$. The proposed method is also remarkably faster in the learning phase. Along with the efficiency, the retrieval accuracy is comparable to or slightly outperforming the state-of-the-art. Pairwise rotations used in our method are formulated from an analytical study of the trade-off relationship between quantization error and entropy of binary codes. Although these hashing criteria are widely used in previous researches, its analytical behavior is rarely studied. All building blocks of our algorithm are based on the analytical solution, and it thus provides a fairly simple and efficient procedure.

preprint2013arXiv

The atomic-level mechanism underlying the functionality of aquaporin-0

So far, more than 82,000 protein structures have been reported in the Protein Data Bank, but the driving force and structures that allow for protein functions have not been elucidated at the atomic level for even one protein. We have been able to clarify that the inter-subunit hydrophobic interaction driving the electrostatic opening of the pore in aquaporin 0 (AQP0). Aquaporins are membrane channels for water and small non-ionic solutes found in animals, plants, and microbes. The structures of aquaporins have high homology and consist of homotetramers, each monomer of which has one pore for a water channel. Each pore has two narrow portions: one is the narrowest constriction region consisting of aromatic residues and an arginine (ar/R), and another is two asparagine-proline-alanine (NPA) homolog portions. Here we show that an inter-subunit hydrophobic interaction in AQP0 drives a stick portion consisting of four amino acids toward the pore and the tip of the stick portion, consisting of a nitrogen atom, opens the pore: that movement is the swing mechanism (this http URL). The energetics and conformational change of amino acids participating in the swing mechanism confirm this view. The swing mechanism in which inter-subunit hydrophobic interactions in the tetramer drive the on-off switching of the pore explains why aquaporins consist of tetramers. Here, we report that experimental and molecular dynamics findings using various mutants support this view of the swing mechanism. The finding that mutants of amino acids in AQP2 corresponding to the stick of the swing mechanism cause severe recessive nephrogenic diabetes insipidus (NDI) demonstrates the critical role of the swing mechanism for the aquaporin function. We report first that the inter-subunit hydrophobic interaction in aquaporin 0 drives the electrostatic opening of the aquaporin pore at the atomic level.

preprint2006arXiv

Finite volume corrections to pi-pi scattering

Lattice QCD studies of hadron-hadron interactions are performed by computing the energy levels of the system in a finite box. The shifts in energy levels proportional to inverse powers of the volume are related to scattering parameters in a model independent way. In addition, there are non-universal exponentially suppressed corrections that distort this relation. These terms are proportional to exp(-m_pi L) and become relevant as the chiral limit is approached. In this paper we report on a one-loop chiral perturbation theory calculation of the leading exponential corrections in the case of I=2 pi-pi scattering near threshold.

preprint2005arXiv

Clebsch-Gordan Construction of Lattice Interpolating Fields for Excited Baryons

Large sets of baryon interpolating field operators are developed for use in lattice QCD studies of baryons with zero momentum. Operators are classified according to the double-valued irreducible representations of the octahedral group. At first, three-quark smeared, local operators are constructed for each isospin and strangeness and they are classified according to their symmetry with respect to exchange of Dirac indices. Nonlocal baryon operators are formulated in a second step as direct products of the spinor structures of smeared, local operators together with gauge-covariant lattice displacements of one or more of the smeared quark fields. Linear combinations of direct products of spinorial and spatial irreducible representations are then formed with appropriate Clebsch-Gordan coefficients of the octahedral group. The construction attempts to maintain maximal overlap with the continuum SU(2) group in order to provide a physically interpretable basis. Nonlocal operators provide direct couplings to states that have nonzero orbital angular momentum.

preprint2005arXiv

Combining Quark and Link Smearing to Improve Extended Baryon Operators

The effects of Gaussian quark-field smearing and analytic stout-link smearing on the correlations of gauge-invariant extended baryon operators are studied. Gaussian quark-field smearing substantially reduces contributions from the short wavelength modes of the theory, while stout-link smearing significantly reduces the noise from the stochastic evaluations. The use of gauge-link smearing is shown to be crucial for baryon operators constructed of covariantly-displaced quark fields. Preferred smearing parameters are determined for a lattice spacing a_s ~ 0.1 fm.

Ikuro Sato

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

Feature Space Particle Inference for Neural Network Ensembles

Implicit Neural Representations for Variable Length Human Motion Generation

PoF: Post-Training of Feature Extractor for Improving Generalization

APAC: Augmented PAttern Classification with Neural Networks

Pairwise Rotation Hashing for High-dimensional Features

The atomic-level mechanism underlying the functionality of aquaporin-0

Finite volume corrections to pi-pi scattering

Clebsch-Gordan Construction of Lattice Interpolating Fields for Excited Baryons

Combining Quark and Link Smearing to Improve Extended Baryon Operators