Source author record

Satoshi Tsutsui

Satoshi Tsutsui appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci cond-mat.str-el cond-mat.supr-con cond-mat.dis-nn cond-mat.stat-mech eess.AS Multimedia physics.chem-ph physics.comp-ph physics.data-an

Catalog footprint

What is connected

15works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images

The microscopic examination of white blood cells (WBCs) plays a fundamental role in pathology and is essential for diagnosing blood disorders such as leukemia and anemia. To support further research on WBC images, multiple datasets have been proposed. However, they mainly annotate cell categories, and lack detailed morphological characteristics that pathologists use to explain their interpretations of cells. To address this gap, we introduce WBCAtt+, a novel dataset of WBC images densely annotated with 11 morphological attributes and five pixel-level cell components. With 113k image-level labels and 10k segmentation maps, WBCAtt+ is the first to provide comprehensive annotations for WBC images. Leveraging this dataset, we provide baseline models for attribute recognition and semantic segmentation. We also design an attribute recognition model to incorporate compositional structure of cells, further improving the recognition performance. Lastly, we showcase various applications enabled by our dataset, such as explainable AI models, including counterfactual example generation. \revision{The dataset and code are publicly available\footnote{https://doi.org/10.57967/hf/8143}}.

preprint2022arXiv

Action Recognition based on Cross-Situational Action-object Statistics

Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.

preprint2022arXiv

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challenging videos, we create the AVA Audio-Visual Diarization (AVA-AVD) dataset. Our experiments demonstrate that adding AVA-AVD into training set can produce significantly better diarization models for in-the-wild videos despite that the data is relatively small. Moreover, this benchmark is challenging due to the diverse scenes, complicated acoustic conditions, and completely off-screen speakers. As a first step towards addressing the challenges, we design the Audio-Visual Relation Network (AVR-Net) which introduces a simple yet effective modality mask to capture discriminative information based on face visibility. Experiments show that our method not only can outperform state-of-the-art methods but is more robust as varying the ratio of off-screen speakers. Our data and code has been made publicly available at https://github.com/showlab/AVA-AVD.

preprint2022arXiv

Bayesian Inference on Hamiltonian Selections for Mössbauer Spectroscopy

Mössbauer spectroscopy, which provides knowledge related to electronic states in materials, has been applied to various fields such as condensed matter physics and material sciences. In conventional spectral analyses based on least-square fitting, hyperfine interactions in materials have been determined from the shape of observed spectra. In conventional spectral analyses, it is difficult to discuss the validity of the hyperfine interactions and the estimated values. We propose a spectral analysis method based on Bayesian inference for the selection of hyperfine interactions and the estimation of Mössbauer parameters. An appropriate Hamiltonian has been selected by comparing Bayesian free energy among possible Hamiltonians. We have estimated the Mössbauer parameters and evaluated their estimated values by calculating the posterior distribution of each Mössbauer parameter with confidence intervals. We have also discussed the accuracy of the spectral analyses to elucidate the noise intensity dependence of numerical experiments.

preprint2022arXiv

LO-mode phonon of KCl and NaCl at 300 K by inelastic X ray scattering measurements and first principles calculations

Longitudinal-optical (LO) mode phonon branches of KCl and NaCl were measured using inelastic X-ray scattering (IXS) at 300 K and calculated by the first-principles phonon calculation with the stochastic self-consistent harmonic approximation. Spectral shapes of the IXS measurements and calculated spectral functions agreed well. We analyzed the calculated spectral functions that provide higher resolutions of the spectra than the IXS measurements. Due to strong anharmonicity, the spectral functions of these phonon branches have several peaks and the LO modes along $Γ$--L paths are disconnected.

preprint2022arXiv

Novel View Synthesis for High-fidelity Headshot Scenes

Rendering scenes with a high-quality human face from arbitrary viewpoints is a practical and useful technique for many real-world applications. Recently, Neural Radiance Fields (NeRF), a rendering technique that uses neural networks to approximate classical ray tracing, have been considered as one of the promising approaches for synthesizing novel views from a sparse set of images. We find that NeRF can render new views while maintaining geometric consistency, but it does not properly maintain skin details, such as moles and pores. These details are important particularly for faces because when we look at an image of a face, we are much more sensitive to details than when we look at other objects. On the other hand, 3D Morpable Models (3DMMs) based on traditional meshes and textures can perform well in terms of skin detail despite that it has less precise geometry and cannot cover the head and the entire scene with background. Based on these observations, we propose a method to use both NeRF and 3DMM to synthesize a high-fidelity novel view of a scene with a face. Our method learns a Generative Adversarial Network (GAN) to mix a NeRF-synthesized image and a 3DMM-rendered image and produces a photorealistic scene with a face preserving the skin details. Experiments with various real-world scenes demonstrate the effectiveness of our approach. The code will be available on https://github.com/showlab/headshot .

preprint2022arXiv

Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes. To alleviate this problem, off-the-shelf image generation techniques based on Generative Adversarial Networks (GANs) can potentially create additional training images. However, these GAN-generated images are often not helpful for actually improving the accuracy of one-shot fine-grained recognition. In this paper, we propose a meta-learning framework to combine generated images with original images, so that the resulting "hybrid" training images improve one-shot learning. Specifically, the generic image generator is updated by a few training instances of novel classes, and a Meta Image Reinforcing Network (MetaIRNet) is proposed to conduct one-shot fine-grained recognition as well as image reinforcement. Our experiments demonstrate consistent improvement over baselines on one-shot fine-grained image classification benchmarks. Furthermore, our analysis shows that the reinforced images have more diversity compared to the original and GAN-generated images.

preprint2020arXiv

A Computational Model of Early Word Learning from the Infant's Point of View

Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences. Researchers in cognitive science and developmental psychology have built formal models that implement in-principle learning algorithms, and then used pre-selected and pre-cleaned datasets to test the abilities of the models to find statistical regularities in the input data. In contrast to previous modeling approaches, the present study used egocentric video and gaze data collected from infant learners during natural toy play with their parents. This allowed us to capture the learning environment from the perspective of the learner's own point of view. We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch. As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved, using actual visual data perceived by infant learners. Moreover, we conducted simulation experiments to systematically determine how visual, perceptual, and attentional properties of infants' sensory experiences may affect word learning.

preprint2019arXiv

Phonon anomalies with doping in superconducting oxychlorides Ca2-xCuO2Cl2

We measure the dispersion of the Cu-O bond-stretching phonon mode in the high-temperature superconducting parent compound Ca$_2$CuO$_2$Cl$_2$. Our density functional theory calculations predict a cosine-shaped bending of the dispersion along both the ($ξ$00) and ($ξξ$0) directions, while comparison with previous results on Ca$_{1.84}$CuO$_2$Cl$_2$ show it only along ($ξ$00), suggesting an anisotropic effect which is not reproduced in calculation at optimal doping. Comparison with isostructural La$_{2-x}$Sr$_x$CuO$_4$ suggests that these calculations reproduce well the overdoped regime, however they overestimate the doping effect on the Cu-O bond-stretching mode at optimal doping.

preprint2016arXiv

First-Order Structural Change Accompanied by Yb Valence Transition in YbInCu4

A diffraction experiment using a high energy x-ray was carried out on YbInCu4. Below the Yb valence transition temperature, the splitting of Bragg peaks was detected in higher-order reflections. No superlattice reflections accompanying the valence ordering were found below the transition temperature. These experimental findings indicate that a structural change from a cubic structure to a tetragonal structure without valence ordering occurs at the transition temperature. Such a structural change free from any valence ordering is difficult to understand only in terms of Yb valence degrees of freedom. This means that the structural change may be related to electronic symmetries such as quadrupolar degrees of freedom as well as the change in Yb valence.

preprint2015arXiv

Signature of a polyamorphic transition in the THz spectrum of vitreous GeO2

The THz spectrum of density fluctuations, $S(Q, ω)$, of vitreous GeO$_2$ at ambient temperature was measured by inelastic x-ray scattering from ambient pressure up to pressures well beyond that of the known $α$-quartz to rutile polyamorphic (PA) transition. We observe significant differences in the spectral shape measured below and above the PA transition, in particular, in the 30-80 meV range. Guided by first-principle lattice dynamics calculations, we interpret the changes in the phonon dispersion as the evolution from a quartz-like to a rutile-like coordination. Notably, such a crossover is accompanied by a cusp-like behavior in the pressure dependence of the elastic response of the system. Overall, the presented results highlight the complex fingerprint of PA phenomena on the high-frequency phonon dispersion.

preprint2013arXiv

Non-complicated EuTiO3 Structure

A recently published paper [arXiv:1206.5417 ]showed strong incommensurate diffraction peaks in EuTiO3 around zone boundary R-points. We wish to convey that in our samples, and, it seems, at least another case from the literature, these peaks are absent.

preprint2012arXiv

Phonon Softening and Dispersion in EuTiO3

We measured phonon dispersion in single crystal EuTiO$_3$ using inelastic x-ray scattering. A structural transition to an antiferrodistortive phase was found at a critical temperature $T_0$=287$\pm$1 K using powder and single-crystal x-ray diffraction. Clear softening of the zone boundary \emph{R}-point \textbf{q}=(0.5 0.5 0.5) acoustic phonon shows this to be a displacive transition. The mode energy plotted against reduced temperature could be seen to nearly overlap that of $\rm SrTiO_3$, suggesting a universal scaling relation. Phonon dispersion was measured along $Γ$-$X$ (0 0 0)$\rightarrow$(0.5 0 0). Mode eigenvectors were obtained from a shell model consistent with the \textbf{q}-dependence of intensity and energy, which also showed that the dispersion is nominally the same as in $\rm SrTiO_3$ at room temperature, but corrected for mass. The lowest energy optical mode, determined to be of Slater character, softens approximately linearly with temperature until the 70-100 K range where the softening stops, and at low temperature, the mode disperses linearly near the zone center.

preprint2010arXiv

Superconducting and Structural Transitions in the β-Pyrochlore Oxide KOs2O6 under High Pressure

Rattling-induced superconductivity in the β-pyrochlore oxide KOs2O6 is investigated under high pressure up to 5 GPa. Resistivity measurements in a high-quality single crystal reveal a gradual decrease in the superconducting transition temperature Tc from 9.7 K at 1.0 GPa to 6.5 K at 3.5 GPa, followed by a sudden drop to 3.3 K at 3.6 GPa. Powder X-ray diffraction experiments show a structural transition from cubic to monoclinic or triclinic at a similar pressure. The sudden drop in Tc is ascribed to this structural tran-sition, by which an enhancement in Tc due to a strong electron-rattler interaction present in the low-pressure cubic phase is abrogated as the rattling of the K ion is completely suppressed or weakened in the high-pressure phase of reduced symmetry. In addition, we find two anomalies in the temperature dependence of resistivity in the low-pressure phase, which may be due to subtle changes in rattling vibration.

preprint2009arXiv

Effect of K Doping on Phonons in Ba1-xKxFe2As2

The lattice dynamics of Ba1-xKxFe2As2 (x = 0.00, 0.27) have been studied by inelastic X-ray scattering measurement at room temperature. K doping induces the softening and broadening of phonon modes in the energy range E = 10-15 meV. Analysis with a Born-von Karman force-constant model indicates that the softening results from reduced interatomic force constants around (Ba,K) sites following the displacement of divalent Ba by monovalent K. The phonon broadening may be explained by the local distortions induced by the K substitution. Extra phonon modes are observed around the wave vector q = (0.5,0,0) at E = 16.5 meV for the x = 0.27 sample. These modes may arise either from the local disorder induced by K doping or from electron-phonon coupling.

Satoshi Tsutsui

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images

Action Recognition based on Cross-Situational Action-object Statistics

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Bayesian Inference on Hamiltonian Selections for Mössbauer Spectroscopy

LO-mode phonon of KCl and NaCl at 300 K by inelastic X ray scattering measurements and first principles calculations

Novel View Synthesis for High-fidelity Headshot Scenes

Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

A Computational Model of Early Word Learning from the Infant's Point of View

Phonon anomalies with doping in superconducting oxychlorides Ca2-xCuO2Cl2

First-Order Structural Change Accompanied by Yb Valence Transition in YbInCu4

Signature of a polyamorphic transition in the THz spectrum of vitreous GeO2

Non-complicated EuTiO3 Structure

Phonon Softening and Dispersion in EuTiO3

Superconducting and Structural Transitions in the β-Pyrochlore Oxide KOs2O6 under High Pressure

Effect of K Doping on Phonons in Ba1-xKxFe2As2