Source author record

Tomoyuki Okuno

Tomoyuki Okuno appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.HE Computer Vision

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

Spatial intelligence in vision-language models (VLMs) attracts research interest with the practical demand to reason in the 3D world.Despite promising results, most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representations for the vision modality. However, correspondence-based models with implicit 3D scene understanding often fail to achieve spatial consistency, and representation-based models with 3D geometric priors lack efficiency in vision sequence serialization. To address this, we propose a Proxy3D method with compact yet comprehensive 3D proxy representations for the vision modality. Given only video frames as input, we employ semantic and geometric encoders to extract scene features and then perform their semantic-aware clustering to obtain a set of proxies in the 3D space. For representation alignment, we further curate the SpaceSpan dataset and apply multi-stage training to adopt the proposed 3D proxy representations with the VLM. When using shorter sequences for vision information, our method achieves competitive or state-of-the-art performance in 3D visual question answering, visual grounding and general spatial intelligence benchmarks.

preprint2022arXiv

MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer

Recently, DEtection TRansformer (DETR), an end-to-end object detection pipeline, has achieved promising performance. However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels. We further propose the comprehensive multi-level feature alignment to improve the pseudo labels generated by the mean teacher framework taking advantage of the cross-scale self-attention mechanism in Deformable DETR. Image and object features are aligned at the local, global, and instance levels with domain query-based feature alignment (DQFA), bi-level graph-based prototype alignment (BGPA), and token-wise image feature alignment (TIFA). On the other hand, the unlabeled target domain data pseudo-labeled and available for the object detection training by the mean teacher framework can lead to better feature extraction and alignment. Thus, the mean teacher framework and the comprehensive multi-level feature alignment can be optimized iteratively and mutually based on the architecture of Transformers. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance in three domain adaptation scenarios, especially the result of Sim10k to Cityscapes scenario is remarkably improved from 52.6 mAP to 57.9 mAP. Code will be released.

preprint2021arXiv

Rapid Deceleration of Blast Waves Witnessed in Tycho's Supernova Remnant

In spite of their importance as standard candles in cosmology and as major major sites of nucleosynthesis in the Universe, what kinds of progenitor systems lead to type Ia supernovae (SN) remains a subject of considerable debate in the literature. This is true even for the case of Tycho's SN exploded in 1572 although it has been deeply studied both observationally and theoretically. Analyzing X-ray data of Tycho's supernova remnant (SNR) obtained with Chandra in 2003, 2007, 2009, and 2015, we discover that the expansion before 2007 was substantially faster than radio measurements reported in the past decades and then rapidly decelerated during the last ~ 15 years. The result is well explained if the shock waves recently hit a wall of dense gas surrounding the SNR. Such a gas structure is in fact expected in the so-called single-degenerate scenario, in which the progenitor is a binary system consisting of a white dwarf and a stellar companion, whereas it is not generally predicted by a competing scenario, the double-degenerate scenario, which has a binary of two white dwarfs as the progenitor. Our result thus favors the former scenario. This work also demonstrates a novel technique to probe gas environments surrounding SNRs and thus disentangle the two progenitor scenarios for Type Ia SNe.

preprint2020arXiv

Time Variability of Nonthermal X-ray Stripes in Tycho's Supernova Remnant with Chandra

Analyzing Chandra data of Tycho's supernova remnant (SNR) taken in 2000, 2003, 2007, 2009, and 2015, we search for time variable features of synchrotron X-rays in the southwestern part of the SNR, where stripe structures of hard X-ray emission were previous found. By comparing X-ray images obtained at each epoch, we discover a knot-like structure in the northernmost part of the stripe region became brighter particularly in 2015. We also find a bright filamentary structure gradually became fainter and narrower as it moved outward. Our spectral analysis reveal that not only the nonthermal X-ray flux but also the photon indices of the knot-like structure change from year to year. During the period from 2000 to 2015, the small knot shows brightening of $\sim 70\%$ and hardening of $ΔΓ\sim 0.45$. The time variability can be explained if the magnetic field is amplified to $\sim 100~\mathrm{μG}$ and/or if magnetic turbulence significantly changes with time.