Source author record

Yusuke Sakai

Yusuke Sakai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence gr-qc Machine Learning astro-ph.SR cond-mat.supr-con Cryptography and Security Digital Libraries Information Retrieval physics.data-an

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

Grammatical error correction using large language models often suffers from the over-correction issue. To mitigate this, we propose a training-free inference method that performs edit-level majority voting over multiple candidates generated by a single model, without requiring model modifications or additional training. Across nine benchmarks covering English, Czech, German, Ukrainian, Korean, Hindi, and Romanian, the proposed method outperforms both greedy and MBR decoding in most cases. Moreover, it yields stable correction quality regardless of the instruction prompts used. We release two repository supporting GEC datasets loading and LLM inference.

preprint2026arXiv

Hacking Neural Evaluation Metrics with Single Hub Text

Strongly human-correlated evaluation metrics serve as an essential compass for the development and improvement of generation models and must be highly reliable and robust. Recent embedding-based neural text evaluation metrics, such as COMET for translation tasks, are widely used in both research and development fields. However, there is no guarantee that they yield reliable evaluation results due to the black-box nature of neural networks. To raise concerns about the reliability and safety of such metrics, we propose a method for finding a single adversarial text in the discrete space that is consistently evaluated as high-quality, regardless of the test cases, to identify the vulnerabilities in evaluation metrics. The single hub text found with our method achieved 79.1 COMET% and 67.8 COMET% in the WMT'24 English-to-Japanese (En--Ja) and English-to-German (En--De) translation tasks, respectively, outperforming translations generated individually for each source sentence by using M2M100, a general translation model. Furthermore, we also confirmed that the hub text found with our method generalizes across multiple language pairs such as Ja--En and De--En.

preprint2026arXiv

HalluCiteChecker: A Lightweight Toolkit for Hallucinated Citation Detection and Verification in the Era of AI Scientists

We introduce HalluCiteChecker, a toolkit for detecting and verifying hallucinated citations in scientific papers. While AI assistant technologies have transformed the academic writing process, including citation recommendation, they have also led to the emergence of hallucinated citations that do not correspond to any existing work. Such citations not only undermine the credibility of scientific papers but also impose an additional burden on reviewers and authors, who must manually verify their validity during the review process. In this study, we formalize hallucinated citation detection as an NLP task and provide a corresponding toolkit as a practical foundation for addressing this problem. Our package is lightweight and can perform verification in seconds on a standard laptop. It can also be executed entirely offline and runs efficiently using only CPUs. We hope that HalluCiteChecker will help reduce reviewer workload and support organizers by enabling systematic pre-review and publication checks. Our code is released under the Apache 2.0 license on GitHub and is distributed as an installable package via PyPI. A demonstration video is available on YouTube.

preprint2026arXiv

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness

The hubness problem, in which hub embeddings are close to many unrelated examples, occurs often in high-dimensional embedding spaces and may pose a practical threat for purposes such as information retrieval and automatic evaluation metrics. In particular, since cross-modal similarity between text and images cannot be calculated by direct comparisons, such as string matching, cross-modal encoders that project different modalities into a shared space are helpful for various cross-modal applications, and thus, the existence of hubs may pose practical threats. To reveal the vulnerabilities of cross-modal encoders, we propose a method for identifying the hub embedding and its corresponding hub text. Experiments on image captioning evaluation in MSCOCO and nocaps along with image-to-text retrieval tasks in MSCOCO and Flickr30k showed that our method can identify a single hub text that unreasonably achieves comparable or higher similarity scores than human-written reference captions in many images, thereby revealing the vulnerabilities in cross-modal encoders.

preprint2022arXiv

Training Process of Unsupervised Learning Architecture for Gravity Spy Dataset

Transient noise appearing in the data from gravitational-wave detectors frequently causes problems, such as instability of the detectors and overlapping or mimicking gravitational-wave signals. Because transient noise is considered to be associated with the environment and instrument, its classification would help to understand its origin and improve the detector's performance. In a previous study, an architecture for classifying transient noise using a time-frequency 2D image (spectrogram) is proposed, which uses unsupervised deep learning combined with variational autoencoder and invariant information clustering. The proposed unsupervised-learning architecture is applied to the Gravity Spy dataset, which consists of Advanced Laser Interferometer Gravitational-Wave Observatory (Advanced LIGO) transient noises with their associated metadata to discuss the potential for online or offline data analysis. In this study, focused on the Gravity Spy dataset, the training process of unsupervised-learning architecture of the previous study is examined and reported.

preprint2022arXiv

Unsupervised Learning Architecture for Classifying the Transient Noise of Interferometric Gravitational-wave Detectors

In the data obtained by laser interferometric gravitational wave detectors, transient noise with non-stationary and non-Gaussian features occurs at a high rate. This often results in problems such as detector instability and the hiding and/or imitation of gravitational-wave signals. This transient noise has various characteristics in the time--frequency representation, which is considered to be associated with environmental and instrumental origins. Classification of transient noise can offer clues for exploring its origin and improving the performance of the detector. One approach for accomplishing this is supervised learning. However, in general, supervised learning requires annotation of the training data, and there are issues with ensuring objectivity in the classification and its corresponding new classes. By contrast, unsupervised learning can reduce the annotation work for the training data and ensure objectivity in the classification and its corresponding new classes. In this study, we propose an unsupervised learning architecture for the classification of transient noise that combines a variational autoencoder and invariant information clustering. To evaluate the effectiveness of the proposed architecture, we used the dataset (time--frequency two-dimensional spectrogram images and labels) of the Laser Interferometer Gravitational-wave Observatory (LIGO) first observation run prepared by the Gravity Spy project. The classes provided by our proposed unsupervised learning architecture were consistent with the labels annotated by the Gravity Spy project, which manifests the potential for the existence of unrevealed classes.

preprint2015arXiv

ALMA imaging study of methyl formate (HCOOCH$_{3}$) in the torsionally excited states towards Orion KL

We recently reported the first identification of rotational transitions of methyl formate (HCOOCH$_{3}$) in the second torsionally excited state toward Orion Kleinmann-Low (KL) observed with the Nobeyama 45 m telescope. In combination with the identified transitions of methyl formate in the ground state and the first torsional excited state, it was found that there is a difference in rotational temperature and vibrational temperature, where the latter is higher. In this study, high spatial resolution analysis by using Atacama Large Millimeter/Submillimeter Array (ALMA) science verification data was carried out to verify and understand this difference. Toward the Compact Ridge, two different velocity components at 7.3 and 9.1 km s$^{-1}$ were confirmed, while a single component at 7.3 km s$^{-1}$ was identified towards the Hot Core. The intensity maps in the ground, first, and second torsional excited states have quite similar distributions. Using extensive ALMA data, we determined the rotational and vibrational temperatures for the Compact Ridge and Hot Core by the conventional rotation diagram method. The rotational temperature and vibrational temperatures agree for the Hot Core and for one component of the Compact Ridge. At the 7.3 km s$^{-1}$ velocity component for the Compact Ridge, the rotational temperature was found to be higher than the vibrational temperature. This is different from what we obtained from the results by using the single-dish observation. The difference might be explained by the beam dilution effect of the single-dish data and/or the smaller number of observed transitions within the limited range of energy levels ($\leq$30 K) of $E_u$ in the previous study.

preprint2012arXiv

Reflective terahertz time-domain spectroscopy measurement on the stripe-ordered superconductor La$_{1.84-y}$Nd$_y$Sr$_{0.16}$CuO$_4$

We measured reflectivity on static stripe-ordered La$_{1.84-y}$Nd$_y$Sr$_{0.16}$CuO$_4$ (LNSCO) $y$\,=\,0.1, 0.2, 0.3, and 0.4 by means of reflective terahertz time-domain spectroscopy (THz-TDS) with electric field polarized along c-axis, in which one can obtain lower frequency information than the conventional Fourier transform type spectrometer. Recently, two-dimensional superconducting (2DSC) property \cite{tajima,schafgans,li}, intra-layer perfect conductivity not accompanied with inter-layer superconducting behavior, was reported on LNSCO ($x$\,=0.15\,,\,$y$\,$\geq$\,0.2). We observed the existence of Josephson plasma edge, which suggests that 2DSC is not realized in these compounds.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computation and Language Artificial Intelligence gr-qc Machine Learning astro-ph.SR cond-mat.supr-con Cryptography and Security Digital Libraries Information Retrieval physics.data-an

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2604.27674:author:3:yusuke-sakai

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2604.26835:author:1:yusuke-sakai

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13624:author:2:yusuke-sakai

Imported May 20, 2026Synced May 20, 2026

2 works

Chihiro Kozakai

Researcher

Chihiro Kozakai contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Gen Ueshima

Researcher

Gen Ueshima contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Hirotaka Takahashi

Researcher

Hirotaka Takahashi contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Hiroyuki Deguchi

Researcher

Hiroyuki Deguchi contributes to research discovery and scholarly infrastructure.

Open to collaborate

Yusuke Sakai

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

Hacking Neural Evaluation Metrics with Single Hub Text

HalluCiteChecker: A Lightweight Toolkit for Hallucinated Citation Detection and Verification in the Era of AI Scientists

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness

Training Process of Unsupervised Learning Architecture for Gravity Spy Dataset

Unsupervised Learning Architecture for Classifying the Transient Noise of Interferometric Gravitational-wave Detectors

ALMA imaging study of methyl formate (HCOOCH$_{3}$) in the torsionally excited states towards Orion KL

Reflective terahertz time-domain spectroscopy measurement on the stripe-ordered superconductor La$_{1.84-y}$Nd$_y$Sr$_{0.16}$CuO$_4$