Researcher profile

Hideyuki Kawashima

Hideyuki Kawashima contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

Low-rank gradient compression reduces communication in distributed training by representing updates with rank-$r$ factors. Dion is a recent method that approximates Muon, a spectral optimizer that orthogonalizes momentum, using one step of power iteration followed by column normalization (rescaling each column of the right factor to unit length). This makes it compatible with fully sharded data parallel training, but it converges more slowly than full-rank spectral methods. We show that this gap is geometric: column normalization does not yield the rank-$r$ polar factor that Muon implicitly targets, so the resulting direction violates the dual-norm constraint of the low-rank spectral geometry, and the rate picks up an extra factor of $\sqrt{r}$ even though the low-rank approximation of the gradient itself is accurate. The same mismatch enters the smoothness term and the error-feedback recursion in the analysis, which has a knock-on effect on empirical performance. We propose Orth-Dion, which replaces column normalization with QR orthogonalization of the right factor. Under non-Euclidean smoothness, with $L_r$ the curvature constant along rank-$r$ directions, Orth-Dion attains rate $O(\sqrt{L_r/T})$, matching exact spectral methods at the same per-step communication cost as Dion. The proof removes the bounded-drift assumption common in prior error-feedback analyses via a self-consistent fixed-point argument, and uses a time-averaged contraction that only requires the error sequence to contract on average rather than at every step. Experiments on large-scale language model pre-training validate the predicted $\sqrt{r}$ scaling and show that Orth-Dion closes the convergence gap to Muon at Dion's communication cost.

preprint2020arXiv

NWR: Rethinking Thomas Write Rule for Omittable Write Operations

Concurrency control protocols are the key to scaling current DBMS performances. They efficiently interleave read and write operations in transactions, but occasionally they restrict concurrency by using coordination such as exclusive lockings. Although exclusive lockings ensure the correctness of DBMS, it incurs serious performance penalties on multi-core environments. In particular, existing protocols generally suffer from emerging highly write contended workloads, since they use innumerable lockings for write operations. In this paper, we rethink the Thomas write rule (TWR), which allows the timestamp ordering (T/O) protocol to omit write operations without any lockings. We formalize the notion of omitting and decouple it from the T/O protocol implementation, in order to define a new rule named non-visible write rule (NWR). When the rules of NWR are satisfied, any protocol can in theory generate omittable write operations with preserving the correctness without any lockings. In the experiments, we implement three NWR-extended protocols: Silo+NWR, TicToc+NWR, and MVTO+NWR. Experimental results demonstrate the efficiency and the low-overhead property of the extended protocols. We confirm that NWR-extended protocols achieve more than 11x faster than the originals in the best case of highly write contended YCSB-A and comparable performance with the originals in the other workloads.