Source author record

Hideyuki Kawashima

Hideyuki Kawashima appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

Low-rank gradient compression reduces communication in distributed training by representing updates with rank-$r$ factors. Dion is a recent method that approximates Muon, a spectral optimizer that orthogonalizes momentum, using one step of power iteration followed by column normalization (rescaling each column of the right factor to unit length). This makes it compatible with fully sharded data parallel training, but it converges more slowly than full-rank spectral methods. We show that this gap is geometric: column normalization does not yield the rank-$r$ polar factor that Muon implicitly targets, so the resulting direction violates the dual-norm constraint of the low-rank spectral geometry, and the rate picks up an extra factor of $\sqrt{r}$ even though the low-rank approximation of the gradient itself is accurate. The same mismatch enters the smoothness term and the error-feedback recursion in the analysis, which has a knock-on effect on empirical performance. We propose Orth-Dion, which replaces column normalization with QR orthogonalization of the right factor. Under non-Euclidean smoothness, with $L_r$ the curvature constant along rank-$r$ directions, Orth-Dion attains rate $O(\sqrt{L_r/T})$, matching exact spectral methods at the same per-step communication cost as Dion. The proof removes the bounded-drift assumption common in prior error-feedback analyses via a self-consistent fixed-point argument, and uses a time-averaged contraction that only requires the error sequence to contract on average rather than at every step. Experiments on large-scale language model pre-training validate the predicted $\sqrt{r}$ scaling and show that Orth-Dion closes the convergence gap to Muon at Dion's communication cost.

preprint2020arXiv

NWR: Rethinking Thomas Write Rule for Omittable Write Operations

Concurrency control protocols are the key to scaling current DBMS performances. They efficiently interleave read and write operations in transactions, but occasionally they restrict concurrency by using coordination such as exclusive lockings. Although exclusive lockings ensure the correctness of DBMS, it incurs serious performance penalties on multi-core environments. In particular, existing protocols generally suffer from emerging highly write contended workloads, since they use innumerable lockings for write operations. In this paper, we rethink the Thomas write rule (TWR), which allows the timestamp ordering (T/O) protocol to omit write operations without any lockings. We formalize the notion of omitting and decouple it from the T/O protocol implementation, in order to define a new rule named non-visible write rule (NWR). When the rules of NWR are satisfied, any protocol can in theory generate omittable write operations with preserving the correctness without any lockings. In the experiments, we implement three NWR-extended protocols: Silo+NWR, TicToc+NWR, and MVTO+NWR. Experimental results demonstrate the efficiency and the low-overhead property of the extended protocols. We confirm that NWR-extended protocols achieve more than 11x faster than the originals in the best case of highly write contended YCSB-A and comparable performance with the originals in the other workloads.