Researcher profile

Akira Sakai

Akira Sakai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation

Transformers operate as horizontal token-by-token scanners; at each generation step, attending to an ever-growing sequence of token-level states. This access pattern increases prefill latency and makes long-context decoding more memory-bound, as KV-cache reads and writes dominate inference time over arithmetic operations. We propose Parallel Hierarchical Operation for TOp-down Networks (PHOTON), a hierarchical autoregressive model that replaces horizontal scanning with vertical, multi-resolution context scanning. PHOTON maintains a hierarchy of latent streams: a bottom-up encoder compresses tokens into low-rate contextual states, while lightweight top-down decoders reconstruct fine-grained token representations in parallel. We further introduce recursive generation that updates only the coarsest latent stream and eliminates bottom-up re-encoding. Experimental results show that PHOTON is superior to competitive Transformer-based language models regarding the throughput-quality trade-off, providing advantages in long-context and multi-query tasks. In particular, this reduces decode-time KV-cache traffic, yielding up to $10^{3}\times$ higher throughput per unit memory.

preprint2025arXiv

More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization

For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the scaling parameters of DBF are too restrictive; after factoring out signs, all rank components share the same magnitude profile, resulting in performance saturation. We propose Multi-envelope DBF (MDBF), which retains a shared pair of 1-bit sign bases but replaces the single envelope with a rank-$l$ envelope. By sharing sign matrices among envelope components, MDBF effectively maintains a binary carrier and utilizes the limited memory budget for magnitude expressiveness. We also introduce a closed-form initialization and an alternating refinement method to optimize MDBF. Across the LLaMA and Qwen families, MDBF enhances perplexity and zero-shot accuracy over previous binary formats at matched bits per weight while preserving the same deployment-friendly inference primitive.

preprint2023arXiv

Mixing time and simulated annealing for the stochastic cellular automata

Finding a ground state of a given Hamiltonian of an Ising model on a graph $G=(V,E)$ is an important but hard problem. The standard approach for this kind of problem is the application of algorithms that rely on single-spin-flip Markov chain Monte Carlo methods, such as the simulated annealing based on Glauber or Metropolis dynamics. In this paper, we investigate a particular kind of stochastic cellular automata, in which all spins are updated independently and simultaneously. We prove that (i) if the temperature is fixed sufficiently high, then the mixing time is at most of order $\log|V|$, and that (ii) if the temperature drops in time $n$ as $1/\log n$, then the limiting measure is uniformly distributed over the ground states. We also provide some simulations of the algorithms studied in this paper implemented on a GPU and show their superior performance compared to the conventional simulated annealing.

preprint2022arXiv

Molecular beam homoepitaxy of N-polar AlN: enabling role of Al-assisted surface cleaning

N-polar aluminum nitride (AlN) is an important building block for next-generation high-power RF electronics. We report successful homoepitaxial growth of N-polar AlN by molecular beam epitaxy (MBE) on large-area cost-effective N-polar AlN templates. Direct growth without any in-situ surface cleaning leads to films with inverted Al-polarity. It is found that Al-assisted cleaning before growth enables the epitaxial film to maintain N-polarity. The grown N-polar AlN epilayer with its smooth, pit-free surface duplicates the structural quality of the substrate as evidenced by a clean and smooth growth interface with no noticeable extended defects generation. Near band-edge photoluminescence peaks are observed at room temperature on samples with MBE-grown layers but not on the bare AlN substrates, implying the suppression of non-radiative recombination centers in the epitaxial N-polar AlN. These results are pivotal steps towards future high-power RF electronics and deep ultraviolet photonics based on the N-polar AlN platform.

preprint2022arXiv

Three approaches to facilitate DNN generalization to objects in out-of-distribution orientations and illuminations

The training data distribution is often biased towards objects in certain orientations and illumination conditions. While humans have a remarkable capability of recognizing objects in out-of-distribution (OoD) orientations and illuminations, Deep Neural Networks (DNNs) severely suffer in this case, even when large amounts of training examples are available. In this paper, we investigate three different approaches to improve DNNs in recognizing objects in OoD orientations and illuminations. Namely, these are (i) training much longer after convergence of the in-distribution (InD) validation accuracy, i.e., late-stopping, (ii) tuning the momentum parameter of the batch normalization layers, and (iii) enforcing invariance of the neural activity in an intermediate layer to orientation and illumination conditions. Each of these approaches substantially improves the DNN's OoD accuracy (more than 20% in some cases). We report results in four datasets: two datasets are modified from the MNIST and iLab datasets, and the other two are novel (one of 3D rendered cars and another of objects taken from various controlled orientations and illumination conditions). These datasets allow to study the effects of different amounts of bias and are challenging as DNNs perform poorly in OoD conditions. Finally, we demonstrate that even though the three approaches focus on different aspects of DNNs, they all tend to lead to the same underlying neural mechanism to enable OoD accuracy gains --individual neurons in the intermediate layers become more selective to a category and also invariant to OoD orientations and illuminations. We anticipate this study to be a basis for further improvement of deep neural networks' OoD generalization performance, which is highly demanded to achieve safe and fair AI applications.

preprint2021arXiv

Stability of energy landscape for Ising models

In this paper, we explore the stability of the energy landscape of an Ising Hamiltonian when subjected to two kinds of perturbations: a perturbation on the coupling coefficients and external fields, and a perturbation on the underlying graph structure. We give sufficient conditions so that the ground states of a given Hamiltonian are stable under perturbations of the first kind in terms of order preservation. Here by order preservation we mean that the ordering of energy corresponding to two spin configurations in a perturbed Hamiltonian will be preserved in the original Hamiltonian up to a given error margin. We also estimate the probability that the energy gap between ground states for the original Hamiltonian and the perturbed Hamiltonian is bounded by a given error margin when the coupling coefficients and local external magnetic fields of the original Hamiltonian are i.i.d. Gaussian random variables. In the end we show a concrete example of a system which is stable under perturbations of the second kind.