Source author record

Akira Sakai

Akira Sakai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math-ph math.MP math.PR Artificial Intelligence Machine Learning Computation and Language Computer Vision cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing math.OC physics.app-ph

Catalog footprint

What is connected

9works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation

Transformers operate as horizontal token-by-token scanners; at each generation step, attending to an ever-growing sequence of token-level states. This access pattern increases prefill latency and makes long-context decoding more memory-bound, as KV-cache reads and writes dominate inference time over arithmetic operations. We propose Parallel Hierarchical Operation for TOp-down Networks (PHOTON), a hierarchical autoregressive model that replaces horizontal scanning with vertical, multi-resolution context scanning. PHOTON maintains a hierarchy of latent streams: a bottom-up encoder compresses tokens into low-rate contextual states, while lightweight top-down decoders reconstruct fine-grained token representations in parallel. We further introduce recursive generation that updates only the coarsest latent stream and eliminates bottom-up re-encoding. Experimental results show that PHOTON is superior to competitive Transformer-based language models regarding the throughput-quality trade-off, providing advantages in long-context and multi-query tasks. In particular, this reduces decode-time KV-cache traffic, yielding up to $10^{3}\times$ higher throughput per unit memory.

preprint2025arXiv

More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization

For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the scaling parameters of DBF are too restrictive; after factoring out signs, all rank components share the same magnitude profile, resulting in performance saturation. We propose Multi-envelope DBF (MDBF), which retains a shared pair of 1-bit sign bases but replaces the single envelope with a rank-$l$ envelope. By sharing sign matrices among envelope components, MDBF effectively maintains a binary carrier and utilizes the limited memory budget for magnitude expressiveness. We also introduce a closed-form initialization and an alternating refinement method to optimize MDBF. Across the LLaMA and Qwen families, MDBF enhances perplexity and zero-shot accuracy over previous binary formats at matched bits per weight while preserving the same deployment-friendly inference primitive.

preprint2023arXiv

Mixing time and simulated annealing for the stochastic cellular automata

Finding a ground state of a given Hamiltonian of an Ising model on a graph $G=(V,E)$ is an important but hard problem. The standard approach for this kind of problem is the application of algorithms that rely on single-spin-flip Markov chain Monte Carlo methods, such as the simulated annealing based on Glauber or Metropolis dynamics. In this paper, we investigate a particular kind of stochastic cellular automata, in which all spins are updated independently and simultaneously. We prove that (i) if the temperature is fixed sufficiently high, then the mixing time is at most of order $\log|V|$, and that (ii) if the temperature drops in time $n$ as $1/\log n$, then the limiting measure is uniformly distributed over the ground states. We also provide some simulations of the algorithms studied in this paper implemented on a GPU and show their superior performance compared to the conventional simulated annealing.

preprint2022arXiv

Molecular beam homoepitaxy of N-polar AlN: enabling role of Al-assisted surface cleaning

N-polar aluminum nitride (AlN) is an important building block for next-generation high-power RF electronics. We report successful homoepitaxial growth of N-polar AlN by molecular beam epitaxy (MBE) on large-area cost-effective N-polar AlN templates. Direct growth without any in-situ surface cleaning leads to films with inverted Al-polarity. It is found that Al-assisted cleaning before growth enables the epitaxial film to maintain N-polarity. The grown N-polar AlN epilayer with its smooth, pit-free surface duplicates the structural quality of the substrate as evidenced by a clean and smooth growth interface with no noticeable extended defects generation. Near band-edge photoluminescence peaks are observed at room temperature on samples with MBE-grown layers but not on the bare AlN substrates, implying the suppression of non-radiative recombination centers in the epitaxial N-polar AlN. These results are pivotal steps towards future high-power RF electronics and deep ultraviolet photonics based on the N-polar AlN platform.

preprint2022arXiv

Three approaches to facilitate DNN generalization to objects in out-of-distribution orientations and illuminations

The training data distribution is often biased towards objects in certain orientations and illumination conditions. While humans have a remarkable capability of recognizing objects in out-of-distribution (OoD) orientations and illuminations, Deep Neural Networks (DNNs) severely suffer in this case, even when large amounts of training examples are available. In this paper, we investigate three different approaches to improve DNNs in recognizing objects in OoD orientations and illuminations. Namely, these are (i) training much longer after convergence of the in-distribution (InD) validation accuracy, i.e., late-stopping, (ii) tuning the momentum parameter of the batch normalization layers, and (iii) enforcing invariance of the neural activity in an intermediate layer to orientation and illumination conditions. Each of these approaches substantially improves the DNN's OoD accuracy (more than 20% in some cases). We report results in four datasets: two datasets are modified from the MNIST and iLab datasets, and the other two are novel (one of 3D rendered cars and another of objects taken from various controlled orientations and illumination conditions). These datasets allow to study the effects of different amounts of bias and are challenging as DNNs perform poorly in OoD conditions. Finally, we demonstrate that even though the three approaches focus on different aspects of DNNs, they all tend to lead to the same underlying neural mechanism to enable OoD accuracy gains --individual neurons in the intermediate layers become more selective to a category and also invariant to OoD orientations and illuminations. We anticipate this study to be a basis for further improvement of deep neural networks' OoD generalization performance, which is highly demanded to achieve safe and fair AI applications.

preprint2021arXiv

Stability of energy landscape for Ising models

In this paper, we explore the stability of the energy landscape of an Ising Hamiltonian when subjected to two kinds of perturbations: a perturbation on the coupling coefficients and external fields, and a perturbation on the underlying graph structure. We give sufficient conditions so that the ground states of a given Hamiltonian are stable under perturbations of the first kind in terms of order preservation. Here by order preservation we mean that the ordering of energy corresponding to two spin configurations in a perturbed Hamiltonian will be preserved in the original Hamiltonian up to a given error margin. We also estimate the probability that the energy gap between ground states for the original Hamiltonian and the perturbed Hamiltonian is bounded by a given error margin when the coupling coefficients and local external magnetic fields of the original Hamiltonian are i.i.d. Gaussian random variables. In the end we show a concrete example of a system which is stable under perturbations of the second kind.

preprint2016arXiv

The quenched critical point for self-avoiding walk on random conductors

Following similar analysis to that in Lacoin (PTRF 159, 777-808, 2014), we can show that the quenched critical point for self-avoiding walk on random conductors on the d-dimensional integer lattice is almost surely a constant, which does not depend on the location of the reference point. We provide its upper and lower bounds that are valid for all dimensions.

preprint2015arXiv

Critical two-point functions for long-range statistical-mechanical models in high dimensions

We consider long-range self-avoiding walk, percolation and the Ising model on $\mathbb{Z}^d$ that are defined by power-law decaying pair potentials of the form $D(x)\asymp|x|^{-d-α}$ with $α>0$. The upper-critical dimension $d_{\mathrm{c}}$ is $2(α\wedge2)$ for self-avoiding walk and the Ising model, and $3(α\wedge2)$ for percolation. Let $α\ne2$ and assume certain heat-kernel bounds on the $n$-step distribution of the underlying random walk. We prove that, for $d>d_{\mathrm{c}}$ (and the spread-out parameter sufficiently large), the critical two-point function $G_{p_{\mathrm{c}}}(x)$ for each model is asymptotically $C|x|^{α\wedge2-d}$, where the constant $C\in(0,\infty)$ is expressed in terms of the model-dependent lace-expansion coefficients and exhibits crossover between $α<2$ and $α>2$. We also provide a class of random walks that satisfy those heat-kernel bounds.

preprint2011arXiv

Asymptotic behavior of the gyration radius for long-range self-avoiding walk and long-range oriented percolation

We consider random walk and self-avoiding walk whose 1-step distribution is given by $D$, and oriented percolation whose bond-occupation probability is proportional to $D$. Suppose that $D(x)$ decays as $|x|^{-d-α}$ with $α>0$. For random walk in any dimension $d$ and for self-avoiding walk and critical/subcritical oriented percolation above the common upper-critical dimension $d_{\mathrm{c}}\equiv2(α\wedge2)$, we prove large-$t$ asymptotics of the gyration radius, which is the average end-to-end distance of random walk/self-avoiding walk of length $t$ or the average spatial size of an oriented percolation cluster at time $t$. This proves the conjecture for long-range self-avoiding walk in [Ann. Inst. H. Poincaré Probab. Statist. (2010), to appear] and for long-range oriented percolation in [Probab. Theory Related Fields 142 (2008) 151--188] and [Probab. Theory Related Fields 145 (2009) 435--458].