Source author record

Neal Livesay

Neal Livesay appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Distributed, Parallel, and Cluster Computing Hardware Architecture math.AG math.DG math.RT Performance

Catalog footprint

What is connected

2works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

Homomorphic Encryption (HE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does HE offer an attractive solution for security in cloud systems, but lattice-based HE systems are also believed to be resistant to attacks by quantum computers. However, current HE implementations suffer from prohibitively high latency. For lattice-based HE to become viable for real-world systems, it is necessary for the key bottlenecks - particularly polynomial multiplication - to be highly efficient. In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations. Next, we explore the following GPU-specific improvements for polynomial multiplication targeted at optimizing latency and throughput: 1) We present a 2D mixed-radix, multi-block implementation of NTT that results in a 1.85x average speedup over the previous state-of-the-art. 2) We explore shared memory optimizations aimed at reducing redundant memory accesses, further improving speedups by 1.2x. 3) Finally, we fuse the Hadamard product with neighboring stages of the NTT, reducing the twiddle factor memory footprint by 50%. By combining our NTT optimizations, we achieve an overall speedup of 123.13x and 2.37x over the previous state-of-the-art CPU and GPU implementations of NTT kernels, respectively.

preprint2022arXiv

The Deligne-Simpson problem for connections on $\mathbb{G}_m$ with a maximally ramified singularity

The classical additive Deligne-Simpson problem is the existence problem for Fuchsian connections with residues at the singular points in specified adjoint orbits. Crawley-Boevey found the solution in 2003 by reinterpreting the problem in terms of quiver varieties. A more general version of this problem, solved by Hiroe, allows additional unramified irregular singularities. We apply the theory of fundamental and regular strata due to Bremer and Sage to formulate a version of the Deligne-Simpson problem in which certain ramified singularities are allowed. These allowed singular points are called toral singularities; they are singularities whose leading term with respect to a lattice chain filtration is regular semisimple. We solve this problem in the important special case of connections on $\mathbb{G}_m$ with a maximally ramified singularity at $0$ and possibly an additional regular singular point at infinity. We also give a complete characterization of all such connections which are rigid, under the additional hypothesis of unipotent monodromy at infinity.

Neal Livesay

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

The Deligne-Simpson problem for connections on $\mathbb{G}_m$ with a maximally ramified singularity