Researcher profile

George A. Constantinides

George A. Constantinides contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Direction-Preserving Number Representations

Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geometric framework is introduced for analyzing the directional coverage of such product-structured codes. This work analytically quantifies the suboptimality gap between such product-structured codes and spherical codes for the vector as a whole, in both low and asymptotically high dimensions. Furthermore, within the product code class, it is proven that the standard formats of two's complement, fixed-point, and floating-point are suboptimal, again with quantified gap, pointing to the potential to develop new scalar number formats. Such scalar alphabets are numerically optimized across multiple block dimensions for directional coverage, including the dimension used in NVIDIA's NVFP4 format. Experimental results are presented comparing the performance of standard formats and the optimized alphabet. We find that for four bits, NVIDIA's choice of E2M1 closely approximates the optimized alphabet, providing a geometric explanation for its strong performance in low-precision machine learning workloads and an analytical understanding of the link between that superiority and block size. We provide open-source formal proofs in Lean for the theorems in this work, along with the experimental code and the optimized alphabets obtained.

preprint2022arXiv

Abstract Interpretation on E-Graphs

Recent e-graph applications have typically considered concrete semantics of expressions, where the notion of equivalence stems from concrete interpretation of expressions. However, equivalences that hold over one interpretation may not hold in an alternative interpretation. Such an observation can be exploited. We consider the application of abstract interpretation to e-graphs, and show that within an e-graph, the lattice meet operation associated with the abstract domain has a natural interpretation for an e-class, leading to improved precision in over-approximation. In this extended abstract, we use Interval Arithmetic (IA) to illustrate this point.

preprint2022arXiv

Automatic Datapath Optimization using E-Graphs

Manual optimization of Register Transfer Level (RTL) datapath is commonplace in industry but holds back development as it can be very time consuming. We utilize the fact that a complex transformation of one RTL into another equivalent RTL can be broken down into a sequence of smaller, localized transformations. By representing RTL as a graph and deploying modern graph rewriting techniques we can automate the circuit design space exploration, allowing us to discover functionally equivalent but optimized architectures. We demonstrate that modern rewriting frameworks can adequately capture a wide variety of complex optimizations performed by human designers on bit-vector manipulating code, including significant error-prone subtleties regarding the validity of transformations under complex interactions of bitwidths. The proposed automated optimization approach is able to reproduce the results of typical industrial manual optimization, resulting in a reduction in circuit area by up to 71%. Not only does our tool discover optimized RTL, but also correctly identifies that the optimal architecture to implement a given arithmetic expression can depend on the width of the operands, thus producing a library of optimized designs rather than the single design point typically generated by manual optimization. In addition, we demonstrate that prior academic work on maximally exploiting carry-save representation and on multiple constant multiplication are both generalized and extended, falling out as special cases of this paper.

preprint2022arXiv

High-level Synthesis using the Julia Language

The growing proliferation of FPGAs and High-level Synthesis (HLS) tools has led to a large interest in designing hardware accelerators for complex operations and algorithms. However, existing HLS toolflows typically require a significant amount of user knowledge or training to be effective in both industrial and research applications. In this paper, we propose using the Julia language as the basis for an HLS tool. The Julia HLS tool aims to decrease the barrier to entry for hardware acceleration by taking advantage of the readability of the Julia language and by allowing the use of the existing large library of standard mathematical functions written in Julia. We present a prototype Julia HLS tool, written in Julia, that transforms Julia code to VHDL. We highlight how features of Julia and its compiler simplified the creation of this tool, and we discuss potential directions for future work.

preprint2022arXiv

Horizon-independent Preconditioner Design for Linear Predictive Control

First-order optimization solvers, such as the Fast Gradient Method, are increasingly being used to solve Model Predictive Control problems in resource-constrained environments. Unfortunately, the convergence rate of these solvers is significantly affected by the conditioning of the problem data, with ill-conditioned problems requiring a large number of iterations. To reduce the number of iterations required, we present a simple method for computing a horizon-independent preconditioning matrix for the Hessian of the condensed problem. The preconditioner is based on the block Toeplitz structure of the Hessian. Horizon-independence allows one to use only the predicted system and cost matrices to compute the preconditioner, instead of the full Hessian. The proposed preconditioner has equivalent performance to an optimal preconditioner in numerical examples, producing speedups between 2x and 9x for the Fast Gradient Method. Additionally, we derive horizon-independent spectral bounds for the Hessian in terms of the transfer function of the predicted system, and show how these can be used to compute a novel horizon-independent bound on the condition number for the preconditioned Hessian.

preprint2022arXiv

Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs than via the direct use of off-the-shelf, hand-designed networks. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. Choosing appropriate K a priori is challenging, and doing so at even high granularity, e.g. per layer, is a time-consuming and error-prone process that leaves FPGAs' spatial flexibility underexploited. Furthermore, prior works see LUT inputs connected randomly, which does not guarantee a good choice of network topology. To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference. By removing LUT inputs determined to be of low importance, our method increases the efficiency of the resultant accelerators. Our GPU-friendly solution to LUT input removal is capable of processing large topologies during their training with negligible slowdown. With logic shrinkage, we better the area and energy efficiency of the best-performing LUTNet implementation of the CNV network classifying CIFAR-10 by 1.54x and 1.31x, respectively, while matching its accuracy. This implementation also reaches 2.71x the area efficiency of an equally accurate, heavily pruned BNN. On ImageNet with the Bi-Real Net architecture, employment of logic shrinkage results in a post-synthesis area reduction of 2.67x vs LUTNet, allowing for implementation that was previously impossible on today's largest FPGAs.

preprint2022arXiv

Nonideality-Aware Training for Accurate and Robust Low-Power Memristive Neural Networks

Recent years have seen a rapid rise of artificial neural networks being employed in a number of cognitive tasks. The ever-increasing computing requirements of these structures have contributed to a desire for novel technologies and paradigms, including memristor-based hardware accelerators. Solutions based on memristive crossbars and analog data processing promise to improve the overall energy efficiency. However, memristor nonidealities can lead to the degradation of neural network accuracy, while the attempts to mitigate these negative effects often introduce design trade-offs, such as those between power and reliability. In this work, we design nonideality-aware training of memristor-based neural networks capable of dealing with the most common device nonidealities. We demonstrate the feasibility of using high-resistance devices that exhibit high $I$-$V$ nonlinearity -- by analyzing experimental data and employing nonideality-aware training, we estimate that the energy efficiency of memristive vector-matrix multipliers is improved by three orders of magnitude ($0.715\ \mathrm{TOPs}^{-1}\mathrm{W}^{-1}$ to $381\ \mathrm{TOPs}^{-1}\mathrm{W}^{-1}$) while maintaining similar accuracy. We show that associating the parameters of neural networks with individual memristors allows to bias these devices towards less conductive states through regularization of the corresponding optimization problem, while modifying the validation procedure leads to more reliable estimates of performance. We demonstrate the universality and robustness of our approach when dealing with a wide range of nonidealities.

preprint2020arXiv

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is capable of implementing far more than an XNOR: it can perform any K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We describe the realization of both unrolled and tiled LUTNet architectures, with the latter facilitating smaller, less power-hungry deployment over the former while sacrificing area and energy efficiency along with throughput. For both varieties, we demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarized neural network implementation, we achieve up to twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable.