Researcher profile

Lior Horesh

Lior Horesh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2023arXiv

AI Descartes: Combining Data and Theory for Derivable Scientific Discovery

Scientists have long aimed to discover meaningful formulae which accurately describe experimental data. A common approach is to manually create mathematical models of natural phenomena using domain knowledge, and then fit these models to data. In contrast, machine-learning algorithms automate the construction of accurate data-driven models while consuming large amounts of data. The problem of incorporating prior knowledge in the form of constraints on the functional form of a learned model (e.g., nonnegativity) has been explored in the literature. However, finding models that are consistent with prior knowledge expressed in the form of general logical axioms (e.g., conservation of energy) is an open problem. We develop a method to enable principled derivations of models of natural phenomena from axiomatic knowledge and experimental data by combining logical reasoning with symbolic regression. We demonstrate these concepts for Kepler's third law of planetary motion, Einstein's relativistic time-dilation law, and Langmuir's theory of adsorption, automatically connecting experimental data with background theory in each case. We show that laws can be discovered from few data points when using formal logical reasoning to distinguish the correct formula from a set of plausible formulas that have similar error on the data. The combination of reasoning with machine learning provides generalizeable insights into key aspects of natural phenomena. We envision that this combination will enable derivable discovery of fundamental laws of science and believe that our work is an important step towards automating the scientific method.

preprint2022arXiv

Combining Fast and Slow Thinking for Human-like and Efficient Navigation in Constrained Environments

Current AI systems lack several important human capabilities, such as adaptability, generalizability, self-control, consistency, common sense, and causal reasoning. We believe that existing cognitive theories of human decision making, such as the thinking fast and slow theory, can provide insights on how to advance AI systems towards some of these capabilities. In this paper, we propose a general architecture that is based on fast/slow solvers and a metacognitive component. We then present experimental results on the behavior of an instance of this architecture, for AI systems that make decisions about navigating in a constrained environment. We show how combining the fast and slow decision modalities allows the system to evolve over time and gradually pass from slow to fast thinking with enough experience, and that this greatly helps in decision quality, resource consumption, and efficiency.

preprint2022arXiv

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.

preprint2022arXiv

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.

preprint2022arXiv

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of attention in recent years; we obtain sharper bounds for these problems. More significantly, we achieve these improvements by arguing that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise; these are powerful and standard techniques in randomized numerical linear algebra. With this recognition, we are able to employ the large body of work in numerical linear algebra to obtain algorithms for these problems that are simpler or faster (or both) than existing approaches. Our experiments demonstrate that the proposed data structures also work well on real-world datasets.

preprint2022arXiv

Representation of the Fermionic Boundary Operator

The boundary operator is a linear operator that acts on a collection of high-dimensional binary points (simplices) and maps them to their boundaries. This boundary map is one of the key components in numerous applications, including differential equations, machine learning, computational geometry, machine vision and control systems. We consider the problem of representing the full boundary operator on a quantum computer. We first prove that the boundary operator has a special structure in the form of a complete sum of fermionic creation and annihilation operators. We then use the fact that these operators pairwise anticommute to produce an $O(n)$-depth circuit that exactly implements the boundary operator without any Trotterization or Taylor series approximation errors. Having fewer errors reduces the number of shots required to obtain desired accuracies.

preprint2021arXiv

Fast randomized non-Hermitian eigensolver based on rational filtering and matrix partitioning

This paper describes a set of rational filtering algorithms to compute a few eigenvalues (and associated eigenvectors) of non-Hermitian matrix pencils. Our interest lies in computing eigenvalues located inside a given disk, and the proposed algorithms approximate these eigenvalues and associated eigenvectors by harmonic Rayleigh-Ritz projections on subspaces built by computing range spaces of rational matrix functions through randomized range finders. These rational matrix functions are designed so that directions associated with non-sought eigenvalues are dampened to (approximately) zero. Variants based on matrix partitionings are introduced to further reduce the overall complexity of the proposed framework. Compared with existing eigenvalue solvers based on rational matrix functions, the proposed technique requires no estimation of the number of eigenvalues located inside the disk. Several theoretical and practical issues are discussed, and the competitiveness of the proposed framework is demonstrated via numerical experiments.

preprint2021arXiv

Sparse graph based sketching for fast numerical linear algebra

In recent years, a variety of randomized constructions of sketching matrices have been devised, that have been used in fast algorithms for numerical linear algebra problems, such as least squares regression, low-rank approximation, and the approximation of leverage scores. A key property of sketching matrices is that of subspace embedding. In this paper, we study sketching matrices that are obtained from bipartite graphs that are sparse, i.e., have left degree~s that is small. In particular, we explore two popular classes of sparse graphs, namely, expander graphs and magical graphs. For a given subspace $\mathcal{U} \subseteq \mathbb{R}^n$ of dimension $k$, we show that the magical graph with left degree $s=2$ yields a $(1\pm ε)$ ${\ell}_2$-subspace embedding for $\mathcal{U}$, if the number of right vertices (the sketch size) $m = \mathcal{O}({k^2}/{ε^2})$. The expander graph with $s = \mathcal{O}({\log k}/ε)$ yields a subspace embedding for $m = \mathcal{O}({k \log k}/{ε^2})$. We also discuss the construction of sparse sketching matrices with reduced randomness using expanders based on error-correcting codes. Empirical results on various synthetic and real datasets show that these sparse graph sketching matrices work very well in practice.

preprint2020arXiv

Denoising quantum states with Quantum Autoencoders -- Theory and Applications

We implement a Quantum Autoencoder (QAE) as a quantum circuit capable of correcting Greenberger-Horne-Zeilinger (GHZ) states subject to various noisy quantum channels : the bit-flip channel and the more general quantum depolarizing channel. The QAE shows particularly interesting results, as it enables to perform an almost perfect reconstruction of noisy states, but can also, more surprisingly, act as a generative model to create noise-free GHZ states. Finally, we detail a useful application of QAEs : Quantum Secret Sharing (QSS). We analyze how noise corrupts QSS, causing it to fail, and show how the QAE allows the QSS protocol to succeed even in the presence of noise.

preprint2020arXiv

Pareto-Efficient Quantum Circuit Simulation Using Tensor Contraction Deferral

With the current rate of progress in quantum computing technologies, systems with more than 50 qubits will soon become reality. Computing ideal quantum state amplitudes for circuits of such and larger sizes is a fundamental step to assess both the correctness, performance, and scaling behavior of quantum algorithms and the fidelities of quantum devices. However, resource requirements for such calculations on classical computers grow exponentially. We show that deferring tensor contractions can extend the boundaries of what can be computed on classical systems. To demonstrate this technique, we present results obtained from a calculation of the complete set of output amplitudes of a universal random circuit with depth 27 in a 2D lattice of $7 \times 7$ qubits, and an arbitrarily selected slice of $2^{37}$ amplitudes of a universal random circuit with depth 23 in a 2D lattice of $8 \times 7$ qubits. Combining our methodology with other decomposition approaches found in the literature, we show that we can simulate $7 \times 7$-qubit random circuits to arbitrary depth by leveraging secondary storage. These calculations were thought to be impossible due to resource requirements.

preprint2020arXiv

Symbolic Regression using Mixed-Integer Nonlinear Optimization

The Symbolic Regression (SR) problem, where the goal is to find a regression function that does not have a pre-specified form but is any function that can be composed of a list of operators, is a hard problem in machine learning, both theoretically and computationally. Genetic programming based methods, that heuristically search over a very large space of functions, are the most commonly used methods to tackle SR problems. An alternative mathematical programming approach, proposed in the last decade, is to express the optimal symbolic expression as the solution of a system of nonlinear equations over continuous and discrete variables that minimizes a certain objective, and to solve this system via a global solver for mixed-integer nonlinear programming problems. Algorithms based on the latter approach are often very slow. We propose a hybrid algorithm that combines mixed-integer nonlinear optimization with explicit enumeration and incorporates constraints from dimensional analysis. We show that our algorithm is competitive, for some synthetic data sets, with a state-of-the-art SR software and a recent physics-inspired method called AI Feynman.

preprint2019arXiv

Tensor-Tensor Products for Optimal Representation and Compression

In this era of big data, data analytics and machine learning, it is imperative to find ways to compress large data sets such that intrinsic features necessary for subsequent analysis are not lost. The traditional workhorse for data dimensionality reduction and feature extraction has been the matrix SVD, which presupposes that the data has been arranged in matrix format. Our main goal in this study is to show that high-dimensional data sets are more compressible when treated as tensors (aka multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product structures in (Kilmer and Martin, 2011; Kernfeld et al., 2015). We begin by proving Eckart Young optimality results for families of tensor-SVDs under two different truncation strategies. As such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is yes, as shown when we prove that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then investigate how the compressed representation provided by the truncated tensor-SVD is related both theoretically and in compression performance to its closest tensor-based analogue, truncated HOSVD (De Lathauwer et al., 2000; De Lathauwer and Vandewalle, 2004), thereby showing the potential advantages of our tensor-based algorithms. Finally, we propose new tensor truncated SVD variants, namely multi-way tensor SVDs, provide further approximated representation efficiency and discuss under which conditions they are considered optimal. We conclude with a numerical study demonstrating the utility of the theory.