Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

A Theoretical Game of Attacks via Compositional Skills

As large language models grow increasingly capable, concerns about their safe deployment have intensified. While numerous alignment strategies aim to restrict harmful behavior, these defenses can still be circumvented through carefully designed adversarial prompts. In this work, we introduce a theoretical framework that formalizes a game between an attacker and a defender. Within this framework, we design a theoretical best-response attack strategy and show that it is closely related to many existing adversarial prompting methods. We further analyze the resulting game, characterize its equilibria, and reveal inherent advantages for the attacker. Drawing on our theoretical analysis, we also derive a provably optimal defense strategy. Empirically, we evaluate a practical instantiation of the theoretically optimal attack and observe stronger performance relative to existing adversarial prompting approaches in diverse settings encompassing different LLMs and benchmarks.

preprint2026arXiv

Combinatorial Creativity: A New Frontier in Generalization Abilities

Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Though our findings persist up to the 100M scale, frontier models today are well into the billions of parameters. Therefore, our conceptual framework and empirical findings can best serve as a starting point for understanding and improving the creativity of frontier-size models today, as we begin to bridge the gap between human and machine intelligence.

preprint2026arXiv

Containment Verification: AI Safety Guarantees Independent of Alignment

Agentic frameworks are the software layer through which AI agents act in the world. Existing safety methods intervene on the model and therefore remain conditional on unverifiable properties of learned behavior. We introduce containment verification, which locates safety guarantees in the agentic framework itself. Under havoc oracle semantics, the AI is modeled as an unconstrained oracle ranging over the entire typed action space, and the verified containment layer must enforce the boundary policy for every possible AI output. For boundary-enforceable properties, expressed over modeled boundary events, action arguments, and state, we prove a universal guarantee by forward-simulation refinement and mechanize it in Dafny. We instantiate the paradigm by verifying PocketFlow, a minimalist agentic LLM framework, and use an agentic synthesis pipeline to generate the specification, operational model, and refinement proof under an information barrier against tautological specifications. To our knowledge, this is the first deductive formal verification of an agentic framework, and its guarantee is invariant to model capability over the modeled typed action boundary.

preprint2026arXiv

Context-Gated Associative Retrieval: From Theory to Transformers

Hopfield networks and their generalizations have established deep connections among biological associative memories, statistical physics, and transformers. Yet most models treat retrieval as a fixed query-to-memory mapping, ignoring the role of external context in recall. In this work, we propose a two-stage associative memory architecture, wherein a context-gate subcircuit reshapes the retrieval energy landscape before and during recall. We show theoretically that context gating increases inter-memory separation while inducing sparsity, translating into exponential improvements in retrieval. Crucially, we prove that the system admits a unique self-consistent fixed point, revealing that the resulting retrieval state is driven by both a direct contextual bias and a second-order retrieval-gate feedback loop. We then bridge this theory to transformers; specifically, we evaluate a first-order approximation on Llama-3, confirming that in-context learning acts as context-gated retrieval. Native dynamics mirror our theory: context localizes a memory subspace, enabling the zero-shot query to cleanly discriminate. Ultimately, this framework provides a mechanistic link between associative memory theory and LLM phenomenology.

preprint2022arXiv

Accelerated Design and Deployment of Low-Carbon Concrete for Data Centers

Concrete is the most widely used engineered material in the world with more than 10 billion tons produced annually. Unfortunately, with that scale comes a significant burden in terms of energy, water, and release of greenhouse gases and other pollutants; indeed 8% of worldwide carbon emissions are attributed to the production of cement, a key ingredient in concrete. As such, there is interest in creating concrete formulas that minimize this environmental burden, while satisfying engineering performance requirements including compressive strength. Specifically for computing, concrete is a major ingredient in the construction of data centers. In this work, we use conditional variational autoencoders (CVAEs), a type of semi-supervised generative artificial intelligence (AI) model, to discover concrete formulas with desired properties. Our model is trained just using a small open dataset from the UCI Machine Learning Repository joined with environmental impact data from standard lifecycle analysis. Computational predictions demonstrate CVAEs can design concrete formulas with much lower carbon requirements than existing formulations while meeting design requirements. Next we report laboratory-based compressive strength experiments for five AI-generated formulations, which demonstrate that the formulations exceed design requirements. The resulting formulations were then used by Ozinga Ready Mix -- a concrete supplier -- to generate field-ready concrete formulations, based on local conditions and their expertise in concrete design. Finally, we report on how these formulations were used in the construction of buildings and structures in a Meta data center in DeKalb, IL, USA. Results from field experiments as part of this real-world deployment corroborate the efficacy of AI-generated low-carbon concrete mixes.

preprint2022arXiv

Advanced Methods for Connectome-Based Predictive Modeling of Human Intelligence: A Novel Approach Based on Individual Differences in Cortical Topography

Individual differences in human intelligence can be modeled and predicted from in vivo neurobiological connectivity. Many established modeling frameworks for predicting intelligence, however, discard higher-order information about individual differences in brain network topology, and show only moderate performance when generalized to make predictions in out-of-sample subjects. In this paper, we propose that connectome-based predictive modeling, a common predictive modeling framework for neuroscience data, can be productively modified to incorporate information about brain network topology and individual differences via the incorporation of bagged decision trees and the network based statistic. These modifications produce a novel predictive modeling framework that leverages individual differences in cortical tractography to generate accurate regression predictions of intelligence scores. Network topology-based feature selection provides for natively interpretable networks as input features, increasing the model's explainability. Investigating the proposed modeling framework's efficacy, we find that advanced connectome-based predictive modeling generates neuroscience predictions that account for a significantly greater proportion of variance in general intelligence scores than previously established methods, advancing our scientific understanding of the network architecture that underlies human intelligence.

preprint2022arXiv

Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts

Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.

preprint2022arXiv

The CEO Problem with $r$th Power of Difference and Logarithmic Distortions

The CEO problem has received much attention since first introduced by Berger et al., but there are limited results on non-Gaussian models with non-quadratic distortion measures. In this work, we extend the quadratic Gaussian CEO problem to two non-Gaussian settings with general $r$th power of difference distortion. Assuming an identical observation channel across agents, we study the asymptotics of distortion decay as the number of agents and sum-rate, $R_{sum}$, grow without bound, while individual rates vanish. The first setting is a regular source-observation model with $r$th power of difference distortion, which subsumes the quadratic Gaussian CEO problem, and we establish that the distortion decays at $\mathcal{O}(R_{sum}^{-r/2})$ when $r \ge 2$. We use sample median estimation after the Berger-Tung scheme for achievability. The other setting is a \emph{non-regular} source-observation model, including uniform additive noise models, with $r$th power of difference distortion for which estimation-theoretic regularity conditions do not hold. The distortion decay $\mathcal{O}(R_{sum}^{-r})$ when $r \ge 1$ is obtained for the non-regular model by midrange estimator following the Berger-Tung scheme. We also provide converses based on the Shannon lower bound for the regular model and the Chazan-Zakai-Ziv bound for the non-regular model, respectively. Lastly, we provide a sufficient condition for the regular model, under which quadratic and logarithmic distortions are asymptotically equivalent by an entropy power relationship as the number of agents grows. This proof relies on the Bernstein-von Mises theorem.

preprint2021arXiv

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order $\mathcal{O}(\sqrt{(K+α(G)d)T\log{K}})$ over the time horizon $T$, where $α(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{α(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

preprint2021arXiv

Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity

Neural text decoding is important for generating high-quality texts using language models. To generate high-quality text, popular decoding algorithms like top-k, top-p (nucleus), and temperature-based sampling truncate or distort the unreliable low probability tail of the language model. Though these methods generate high-quality text after parameter tuning, they are ad hoc. Not much is known about the control they provide over the statistics of the output, which is important since recent reports show text quality is highest for a specific range of likelihoods. Here, first we provide a theoretical analysis of perplexity in top-k, top-p, and temperature sampling, finding that cross-entropy behaves approximately linearly as a function of p in top-p sampling whereas it is a nonlinear function of k in top-k sampling, under Zipfian statistics. We use this analysis to design a feedback-based adaptive top-k text decoding algorithm called mirostat that generates text (of any length) with a predetermined value of perplexity, and thereby high-quality text without any tuning. Experiments show that for low values of k and p in top-k and top-p sampling, perplexity drops significantly with generated text length, which is also correlated with excessive repetitions in the text (the boredom trap). On the other hand, for large values of k and p, we find that perplexity increases with generated text length, which is correlated with incoherence in the text (confusion trap). Mirostat avoids both traps: experiments show that cross-entropy has a near-linear relation with repetition in generated text. This relation is almost independent of the sampling method but slightly dependent on the model used. Hence, for a given language model, control over perplexity also gives control over repetitions. Experiments with human raters for fluency, coherence, and quality further verify our findings.

preprint2021arXiv

The Twelvefold Way of Non-Sequential Lossless Compression

Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions. We consider an entire classification of these invariances called the twelvefold way in enumerative combinatorics and develop a method to characterize lossless compression limits. Explicit computations for all twelve settings are carried out for i.i.d. uniform and Bernoulli distributions. Comparisons among settings provide quantitative insight.

preprint2021arXiv

Wireless Power Transfer for Future Networks: Signal Processing, Machine Learning, Computing, and Sensing

Wireless power transfer (WPT) is an emerging paradigm that will enable using wireless to its full potential in future networks, not only to convey information but also to deliver energy. Such networks will enable trillions of future low-power devices to sense, compute, connect, and energize anywhere, anytime, and on the move. The design of such future networks brings new challenges and opportunities for signal processing, machine learning, sensing, and computing so as to make the best use of the RF radiations, spectrum, and network infrastructure in providing cost-effective and real-time power supplies to wireless devices and enable wireless-powered applications. In this paper, we first review recent signal processing techniques to make WPT and wireless information and power transfer as efficient as possible. Topics include power amplifier and energy harvester nonlinearities, active and passive beamforming, intelligent reflecting surfaces, receive combining with multi-antenna harvester, modulation, coding, waveform, massive MIMO, channel acquisition, transmit diversity, multi-user power region characterization, coordinated multipoint, and distributed antenna systems. Then, we overview two different design methodologies: the model and optimize approach relying on analytical system models, modern convex optimization, and communication theory, and the learning approach based on data-driven end-to-end learning and physics-based learning. We discuss the pros and cons of each approach, especially when accounting for various nonlinearities in wireless-powered networks, and identify interesting emerging opportunities for the approaches to complement each other. Finally, we identify new emerging wireless technologies where WPT may play a key role -- wireless-powered mobile edge computing and wireless-powered sensing -- arguing WPT, communication, computation, and sensing must be jointly designed.

preprint2020arXiv

A Difficulty in Controlling Blockchain Mining Costs via Cryptopuzzle Difficulty

Blockchain systems often employ proof-of-work consensus protocols to validate and add transactions into hashchains. These protocols stimulate competition among miners in solving cryptopuzzles (e.g. SHA-256 hash computation in Bitcoin) in exchange for a monetary reward. Here, we model mining as an all-pay auction, where miners' computational efforts are interpreted as bids, and the allocation function is the probability of solving the cryptopuzzle in a single attempt with unit (normalized) computational capability. Such an allocation function captures how blockchain systems control the difficulty of the cryptopuzzle as a function of miners' computational abilities (bids). In an attempt to reduce mining costs, we investigate designing a mining auction mechanism which induces a logit equilibrium amongst the miners with choice distributions that are unilaterally decreasing with costs at each miner. We show it is impossible to design a lenient allocation function that does this. Specifically, we show that there exists no allocation function that discourages miners to bid higher costs at logit equilibrium, if the rate of change of difficulty with respect to each miner's cost is bounded by the inverse of the sum of costs at all the miners.

preprint2020arXiv

A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, \texttt{GLR-CUCB}, which incorporates an efficient combinatorial semi-bandit algorithm, \texttt{CUCB}, with an almost parameter-free change-point detector, the \emph{Generalized Likelihood Ratio Test} (GLRT). Our analysis shows that the regret of \texttt{GLR-CUCB} is upper bounded by $\mathcal{O}(\sqrt{NKT\log{T}})$, where $N$ is the number of piecewise-stationary segments, $K$ is the number of base arms, and $T$ is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of $Ω(\sqrt{NKT}$), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is $Ω(\sqrt{T})$. Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of \texttt{GLR-CUCB} compared to other state-of-the-art algorithms.

preprint2020arXiv

Classes of Full-Duplex Channels with Capacity Achieved Without Adaptation

Full-duplex communication allows a terminal to transmit and receive signals simultaneously, and hence, it is helpful in general to adapt transmissions to received signals. However, this often requires unaffordable complexity. This work focuses on simple non-adaptive transmission, and provides two classes of channels for which Shannon's information capacity regions are achieved without adaptation. The first is the injective semi-deterministic two-way channel that includes additive channels with various types of noises modeling wireless, coaxial cable, and other settings. The other is the Poisson two-way channel, for which we show that non-adaptive transmission is asymptotically optimal in the high dark current regime.

preprint2020arXiv

Energy-Reliability Limits in Nanoscale Feedforward Neural Networks and Formulas

Due to energy-efficiency requirements, computational systems are now being implemented using noisy nanoscale semiconductor devices whose reliability depends on energy consumed. We study circuit-level energy-reliability limits for deep feedforward neural networks (multilayer perceptrons) built using such devices, and en route also establish the same limits for formulas (boolean tree-structured circuits). To obtain energy lower bounds, we extend Pippenger's mutual information propagation technique for characterizing the complexity of noisy circuits, since small circuit complexity need not imply low energy. Many device technologies require all gates to have the same electrical operating point; in circuits of such uniform gates, we show that the minimum energy required to achieve any non-trivial reliability scales superlinearly with the number of inputs. Circuits implemented in emerging device technologies like spin electronics can, however, have gates operate at different electrical points; in circuits of such heterogeneous gates, we show energy scaling can be linear in the number of inputs. Building on our extended mutual information propagation technique and using crucial insights from convex optimization theory, we develop an algorithm to compute energy lower bounds for any given boolean tree under heterogeneous gates. This algorithm runs in linear time in number of gates, and is therefore practical for modern circuit design. As part of our development we find a simple procedure for energy allocation across circuit gates with different operating points and neural networks with differently-operating layers.

preprint2020arXiv

Finite-Sample Analysis of Image Registration

We study the problem of image registration in the finite-resolution regime and characterize the error probability of algorithms as a function of properties of the transformation and the image capture noise. Specifically, we define a channel-aware Feinstein decoder to obtain upper bounds on the minimum achievable error probability under finite resolution. We specifically focus on the higher-order terms and use Berry-Esseen type CLTs to obtain a stronger characterization of the achievability condition for the problem. Then, we derive a strong type-counting result to characterize the performance of the MMI decoder in terms of the maximum likelihood decoder, in a simplified setting of the problem. We then describe how this analysis, when related to the results from the channel-aware context provide stronger characterization of the finite-sample performance of universal image registration.

preprint2020arXiv

Human Evaluation of Interpretability: The Case of AI-Generated Music Knowledge

Interpretability of machine learning models has gained more and more attention among researchers in the artificial intelligence (AI) and human-computer interaction (HCI) communities. Most existing work focuses on decision making, whereas we consider knowledge discovery. In particular, we focus on evaluating AI-discovered knowledge/rules in the arts and humanities. From a specific scenario, we present an experimental procedure to collect and assess human-generated verbal interpretations of AI-generated music theory/rules rendered as sophisticated symbolic/numeric objects. Our goal is to reveal both the possibilities and the challenges in such a process of decoding expressive messages from AI sources. We treat this as a first step towards 1) better design of AI representations that are human interpretable and 2) a general methodology to evaluate interpretability of AI-discovered knowledge representations.

preprint2020arXiv

Limits of Detecting Text Generated by Large-Scale Language Models

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

preprint2020arXiv

Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user. However, the stationary CB model may be too simple to apply to real-world problems, where user preferences may change over time. Considering piecewise-stationary environments, two efficient algorithms, \texttt{GLRT-CascadeUCB} and \texttt{GLRT-CascadeKL-UCB}, are developed and shown to ensure regret upper bounds on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, and $T$ is the number of time slots. At the crux of the proposed algorithms is an almost parameter-free change-point detector, the generalized likelihood ratio test (GLRT). Comparing with existing works, the GLRT-based algorithms: i) are free of change-point-dependent information for choosing parameters; ii) have fewer tuning parameters; iii) improve at least the $L$ dependence in regret upper bounds. In addition, we show that the proposed algorithms are optimal (up to a logarithm factor) in terms of regret by deriving a minimax lower bound on the order of $Ω(\sqrt{NLT})$ for piecewise-stationary CB. The efficiency of the proposed algorithms relative to state-of-the-art approaches is validated through numerical experiments on both synthetic and real-world datasets.

preprint2020arXiv

On Multiple-Access in Queue-Length Sensitive Systems

We consider transmission of packets over queue-length sensitive unreliable links, where packets are randomly corrupted through a noisy channel whose transition probabilities are modulated by the queue-length. The goal is to characterize the capacity of this channel. We particularly consider multiple-access systems, where transmitters dispatch encoded symbols over a system that is a superposition of continuous-time $GI_k/GI/1$ queues. A server receives and processes symbols in order of arrivals with queue-length dependent noise. We first determine the capacity of single-user queue-length dependent channels. Further, we characterize the best and worst dispatch processes for $GI/M/1$ queues and the best and worst service processes for $M/GI/1$ queues. Then, the multiple-access channel capacity is obtained using point processes. When the number of transmitters is large and each arrival process is sparse, the superposition of arrivals approaches a Poisson point process. In characterizing the Poisson approximation, we show that the capacity of the multiple-access system converges to that of a single-user $M/GI/1$ queue-length dependent system, and an upper bound on the convergence rate is obtained. This implies that the best and worst server behaviors of single-user $M/GI/1$ queues are preserved in the sparse multiple-access case.

preprint2020arXiv

Orbit Computation for Atomically Generated Subgroups of Isometries of $\mathbb{Z}^n$

Isometries are ubiquitous in nature; isometries of discrete (quantized) objects---abstracted as the group of isometries of $\mathbb{Z}^n$ denoted by $\mathsf{ISO}(\mathbb{Z}^n)$---are important concepts in the computational world. In this paper, we compute various isometric invariances which mathematically are orbit-computation problems under various isometry-subgroup actions $H \curvearrowright \mathbb{Z}^n, H \leq \mathsf{ISO}(\mathbb{Z}^n)$. One computational challenge here is about the \emph{infinite}: in general, we can have an infinite subgroup acting on $\mathbb{Z}^n$, resulting in possibly an infinite number of orbits of possibly infinite size. In practice, we restrict the set of orbits (a partition of $\mathbb{Z}^n$) to a finite subset $Z \subseteq \mathbb{Z}^n$ (a partition of $Z$), where $Z$ is specified a priori by an application domain or a data set. Our main contribution is an efficient algorithm to solve this \emph{restricted} orbit-computation problem in the special case of \emph{atomically generated subgroups}---a new notion partially motivated from interpretable AI. The atomic property is key to preserving the \emph{semidirect-product structure}---the core structure we leverage to make our algorithm outperform generic approaches. Besides algorithmic merit, our approach enables \emph{parallel-computing} implementations in many subroutines, which can further benefit from hardware boosts. Moreover, our algorithm works efficiently for \emph{any} finite subset ($Z$) regardless of the shape (continuous/discrete, (non)convex) or location; so it is application-independent.

preprint2020arXiv

Respect for Human Autonomy in Recommender Systems

Recommender systems can influence human behavior in significant ways, in some cases making people more machine-like. In this sense, recommender systems may be deleterious to notions of human autonomy. Many ethical systems point to respect for human autonomy as a key principle arising from human rights considerations, and several emerging frameworks for AI include this principle. Yet, no specific formalization has been defined. Separately, self-determination theory shows that autonomy is an innate psychological need for people, and moreover has a significant body of experimental work that formalizes and measures level of human autonomy. In this position paper, we argue that there is a need to specifically operationalize respect for human autonomy in the context of recommender systems. Moreover, that such an operational definition can be developed based on well-established approaches from experimental psychology, which can then be used to design future recommender systems that respect human autonomy.

preprint2020arXiv

Universal and Succinct Source Coding of Deep Neural Networks

Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling of deep learning as a service (DLaaS) and use for on-device intelligence. This paper is concerned with finding universal lossless compressed representations of deep feedforward networks with synaptic weights drawn from discrete sets, and directly performing inference without full decompression. The basic insight that allows less rate than naive approaches is recognizing that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation. We provide efficient algorithms to dissipate this irrelevant uncertainty and then use arithmetic coding to nearly achieve the entropy bound in a universal manner. We also provide experimental results of our approach on several standard datasets.