Source author record

Hanie Sedghi

Hanie Sedghi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Neural and Evolutionary Computing Artificial Intelligence Computer Vision math.ST Statistics Theory math.OC Social and Information Networks Systems and Control

Catalog footprint

What is connected

16works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Enhancing LLM Planning Capabilities through Intrinsic Self-Critique

We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchmarks. Despite the findings of earlier research that has cast doubt on the effectiveness of LLMs leveraging self critique methods, we show significant performance gains on planning datasets in the Blocksworld domain through intrinsic self-critique, without external source such as a verifier. We also demonstrate similar improvements on Logistics and Mini-grid datasets, exceeding strong baseline accuracies. We employ a few-shot learning technique and progressively extend it to a many-shot approach as our base method and demonstrate that it is possible to gain substantial improvement on top of this already competitive approach by employing an iterative process for correction and refinement. We illustrate how self-critique can significantly boost planning performance. Our empirical results present new state-of-the-art on the class of models considered, namely LLM model checkpoints from October 2024. Our primary focus lies on the method itself, demonstrating intrinsic self-improvement capabilities that are applicable regardless of the specific model version, and we believe that applying our method to more complex search techniques and more capable models will lead to even better performance.

preprint2022arXiv

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implications for lottery ticket hypothesis, distributed training, and ensemble methods.

preprint2022arXiv

Understanding the effect of sparsity on neural networks robustness

This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples. We show that, up to a certain sparsity achieved by increasing network width and depth while keeping the network capacity fixed, sparsified networks consistently match and often outperform their initially dense versions. Robustness and accuracy decline simultaneously for very high sparsity due to loose connectivity between network layers. Our findings show that a rapid robustness drop caused by network compression observed in the literature is due to a reduced network capacity rather than sparsity.

preprint2021arXiv

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If the gap (2) is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings, in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is "because" they optimize faster on the population loss in the Ideal World. This suggests our framework is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area.

preprint2021arXiv

What is being transferred in transfer learning?

One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce. Despite ample adaptation of transfer learning in various deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analyses to address these fundamental questions. Through a series of analyses on transferring to block-shuffled images, we separate the effect of feature reuse from learning low-level statistics of data and show that some benefit of transfer learning comes from the latter. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.

preprint2020arXiv

Generalization bounds for deep convolutional neural networks

We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments using CIFAR-10 with varying hyperparameters of a deep convolutional network, comparing our bounds with practical generalization gaps.

preprint2020arXiv

The intriguing role of module criticality in the generalization of deep networks

We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.

preprint2016arXiv

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.

preprint2016arXiv

Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to employ certain feature transformations of the input, which depend on the input generative model. Specifically, we employ score function tensors of the input and compute their cross-correlation with the response variable. We establish that the decomposition of this tensor consistently recovers the parameters, under mild non-degeneracy conditions. We demonstrate that the computational and sample complexity of our method is a low order polynomial of the input and the latent dimensions.

preprint2016arXiv

Training Input-Output Recurrent Neural Networks through Spectral Methods

We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computational complexity under transparent conditions such as non-degeneracy of model parameters, polynomial activations for the neurons, and a Markovian evolution of the input sequence. We also extend our results to Bidirectional RNN which uses both previous and future information to output the label at each time point, and is employed in many NLP tasks such as POS tagging.

preprint2015arXiv

Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods

Community detection in graphs has been extensively studied both in theory and in applications. However, detecting communities in hypergraphs is more challenging. In this paper, we propose a tensor decomposition approach for guaranteed learning of communities in a special class of hypergraphs modeling social tagging systems or folksonomies. A folksonomy is a tripartite 3-uniform hypergraph consisting of (user, tag, resource) hyperedges. We posit a probabilistic mixed membership community model, and prove that the tensor method consistently learns the communities under efficient sample complexity and separation requirements.

preprint2015arXiv

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

We propose an efficient ADMM method with guarantees for high-dimensional problems. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem. For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number of steps. This matches with the minimax lower bounds for sparse estimation. For matrix decomposition into sparse and low rank components, we provide the first guarantees for any online method, and prove a convergence rate of $\tilde{\mathcal{O}}((s+r)β^2(p) /T) + \mathcal{O}(1/p)$ for a $p\times p$ matrix, where $s$ is the sparsity level, $r$ is the rank and $Θ(\sqrt{p})\leq β(p)\leq Θ(p)$. Our guarantees match the minimax lower bound with respect to $s,r$ and $T$. In addition, we match the minimax lower bound with respect to the matrix dimension $p$, i.e. $β(p)=Θ(\sqrt{p})$, for many important statistical models including the independent noise model, the linear Bayesian network and the latent Gaussian graphical model under some conditions. Our ADMM method is based on epoch-based annealing and consists of inexpensive steps which involve projections on to simple norm balls. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods. In particular, we reach higher accuracy with same time complexity.

preprint2015arXiv

Provable Methods for Training Neural Networks with Sparse Connectivity

We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity. We leverage on the techniques developed previously for learning linear networks and show that they can also be effectively adopted to learn non-linear networks. We operate on the moments involving label and the score function of the input, and show that their factorization provably yields the weight matrix of the first layer of a deep network under mild conditions. In practice, the output of our method can be employed as effective initializers for gradient descent.

preprint2015arXiv

Score Function Features for Discriminative Learning

Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.

preprint2014arXiv

Score Function Features for Discriminative Learning: Matrix and Tensor Framework

preprint2014arXiv

Statistical Structure Learning, Towards a Robust Smart Grid

Robust control and maintenance of the grid relies on accurate data. Both PMUs and state estimators are prone to false data injection attacks. Thus, it is crucial to have a mechanism for fast and accurate detection of an agent maliciously tampering with the data---for both preventing attacks that may lead to blackouts, and for routine monitoring and control tasks of current and future grids. We propose a decentralized false data injection detection scheme based on Markov graph of the bus phase angles. We utilize the Conditional Covariance Test (CCT) to learn the structure of the grid. Using the DC power flow model, we show that under normal circumstances, and because of walk-summability of the grid graph, the Markov graph of the voltage angles can be determined by the power grid graph. Therefore, a discrepancy between calculated Markov graph and learned structure should trigger the alarm. Local grid topology is available online from the protection system and we exploit it to check for mismatch. Should a mismatch be detected, we use correlation anomaly score to detect the set of attacked nodes. Our method can detect the most recent stealthy deception attack on the power grid that assumes knowledge of bus-branch model of the system and is capable of deceiving the state estimator, damaging power network observatory, control, monitoring, demand response and pricing schemes. Specifically, under the stealthy deception attack, the Markov graph of phase angles changes. In addition to detect a state of attack, our method can detect the set of attacked nodes. To the best of our knowledge, our remedy is the first to comprehensively detect this sophisticated attack and it does not need additional hardware. Moreover, our detection scheme is successful no matter the size of the attacked subset. Simulation of various power networks confirms our claims.

Hanie Sedghi

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Enhancing LLM Planning Capabilities through Intrinsic Self-Critique

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

Understanding the effect of sparsity on neural networks robustness

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

What is being transferred in transfer learning?

Generalization bounds for deep convolutional neural networks

The intriguing role of module criticality in the generalization of deep networks

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

Training Input-Output Recurrent Neural Networks through Spectral Methods

Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

Provable Methods for Training Neural Networks with Sparse Connectivity

Score Function Features for Discriminative Learning

Score Function Features for Discriminative Learning: Matrix and Tensor Framework

Statistical Structure Learning, Towards a Robust Smart Grid