Researcher profile

Benyamin Ghojogh

Benyamin Ghojogh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

Foundations of Riemannian Geometry for Riemannian Optimization: A Monograph with Detailed Derivations

Riemannian geometry provides the fundamental framework for optimization on nonlinear spaces such as matrix manifolds, which arise in machine learning, signal processing, and robotics. While the underlying theory is classical, existing literature often presents results at a high level of abstraction, omitting the detailed coordinate-level derivations required for implementation and algorithm development. This work provides a self-contained and rigorous treatment of the foundations of Riemannian geometry, with a focus on explicit derivations tailored to Riemannian optimization. We systematically develop the key geometric structures -- including tangent and cotangent spaces, tensor calculus, metric tensors, Levi-Civita connections, curvature, and geodesics -- emphasizing step-by-step derivations in coordinates and matrix form. Building on these foundations, we derive the Riemannian gradient, Hessian, exponential map, and retraction in a form suitable for numerical computation. We further specialize these constructions to important matrix manifolds, including the Stiefel, Grassmann, and SPD (Symmetric Positive Definite) manifolds, providing explicit formulas widely used in optimization and geometric machine learning. This monograph develops a unified and implementation-oriented treatment of Riemannian geometry for optimization on manifolds. Its main contribution is the systematic organization and detailed derivation of classical geometric constructions in forms directly usable for algorithm design and numerical implementation. By connecting coordinate-level differential geometry with matrix-manifold formulas, the monograph bridges the gap between abstract theory and practical computation, and provides a reference for researchers and practitioners working in Riemannian optimization and related fields.

preprint2026arXiv

Wittgenstein's Family Resemblance Clustering Algorithm

This paper, introducing a novel method in philomatics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. According to Wittgenstein's Philosophical Investigations (1953), family resemblance holds that members of a concept or category are connected by overlapping similarities rather than a single defining property. Consequently, a family of entities forms a chain of items sharing overlapping traits. This philosophical idea naturally lends itself to a graph-based approach in machine learning. Accordingly, we propose the Wittgenstein's Family Resemblance (WFR) clustering algorithm and its kernel variant, kernel WFR. This algorithm computes resemblance scores between neighboring data instances, and after thresholding these scores, a resemblance graph is constructed. The connected components of this graph define the resulting clusters. Simulations on benchmark datasets demonstrate that WFR is an effective nonlinear clustering algorithm that does not require prior knowledge of the number of clusters or assumptions about their shapes.

preprint2022arXiv

Affective Manifolds: Modeling Machine's Mind to Like, Dislike, Enjoy, Suffer, Worry, Fear, and Feel Like A Human

After the development of different machine learning and manifold learning algorithms, it may be a good time to put them together to make a powerful mind for machine. In this work, we propose affective manifolds as components of a machine's mind. Every affective manifold models a characteristic group of mind and contains multiple states. We define the machine's mind as a set of affective manifolds. We use a learning model for mapping the input signals to the embedding space of affective manifold. Using this mapping, a machine or a robot takes an input signal and can react emotionally to it. We use deep metric learning, with Siamese network, and propose a loss function for affective manifold learning. We define margins between states based on the psychological and philosophical studies. Using triplets of instances, we train the network to minimize the variance of every state and have the desired distances between states. We show that affective manifolds can have various applications for machine-machine and human-machine interactions. Some simulations are also provided for verification of the proposed method. It is possible to have as many affective manifolds as required in machine's mind. More affective manifolds in the machine's mind can make it more realistic and effective. This paper opens the door; we invite the researchers from various fields of science to propose more affective manifolds to be inserted in machine's mind.

preprint2022arXiv

Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey

This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They assume that every data point is generated from or caused by a low-dimensional latent factor. By learning the parameters of distribution of latent space, the corresponding low-dimensional factors are found for the sake of dimensionality reduction. For their stochastic and generative behaviour, these models can also be used for generation of new data points in the data space. In this paper, we first start with variational inference where we derive the Evidence Lower Bound (ELBO) and Expectation Maximization (EM) for learning the parameters. Then, we introduce factor analysis, derive its joint and marginal distributions, and work out its EM steps. Probabilistic PCA is then explained, as a special case of factor analysis, and its closed-form solutions are derived. Finally, VAE is explained where the encoder, decoder and sampling from the latent space are introduced. Training VAE using both EM and backpropagation are explained.

preprint2022arXiv

Fisher and Kernel Fisher Discriminant Analysis: Tutorial

This is a detailed tutorial paper which explains the Fisher discriminant Analysis (FDA) and kernel FDA. We start with projection and reconstruction. Then, one- and multi-dimensional FDA subspaces are covered. Scatters in two- and then multi-classes are explained in FDA. Then, we discuss on the rank of the scatters and the dimensionality of the subspace. A real-life example is also provided for interpreting FDA. Then, possible singularity of the scatter is discussed to introduce robust FDA. PCA and FDA directions are also compared. We also prove that FDA and linear discriminant analysis are equivalent. Fisher forest is also introduced as an ensemble of fisher subspaces useful for handling data with different features and dimensionality. Afterwards, kernel FDA is explained for both one- and multi-dimensional subspaces with both two- and multi-classes. Finally, some simulations are performed on AT&T face dataset to illustrate FDA and compare it with PCA.

preprint2022arXiv

Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey

This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplacian eigenmap and its out-of-sample extension are explained. Thereafter, we introduce the locality preserving projection and its kernel variant as linear special cases of Laplacian eigenmap. Versions of graph embedding are then explained which are generalized versions of Laplacian eigenmap and locality preserving projection. Finally, diffusion map is introduced which is a method based on Laplacian of data and random walks on the data graph.

preprint2022arXiv

Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches

We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches to a given anchor, both in online and offline mining. While many works focus solely on selecting the triplets online (batch-wise), we also study the effect of extreme distances and neighbor patches before training in an offline fashion. We analyze extreme cases' impacts in terms of embedding distance for offline versus online mining, including easy positive, batch semi-hard, batch hard triplet mining, neighborhood component analysis loss, its proxy version, and distance weighted sampling. We also investigate online approaches based on extreme distance and comprehensively compare offline, and online mining performance based on the data patterns and explain offline mining as a tractable generalization of the online mining with large mini-batch size. As well, we discuss the relations of different colorectal tissue types in terms of extreme distances. We found that offline and online mining approaches have comparable performances for a specific architecture, such as ResNet-18 in this study. Moreover, we found the assorted case, including different extreme distances, is promising, especially in the online approach.

preprint2022arXiv

On Manifold Hypothesis: Hypersurface Submanifold Embedding Using Osculating Hyperspheres

Consider a set of $n$ data points in the Euclidean space $\mathbb{R}^d$. This set is called dataset in machine learning and data science. Manifold hypothesis states that the dataset lies on a low-dimensional submanifold with high probability. All dimensionality reduction and manifold learning methods have the assumption of manifold hypothesis. In this paper, we show that the dataset lies on an embedded hypersurface submanifold which is locally $(d-1)$-dimensional. Hence, we show that the manifold hypothesis holds at least for the embedding dimensionality $d-1$. Using an induction in a pyramid structure, we also extend the embedding dimensionality to lower embedding dimensionalities to show the validity of manifold hypothesis for embedding dimensionalities $\{1, 2, \dots, d-1\}$. For embedding the hypersurface, we first construct the $d$ nearest neighbors graph for data. For every point, we fit an osculating hypersphere $S^{d-1}$ using its neighbors where this hypersphere is osculating to a hypothetical hypersurface. Then, using surgery theory, we apply surgery on the osculating hyperspheres to obtain $n$ hyper-caps. We connect the hyper-caps to one another using partial hyper-cylinders. By connecting all parts, the embedded hypersurface is obtained as the disjoint union of these elements. We discuss the geometrical characteristics of the embedded hypersurface, such as having boundary, its topology, smoothness, boundedness, orientability, compactness, and injectivity. Some discussion are also provided for the linearity and structure of data. This paper is the intersection of several fields of science including machine learning, differential geometry, and algebraic topology.

preprint2022arXiv

Restricted Boltzmann Machine and Deep Belief Network: Tutorial and Survey

This is a tutorial and survey paper on Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), and Deep Belief Network (DBN). We start with the required background on probabilistic graphical models, Markov random field, Gibbs sampling, statistical physics, Ising model, and the Hopfield network. Then, we introduce the structures of BM and RBM. The conditional distributions of visible and hidden variables, Gibbs sampling in RBM for generating variables, training BM and RBM by maximum likelihood estimation, and contrastive divergence are explained. Then, we discuss different possible discrete and continuous distributions for the variables. We introduce conditional RBM and how it is trained. Finally, we explain deep belief network as a stack of RBM models. This paper on Boltzmann machines can be useful in various fields including data science, statistics, neural computation, and statistical physics.

preprint2022arXiv

Spectral, Probabilistic, and Deep Metric Learning: Tutorial and Survey

This is a tutorial and survey paper on metric learning. Algorithms are divided into spectral, probabilistic, and deep metric learning. We first start with the definition of distance metric, Mahalanobis distance, and generalized Mahalanobis distance. In spectral methods, we start with methods using scatters of data, including the first spectral metric learning, relevant methods to Fisher discriminant analysis, Relevant Component Analysis (RCA), Discriminant Component Analysis (DCA), and the Fisher-HSIC method. Then, large-margin metric learning, imbalanced metric learning, locally linear metric adaptation, and adversarial metric learning are covered. We also explain several kernel spectral methods for metric learning in the feature space. We also introduce geometric metric learning methods on the Riemannian manifolds. In probabilistic methods, we start with collapsing classes in both input and feature spaces and then explain the neighborhood component analysis methods, Bayesian metric learning, information theoretic methods, and empirical risk minimization in metric learning. In deep learning methods, we first introduce reconstruction autoencoders and supervised loss functions for metric learning. Then, Siamese networks and its various loss functions, triplet mining, and triplet sampling are explained. Deep discriminant analysis methods, based on Fisher discriminant analysis, are also reviewed. Finally, we introduce multi-modal deep metric learning, geometric metric learning by neural networks, and few-shot metric learning.

preprint2022arXiv

Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey

Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods.

preprint2022arXiv

Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we look at the linear reconstruction step from a stochastic perspective where it is assumed that every data point is conditioned on its linear reconstruction weights as latent factors. The stochastic linear reconstruction of LLE is solved using expectation maximization. We show that there is a theoretical connection between three fundamental dimensionality reduction methods, i.e., LLE, factor analysis, and probabilistic Principal Component Analysis (PCA). The stochastic linear reconstruction of LLE is formulated similar to the factor analysis and probabilistic PCA. It is also explained why factor analysis and probabilistic PCA are linear and LLE is a nonlinear method. This work combines and makes a bridge between two broad approaches of dimensionality reduction, i.e., the spectral and probabilistic algorithms.

preprint2022arXiv

Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey

This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.

preprint2022arXiv

Unsupervised and Supervised Principal Component Analysis: Tutorial

This is a detailed tutorial paper which explains the Principal Component Analysis (PCA), Supervised PCA (SPCA), kernel PCA, and kernel SPCA. We start with projection, PCA with eigen-decomposition, PCA with one and multiple projection directions, properties of the projection matrix, reconstruction error minimization, and we connect to autoencoder. Then, PCA with singular value decomposition, dual PCA, and kernel PCA are covered. SPCA using both scoring and Hilbert-Schmidt independence criterion are explained. Kernel SPCA using both direct and dual approaches are then introduced. We cover all cases of projection and reconstruction of training and out-of-sample data. Finally, some simulations are provided on Frey and AT&T face datasets for verifying the theory in practice.

preprint2020arXiv

Anomaly Detection and Prototype Selection Using Polyhedron Curvature

We propose a novel approach to anomaly detection called Curvature Anomaly Detection (CAD) and Kernel CAD based on the idea of polyhedron curvature. Using the nearest neighbors for a point, we consider every data point as the vertex of a polyhedron where the more anomalous point has more curvature. We also propose inverse CAD (iCAD) and Kernel iCAD for instance ranking and prototype selection by looking at CAD from an opposite perspective. We define the concept of anomaly landscape and anomaly path and we demonstrate an application for it which is image denoising. The proposed methods are straightforward and easy to implement. Our experiments on different benchmarks show that the proposed methods are effective for anomaly detection and prototype selection.

preprint2020arXiv

Backprojection for Training Feedforward Neural Networks in the Input and Feature Spaces

After the tremendous development of neural networks trained by backpropagation, it is a good time to develop other algorithms for training neural networks to gain more insights into networks. In this paper, we propose a new algorithm for training feedforward neural networks which is fairly faster than backpropagation. This method is based on projection and reconstruction where, at every layer, the projected data and reconstructed labels are forced to be similar and the weights are tuned accordingly layer by layer. The proposed algorithm can be used for both input and feature spaces, named as backprojection and kernel backprojection, respectively. This algorithm gives an insight to networks with a projection-based perspective. The experiments on synthetic datasets show the effectiveness of the proposed method.

preprint2020arXiv

Multidimensional Scaling, Sammon Mapping, and Isomap: Tutorial and Survey

Multidimensional Scaling (MDS) is one of the first fundamental manifold learning methods. It can be categorized into several methods, i.e., classical MDS, kernel classical MDS, metric MDS, and non-metric MDS. Sammon mapping and Isomap can be considered as special cases of metric MDS and kernel classical MDS, respectively. In this tutorial and survey paper, we review the theory of MDS, Sammon mapping, and Isomap in detail. We explain all the mentioned categories of MDS. Then, Sammon mapping, Isomap, and kernel Isomap are explained. Out-of-sample embedding for MDS and Isomap using eigenfunctions and kernel mapping are introduced. Then, Nystrom approximation and its use in landmark MDS and landmark Isomap are introduced for big data embedding. We also provide some simulations for illustrating the embedding by these methods.

preprint2020arXiv

Roweisposes, Including Eigenposes, Supervised Eigenposes, and Fisherposes, for 3D Action Recognition

Human action recognition is one of the important fields of computer vision and machine learning. Although various methods have been proposed for 3D action recognition, some of which are basic and some use deep learning, the need of basic methods based on generalized eigenvalue problem is sensed for action recognition. This need is especially sensed because of having similar basic methods in the field of face recognition such as eigenfaces and Fisherfaces. In this paper, we propose Roweisposes which uses Roweis discriminant analysis for generalized subspace learning. This method includes Fisherposes, eigenposes, supervised eigenposes, and double supervised eigenposes as its special cases. Roweisposes is a family of infinite number of action recongition methods which learn a discriminative subspace for embedding the body poses. Experiments on the TST, UTKinect, and UCFKinect datasets verify the effectiveness of the proposed method for action recognition.

preprint2020arXiv

Supervision and Source Domain Impact on Representation Learning: A Histopathology Case Study

As many algorithms depend on a suitable representation of data, learning unique features is considered a crucial task. Although supervised techniques using deep neural networks have boosted the performance of representation learning, the need for a large set of labeled data limits the application of such methods. As an example, high-quality delineations of regions of interest in the field of pathology is a tedious and time-consuming task due to the large image dimensions. In this work, we explored the performance of a deep neural network and triplet loss in the area of representation learning. We investigated the notion of similarity and dissimilarity in pathology whole-slide images and compared different setups from unsupervised and semi-supervised to supervised learning in our experiments. Additionally, different approaches were tested, applying few-shot learning on two publicly available pathology image datasets. We achieved high accuracy and generalization when the learned representations were applied to two different pathology datasets.

preprint2020arXiv

Theoretical Insights into the Use of Structural Similarity Index In Generative Models and Inferential Autoencoders

Generative models and inferential autoencoders mostly make use of $\ell_2$ norm in their optimization objectives. In order to generate perceptually better images, this short paper theoretically discusses how to use Structural Similarity Index (SSIM) in generative models and inferential autoencoders. We first review SSIM, SSIM distance metrics, and SSIM kernel. We show that the SSIM kernel is a universal kernel and thus can be used in unconditional and conditional generated moment matching networks. Then, we explain how to use SSIM distance in variational and adversarial autoencoders and unconditional and conditional Generative Adversarial Networks (GANs). Finally, we propose to use SSIM distance rather than $\ell_2$ norm in least squares GAN.

preprint2020arXiv

Weighted Fisher Discriminant Analysis in the Input and Feature Spaces

Fisher Discriminant Analysis (FDA) is a subspace learning method which minimizes and maximizes the intra- and inter-class scatters of data, respectively. Although, in FDA, all the pairs of classes are treated the same way, some classes are closer than the others. Weighted FDA assigns weights to the pairs of classes to address this shortcoming of FDA. In this paper, we propose a cosine-weighted FDA as well as an automatically weighted FDA in which weights are found automatically. We also propose a weighted FDA in the feature space to establish a weighted kernel FDA for both existing and newly proposed weights. Our experiments on the ORL face recognition dataset show the effectiveness of the proposed weighting schemes.