Source author record

Johan Karlsson

Johan Karlsson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Information Theory math.IT Machine Learning math.FA Systems and Control Computer Vision Distributed, Parallel, and Cluster Computing eess.SP eess.SY math.ST Statistics Theory

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization

In this paper we propose an adaptive approach for clustering and visualization of data by an orthogonalization process. Starting with the data points being represented by a Markov process using the diffusion map framework, the method adaptively increase the orthogonality of the clusters by applying a feedback mechanism inspired by the Gromov-Wasserstein distance. This mechanism iteratively increases the spectral gap and refines the orthogonality of the data to achieve a clustering with high specificity. By using the diffusion map framework and representing the relation between data points using transition probabilities, the method is robust with respect to both the underlying distance, noise in the data and random initialization. We prove that the method converges globally to a unique fixpoint for certain parameter values. We also propose a related approach where the transition probabilities in the Markov process are required to be doubly stochastic, in which case the method generates a minimizer to a nonconvex optimization problem. We apply the method on cryo-electron microscopy image data from biopharmaceutical manufacturing where we can confirm biologically relevant insights related to therapeutic efficacy. We consider an example with morphological variations of gene packaging and confirm that the method produces biologically meaningful clustering results consistent with human expert classification.

preprint2022arXiv

VidHarm: A Clip Based Dataset for Harmful Content Detection

Automatically identifying harmful content in video is an important task with a wide range of applications. However, there is a lack of professionally labeled open datasets available. In this work VidHarm, an open dataset of 3589 video clips from film trailers annotated by professionals, is presented. An analysis of the dataset is performed, revealing among other things the relation between clip and trailer level annotations. Audiovisual models are trained on the dataset and an in-depth study of modeling choices conducted. The results show that performance is greatly improved by combining the visual and audio modality, pre-training on large-scale video recognition datasets, and class balanced sampling. Lastly, biases of the trained models are investigated using discrimination probing. VidHarm is openly available, and further details are available at: https://vidharm.github.io

preprint2020arXiv

Incremental inference of collective graphical models

We consider incremental inference problems from aggregate data for collective dynamics. In particular, we address the problem of estimating the aggregate marginals of a Markov chain from noisy aggregate observations in an incremental (online) fashion. We propose a sliding window Sinkhorn belief propagation (SW-SBP) algorithm that utilizes a sliding window filter of the most recent noisy aggregate observations along with encoded information from discarded observations. Our algorithm is built upon the recently proposed multi-marginal optimal transport based SBP algorithm that leverages standard belief propagation and Sinkhorn algorithm to solve inference problems from aggregate data. We demonstrate the performance of our algorithm on applications such as inferring population flow from aggregate observations.

preprint2020arXiv

M$^2$-Spectral Estimation: A Relative Entropy Approach

This paper deals with M$^2$-signals, namely multivariate (or vector-valued) signals defined over a multidimensional domain. In particular, we propose an optimization technique to solve the covariance extension problem for stationary random vector fields. The multidimensional Itakura-Saito distance is employed as an optimization criterion to select the solution among the spectra satisfying a finite number of moment constraints. In order to avoid technicalities that may happen on the boundary of the feasible set, we deal with the discrete version of the problem where the multidimensional integrals are approximated by Riemann sums. The spectrum solution is also discrete, which occurs naturally when the underlying random field is periodic. We show that a solution to the discrete problem exists, is unique and depends smoothly on the problem data. Therefore, we have a well-posed problem whose solution can be tuned in a smooth manner. Finally, we have applied our theory to the target parameter estimation problem in an integrated system of automotive modules. Simulation results show that our spectral estimator has promising performance.

preprint2020arXiv

Modeling collective behaviors: A moment-based approach

In this work we introduce an approach for modeling and analyzing collective behavior of a group of agents using moments. We represent the group of agents via their distribution and derive a method to estimate the dynamics of the moments. We use this to predict the evolution of the distribution of agents by first computing the moment trajectories and then use this to reconstruct the distribution of the agents. In the latter an inverse problem is solved in order to reconstruct a nominal distribution and to recover the macro-scale properties of the group of agents. The proposed method is applicable for several types of multi-agent systems, e.g., leader-follower systems. We derive error bounds for the moment trajectories and describe how to take these error bounds into account for computing the moment dynamics. The convergence of the moment dynamics is also analyzed for cases with monomial moments. To illustrate the theory, two numerical examples are given. In the first we consider a multi-agent system with interactions and compare the proposed methods for several types of moments. In the second example we apply the framework to a leader-follower problem for modeling pedestrian crowd dynamics.

preprint2020arXiv

Multi-marginal optimal transport and probabilistic graphical models

We study multi-marginal optimal transport problems from a probabilistic graphical model perspective. We point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized multi-marginal optimal transport is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for multi-marginal optimal transport by leveraging the well-developed algorithms in Bayesian inference. Several numerical examples are provided to highlight the results.

preprint2020arXiv

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechanisms for executing complex workflows over large datasets. A common challenge for the majority of currently available streaming frameworks is efficient utilization of resources. Most frameworks use static or semi-static settings for resource utilization that work well for established use cases but lead to marginal improvements for unseen scenarios. Another pressing issue is the efficient processing of large individual objects such as images and matrices typical for scientific datasets. HarmonicIO has proven to be a good solution for streams of relatively large individual objects, as demonstrated in a benchmark comparison with the Spark and Kafka streaming frameworks. We here present an extension of the HarmonicIO framework based on the online bin-packing algorithm, to allow for efficient utilization of resources. Based on a real world use case from large-scale microscopy pipelines, we compare results of the new system to Spark's auto-scaling mechanism.

preprint2018arXiv

Data-driven nonsmooth optimization

In this work, we consider methods for solving large-scale optimization problems with a possibly nonsmooth objective function. The key idea is to first specify a class of optimization algorithms using a generic iterative scheme involving only linear operations and applications of proximal operators. This scheme contains many modern primal-dual first-order solvers like the Douglas-Rachford and hybrid gradient methods as special cases. Moreover, we show convergence to an optimal point for a new method which also belongs to this class. Next, we interpret the generic scheme as a neural network and use unsupervised training to learn the best set of parameters for a specific class of objective functions while imposing a fixed number of iterations. In contrast to other approaches of "learning to optimize", we present an approach which learns parameters only in the set of convergent schemes. As use cases, we consider optimization problems arising in tomographic reconstruction and image deconvolution, and in particular a family of total variation regularization problems.

preprint2015arXiv

Robust Optimal Power Distribution for Hyperthermia Cancer Treatment

We consider an optimization problem for spatial power distribution generated by an array of transmitting elements. Using ultrasound hyperthermia cancer treatment as a motivating example, the signal design problem consists of optimizing the power distribution across the tumor and healthy tissue regions, respectively. The models used in the optimization problem are, however, invariably subject to errors. deposition as well as inefficient treatment. To combat such unknown model errors, we formulate a robust signal design framework that can take the uncertainty into account using a worst-case approach. This leads to a semi-infinite programming (SIP) robust design problem which we reformulate as a tractable convex problem, potentially has a wider range of applications.

preprint2015arXiv

The Multidimensional Moment Problem with Complexity Constraint

A long series of previous papers have been devoted to the (one-dimensional) moment problem with nonnegative rational measure. The rationality assumption is a complexity constraint motivated by applications where a parameterization of the solution set in terms of a bounded finite number of parameters is required. In this paper we provide a complete solution of the multidimensional moment problem with a complexity constraint also allowing for solutions that require a singular measure added to the rational, absolutely continuous one. Such solutions occur on the boundary of a certain convex cone of solutions. In this paper we provide complete parameterizations of all such solutions. We also provide errata for a previous paper in this journal coauthored by one of the authors of the present paper.

preprint2015arXiv

The role of the time-arrow in mean-square estimation of stochastic processes

The purpose of this paper is to explain a certain dichotomy between the information that the past and future values of a multivariate stochastic process carry about the present. More specifically, vector-valued, second-order stochastic processes may be deterministic in one time-direction and not the other. This phenomenon, which is absent in scalar-valued processes, is deeply rooted in the geometry of the shift-operator. The exposition and the examples we discuss are based on the work of Douglas, Shapiro and Shields on cyclic vectors of the backward shift and relate to classical ideas going back to Wiener and Kolmogorov. We focus on rank-one stochastic processes for which we present a characterization of all regular processes that are deterministic in the reverse time-direction. The paper builds on examples and the goal is to provide pertinent insights to a control engineering audience.

preprint2015arXiv

Time Localization and Capacity of Faster-Than-Nyquist Signaling

In this paper, we consider communication over the bandwidth limited analog white Gaussian noise channel using non-orthogonal pulses. In particular, we consider non-orthogonal transmission by signaling samples at a rate higher than the Nyquist rate. Using the faster-than-Nyquist (FTN) framework, Mazo showed that one may transmit symbols carried by sinc pulses at a higher rate than that dictated by Nyquist without loosing bit error rate. However, as we will show in this paper, such pulses are not necessarily well localized in time. In fact, assuming that signals in the FTN framework are well localized in time, one can construct a signaling scheme that violates the Shannon capacity bound. We also show directly that FTN signals are in general not well localized in time. Therefore, the results of Mazo do not imply that one can transmit more data per time unit without degrading performance in terms of error probability. We also consider FTN signaling in the case of pulses that are different from the sinc pulses. We show that one can use a precoding scheme of low complexity to remove the inter-symbol interference. This leads to the possibility of increasing the number of transmitted samples per time unit and compensate for spectral inefficiency due to signaling at the Nyquist rate of the non sinc pulses. We demonstrate the power of the precoding scheme by simulations.

preprint2012arXiv

Uncertainty Bounds for Spectral Estimation

The purpose of this paper is to study metrics suitable for assessing uncertainty of power spectra when these are based on finite second-order statistics. The family of power spectra which is consistent with a given range of values for the estimated statistics represents the uncertainty set about the "true" power spectrum. Our aim is to quantify the size of this uncertainty set using suitable notions of distance, and in particular, to compute the diameter of the set since this represents an upper bound on the distance between any choice of a nominal element in the set and the "true" power spectrum. Since the uncertainty set may contain power spectra with lines and discontinuities, it is natural to quantify distances in the weak topology---the topology defined by continuity of moments. We provide examples of such weakly-continuous metrics and focus on particular metrics for which we can explicitly quantify spectral uncertainty. We then consider certain high resolution techniques which utilize filter-banks for pre-processing, and compute worst-case a priori uncertainty bounds solely on the basis of the filter dynamics. This allows the a priori tuning of the filter-banks for improved resolution over selected frequency bands.

Johan Karlsson

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization

VidHarm: A Clip Based Dataset for Harmful Content Detection

Incremental inference of collective graphical models

M$^2$-Spectral Estimation: A Relative Entropy Approach

Modeling collective behaviors: A moment-based approach

Multi-marginal optimal transport and probabilistic graphical models

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data-driven nonsmooth optimization

Robust Optimal Power Distribution for Hyperthermia Cancer Treatment

The Multidimensional Moment Problem with Complexity Constraint

The role of the time-arrow in mean-square estimation of stochastic processes

Time Localization and Capacity of Faster-Than-Nyquist Signaling

Uncertainty Bounds for Spectral Estimation