Source author record

Ondřej Sokol

Ondřej Sokol appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Artificial Intelligence Data Structures and Algorithms Machine Learning math.OC

Catalog footprint

What is connected

3works

5topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The NP-hard problem of computing the maximal sample variance over interval data is solvable in almost linear time with high probability

We consider the algorithm by Ferson et al. (Reliable computing 11(3), p. 207-233, 2005) designed for solving the NP-hard problem of computing the maximal sample variance over interval data, motivated by robust statistics (in fact, the formulation can be written as a nonconvex quadratic program with a specific structure). First, we propose a new version of the algorithm improving its original time bound $O(n^2 2^ω)$ to $O(n \log n+n\cdot 2^ω)$, where $n$ is number of input data and $ω$ is the clique number in a certain intersection graph. Then we treat input data as random variables as it is usual in statistics) and introduce a natural probabilistic data generating model. We get $2^ω= O(n^{1/\log\log n})$ and $ω= O(\log n / \log\log n)$ on average. This results in average computing time $O(n^{1+ε})$ for $ε> 0$ arbitrarily small, which may be considered as "surprisingly good" average time complexity for solving an NP-hard problem. Moreover, we prove the following tail bound on the distribution of computation time: hard instances, forcing the algorithm to compute in time $2^{Ω(n)}$, occur rarely, with probability tending to zero at the rate $e^{-n\log\log n}$.

preprint2021arXiv

Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects

The method of Holý, Sokol and Černý (Applied Soft Computing, 2017, Vol. 60, p. 752-762) clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method. First, we prove that the problem of finding the optimal clustering is NP-hard. Second, to numerically find a suitable clustering, we propose to use the genetic algorithm augmented by a renumbering procedure, a fast task-specific local search heuristic and an initial solution based on a simplified model. Third, in a simulation study, we demonstrate that our improvements of the standard genetic algorithm significantly enhance its computational performance.

preprint2020arXiv

How Many Customers Does a Retail Store Have?

The knowledge of the number of customers is the pillar of retail business analytics. In our setting, we assume that a portion of customers is monitored and easily counted due to the loyalty program while the rest is not monitored. The behavior of customers in both groups may significantly differ making the estimation of the number of unmonitored customers a non-trivial task. We identify shopping patterns of several customer segments which allows us to estimate the distribution of customers without the loyalty card using the maximum likelihood method. In a simulation study, we find that the proposed approach is quite precise even when the data sample is very small and its assumptions are violated to a certain degree. In an empirical study of a drugstore chain, we validate and illustrate the proposed approach in practice. The actual number of customers estimated by the proposed method is much higher than the number suggested by the naive estimate assuming the constant customer distribution. The proposed method can also be utilized to determine penetration of the loyalty program in the individual customer segments.