Researcher profile

Ondřej Sokol

Ondřej Sokol contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

The NP-hard problem of computing the maximal sample variance over interval data is solvable in almost linear time with high probability

We consider the algorithm by Ferson et al. (Reliable computing 11(3), p. 207-233, 2005) designed for solving the NP-hard problem of computing the maximal sample variance over interval data, motivated by robust statistics (in fact, the formulation can be written as a nonconvex quadratic program with a specific structure). First, we propose a new version of the algorithm improving its original time bound $O(n^2 2^ω)$ to $O(n \log n+n\cdot 2^ω)$, where $n$ is number of input data and $ω$ is the clique number in a certain intersection graph. Then we treat input data as random variables as it is usual in statistics) and introduce a natural probabilistic data generating model. We get $2^ω= O(n^{1/\log\log n})$ and $ω= O(\log n / \log\log n)$ on average. This results in average computing time $O(n^{1+ε})$ for $ε> 0$ arbitrarily small, which may be considered as "surprisingly good" average time complexity for solving an NP-hard problem. Moreover, we prove the following tail bound on the distribution of computation time: hard instances, forcing the algorithm to compute in time $2^{Ω(n)}$, occur rarely, with probability tending to zero at the rate $e^{-n\log\log n}$.

preprint2021arXiv

Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects

The method of Holý, Sokol and Černý (Applied Soft Computing, 2017, Vol. 60, p. 752-762) clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method. First, we prove that the problem of finding the optimal clustering is NP-hard. Second, to numerically find a suitable clustering, we propose to use the genetic algorithm augmented by a renumbering procedure, a fast task-specific local search heuristic and an initial solution based on a simplified model. Third, in a simulation study, we demonstrate that our improvements of the standard genetic algorithm significantly enhance its computational performance.

preprint2020arXiv

How Many Customers Does a Retail Store Have?

The knowledge of the number of customers is the pillar of retail business analytics. In our setting, we assume that a portion of customers is monitored and easily counted due to the loyalty program while the rest is not monitored. The behavior of customers in both groups may significantly differ making the estimation of the number of unmonitored customers a non-trivial task. We identify shopping patterns of several customer segments which allows us to estimate the distribution of customers without the loyalty card using the maximum likelihood method. In a simulation study, we find that the proposed approach is quite precise even when the data sample is very small and its assumptions are violated to a certain degree. In an empirical study of a drugstore chain, we validate and illustrate the proposed approach in practice. The actual number of customers estimated by the proposed method is much higher than the number suggested by the naive estimate assuming the constant customer distribution. The proposed method can also be utilized to determine penetration of the loyalty program in the individual customer segments.