Researcher profile

Djamal Belazzougui

Djamal Belazzougui contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2020arXiv

Efficient tree-structured categorical retrieval

We study a document retrieval problem in the new framework where $D$ text documents are organized in a {\em category tree} with a pre-defined number $h$ of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern $p$ and a category (level in the category tree), we wish to efficiently retrieve the $t$ \emph{categorical units} containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses $n(\logσ(1+o(1))+\log D+O(h)) + O(Δ)$ bits of space and $O(|p|+t)$ query time, where $n$ is the total length of the documents, $σ$ the size of the alphabet used in the documents and $Δ$ is the total number of nodes in the category tree. Another solution uses $n(\logσ(1+o(1))+O(\log D))+O(Δ)+O(D\log n)$ bits of space and $O(|p|+t\log D)$ query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.

preprint2013arXiv

Cache-Oblivious Peeling of Random Hypergraphs

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random $r$-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires $O(\mathrm{sort}(n))$ I/Os and $O(n \log n)$ time to peel a random hypergraph with $n$ edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of $7.6$ billion keys in less than $21$ hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets.

preprint2013arXiv

Optimal Lower and Upper Bounds for Representing Sequences

Sequence representations supporting queries $access$, $select$ and $rank$ are at the core of many data structures. There is a considerable gap between the various upper bounds and the few lower bounds known for such representations, and how they relate to the space used. In this article we prove a strong lower bound for $rank$, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations $access$ and $select$ can be solved in constant or almost-constant time, which is optimal for large alphabets. Our new upper bounds dominate all of the previous work in the time/space map.

preprint2013arXiv

Single and multiple consecutive permutation motif search

Let $t$ be a permutation (that shall play the role of the {\em text}) on $[n]$ and a pattern $p$ be a sequence of $m$ distinct integer(s) of $[n]$, $m\leq n$. The pattern $p$ occurs in $t$ in position $i$ if and only if $p_1... p_m$ is order-isomorphic to $t_i... t_{i+m-1}$, that is, for all $1 \leq k< \ell \leq m$, $p_k>p_\ell$ if and only if $t_{i+k-1}>t_{i+\ell-1}$. Searching for a pattern $p$ in a text $t$ consists in identifying all occurrences of $p$ in $t$. We first present a forward automaton which allows us to search for $p$ in $t$ in $O(m^2\log \log m +n)$ time. We then introduce a Morris-Pratt automaton representation of the forward automaton which allows us to reduce this complexity to $O(m\log \log m +n)$ at the price of an additional amortized constant term by integer of the text. Both automata occupy $O(m)$ space. We then extend the problem to search for a set of patterns and exhibit a specific Aho-Corasick like algorithm. Next we present a sub-linear average case search algorithm running in $O(\frac{m\log m}{\log\log m}+\frac{n\log m}{m\log\log m})$ time, that we eventually prove to be optimal on average.

preprint2013arXiv

Various improvements to text fingerprinting

Let s = s_1 .. s_n be a text (or sequence) on a finite alphabet Σof size σ. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set {\cal F} of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s_i .. s_j is a maximal location for a fingerprint f in F (denoted by <i,j>) if the alphabet of s_i .. s_j is f and s_{i-1}, s_{j+1}, if defined, are not in f. The set of maximal locations ins is {\cal L} (it is easy to see that |{\cal L}| \leq n σ). Two maximal locations <i,j> and <k,l> such that s_i .. s_j = s_k .. s_l are named {\em copies}, and the quotient set of {\cal L} according to the copy relation is denoted by {\cal L}_C. We present new exact and approximate efficient algorithms and data structures for the following three problems: (1) to compute {\cal F}; (2) given f as a set of distinct characters in Σ, to answer if f represents a fingerprint in {\cal F}; (3) given f, to find all maximal locations of f in s.

preprint2012arXiv

Predecessor search with distance-sensitive query time

A predecessor (successor) search finds the largest element $x^-$ smaller than the input string $x$ (the smallest element $x^+$ larger than or equal to $x$, respectively) out of a given set $S$; in this paper, we consider the static case (i.e., $S$ is fixed and does not change over time) and assume that the $n$ elements of $S$ are available for inspection. We present a number of algorithms that, with a small additional index (usually of O(n log w) bits, where $w$ is the string length), can answer predecessor/successor queries quickly and with time bounds that depend on different kinds of distance, improving significantly several results that appeared in the recent literature. Intuitively, our first result has a running time that depends on the distance between $x$ and $x^\pm$: it is especially efficient when the input $x$ is either very close to or very far from $x^-$ or $x^+$; our second result depends on some global notion of distance in the set $S$, and is fast when the elements of $S$ are more or less equally spaced in the universe; finally, for our third result we rely on a finger (i.e., an element of $S$) to improve upon the first one; its running time depends on the distance between the input and the finger.