Source author record

Ibrahim Sabek

Ibrahim Sabek appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Machine Learning Networking and Internet Architecture Computation and Language

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Is Quantum Computing Ready for Real-Time Database Optimization?

Database systems encompass several performance-critical optimization tasks, such as join ordering and index tuning. As data volumes grow and workloads become more complex, these problems have become exponentially harder to solve efficiently. Quantum computing, especially quantum annealing, is a promising paradigm that can efficiently explore very large search spaces through quantum tunneling. It can escape local optima by tunneling through energy barriers rather than climbing over them. Earlier works mainly focused on providing an abstract representation (e.g., Quadratic Unconstrained Binary Optimization (QUBO)) for the database optimization problems (e.g., join order) and overlooked the real integration within database systems due to the high overhead of quantum computing services (e.g., a minimum 5s runtime for D-Wave's CQM-Solver). Recently, quantum annealing providers have offered more low-latency solutions, e.g., NL-Solver, which paves the road to actually realizing quantum solutions within DBMSs. However, this raises new systems research challenges in balancing efficiency and solution quality. In this talk, we show that this balance is possible to achieve. As a proof of concept, we present Q2O, the first real Quantum-augmented Query Optimizer. We show the end-to-end workflow: we encode the join order problem as a nonlinear model, a format solvable by the NL-Solver, using actual database statistics; the solution is translated into a plan hint that guides PostgreSQL's optimizer to produce a complete plan. Q2O is capable of handling actual queries in real time.

preprint2022arXiv

The Case for Learned In-Memory Joins

In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is practical or not. To the best of our knowledge, we are the first to fill this gap. We investigate the usage of CDF-based partitioning and learned indexes (e.g., Recursive Model Indexes (RMI) and RadixSpline) in the three join categories; indexed nested loop join (INLJ), sort-based joins (SJ) and hash-based joins (HJ). Our study shows that there is a room to improve the performance of INLJ and SJ categories through our proposed optimized learned variants. Our experimental analysis showed that these proposed learned variants of INLJ and SJ consistently outperform the state-of-the-art techniques.

preprint2021arXiv

The Case for Distance-Bounded Spatial Approximations

Spatial approximations have been traditionally used in spatial databases to accelerate the processing of complex geometric operations. However, approximations are typically only used in a first filtering step to determine a set of candidate spatial objects that may fulfill the query condition. To provide accurate results, the exact geometries of the candidate objects are tested against the query condition, which is typically an expensive operation. Nevertheless, many emerging applications (e.g., visualization tools) require interactive responses, while only needing approximate results. Besides, real-world geospatial data is inherently imprecise, which makes exact data processing unnecessary. Given the uncertainty associated with spatial data and the relaxed precision requirements of many applications, this vision paper advocates for approximate spatial data processing techniques that omit exact geometric tests and provide final answers solely on the basis of (fine-grained) approximations. Thanks to recent hardware advances, this vision can be realized today. Furthermore, our approximate techniques employ a distance-based error bound, i.e., a bound on the maximum spatial distance between false (or missing) and exact results which is crucial for meaningful analyses. This bound allows to control the precision of the approximation and trade accuracy for performance.

preprint2020arXiv

The Case for Learned Spatial Indexes

Spatial data is ubiquitous. Massive amounts of data are generated every day from billions of GPS-enabled devices such as cell phones, cars, sensors, and various consumer-based applications such as Uber, Tinder, location-tagged posts in Facebook, Twitter, Instagram, etc. This exponential growth in spatial data has led the research community to focus on building systems and applications that can process spatial data efficiently. In the meantime, recent research has introduced learned index structures. In this work, we use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) and apply them to five classical multi-dimensional indexes to be able to answer spatial range queries. By tuning each partitioning technique for optimal performance, we show that (i) machine learned search within a partition is faster by 11.79\% to 39.51\% than binary search when using filtering on one dimension, (ii) the bottleneck for tree structures is index lookup, which could potentially be improved by linearizing the indexed partitions (iii) filtering on one dimension and refining using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions, and (iv) learned indexes can have a significant impact on the performance of low selectivity queries while being less effective under higher selectivities.

preprint2013arXiv

Intelligent Hybrid Man-Machine Translation Quality Estimation

Inferring evaluation scores based on human judgments is invaluable compared to using current evaluation metrics which are not suitable for real-time applications e.g. post-editing. However, these judgments are much more expensive to collect especially from expert translators, compared to evaluation based on indicators contrasting source and translation texts. This work introduces a novel approach for quality estimation by combining learnt confidence scores from a probabilistic inference model based on human judgments, with selective linguistic features-based scores, where the proposed inference model infers the credibility of given human ranks to solve the scarcity and inconsistency issues of human judgments. Experimental results, using challenging language-pairs, demonstrate improvement in correlation with human judgments over traditional evaluation metrics.

preprint2013arXiv

MonoStream: A Minimal-Hardware High Accuracy Device-free WLAN Localization System

Device-free (DF) localization is an emerging technology that allows the detection and tracking of entities that do not carry any devices nor participate actively in the localization process. Typically, DF systems require a large number of transmitters and receivers to achieve acceptable accuracy, which is not available in many scenarios such as homes and small businesses. In this paper, we introduce MonoStream as an accurate single-stream DF localization system that leverages the rich Channel State Information (CSI) as well as MIMO information from the physical layer to provide accurate DF localization with only one stream. To boost its accuracy and attain low computational requirements, MonoStream models the DF localization problem as an object recognition problem and uses a novel set of CSI-context features and techniques with proven accuracy and efficiency. Experimental evaluation in two typical testbeds, with a side-by-side comparison with the state-of-the-art, shows that MonoStream can achieve an accuracy of 0.95m with at least 26% enhancement in median distance error using a single stream only. This enhancement in accuracy comes with an efficient execution of less than 23ms per location update on a typical laptop. This highlights the potential of MonoStream usage for real-time DF tracking applications.

preprint2012arXiv

Spot: An accurate and efficient multi-entity device-free WLAN localization system

Device-free (DF) localization in WLANs has been introduced as a value-added service that allows tracking indoor entities that do not carry any devices. Previous work in DF WLAN localization focused on the tracking of a single entity due to the intractability of the multi-entity tracking problem whose complexity grows exponentially with the number of humans being tracked. In this paper, we introduce Spot as an accurate and efficient system for multi-entity DF detection and tracking. Spot is based on a probabilistic energy minimization framework that combines a conditional random field with a Markov model to capture the temporal and spatial relations between the entities' poses. A novel cross-calibration technique is introduced to reduce the calibration overhead of multiple entities to linear, regardless of the number of humans being tracked. This also helps in increasing the system accuracy. We design the energy minimization function with the goal of being efficiently solved in mind. We show that the designed function can be mapped to a binary graph-cut problem whose solution has a linear complexity on average and a third order polynomial in the worst case. We further employ clustering on the estimated location candidates to reduce outliers and obtain more accurate tracking. Experimental evaluation in two typical testbeds, with a side-by-side comparison with the state-of-the-art, shows that Spot can achieve a multi-entity tracking accuracy of less than 1.1m. This corresponds to at least 36% enhancement in median distance error over the state-of-the-art DF localization systems, which can only track a single entity. In addition, Spot can estimate the number of entities correctly to within one difference error. This highlights that Spot achieves its goals of having an accurate and efficient software-only DF tracking solution of multiple entities in indoor environments.

Ibrahim Sabek

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Is Quantum Computing Ready for Real-Time Database Optimization?

The Case for Learned In-Memory Joins

The Case for Distance-Bounded Spatial Approximations

The Case for Learned Spatial Indexes

Intelligent Hybrid Man-Machine Translation Quality Estimation

MonoStream: A Minimal-Hardware High Accuracy Device-free WLAN Localization System

Spot: An accurate and efficient multi-entity device-free WLAN localization system