Source author record

Franck Morvan

Franck Morvan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases

Catalog footprint

What is connected

3works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Selectivity correction with online machine learning

Computer systems are full of heuristic rules which drive the decisions they make. These rules of thumb are designed to work well on average, but ignore specific information about the available context, and are thus sub-optimal. The emerging field of machine learning for systems attempts to learn decision rules with machine learning algorithms. In the database community, many recent proposals have been made to improve selectivity estimation with batch machine learning methods. Such methods are all batch methods which require retraining and cannot handle concept drift, such as workload changes and schema modifications. We present online machine learning as an alternative approach. Online models learn on the fly and do not require storing data, they are more lightweight than batch models, and finally may adapt to concept drift. As an experiment, we teach models to improve the selectivity estimates made by PostgreSQL's cost model. Our experiments make the case that simple online models are able to compete with a recently proposed deep learning method.

preprint2020arXiv

Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks

Relational query optimisers rely on cost models to choose between different query execution plans. Selectivity estimates are known to be a crucial input to the cost model. In practice, standard selectivity estimation procedures are prone to large errors. This is mostly because they rely on the so-called attribute value independence and join uniformity assumptions. Therefore, multidimensional methods have been proposed to capture dependencies between two or more attributes both within and across relations. However, these methods require a large computational cost which makes them unusable in practice. We propose a method based on Bayesian networks that is able to capture cross-relation attribute value dependencies with little overhead. Our proposal is based on the assumption that dependencies between attributes are preserved when joins are involved. Furthermore, we introduce a parameter for trading between estimation accuracy and computational cost. We validate our work by comparing it with other relevant methods on a large workload derived from the JOB and TPC-DS benchmarks. Our results show that our method is an order of magnitude more efficient than existing methods, whilst maintaining a high level of accuracy.

preprint2010arXiv

Query Routing and Processing in Peer-To-Peer Data Sharing Systems

Sharing musical files via the Internet was the essential motivation of early P2P systems. Despite of the great success of the P2P file sharing systems, these systems support only "simple" queries. The focus in such systems is how to carry out an efficient query routing in order to find the nodes storing a desired file. Recently, several research works have been made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). These works have led to the emergence of P2P data sharing systems that represent a new generation of P2P systems and, on the other hand, a next stage in a long period of the database research area. ? The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that represents often an essential component in traditional database systems. Usually, such a catalog stores information about data, schemas and data sources. Query routing and processing are two problems affected by the absence of a global catalog. Locating relevant data sources and generating a close to optimal execution plan become more difficult. In this paper, we concentrate our study on proposed solutions for the both problems. Furthermore, selected case studies of main P2P data sharing systems are analyzed and compared.