Source author record

Ioana Manolescu

Ioana Manolescu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Artificial Intelligence Digital Libraries

Catalog footprint

What is connected

9works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Integrating connection search in graph queries

Graph data management and querying has many practical applications. When graphs are very heterogeneous and/or users are unfamiliar with their structure, they may need to find how two or more groups of nodes are connected in a graph, even when users are not able to describe the connections. This is only partially supported by existing query languages, which allow searching for paths, but not for trees connecting three or more node groups. The latter is related to the NP-hard Group Steiner Tree problem, and has been previously considered for keyword search in databases. In this work, we formally show how to integrate connecting tree patterns (CTPs, in short) within a graph query language such as SPARQL or Cypher, leading to an Extended Query Language (or EQL, in short). We then study a set of algorithms for evaluating CTPs; we generalize prior keyword search work, most importantly by (i) considering bidirectional edge traversal and (ii) allowing users to select any score function for ranking CTP results. To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning. Our experiments validate the performance of our CTP and EQL evaluation algorithms on a large set of synthetic and real-world workloads.

preprint2021arXiv

Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

Investigative Journalism (IJ, in short) is staple of modern, democratic societies. IJ often necessitates working with large, dynamic sets of heterogeneous, schema-less data sources, which can be structured, semi-structured, or textual, limiting the applicability of classical data integration approaches. In prior work, we have developed ConnectionLens, a system capable of integrating such sources into a single heterogeneous graph, leveraging Information Extraction (IE) techniques; users can then query the graph by means of keywords, and explore query results and their neighborhood using an interactive GUI. Our keyword search problem is complicated by the graph heterogeneity, and by the lack of a result score function that would allow to prune some of the search space. In this work, we describe an actual IJ application studying conflicts of interest in the biomedical domain, and we show how ConnectionLens supports it. Then, we present novel techniques addressing the scalability challenges raised by this application: one allows to reduce the significant IE costs while building the graph, while the other is a novel, parallel, in-memory keyword search engine, which achieves orders of magnitude speed-up over our previous engine. Our experimental study on the real-world IJ application data confirms the benefits of our contributions.

preprint2012arXiv

Proceedings of the first International Workshop On Open Data, WOD-2012

WOD-2012 aims at facilitating new trends and ideas from a broad range of topics concerned within the widely-spread Open Data movement, from the viewpoint of computer science research. While being most commonly known from the recent Linked Open Data movement, the concept of publishing data explicitly as Open Data has meanwhile developed many variants and facets that go beyond publishing large and highly structured RDF/S repositories. Open Data comprises text and semi-structured data, but also open multi-modal contents, including music, images, and videos. With the increasing amount of data that is published by governments (see, e.g., data.gov, data.gov.uk or data.gouv.fr), by international organizations (data.worldbank.org or data.undp.org) and by scientific communities (tdar.org, cds.u-strasbg.fr, GenBank, IRIS or KNB) explicitly under an Open Data policy, new challenges arise not only due to the scale at which this data becomes available. A number of community-based conferences accommodate tracks or workshops which are dedicated to Open Data. However, WOD aims to be a premier venue to gather researchers and practitioners who are contributing to and interested in the emerging field of managing Open Data from a computer science perspective. Hence, it is a unique opportunity to find in a single place up-to-date scientific works on Web-scale Open Data issues that have so far only partially been addressed by different research communities such as Databases, Data Mining and Knowledge Management, Distributed Systems, Data Privacy, and Data Visualization.

preprint2011arXiv

The ViP2P Platform: XML Views in P2P

The growing volumes of XML data sources on the Web or produced by enterprises, organizations etc. raise many performance challenges for data management applications. In this work, we are concerned with the distributed, peer-to-peer management of large corpora of XML documents, based on distributed hash table (or DHT, in short) overlay networks. We present ViP2P (standing for Views in Peer-to-Peer), a distributed platform for sharing XML documents based on a structured P2P network infrastructure (DHT). At the core of ViP2P stand distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P allows user queries to be evaluated over XML documents published by peers in two modes. First, a long-running subscription mode, when a query can be registered in the system and receive answers incrementally when and if published data matches the query. Second, queries can also be asked in an ad-hoc, snapshot mode, where results are required immediately and must be computed based on the results of other long-running, subscription queries. ViP2P innovates over other similar DHT-based XML sharing platforms by using a very expressive structured XML query language. This expressivity leads to a very flexible distribution of XML content in the ViP2P network, and to efficient snapshot query execution. ViP2P has been tested in real deployments of hundreds of computers. We present the platform architecture, its internal algorithms, and demonstrate its efficiency and scalability through a set of experiments. Our experimental results outgrow by orders of magnitude similar competitor systems in terms of data volumes, network size and data dissemination throughput.

preprint2011arXiv

View Selection in Semantic Web Databases

We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.

preprint2011arXiv

XML content warehousing: Improving sociological studies of mailing lists and web data

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis.

preprint2010arXiv

LiquidXML: Adaptive XML Content Redistribution

We propose to demonstrate LiquidXML, a platform for managing large corpora of XML documents in large-scale P2P networks. All LiquidXML peers may publish XML documents to be shared with all the network peers. The challenge then is to efficiently (re-)distribute the published content in the network, possibly in overlapping, redundant fragments, to support efficient processing of queries at each peer. The novelty of LiquidXML relies in its adaptive method of choosing which data fragments are stored where, to improve performance. The "liquid" aspect of XML management is twofold: XML data flows from many sources towards many consumers, and its distribution in the network continuously adapts to improve query performance.

preprint2010arXiv

RDFViewS: A Storage Tuning Wizard for RDF Applications

In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of potential view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation.

preprint2010arXiv

The WebStand Project

In this paper we present the state of advancement of the French ANR WebStand project. The objective of this project is to construct a customizable XML based warehouse platform to acquire, transform, analyze, store, query and export data from the web, in particular mailing lists, with the final intension of using this data to perform sociological studies focused on social groups of World Wide Web, with a specific emphasis on the temporal aspects of this data. We are currently using this system to analyze the standardization process of the W3C, through its social network of standard setters.

Ioana Manolescu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Integrating connection search in graph queries

Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

Proceedings of the first International Workshop On Open Data, WOD-2012

The ViP2P Platform: XML Views in P2P

View Selection in Semantic Web Databases

XML content warehousing: Improving sociological studies of mailing lists and web data

LiquidXML: Adaptive XML Content Redistribution

RDFViewS: A Storage Tuning Wizard for RDF Applications

The WebStand Project