Source author record

Petros Maniatis

Petros Maniatis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Programming Languages Operating Systems Software Engineering Artificial Intelligence Computation and Language Cryptography and Security Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Performance

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Library for Representing Python Programs as Graphs for Machine Learning

Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.

preprint2020arXiv

Learning and Evaluating Contextual Embedding of Source Code

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come with the development of pre-trained contextual embeddings, such as BERT, which can be fine-tuned for downstream tasks with less labeled data and training budget, while achieving better accuracies. However, there is no attempt yet to obtain a high-quality contextual embedding of source code, and to evaluate it on multiple program-understanding tasks simultaneously; that is the gap that this paper aims to mitigate. Specifically, first, we curate a massive, deduplicated corpus of 7.4M Python files from GitHub, which we use to pre-train CuBERT, an open-sourced code-understanding BERT model; and, second, we create an open-sourced benchmark that comprises five classification tasks and one program-repair task, akin to code-understanding tasks proposed in the literature before. We fine-tune CuBERT on our benchmark tasks, and compare the resulting models to different variants of Word2Vec token embeddings, BiLSTM and Transformer models, as well as published state-of-the-art models, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples. Future work on source-code embedding can benefit from reusing our benchmark, and from comparing against CuBERT models as a strong baseline.

preprint2020arXiv

Neural Program Synthesis with a Differentiable Fixer

We present a new program synthesis approach that combines an encoder-decoder based synthesis architecture with a differentiable program fixer. Our approach is inspired from the fact that human developers seldom get their program correct on the first attempt, and perform iterative testing-based program fixing to get to the desired program functionality. Similarly, our approach first learns a distribution over programs conditioned on an encoding of a set of input-output examples, and then iteratively performs fix operations using the differentiable fixer. The fixer takes as input the original examples and the current program's outputs on example inputs, and generates a new distribution over the programs with the goal of reducing the discrepancies between the current program outputs and the desired example outputs. We train our architecture end-to-end on the RobustFill domain, and show that the addition of the fixer module leads to a significant improvement on synthesis accuracy compared to using beam search.

preprint2010arXiv

A Data Capsule Framework For Web Services: Providing Flexible Data Access Control To Users

This paper introduces the notion of a secure data capsule, which refers to an encapsulation of sensitive user information (such as a credit card number) along with code that implements an interface suitable for the use of such information (such as charging for purchases) by a service (such as an online merchant). In our capsule framework, users provide their data in the form of such capsules to web services rather than raw data. Capsules can be deployed in a variety of ways, either on a trusted third party or the user's own computer or at the service itself, through the use of a variety of hardware or software modules, such as a virtual machine monitor or trusted platform module: the only requirement is that the deployment mechanism must ensure that the user's data is only accessed via the interface sanctioned by the user. The framework further allows an user to specify policies regarding which services or machines may host her capsule, what parties are allowed to access the interface, and with what parameters. The combination of interface restrictions and policy control lets us bound the impact of an attacker who compromises the service to gain access to the user's capsule or a malicious insider at the service itself.

preprint2010arXiv

CloneCloud: Boosting Mobile Device Applications Through Cloud Clone Execution

Mobile applications are becoming increasingly ubiquitous and provide ever richer functionality on mobile devices. At the same time, such devices often enjoy strong connectivity with more powerful machines ranging from laptops and desktops to commercial clouds. This paper presents the design and implementation of CloneCloud, a system that automatically transforms mobile applications to benefit from the cloud. The system is a flexible application partitioner and execution runtime that enables unmodified mobile applications running in an application-level virtual machine to seamlessly off-load part of their execution from mobile devices onto device clones operating in a computational cloud. CloneCloud uses a combination of static analysis and dynamic profiling to optimally and automatically partition an application so that it migrates, executes in the cloud, and re-integrates computation in a fine-grained manner that makes efficient use of resources. Our evaluation shows that CloneCloud can achieve up to 21.2x speedup of smartphone applications we tested and it allows different partitioning for different inputs and networks.

preprint2010arXiv

Mantis: Predicting System Performance through Program Analysis and Modeling

We present Mantis, a new framework that automatically predicts program performance with high accuracy. Mantis integrates techniques from programming language and machine learning for performance modeling, and is a radical departure from traditional approaches. Mantis extracts program features, which are information about program execution runs, through program instrumentation. It uses machine learning techniques to select features relevant to performance and creates prediction models as a function of the selected features. Through program analysis, it then generates compact code slices that compute these feature values for prediction. Our evaluation shows that Mantis can achieve more than 93% accuracy with less than 10% training data set, which is a significant improvement over models that are oblivious to program features. The system generates code slices that are cheap to compute feature values.

preprint2010arXiv

Verifiable Network-Performance Measurements

In the current Internet, there is no clean way for affected parties to react to poor forwarding performance: when a domain violates its Service Level Agreement (SLA) with a contractual partner, the partner must resort to ad-hoc probing-based monitoring to determine the existence and extent of the violation. Instead, we propose a new, systematic approach to the problem of forwarding-performance verification. Our mechanism relies on voluntary reporting, allowing each domain to disclose its loss and delay performance to its neighbors; it does not disclose any information regarding the participating domains' topology or routing policies beyond what is already publicly available. Most importantly, it enables verifiable performance measurements, i.e., domains cannot abuse it to significantly exaggerate their performance. Finally, our mechanism is tunable, allowing each participating domain to determine how many resources to devote to it independently (i.e., without any inter-domain coordination), exposing a controllable trade-off between performance-verification quality and resource consumption. Our mechanism comes at the cost of deploying modest functionality at the participating domains' border routers; we show that it requires reasonable processing and memory resources within modern network capabilities.

Petros Maniatis

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Library for Representing Python Programs as Graphs for Machine Learning

Learning and Evaluating Contextual Embedding of Source Code

Neural Program Synthesis with a Differentiable Fixer

A Data Capsule Framework For Web Services: Providing Flexible Data Access Control To Users

CloneCloud: Boosting Mobile Device Applications Through Cloud Clone Execution

Mantis: Predicting System Performance through Program Analysis and Modeling

Verifiable Network-Performance Measurements