Source author record

Grigori Fursin

Grigori Fursin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Software Engineering Distributed, Parallel, and Cluster Computing Performance Digital Libraries Human-Computer Interaction Mathematical Software Programming Languages

Catalog footprint

What is connected

10works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

A Community Roadmap for Scientific Workflows Research and Development

The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.

preprint2020arXiv

CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking

We present CodeReef - an open platform to share all the components necessary to enable cross-platform MLOps (MLSysOps), i.e. automating the deployment of ML models across diverse systems in the most efficient way. We also introduce the CodeReef solution - a way to package and share models as non-virtualized, portable, customizable and reproducible archive files. Such ML packages include JSON meta description of models with all dependencies, Python APIs, CLI actions and portable workflows necessary to automatically build, benchmark, test and customize models across diverse platforms, AI frameworks, libraries, compilers and datasets. We demonstrate several CodeReef solutions to automatically build, run and measure object detection based on SSD-Mobilenets, TensorFlow and COCO dataset from the latest MLPerf inference benchmark across a wide range of platforms from Raspberry Pi, Android phones and IoT devices to data centers. Our long-term goal is to help researchers share their new techniques as production-ready packages along with research papers to participate in collaborative and reproducible benchmarking, compare the different ML/software/hardware stacks and select the most efficient ones on a Pareto frontier using online CodeReef dashboards.

preprint2020arXiv

The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps

This article provides an overview of the Collective Knowledge technology (CK or cKnowledge). CK attempts to make it easier to reproduce ML&systems research, deploy ML models in production, and adapt them to continuously changing data sets, models, research techniques, software, and hardware. The CK concept is to decompose complex systems and ad-hoc research projects into reusable sub-components with unified APIs, CLI, and JSON meta description. Such components can be connected into portable workflows using DevOps principles combined with reusable automation actions, software detection plugins, meta packages, and exposed optimization parameters. CK workflows can automatically plug in different models, data and tools from different vendors while building, running and benchmarking research code in a unified way across diverse platforms and environments. Such workflows also help to perform whole system optimization, reproduce results, and compare them using public or private scoreboards on the CK platform (https://cKnowledge.io). For example, the modular CK approach was successfully validated with industrial partners to automatically co-design and optimize software, hardware, and machine learning models for reproducible and efficient object detection in terms of speed, accuracy, energy, size, and other characteristics. The long-term goal is to simplify and accelerate the development and deployment of ML models and systems by helping researchers and practitioners to share and reuse their knowledge, experience, best practices, artifacts, and techniques using open CK APIs.

preprint2018arXiv

A model-driven approach for a new generation of adaptive libraries

Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some implementations can be more suitable for certain devices by exploiting hardware characteristics such as local memories and vector units. Traditionally, such parameters and algorithmic choices are tuned and then hard-coded for a specific architecture and for certain characteristics of the inputs. However, emerging applications are often data-driven, thus traditional approaches are not effective across the wide range of inputs and architectures used in practice. In this paper, we present a new adaptive framework for data-driven applications which uses a predictive model to select the optimal algorithmic parameters by training with synthetic and real datasets. We demonstrate the effectiveness of a BLAS library and specifically on its matrix multiplication routine. We present experimental results for two GPU architectures and show significant performance gains of up to 3x (on a high-end NVIDIA Pascal GPU) and 2.5x (on an embedded ARM Mali GPU) when compared to a traditionally optimized library.

preprint2015arXiv

Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science

Nowadays, engineers have to develop software often without even knowing which hardware it will eventually run on in numerous mobile phones, tablets, desktops, laptops, data centers, supercomputers and cloud services. Unfortunately, optimizing compilers are not keeping pace with ever increasing complexity of computer systems anymore and may produce severely underperforming executable codes while wasting expensive resources and energy. We present our practical and collaborative solution to this problem via light-weight wrappers around any software piece when more than one implementation or optimization choice available. These wrappers are connected with a public Collective Mind autotuning infrastructure and repository of knowledge (c-mind.org/repo) to continuously monitor various important characteristics of these pieces (computational species) across numerous existing hardware configurations together with randomly selected optimizations. Similar to natural sciences, we can now continuously track winning solutions (optimizations for a given hardware) that minimize all costs of a computation (execution time, energy spent, code size, failures, memory and storage footprint, optimization time, faults, contentions, inaccuracy and so on) of a given species on a Pareto frontier along with any unexpected behavior. The community can then collaboratively classify solutions, prune redundant ones, and correlate them with various features of software, its inputs (data sets) and used hardware either manually or using powerful predictive analytics techniques. Our approach can then help create a large, realistic, diverse, representative, and continuously evolving benchmark with related optimization knowledge while gradually covering all possible software and hardware to be able to predict best optimizations and improve compilers and hardware depending on usage scenarios and requirements.

preprint2014arXiv

Collective Tuning Initiative

Computing systems rarely deliver best possible performance due to ever increasing hardware and software complexity and limitations of the current optimization technology. Additional code and architecture optimizations are often required to improve execution time, size, power consumption, reliability and other important characteristics of computing systems. However, it is often a tedious, repetitive, isolated and time consuming process. In order to automate, simplify and systematize program optimization and architecture design, we are developing open-source modular plugin-based Collective Tuning Infrastructure (CTI, http://cTuning.org) that can distribute optimization process and leverage optimization experience of multiple users. CTI provides a novel fully integrated, collaborative, "one button" approach to improve existing underperfoming computing systems ranging from embedded architectures to high-performance servers based on systematic iterative compilation, statistical collective optimization and machine learning. Our experimental results show that it is possible to reduce execution time (and code size) of some programs from SPEC2006 and EEMBC among others by more than a factor of 2 automatically. It can also reduce development and testing time considerably. Together with the first production quality machine learning enabled interactive research compiler (MILEPOST GCC) this infrastructure opens up many research opportunities to study and develop future realistic self-tuning and self-organizing adaptive intelligent computing systems based on systematic statistical performance evaluation and benchmarking. Finally, using common optimization repository is intended to improve the quality and reproducibility of the research on architecture and code optimization.

preprint2014arXiv

Community-driven reviewing and validation of publications

In this report, we share our practical experience on crowdsourcing evaluation of research artifacts and reviewing of publications since 2008. We also briefly discuss encountered problems including reproducibility of experimental results and possible solutions.

preprint2014arXiv

Finding representative sets of optimizations for adaptive multiversioning applications

Iterative compilation is a widely adopted technique to optimize programs for different constraints such as performance, code size and power consumption in rapidly evolving hardware and software environments. However, in case of statically compiled programs, it is often restricted to optimizations for a specific dataset and may not be applicable to applications that exhibit different run-time behavior across program phases, multiple datasets or when executed in heterogeneous, reconfigurable and virtual environments. Several frameworks have been recently introduced to tackle these problems and enable run-time optimization and adaptation for statically compiled programs based on static function multiversioning and monitoring of online program behavior. In this article, we present a novel technique to select a minimal set of representative optimization variants (function versions) for such frameworks while avoiding performance loss across available datasets and code-size explosion. We developed a novel mapping mechanism using popular decision tree or rule induction based machine learning techniques to rapidly select best code versions at run-time based on dataset features and minimize selection overhead. These techniques enable creation of self-tuning static binaries or libraries adaptable to changing behavior and environments at run-time using staged compilation that do not require complex recompilation frameworks while effectively outperforming traditional single-version non-adaptable code.

preprint2014arXiv

Proceedings of the 5th International Workshop on Adaptive Self-tuning Computing Systems 2015 (ADAPT'15)

This is the proceedings of the 5th International Workshop on Adaptive Self-tuning Computing Systems 2015 (ADAPT'15).

preprint2013arXiv

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning

Software and hardware co-design and optimization of HPC systems has become intolerably complex, ad-hoc, time consuming and error prone due to enormous number of available design and optimization choices, complex interactions between all software and hardware components, and multiple strict requirements placed on performance, power consumption, size, reliability and cost. We present our novel long-term holistic and practical solution to this problem based on customizable, plugin-based, schema-free, heterogeneous, open-source Collective Mind repository and infrastructure with unified web interfaces and on-line advise system. This collaborative framework distributes analysis and multi-objective off-line and on-line auto-tuning of computer systems among many participants while utilizing any available smart phone, tablet, laptop, cluster or data center, and continuously observing, classifying and modeling their realistic behavior. Any unexpected behavior is analyzed using shared data mining and predictive modeling plugins or exposed to the community at cTuning.org for collaborative explanation, top-down complexity reduction, incremental problem decomposition and detection of correlating program, architecture or run-time properties (features). Gradually increasing optimization knowledge helps to continuously improve optimization heuristics of any compiler, predict optimizations for new programs or suggest efficient run-time (online) tuning and adaptation strategies depending on end-user requirements. We decided to share all our past research artifacts including hundreds of codelets, numerical applications, data sets, models, universal experimental analysis and auto-tuning pipelines, self-tuning machine learning based meta compiler, and unified statistical analysis and machine learning plugins in a public repository to initiate systematic, reproducible and collaborative research, development and experimentation with a new publication model where experiments and techniques are validated, ranked and improved by the community.

Grigori Fursin

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A Community Roadmap for Scientific Workflows Research and Development

CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking

The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps

A model-driven approach for a new generation of adaptive libraries

Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science

Collective Tuning Initiative

Community-driven reviewing and validation of publications

Finding representative sets of optimizations for adaptive multiversioning applications

Proceedings of the 5th International Workshop on Adaptive Self-tuning Computing Systems 2015 (ADAPT'15)

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning