Topic overview

General Literature

20 works108 researchers0 institutions

Topic snapshot

What this area looks like now

20works
108authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

20 featured work(s)

preprint2022arXiv

A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning

Interactive machine learning (IML) is a field of research that explores how to leverage both human and computational abilities in decision making systems. IML represents a collaboration between multiple complementary human and machine intelligent systems working as a team, each with their own unique abilities and limitations. This teamwork might mean that both systems take actions at the same time, or in sequence. Two major open research questions in the field of IML are: "How should we design systems that can learn to make better decisions over time with human interaction?" and "How should we evaluate the design and deployment of such systems?" A lack of appropriate consideration for the humans involved can lead to problematic system behaviour, and issues of fairness, accountability, and transparency. Thus, our goal with this work is to present a human-centred guide to designing and evaluating IML systems while mitigating risks. This guide is intended to be used by machine learning practitioners who are responsible for the health, safety, and well-being of interacting humans. An obligation of responsibility for public interaction means acting with integrity, honesty, fairness, and abiding by applicable legal statutes. With these values and principles in mind, we as a machine learning research community can better achieve goals of augmenting human skills and abilities. This practical guide therefore aims to support many of the responsible decisions necessary throughout the iterative design, development, and dissemination of IML systems.

preprint2022arXiv

Data science to investigate temperature profiles of large networks of food refrigeration systems

The electrical generation and transmission infrastructures of many countries are under increased pressure. This partially reflects the move towards low carbon economies and the increased reliance on renewable power generation systems. There has been a reduction in the use of traditional fossil fuel generation systems, which provide a stable base load, and this has been replaced with more unpredictable renewable generation. As a consequence, the available load on the grid is becoming more unstable. To cope with this variability, the UK National Grid has placed emphasis on the investigation of various technical mechanisms (e.g. implementation of smart grids, energy storage technologies, auxiliary power sources), which may be able to prevent critical situations, when the grid may become sometimes unstable. The successful implementation of these mechanisms may require large numbers of electrical consumers (e.g. HVAC systems, food refrigeration systems) for example to make additional investments in energy storage technologies (food refrigeration systems) or to integrate their electrical demand from industrial processes into the National Grid (HVAC systems). However, in the situation of food refrigeration systems, during these critical situations, even if the thermal inertia within refrigeration systems may maintain effective performance of the device for a short period of time (e.g. under 1 minute) when the electrical input load into the system is reduced, this still carries the paramount risk of food safety even for very short periods of time (e.g. under 1 minute). Therefore before considering any future actions (e.g. investing in energy storage technologies) to prevent the critical situations when grid becomes unstable, it is also needed to understand during the normal use how the temperature profiles evolve along the time inside these massive networks of food refrigeration systems.

preprint2026arXiv

Tensor Cookbook: Mastering Tensors through Diagrams

High-dimensional data arise naturally in many areas of science and engineering, including machine learning, signal processing, computational physics, and statistics. Such data are often represented as tensors, multi-dimensional generalizations of matrices. While tensors provide a natural representation for multi-modal structure, their direct manipulation quickly becomes challenging as the order grows: the number of parameters increases exponentially, and algebraic expressions involving many indices become difficult to interpret and implement. Tensor networks (TNs) provide an effective framework for addressing these challenges. Originally introduced by Penrose and developed extensively in quantum physics, the graphical language of tensor networks encodes contractions as edges in a graph, reducing notational overhead and revealing structural properties obscured by index notation. Despite the central role of high-dimensional tensors in modern machine learning and numerical analysis, tensor network diagrams remain underutilized outside quantum computing, partly due to the lack of a self-contained mathematical reference accessible to a broad technical audience. This manuscript provides a self-contained guide to tensor networks and their use in tensor algebra. We present the main operations on tensors, contractions, products, and reshaping through, graphical notation, and show how classical tensor decompositions and related computations are naturally expressed in this framework. We also illustrate how tensor networks simplify the derivation of gradients and the manipulation of high-dimensional probability distributions. Throughout, we show that the diagrammatic approach yields genuinely shorter and more transparent proofs of classical identities, rank bounds, and gradient formulas that would otherwise require laborious index manipulation.

preprint2023arXiv

Charles Babbage, Ada Lovelace, and the Bernoulli Numbers

This chapter makes needed corrections to an unduly negative scholarly view of Ada Lovelace. Credit between Lovelace and Babbage is not a zero-sum game, where any credit added to Lovelace somehow detracts from Babbage. Ample evidence indicates Babbage and Lovelace each had important contributions to the famous 1843 Sketch of Babbage's Analytical Engine and the accompanying Notes. Further, Lovelace's correspondence with two highly accomplished figures in 19th century mathematics, Charles Babbage and Augustus De Morgan, establish her mathematical background and sophistication. Babbage and Lovelace's treatment of the Bernoulli numbers in Note 'G' spotlights this aspect of their collaboration. Finally, while acknowledging significant definitional problems in calling Lovelace the world's "first computer programmer," I affirm that Lovelace created an elemental sequence of instructions -- that is, an algorithm -- for computing the series of Bernoulli numbers.

preprint2022arXiv

Advancing Data Justice Research and Practice: An Integrated Literature Review

The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic and global data innovation ecosystems. In this integrated literature review we hope to lay the conceptual groundwork needed to support this aspiration. The introduction motivates the broadening of data justice that is undertaken by the literature review which follows. First, we address how certain limitations of the current study of data justice drive the need for a re-location of data justice research and practice. We map out the strengths and shortcomings of the contemporary state of the art and then elaborate on the challenges faced by our own effort to broaden the data justice perspective in the decolonial context. The body of the literature review covers seven thematic areas. For each theme, the ADJRP team has systematically collected and analysed key texts in order to tell the critical empirical story of how existing social structures and power dynamics present challenges to data justice and related justice fields. In each case, this critical empirical story is also supplemented by the transformational story of how activists, policymakers, and academics are challenging longstanding structures of inequity to advance social justice in data innovation ecosystems and adjacent areas of technological practice.

preprint2022arXiv

Towards Specificationless Monitoring of Provenance-Emitting Systems

Monitoring often requires insight into the monitored system as well as concrete specifications of expected behavior. More and more systems, however, provide information about their inner procedures by emitting provenance information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for anomalous behavior by performing spectral graph analysis on slices of the constructed provenance graph and by comparing the characteristics of each slice with those of a sliding window over recently seen slices. We argue that this approach not only simplifies the monitoring of heterogeneous distributed systems, but also enables applying a host of well-studied techniques to monitor such systems.

preprint2022arXiv

A survey study of success factors in data science projects

In recent years, the data science community has pursued excellence and made significant research efforts to develop advanced analytics, focusing on solving technical problems at the expense of organizational and socio-technical challenges. According to previous surveys on the state of data science project management, there is a significant gap between technical and organizational processes. In this article we present new empirical data from a survey to 237 data science professionals on the use of project management methodologies for data science. We provide additional profiling of the survey respondents' roles and their priorities when executing data science projects. Based on this survey study, the main findings are: (1) Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology. (2) The most important success factors are precisely describing stakeholders' needs, communicating the results to end-users, and team collaboration and coordination. (3) Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls, version control, the deployment pipeline to production, and data security and privacy.

preprint2022arXiv

Data Science in Perspective

Data and Science has stood out in the generation of results, whether in the projects of the scientific domain or business domain. CERN Project, Scientific Institutes, companies like Walmart, Google, Apple, among others, need data to present their results and make predictions in the competitive data world. Data and Science are words that together culminated in a globally recognized term called Data Science. Data Science is in its initial phase, possibly being part of formal sciences and also being presented as part of applied sciences, capable of generating value and supporting decision making. Data Science considers science and, consequently, the scientific method to promote decision making through data intelligence. In many cases, the application of the method (or part of it) is considered in Data Science projects in scientific domain (social sciences, bioinformatics, geospatial projects) or business domain (finance, logistic, retail), among others. In this sense, this article addresses the perspectives of Data Science as a multidisciplinary area, considering science and the scientific method, and its formal structure which integrate Statistics, Computer Science, and Business Science, also taking into account Artificial Intelligence, emphasizing Machine Learning, among others. The article also deals with the perspective of applied Data Science, since Data Science is used for generating value through scientific and business projects. Data Science persona is also discussed in the article, concerning the education of Data Science professionals and its corresponding profiles, since its projection changes the field of data in the world.

preprint2022arXiv

The EL-X8 computer and the BOL detector Networking, programming, time-sharing and data-handling in the Amsterdam nuclear research project `BOL' A personal historical review

From 1967 to 1974, an Electrologica X8 computer was installed at the Institute for Nuclear Research (IKO) in Amsterdam, primarily for online and offline evaluation of experimental data, an application quite different from its `brother's', X8's. During that time, the nuclear detection system `BOL' was in operation to study nuclear reactions. The BOL detector embodied a new and bold concept. It consisted of a large number of state-of-the-art detection units, mounted in a spherical arrangement around a target in a beam of nuclear particles. Two minicomputers performed data acquisition and control of the experiment and supported online visual display of acquired data. The X8 computer, networked with the minicomputers, allowed fast high-level data processing and analysis. Pioneering work in both experimental nuclear physics as well as in programming, turned out to be a surprisingly good combination. For the network with the X8 and the minicomputers, advanced software layers were developed to efficiently and flexibly program extensive data handling.

preprint2022arXiv

The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

Machine Learning (ML) workloads have rapidly grown in importance, but raised concerns about their carbon footprint. Four best practices can reduce ML training energy by up to 100x and CO2 emissions up to 1000x. By following best practices, overall ML energy use (across research, development, and production) held steady at <15% of Google&#39;s total energy use for the past three years. If the whole ML field were to adopt best practices, total carbon emissions from training would reduce. Hence, we recommend that ML papers include emissions explicitly to foster competition on more than just model quality. Estimates of emissions in papers that omitted them have been off 100x-100,000x, so publishing emissions has the added benefit of ensuring accurate accounting. Given the importance of climate change, we must get the numbers right to make certain that we work on its biggest challenges.

preprint2022arXiv

Contextualizing Artificially Intelligent Morality: A Meta-Ethnography of Top-Down, Bottom-Up, and Hybrid Models for Theoretical and Applied Ethics in Artificial Intelligence

In this meta-ethnography, we explore three different angles of ethical artificial intelligence (AI) design implementation including the philosophical ethical viewpoint, the technical perspective, and framing through a political lens. Our qualitative research includes a literature review that highlights the cross-referencing of these angles by discussing the value and drawbacks of contrastive top-down, bottom-up, and hybrid approaches previously published. The novel contribution to this framework is the political angle, which constitutes ethics in AI either being determined by corporations and governments and imposed through policies or law (coming from the top), or ethics being called for by the people (coming from the bottom), as well as top-down, bottom-up, and hybrid technicalities of how AI is developed within a moral construct and in consideration of its users, with expected and unexpected consequences and long-term impact in the world. There is a focus on reinforcement learning as an example of a bottom-up applied technical approach and AI ethics principles as a practical top-down approach. This investigation includes real-world case studies to impart a global perspective, as well as philosophical debate on the ethics of AI and theoretical future thought experimentation based on historical facts, current world circumstances, and possible ensuing realities.

preprint2022arXiv

Moore&#39;s Law is dead, long live Moore&#39;s Law!

Moore&#39;s Law has been used by semiconductor industry as predicative indicators of the industry and it has become a self-fulfilling prophecy. Now more people tend to agree that the original Moore&#39;s Law started to falter. This paper proposes a possible quantitative modification to Moore&#39;s Law. It can cover other derivative laws of Moore&#39;s Law as well. It intends to more accurately predict the roadmap of chip&#39;s performance and energy consumption.

preprint2022arXiv

RangL: A Reinforcement Learning Competition Platform

The RangL project hosted by The Alan Turing Institute aims to encourage the wider uptake of reinforcement learning by supporting competitions relating to real-world dynamic decision problems. This article describes the reusable code repository developed by the RangL team and deployed for the 2022 Pathways to Net Zero Challenge, supported by the UK Net Zero Technology Centre. The winning solutions to this particular Challenge seek to optimize the UK&#39;s energy transition policy to net zero carbon emissions by 2050. The RangL repository includes an OpenAI Gym reinforcement learning environment and code that supports both submission to, and evaluation in, a remote instance of the open source EvalAI platform as well as all winning learning agent strategies. The repository is an illustrative example of RangL&#39;s capability to provide a reusable structure for future challenges.

preprint2022arXiv

Satoshi Nakamoto and the Origins of Bitcoin -- The Profile of a 1-in-a-Billion Genius

The mystery about the ingenious creator of Bitcoin concealing behind the pseudonym Satoshi Nakamoto has been fascinating the global public for more than a decade. Suddenly jumping out of the dark in 2008, this persona hurled the decentralized electronic cash system &#34;Bitcoin&#34;, which has reached a peak market capitalization in the region of 1 trillion USD. In a purposely agnostic, and meticulous &#34;lea-ving no stone unturned&#34; approach, this study presents new hard facts, which evidently slipped through Satoshi Nakamoto&#39;s elaborate privacy shield, and derives meaningful pointers that are primarily inferred from Bitcoin&#39;s whitepaper, its blockchain parameters, and data that were widely up to his discretion. This ample stack of established and novel evidence is systematically categorized, analyzed, and then connected to its related, real-world ambient, like relevant locations and happenings in the past, and at the time. Evidence compounds towards a substantial role of the Benelux cryptography ecosystem, with strong transatlantic links, in the creation of Bitcoin. A consistent biography, a psychogram, and gripping story of an ingenious, multi-talented, autodidactic, reticent, and capricious polymath transpire, which are absolutely unique from a history of science and technology perspective. A cohort of previously fielded and best matches emerging from the investigations are probed against an unprecedently restrictive, multi-stage exclusion filter, which can, with maximum certainty, rule out most &#34;Satoshi Nakamoto&#34; candidates, while some of them remain to be confirmed. With this article, you will be able to decide who is not, or highly unlikely to be Satoshi Nakamoto, be equipped with an ample stack of systematically categorized evidence and efficient methodologies to find suitable candidates, and can possibly unveil the real identity of the creator of Bitcoin - if you want.

preprint2022arXiv

SIND: A Drone Dataset at Signalized Intersection in China

Intersection is one of the most challenging scenarios for autonomous driving tasks. Due to the complexity and stochasticity, essential applications (e.g., behavior modeling, motion prediction, safety validation, etc.) at intersections rely heavily on data-driven techniques. Thus, there is an intense demand for trajectory datasets of traffic participants (TPs) in intersections. Currently, most intersections in urban areas are equipped with traffic lights. However, there is not yet a large-scale, high-quality, publicly available trajectory dataset for signalized intersections. Therefore, in this paper, a typical two-phase signalized intersection is selected in Tianjin, China. Besides, a pipeline is designed to construct a Signalized INtersection Dataset (SIND), which contains 7 hours of recording including over 13,000 TPs with 7 types. Then, the behaviors of traffic light violations in SIND are recorded. Furthermore, the SIND is also compared with other similar works. The features of the SIND can be summarized as follows: 1) SIND provides more comprehensive information, including traffic light states, motion parameters, High Definition (HD) map, etc. 2) The category of TPs is diverse and characteristic, where the proportion of vulnerable road users (VRUs) is up to 62.6% 3) Multiple traffic light violations of non-motor vehicles are shown. We believe that SIND would be an effective supplement to existing datasets and can promote related research on autonomous driving.The dataset is available online via: https://github.com/SOTIF-AVLab/SinD

preprint2020arXiv

From the digital data revolution to digital health and digital economy toward a digital society: Pervasiveness of Artificial Intelligence

Technological progress has led to powerful computers and communication technologies that penetrate nowadays all areas of science, industry and our private lives. As a consequence, all these areas are generating digital traces of data amounting to big data resources. This opens unprecedented opportunities but also challenges toward the analysis, management, interpretation and utilization of these data. Fortunately, recent breakthroughs in deep learning algorithms complement now machine learning and statistics methods for an efficient analysis of such data. Furthermore, advances in text mining and natural language processing, e.g., word-embedding methods, enable also the processing of large amounts of text data from diverse sources as governmental reports, blog entries in social media or clinical health records of patients. In this paper, we present a perspective on the role of artificial intelligence in these developments and discuss also potential problems we are facing in a digital society.

preprint2021arXiv

Empirical Standards for Software Engineering Research

Empirical Standards are natural-language models of a scientific community&#39;s expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

preprint2021arXiv

The Slodderwetenschap (Sloppy Science) of Stochastic Parrots -- A Plea for Science to NOT take the Route Advocated by Gebru and Bender

This article is a position paper written in reaction to the now-infamous paper titled &#34;On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?&#34; by Timnit Gebru, Emily Bender, and others who were, as of the date of this writing, still unnamed. I find the ethics of the Parrot Paper lacking, and in that lack, I worry about the direction in which computer science, machine learning, and artificial intelligence are heading. At best, I would describe the argumentation and evidentiary practices embodied in the Parrot Paper as Slodderwetenschap (Dutch for Sloppy Science) -- a word which the academic world last widely used in conjunction with the Diederik Stapel affair in psychology [2]. What is missing in the Parrot Paper are three critical elements: 1) acknowledgment that it is a position paper/advocacy piece rather than research, 2) explicit articulation of the critical presuppositions, and 3) explicit consideration of cost/benefit trade-offs rather than a mere recitation of potential &#34;harms&#34; as if benefits did not matter. To leave out these three elements is not good practice for either science or research.

People in this topic

12 visible researcher(s)