Topic overview

cs.CY

2620 works9246 researchers0 institutions

Topic snapshot

What this area looks like now

2620works
9246authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2016arXiv

Superintelligence cannot be contained: Lessons from Computability Theory

Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers and technologists have revived the discussion about the potential catastrophic risks entailed by such an entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of the major proposals for its containment. We argue that such containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

preprint2017arXiv

Divergent discourse between protests and counter-protests: #BlackLivesMatter and #AllLivesMatter

Since the shooting of Black teenager Michael Brown by White police officer Darren Wilson in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has amplified critiques of extrajudicial killings of Black Americans. In response to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a counter-protest hashtag whose content argues that equal attention should be given to all lives regardless of race. Through a multi-level analysis of over 860,000 tweets, we study how these protests and counter-protests diverge by quantifying aspects of their discourse. We find that #AllLivesMatter facilitates opposition between #BlackLivesMatter and hashtags such as #PoliceLivesMatter and #BlueLivesMatter in such a way that historically echoes the tension between Black protesters and law enforcement. In addition, we show that a significant portion of #AllLivesMatter use stems from hijacking by #BlackLivesMatter advocates. Beyond simply injecting #AllLivesMatter with #BlackLivesMatter content, these hijackers use the hashtag to directly confront the counter-protest notion of "All lives matter." Our findings suggest that Black Lives Matter movement was able to grow, exhibit diverse conversations, and avoid derailment on social media by making discussion of counter-protest opinions a central topic of #AllLivesMatter, rather than the movement itself.

preprint2018arXiv

Temporal Limits of Privacy in Human Behavior

Large-scale collection of human behavioral data by companies raises serious privacy concerns. We show that behavior captured in the form of application usage data collected from smartphones is highly unique even in very large datasets encompassing millions of individuals. This makes behavior-based re-identification of users across datasets possible. We study 12 months of data from 3.5 million users and show that four apps are enough to uniquely re-identify 91.2% of users using a simple strategy based on public information. Furthermore, we show that there is seasonal variability in uniqueness and that application usage fingerprints drift over time at an average constant rate.

preprint2017arXiv

Improving Assessment on MOOCs Through Peer Identification and Aligned Incentives

Massive Open Online Courses (MOOCs) use peer assessment to grade open ended questions at scale, allowing students to provide feedback. Relative to teacher based grading, peer assessment on MOOCs traditionally delivers lower quality feedback and fewer learner interactions. We present the identified peer review (IPR) framework, which provides non-blind peer assessment and incentives driving high quality feedback. We show that, compared to traditional peer assessment methods, IPR leads to significantly longer and more useful feedback as well as more discussion between peers.

preprint2018arXiv

Facebook Use of Sensitive Data for Advertising in Europe

The upcoming European General Data Protection Regulation (GDPR) prohibits the processing and exploitation of some categories of personal data (health, political orientation, sexual preferences, religious beliefs, ethnic origin, etc.) due to the obvious privacy risks that may be derived from a malicious use of such type of information. These categories are referred to as sensitive personal data. Facebook has been recently fined EUR 1.2M in Spain for collecting, storing and processing sensitive personal data for advertising purposes. This paper quantifies the portion of Facebook users in the European Union (EU) who are labeled with interests linked to sensitive personal data. The results of our study reveal that Facebook labels 73% EU users with sensitive interests. This corresponds to 40% of the overall EU population. We also estimate that a malicious third-party could unveil the identity of Facebook users that have been assigned a sensitive interest at a cost as low as EUR 0.015 per user. Finally, we propose and implement a web browser extension to inform Facebook users of the sensitive interests Facebook has assigned them.

preprint2018arXiv

Open Data Analytical Model for Human Development Index Optimization to Support Government Policy

The transparency nature of Open Data is beneficial for citizens to evaluate government work performance. In Indonesia, each government bodies or ministry have their own standard operating procedure on data treatment resulting in incoherent information between agent and likely to miss valuable insight. Therefore, our motivation is to show the advantage of Open Data movement to support unified government decision making. We use the dataset from data.go.id which publish official data from each government bodies. The idea is by using those official but limited data, we can find important pattern. The case study is on Human Development Index value prediction and its clustered nature. We explore the data pattern using two important data analytics methods classification and clustering procedure. Data analytics is the collection of activities to reveal unknown data pattern. Specifically, we use Artificial Neural Network classification and K-means clustering. The classification objective is to categorize different level of Human Development Index of cities or region in Indonesia based on Gross Domestic Product, Number of Population in Poverty, Number of Internet User, Number of Labors and Number of Population indicators data. We determined which city belongs to four categories of Human Development stated by UNDP standard. The clustering objective is to find the group characteristics between Human Development Index and Gross Domestic Product.

preprint2018arXiv

Motivations, Classification and Model Trial of Conversational Agents for Insurance Companies

Advances in artificial intelligence have renewed interest in conversational agents. So-called chatbots have reached maturity for industrial applications. German insurance companies are interested in improving their customer service and digitizing their business processes. In this work we investigate the potential use of conversational agents in insurance companies by determining which classes of agents are of interest to insurance companies, finding relevant use cases and requirements, and developing a prototype for an exemplary insurance scenario. Based on this approach, we derive key findings for conversational agent implementation in insurance companies.

preprint2019arXiv

Fair Regression for Health Care Spending

The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enrollees leading to undercompensated payments to health insurers. This incentivizes insurers to design their plans such that individuals in undercompensated groups will be less likely to enroll, impacting access to health care for these groups. To improve risk adjustment formulas for undercompensated groups, we expand on concepts from the statistics, computer science, and health economics literature to develop new fair regression methods for continuous outcomes by building fairness considerations directly into the objective function. We additionally propose a novel measure of fairness while asserting that a suite of metrics is necessary in order to evaluate risk adjustment formulas more fully. Our data application using the IBM MarketScan Research Databases and simulation studies demonstrate that these new fair regression methods may lead to massive improvements in group fairness (e.g., 98%) with only small reductions in overall fit (e.g., 4%).

preprint2019arXiv

A Quantitative Approach to Understanding Online Antisemitism

A new wave of growing antisemitism, driven by fringe Web communities, is an increasingly worrying presence in the socio-political realm. The ubiquitous and global nature of the Web has provided tools used by these groups to spread their ideology to the rest of the Internet. Although the study of antisemitism and hate is not new, the scale and rate of change of online data has impacted the efficacy of traditional approaches to measure and understand these troubling trends. In this paper, we present a large-scale, quantitative study of online antisemitism. We collect hundreds of million posts and images from alt-right Web communities like 4chan's Politically Incorrect board (/pol/) and Gab. Using scientifically grounded methods, we quantify the escalation and spread of antisemitic memes and rhetoric across the Web. We find the frequency of antisemitic content greatly increases (in some cases more than doubling) after major political events such as the 2016 US Presidential Election and the "Unite the Right" rally in Charlottesville. We extract semantic embeddings from our corpus of posts and demonstrate how automated techniques can discover and categorize the use of antisemitic terminology. We additionally examine the prevalence and spread of the antisemitic "Happy Merchant" meme, and in particular how these fringe communities influence its propagation to more mainstream communities like Twitter and Reddit. Taken together, our results provide a data-driven, quantitative framework for understanding online antisemitism. Our methods serve as a framework to augment current qualitative efforts by anti-hate groups, providing new insights into the growth and spread of hate online.

preprint2019arXiv

Does Facebook Use Sensitive Data for Advertising Purposes? Worldwide Analysis and GDPR Impact

The recent European General Data Protection Regulation (GDPR) and other data protection regulations restrict the processing of some categories of personal data (health, political orientation, sexual preferences, religious beliefs, ethnic origin, etc.) due to the privacy risks associated to such information. The GDPR refers to these categories as sensitive personal data. This paper quantifies the portion of Facebook (FB) users, across 197 countries, who are labeled with advertising interests linked to potentially sensitive personal data. Our study reveals that Facebook labels 67% of users with potential sensitive interests. This corresponds to 22% of the population in the referred 197 countries. Moreover, our work shows that the GDPR enforcement had a negligible impact in this context since the portion of FB users labeled with sensitive interests in the European Union remains almost the same 5 months before and 9 months after the GDPR was enacted. The paper also illustrates potential risks associated to the use of sensitive interests. For instance, we quantify the portion of FB users labelled with the interest "Homosexuality" in countries where being gay may be punished with the death penalty. The last contribution is the implementation of a web browser extension that allows FB users removing in a simple way the potentially sensitive interests FB has assigned them.

preprint2019arXiv

Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter

State-sponsored organizations are increasingly linked to efforts aimed to exploit social media for information warfare and manipulating public opinion. Typically, their activities rely on a number of social network accounts they control, aka trolls, that post and interact with other users disguised as "regular" users. These accounts often use images and memes, along with textual content, in order to increase the engagement and the credibility of their posts. In this paper, we present the first study of images shared by state-sponsored accounts by analyzing a ground truth dataset of 1.8M images posted to Twitter by accounts controlled by the Russian Internet Research Agency. First, we analyze the content of the images as well as their posting activity. Then, using Hawkes Processes, we quantify their influence on popular Web communities like Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, with respect to the dissemination of images. We find that the extensive image posting activity of Russian trolls coincides with real-world events (e.g., the Unite the Right rally in Charlottesville), and shed light on their targets as well as the content disseminated via images. Finally, we show that the trolls were more effective in disseminating politics-related imagery than other images.

preprint2020arXiv

Demographic Bias in Biometrics: A Survey on an Emerging Challenge

Systems incorporating biometric technologies have become ubiquitous in personal, commercial, and governmental identity management applications. Both cooperative (e.g. access control) and non-cooperative (e.g. surveillance and forensics) systems have benefited from biometrics. Such systems rely on the uniqueness of certain biological or behavioural characteristics of human beings, which enable for individuals to be reliably recognised using automated algorithms. Recently, however, there has been a wave of public and academic concerns regarding the existence of systemic bias in automated decision systems (including biometrics). Most prominently, face recognition algorithms have often been labelled as "racist" or "biased" by the media, non-governmental organisations, and researchers alike. The main contributions of this article are: (1) an overview of the topic of algorithmic bias in the context of biometrics, (2) a comprehensive survey of the existing literature on biometric bias estimation and mitigation, (3) a discussion of the pertinent technical and social matters, and (4) an outline of the remaining challenges and future work items, both from technological and social points of view.

preprint2020arXiv

COMPLEX-IT: A Case-Based Modeling and Scenario Simulation Platform for Social Inquiry

COMPLEX-IT is a case-based, mixed-methods platform for social inquiry into complex data/systems, designed to increase non-expert access to the tools of computational social science (i.e., cluster analysis, artificial intelligence, data visualization, data forecasting, and scenario simulation). In particular, COMPLEX-IT aids social inquiry though a heavy emphasis on learning about the complex data/system under study, which it does by (a) identifying and forecasting major and minor clusters/trends; (b) visualizing their complex causality; and (c) simulating scenarios for potential interventions. COMPLEX-IT is accessible through the web or can be run locally and is powered by R and the Shiny web framework.

preprint2020arXiv

Social and Child Care Provision in Kinship Networks: an Agent-Based Model

Providing for the needs of the vulnerable is a critical component of social and health policy-making. In particular, caring for children and for vulnerable older people is vital to the wellbeing of millions of families throughout the world. In most developed countries, this care is provided through both formal and informal means, and is therefore governed by complex policies that interact in non-obvious ways with other areas of policy-making. In this paper we present an agent-based model of social and child care provision in the UK, in which agents can provide informal care or pay for private care for their relatives. Agents make care decisions based on numerous factors including their health status, employment, financial situation, and social and physical distance to those in need. Simulation results show that the model can produce plausible patterns of care need and availability, and therefore can provide an important aid to this complex area of policy-making. We conclude that the model's use of kinship networks for distributing care and the explicit modelling of interactions between social care and child care will enable policy-makers to develop more informed policy interventions in these critical areas.

preprint2020arXiv

Demographic Bias: A Challenge for Fingervein Recognition Systems?

Recently, concerns regarding potential biases in the underlying algorithms of many automated systems (including biometrics) have been raised. In this context, a biased algorithm produces statistically different outcomes for different groups of individuals based on certain (often protected by anti-discrimination legislation) attributes such as sex and age. While several preliminary studies investigating this matter for facial recognition algorithms do exist, said topic has not yet been addressed for vascular biometric characteristics. Accordingly, in this paper, several popular types of recognition algorithms are benchmarked to ascertain the matter for fingervein recognition. The experimental evaluation suggests lack of bias for the tested algorithms, although future works with larger datasets are needed to validate and confirm those preliminary results.

preprint2020arXiv

Explainable AI as a Social Microscope: A Case Study on Academic Performance

Academic performance is perceived as a product of complex interactions between students' overall experience, personal characteristics and upbringing. Data science techniques, most commonly involving regression analysis and related approaches, serve as a viable means to explore this interplay. However, these tend to extract factors with wide-ranging impact, while overlooking variations specific to individual students. Focusing on each student's peculiarities is generally impossible with thousands or even hundreds of subjects, yet data mining methods might prove effective in devising more targeted approaches. For instance, subjects with shared characteristics can be assigned to clusters, which can then be examined separately with machine learning algorithms, thereby providing a more nuanced view of the factors affecting individuals in a particular group. In this context, we introduce a data science workflow allowing for fine-grained analysis of academic performance correlates that captures the subtle differences in students' sensitivities to these factors. Leveraging the Local Interpretable Model-Agnostic Explanations (LIME) algorithm from the toolbox of Explainable Artificial Intelligence (XAI) techniques, the proposed pipeline yields groups of students having similar academic attainment indicators, rather than similar features (e.g. familial background) as typically practiced in prior studies. As a proof-of-concept case study, a rich longitudinal dataset is selected to evaluate the effectiveness of the proposed approach versus a standard regression model.

preprint2020arXiv

A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records

Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as those having multi-morbidity conditions. In order to build a useful disease diagnosis prediction system, advancement in both data representation and development of machine learning architectures are imperative. First, with respect to data collection and representation, we face severe problems due to multitude of formats and lack of coherency prevalent in Electronic Health Records (EHRs). This causes hindrance in extraction of valuable information contained in EHRs. Currently, no universal global data standard has been established. As a useful solution, we develop and publish a Python package to transform public health dataset into an easy to access universal format. This data transformation to an international health data format facilitates researchers to easily combine EHR datasets with clinical datasets of diverse formats. Second, machine learning algorithms that predict multiple disease diagnosis categories simultaneously remain underdeveloped. We propose two novel model architectures in this regard. First, DeepObserver, which uses structured numerical data to predict the diagnosis categories and second, ClinicalBERT_Multi, that incorporates rich information available in clinical notes via natural language processing methods and also provides interpretable visualizations to medical practitioners. We show that both models can predict multiple diagnoses simultaneously with high accuracy.

preprint2020arXiv

Exploiting the Solar Energy Surplus for Edge Computing

In the context of the global energy ecosystem transformation, we introduce a new approach to reduce the carbon emissions of the cloud-computing sector and, at the same time, foster the deployment of small-scale private photovoltaic plants. We consider the opportunity cost of moving some cloud services to private, distributed, solar-powered computing facilities. To this end, we compare the potential revenue of leasing computing resources to a cloud pool with the revenue obtained by selling the surplus energy to the grid. We first estimate the consumption of virtualized cloud computing instances, establishing a metric of computational efficiency per nominal photovoltaic power installed. Based on this metric and characterizing the site's annual solar production, we estimate the total return and payback. The results show that the model is economically viable and technically feasible. We finally depict the still many questions open, such as security, and the fundamental barriers to address, mainly related with a cloud model ruled by a few big players.

preprint2020arXiv

The Threats of Artificial Intelligence Scale (TAI). Development, Measurement and Test Over Three Application Domains

In recent years Artificial Intelligence (AI) has gained much popularity, with the scientific community as well as with the public. AI is often ascribed many positive impacts for different social domains such as medicine and the economy. On the other side, there is also growing concern about its precarious impact on society and individuals. Several opinion polls frequently query the public fear of autonomous robots and artificial intelligence (FARAI), a phenomenon coming also into scholarly focus. As potential threat perceptions arguably vary with regard to the reach and consequences of AI functionalities and the domain of application, research still lacks necessary precision of a respective measurement that allows for wide-spread research applicability. We propose a fine-grained scale to measure threat perceptions of AI that accounts for four functional classes of AI systems and is applicable to various domains of AI applications. Using a standardized questionnaire in a survey study (N=891), we evaluate the scale over three distinct AI domains (loan origination, job recruitment and medical treatment). The data support the dimensional structure of the proposed Threats of AI (TAI) scale as well as the internal consistency and factoral validity of the indicators. Implications of the results and the empirical application of the scale are discussed in detail. Recommendations for further empirical use of the TAI scale are provided.

preprint2020arXiv

Comparative Analysis of Economic Instruments in Intersection Operation: A User-Based Perspective

Focusing on different economic instruments implemented in intersection operations under a connected environment, this paper analyzes their advantages and disadvantages from the travelers' perspective. Travelers' concerns revolve around whether a new instrument is easy to learn and operate, whether it can save time or money, and whether it can reduce the rich-poor gap. After a comparative analysis, we found that both credit and free-market schemes can benefit users. Second-price auctions can only benefit high VOT vehicles. From the perspective of technology deployment and adoption, a credit scheme is not easy to learn and operate for travelers.

preprint2020arXiv

The Robot Economy: Here It Comes

Automation is not a new phenomenon, and questions about its effects have long followed its advances. More than a half-century ago, US President Lyndon B. Johnson established a national commission to examine the impact of technology on the economy, declaring that automation "can be the ally of our prosperity if we will just look ahead". In this paper, our premise is that we are at a technological inflection point in which robots are developing the capacity to greatly increase their cognitive and physical capabilities, and thus raising questions on labor dynamics. With increasing levels of autonomy and human-robot interaction, intelligent robots could soon accomplish new human-like capabilities such as engaging into social activities. Therefore, an increase in automation and autonomy brings the question of robots directly participating in some economic activities as autonomous agents. In this paper, a technological framework describing a robot economy is outlined and the challenges it might represent in the current socio-economic scenario are pondered.

preprint2020arXiv

Traffic Prediction Framework for OpenStreetMap using Deep Learning based Complex Event Processing and Open Traffic Cameras

Displaying near-real-time traffic information is a useful feature of digital navigation maps. However, most commercial providers rely on privacy-compromising measures such as deriving location information from cellphones to estimate traffic. The lack of an open-source traffic estimation method using open data platforms is a bottleneck for building sophisticated navigation services on top of OpenStreetMap (OSM). We propose a deep learning-based Complex Event Processing (CEP) method that relies on publicly available video camera streams for traffic estimation. The proposed framework performs near-real-time object detection and objects property extraction across camera clusters in parallel to derive multiple measures related to traffic with the results visualized on OpenStreetMap. The estimation of object properties (e.g. vehicle speed, count, direction) provides multidimensional data that can be leveraged to create metrics and visualization for congestion beyond commonly used density-based measures. Our approach couples both flow and count measures during interpolation by considering each vehicle as a sample point and their speed as weight. We demonstrate multidimensional traffic metrics (e.g. flow rate, congestion estimation) over OSM by processing 22 traffic cameras from London streets. The system achieves a near-real-time performance of 1.42 seconds median latency and an average F-score of 0.80.

preprint2020arXiv

Conservative AI and social inequality: Conceptualizing alternatives to bias through social theory

In response to calls for greater interdisciplinary involvement from the social sciences and humanities in the development, governance, and study of artificial intelligence systems, this paper presents one sociologist's view on the problem of algorithmic bias and the reproduction of societal bias. Discussions of bias in AI cover much of the same conceptual terrain that sociologists studying inequality have long understood using more specific terms and theories. Concerns over reproducing societal bias should be informed by an understanding of the ways that inequality is continually reproduced in society -- processes that AI systems are either complicit in, or can be designed to disrupt and counter. The contrast presented here is between conservative and radical approaches to AI, with conservatism referring to dominant tendencies that reproduce and strengthen the status quo, while radical approaches work to disrupt systemic forms of inequality. The limitations of conservative approaches to class, gender, and racial bias are discussed as specific examples, along with the social structures and processes that biases in these areas are linked to. Societal issues can no longer be out of scope for AI and machine learning, given the impact of these systems on human lives. This requires engagement with a growing body of critical AI scholarship that goes beyond biased data to analyze structured ways of perpetuating inequality, opening up the possibility for radical alternatives.

preprint2020arXiv

[not Rp] Reproducibility of 'Poincare dodecahedral space parameter estimates'

Is a scientific research paper based on (i) public, online observational data files and (ii) providing free-licensed software for reproducing its results easy to reproduce by the same author a decade later? This paper attempts to reproduce a cosmic topology observational paper published in 2008 and satisfying both criteria (i) and (ii). The reproduction steps are defined formally in a free-licensed git repository package "0807.4260" and qualitatively in the current paper. It was found that the effort in upgrading the Fortran 77 code at the heart of the software, interfaced with a C front end, and originally compiled with g77, in the content of the contemporary gfortran compiler, risked being too great to be justified on any short time scale. In this sense, the results of RBG08 are not as reproducible as they appeared to be, despite both (i) data availability and (ii) free-licensing and public availability of the software. The software and a script to reproduce the steps of this incomplete reproduction are combined in a new git repository named 0807.4260, following the ArXiv identity code (arXiv:0807.4260) of RBG08.

People in this topic

12 visible researcher(s)