Source author record

Carlos Sarraute

Carlos Sarraute appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Cryptography and Security cs.CY physics.soc-ph Artificial Intelligence Machine Learning Neural and Evolutionary Computing Applications Databases

Catalog footprint

What is connected

35works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A Comparative Study of Social Network Classifiers for Predicting Churn in the Telecommunication Industry

Relational learning in networked data has been shown to be effective in a number of studies. Relational learners, composed of relational classifiers and collective inference methods, enable the inference of nodes in a network given the existence and strength of links to other nodes. These methods have been adapted to predict customer churn in telecommunication companies showing that incorporating them may give more accurate predictions. In this research, the performance of a variety of relational learners is compared by applying them to a number of CDR datasets originating from the telecommunication industry, with the goal to rank them as a whole and investigate the effects of relational classifiers and collective inference methods separately. Our results show that collective inference methods do not improve the performance of relational classifiers and the best performing relational classifier is the network-only link-based classifier, which builds a logistic model using link-based measures for the nodes in the network.

preprint2020arXiv

BatPay: a gas efficient protocol for the recurrent micropayment of ERC20 tokens

BatPay is a proxy scaling solution for the transfer of ERC20 tokens. It is suitable for micropayments in one-to-many and few-to-many scenarios, including digital markets and the distribution of rewards and dividends. In BatPay, many similar operations are bundled together into a single transaction in order to optimize gas consumption on the Ethereum blockchain. In addition, some costly verifications are replaced by a challenge game, pushing most of the computing cost off-chain. This results in a gas reduction of the transfer costs of three orders of magnitude, achieving around 1700 transactions per second on the Ethereum blockchain. Furthermore, it includes many relevant features, like meta-transactions for end-user operation without ether, and key-locked payments for atomic exchange of digital goods.

preprint2020arXiv

Computing Accessibility Metrics for Argentina

We present a tool to calculate distances and travel times between a set of origins and a set of destinations, using different modes of transport in Argentina. The input data for the tool is a set of destinations (a geo-referenced list of points of city amenities or "opportunities", such as firms, schools, hospitals, parks, banks or retail, etc.) and a set of origins characterized by their geographic coordinates that could be interpreted as households or other. The tool determines, from each origin, which is the closest destination, depending on the distance or travel time and the mode of transport (on foot, by bicycle, by car, and by public transport). The sets of origins and destinations are large sets, which can contain up to several thousand points. We applied and developed algorithms to improve the scalability of the different parts of the procedure. For the public transportation network, we pre-processed the reachable lines from each point and used quad-trees to determine the distance between said points and the bus line's path. A second objective of this project was to rely only on open data, such as Open Street Map (OSM) data, together with making this tool open source. Therefore, the successful development and implementation of this tool is potentially beneficial to both public sector agencies as well as NGOs and other civil society organizations that focus their work on the design and implementation of public policies, aimed at improving accessibility in cities as a way to reduce spatial inequalities and social exclusion.

preprint2020arXiv

Credit Scoring for Good: Enhancing Financial Inclusion with Smartphone-Based Microlending

Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance financial inclusion. We propose a methodology to improve the predictive performance of credit scoring models used by these applications. Our approach is composed of several steps, where we mostly focus on engineering appropriate features from the user data. Thereby, we construct pseudo-social networks to identify similar people and combine complex network analysis with representation learning. Subsequently we build credit scoring models using advanced machine learning techniques with the goal of obtaining the most accurate credit scores, while also taking into consideration ethical and privacy regulations to avoid unfair discrimination. A successful deployment of our proposed methodology could improve the performance of microlending smartphone applications and help enhance financial wellbeing worldwide.

preprint2020arXiv

Detecting Areas of Potential High Prevalence of Chagas in Argentina

A map of potential prevalence of Chagas disease (ChD) with high spatial disaggregation is presented. It aims to detect areas outside the Gran Chaco ecoregion (hyperendemic for the ChD), characterized by high affinity with ChD and high health vulnerability. To quantify potential prevalence, we developed several indicators: an Affinity Index which quantifies the degree of linkage between endemic areas of ChD and the rest of the country. We also studied favorable habitability conditions for Triatoma infestans, looking for areas where the predominant materials of floors, roofs and internal ceilings favor the presence of the disease vector. We studied determinants of a more general nature that can be encompassed under the concept of Health Vulnerability Index. These determinants are associated with access to health providers and the socio-economic level of different segments of the population. Finally we constructed a Chagas Potential Prevalence Index (ChPPI) which combines the affinity index, the health vulnerability index, and the population density. We show and discuss the maps obtained. These maps are intended to assist public health specialists, decision makers of public health policies and public officials in the development of cost-effective strategies to improve access to diagnosis and treatment of ChD.

preprint2020arXiv

Fair and Decentralized Exchange of Digital Goods

We construct a privacy-preserving, distributed and decentralized marketplace where parties can exchange data for tokens. In this market, buyers and sellers make transactions in a blockchain and interact with a third party, called notary, who has the ability to vouch for the authenticity and integrity of the data. We introduce a protocol for the data-token exchange where neither party gains more information than what it is paying for, and the exchange is fair: either both parties gets the other's item or neither does. No third party involvement is required after setup, and no dispute resolution is needed.

preprint2020arXiv

Segregated interactions in urban and online space

Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter mentions among thousands of individuals in three culturally different metropolitan areas. We show that segregated interaction is amplified relative to the expected effects of geographic segregation in terms of both purchase activity and online communication. Furthermore, we find that segregation increases with difference in socio-economic status but is asymmetric for purchase activity, i.e., the amount of interaction from poorer to wealthier neighborhoods is larger than vice versa. Our results provide novel insights into the understanding of behavioral segregation in human interactions with significant socio-political and economic implications.

preprint2020arXiv

Snel: SQL Native Execution for LLVM

Snel is a relational database engine featuring Just-In-Time (JIT) compilation of queries and columnar data representation. Snel is designed for fast on-line analytics by leveraging the LLVM compiler infrastructure. It also has custom special methods like resolving histograms as extensions to the SQL language. "Snel" means "SQL Native Execution for LLVM". Unlike traditional database engines, it does not provide a client-server interface. Instead, it exposes its interface as an extension to SQLite, for a simple interactive usage from command line and for embedding in applications. Since Snel tables are read-only, it does not provide features like transactions or updates. This allows queries to be very fast since they don't have the overhead of table locking or ensuring consistency. At its core, Snel is simply a dynamic library that can be used by client applications. It has an SQLite extension for seamless integration with a traditional SQL environment and simple interactive usage from command line.

preprint2020arXiv

Social Network Analytics for Churn Prediction in Telco: Model Building, Evaluation and Network Architecture

Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational learner by applying them to a total of eight distinct call-detail record datasets, originating from telecommunication organizations across the world. We statistically evaluate the effect of relational classifiers and collective inference methods on the predictive power of relational learners, as well as the performance of models where relational learners are combined with traditional methods of predicting customer churn in the telecommunication industry. Finally we investigate the effect of network construction on model performance; our findings imply that the definition of edges and weights in the network does have an impact on the results of the predictive models. As a result of the study, the best configuration is a non-relational learner enriched with network variables, without collective inference, using binary weights and undirected networks. In addition, we provide guidelines on how to apply social networks analytics for churn prediction in the telecommunication industry in an optimal way, ranging from network architecture to model building and evaluation.

preprint2020arXiv

The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics

Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.

preprint2020arXiv

Wibson Protocol for Secure Data Exchange and Batch Payments

Wibson is a blockchain-based, decentralized data marketplace that provides individuals a way to securely and anonymously sell information in a trusted environment. The combination of the Wibson token and blockchain-enabled smart contracts hopes to allow Data Sellers and Data Buyers to transact with each other directly while providing individuals the ability to maintain anonymity as desired. The Wibson marketplace will provide infrastructure and financial incentives for individuals to securely sell personal information without sacrificing personal privacy. Data Buyers receive information from willing and actively participating individuals with the benefit of knowing that the personal information should be accurate and current. We present here two different components working together to achieve an efficient decentralized marketplace. The first is a smart contract called Data Exchange, which stores references to Data Orders that different Buyers open in order to show to the market that they are interested in buying certain types of data, and provides secure mechanisms to perform the transactions. The second is used to process payments from Buyers to Sellers and intermediaries, and is called Batch Payments.

preprint2020arXiv

WibsonTree: Efficiently Preserving Seller's Privacy in a Decentralized Data Marketplace

We present a cryptographic primitive called WibsonTree designed to preserve users' privacy by allowing them to demonstrate predicates on their personal attributes, without revealing the values of those attributes. We suppose that there are three types of agents --buyers, sellers and notaries-- who interact in a decentralized privacy-preserving data marketplace (dPDM) such as the Wibson marketplace. We introduce the WibsonTree protocol as an efficient cryptographic primitive that enables the exchange of private information while preserving the seller's privacy. Using our primitive, a data seller can efficiently prove that he/she belongs to the target audience of a buyer's data request, without revealing any additional information.

preprint2018arXiv

A Bayesian Approach to Income Inference in a Communication Network

The explosion of mobile phone communications in the last years occurs at a moment where data processing power increases exponentially. Thanks to those two changes in a global scale, the road has been opened to use mobile phone communications to generate inferences and characterizations of mobile phone users. In this work, we use the communication network, enriched by a set of users' attributes, to gain a better understanding of the demographic features of a population. Namely, we use call detail records and banking information to infer the income of each person in the graph.

preprint2018arXiv

Brief survey of Mobility Analyses based on Mobile Phone Datasets

This is a brief survey of the research performed by Grandata Labs in collaboration with numerous academic groups around the world on the topic of human mobility. A driving theme in these projects is to use and improve Data Science techniques to understand mobility, as it can be observed through the lens of mobile phone datasets. We describe applications of mobility analyses for urban planning, prediction of data traffic usage, building delay tolerant networks, generating epidemiologic risk maps and measuring the predictability of human mobility.

preprint2018arXiv

Comparison of Feature Extraction Methods and Predictors for Income Inference

Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users' socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users' income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using node-based features.

preprint2016arXiv

Socioeconomic correlations and stratification in social-communication networks

The uneven distribution of wealth and individual economic capacities are among the main forces which shape modern societies and arguably bias the emerging social structures. However, the study of correlations between the social network and economic status of individuals is difficult due to the lack of large-scale multimodal data disclosing both the social ties and economic indicators of the same population. Here, we close this gap through the analysis of coupled datasets recording the mobile phone communications and bank transaction history of one million anonymised individuals living in a Latin American country. We show that wealth and debt are unevenly distributed among people in agreement with the Pareto principle; the observed social structure is strongly stratified, with people being better connected to others of their own socioeconomic class rather than to others of different classes; the social network appears with assortative socioeconomic correlations and tightly connected "rich clubs"; and that egos from the same class live closer to each other but commute further if they are wealthier. These results are based on a representative, society-large population, and empirically demonstrate some long-lasting hypotheses on socioeconomic correlations which potentially lay behind social segregation, and induce differences in human mobility.

preprint2015arXiv

A Study of Age and Gender seen through Mobile Phone Usage Patterns in Mexico

Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper we focus on the population of Mexican mobile phone users. Our first contribution is an observational study of mobile phone usage according to gender and age groups. We were able to detect significant differences in phone usage among different subgroups of the population. Our second contribution is to provide a novel methodology to predict demographic features (namely age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We provide details of the methodology and show experimental results on a real world dataset that involves millions of users.

preprint2015arXiv

Harnessing Mobile Phone Social Network Topology to Infer Users Demographic Attributes

We study the structure of the social graph of mobile phone users in the country of Mexico, with a focus on demographic attributes of the users (more specifically the users' age). We examine assortativity patterns in the graph, and observe a strong age homophily in the communications preferences. We propose a graph based algorithm for the prediction of the age of mobile phone users. The algorithm exploits the topology of the mobile phone network, together with a subset of known users ages (seeds), to infer the age of remaining users. We provide the details of the methodology, and show experimental results on a network GT with more than 70 million users. By carefully examining the topological relations of the seeds to the rest of the nodes in GT, we find topological metrics which have a direct influence on the performance of the algorithm. In particular we characterize subsets of users for which the accuracy of the algorithm is 62% when predicting between 4 age categories (whereas a pure random guess would yield an accuracy of 25%). We also show that we can use the probabilistic information computed by the algorithm to further increase its inference power to 72% on a significant subset of users.

preprint2013arXiv

An Oblivious Password Cracking Server

Building a password cracking server that preserves the privacy of the queries made to the server is a problem that has not yet been solved. Such a server could acquire practical relevance in the future: for instance, the tables used to crack the passwords could be calculated, stored and hosted in cloud-computing services, and could be queried from devices with limited computing power. In this paper we present a method to preserve the confidentiality of a password cracker---wherein the tables used to crack the passwords are stored by a third party---by combining Hellman tables and Private Information Retrieval (PIR) protocols. We provide the technical details of this method, analyze its complexity, and show the experimental results obtained with our implementation.

preprint2013arXiv

Aplicacion de las Redes Neuronales al Reconocimiento de Sistemas Operativos

In this work we present a family of neural networks, the multi-layer perceptron networks, and some of the algorithms used to train those networks (we hope that with enough details and precision as to satisfy a mathematical public). Then we study how to use those networks to solve a problem that arises from the field of information security: the remote identification of Operating Systems (part of the information gathering steps of the penetration testing methodology). This is the contribution of this work: it is an application of classic Artificial Intelligence techniques to a classification problem that gave better results than the classic techniques used to solve it.

preprint2013arXiv

Attack Planning in the Real World

Assessing network security is a complex and difficult task. Attack graphs have been proposed as a tool to help network administrators understand the potential weaknesses of their network. However, a problem has not yet been addressed by previous work on this subject; namely, how to actually execute and validate the attack paths resulting from the analysis of the attack graph. In this paper we present a complete PDDL representation of an attack model, and an implementation that integrates a planner into a penetration testing tool. This allows to automatically generate attack paths for penetration testing scenarios, and to validate these attacks by executing the corresponding actions -including exploits- against the real target network. We present an algorithm for transforming the information present in the penetration testing tool to the planning domain, and show how the scalability issues of attack graphs can be solved using current planners. We include an analysis of the performance of our solution, showing how our model scales to medium-sized networks and the number of actions available in current penetration testing tools.

preprint2013arXiv

Automated Attack Planning

Penetration Testing is a methodology for assessing network security, by generating and executing possible attacks. Doing so automatically allows for regular and systematic testing. A key question then is how to automatically generate the attacks. A natural way to address this issue is as an attack planning problem. In this thesis, we are concerned with the specific context of regular automated pentesting, and use the term "attack planning" in that sense. The following three research directions are investigated. First, we introduce a conceptual model of computer network attacks, based on an analysis of the penetration testing practices. We study how this attack model can be represented in the PDDL language. Then we describe an implementation that integrates a classical planner with a penetration testing tool. This allows us to automatically generate attack paths for real world pentesting scenarios, and to validate these attacks by executing them. Secondly, we present efficient probabilistic planning algorithms, specifically designed for this problem, that achieve industrial-scale runtime performance (able to solve scenarios with several hundred hosts and exploits). These algorithms take into account the probability of success of the actions and their expected cost (for example in terms of execution time, or network traffic generated). Finally, we take a different direction: instead of trying to improve the efficiency of the solutions developed, we focus on improving the model of the attacker. We model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This grounds penetration testing in a well-researched formalism. POMDPs allow the modelling of information gathering as an integral part of the problem, thus providing for the first time a means to intelligently mix scanning actions with actual exploits.

preprint2013arXiv

Evolution of Communities with Focus on Stability

Community detection is an important tool for analyzing the social graph of mobile phone users. The problem of finding communities in static graphs has been widely studied. However, since mobile social networks evolve over time, static graph algorithms are not sufficient. To be useful in practice (e.g. when used by a telecom analyst), the stability of the partitions becomes critical. We tackle this particular use case in this paper: tracking evolution of communities in dynamic scenarios with focus on stability. We propose two modifications to a widely used static community detection algorithm: we introduce fixed nodes and preferential attachment to pre-existing communities. We then describe experiments to study the stability and quality of the resulting partitions on real-world social networks, represented by monthly call graphs for millions of subscribers.

preprint2013arXiv

Evolution of Communities with Focus on Stability (extended abstract)

The detection of communities is an important tool used to analyze the social graph of mobile phone users. Within each community, customers are susceptible of attracting new ones, retaining old ones and/or accepting new products or services through the leverage of mutual influences. The communities of users are smaller units, easier to grasp, and allow for example the computation of role analysis -- based on the centrality of an actor within his community. The problem of finding communities in static graphs has been widely studied. However, from the point of view of a telecom analyst, to be really useful, the detected communities must evolve as the social graph of communications changes over time -- for example, in order to perform marketing actions on communities and track the results of those actions over time. Additionally the behaviors of communities of users over time can be used to predict future activity that interests the telecom operators, such as subscriber churn or handset adoption. Similary group evolution can provide insights for designing strategies, such as the early warning of group churn. Stability is a crucial issue: the analysis performed on a given community will be lost, if the analyst cannot keep track of this community in the following time steps. This is the particular use case that we tackle in this paper: tracking the evolution of communities in dynamic scenarios with focus on stability. We propose two modifications to a widely used static community detection algorithm. We then describe experiments to study the stability and quality of the resulting partitions on real-world social networks, represented by monthly call graphs for millions of subscribers.

preprint2013arXiv

Human Mobility and Predictability enriched by Social Phenomena Information

The massive amounts of geolocation data collected from mobile phone records has sparked an ongoing effort to understand and predict the mobility patterns of human beings. In this work, we study the extent to which social phenomena are reflected in mobile phone data, focusing in particular in the cases of urban commute and major sports events. We illustrate how these events are reflected in the data, and show how information about the events can be used to improve predictability in a simple model for a mobile phone user's location.

preprint2013arXiv

Human Mobility and Predictability enriched by Social Phenomena Information (extended abstract)

The information collected by mobile phone operators can be considered as the most detailed information on human mobility across a large part of the population. The study of the dynamics of human mobility using the collected geolocations of users, and applying it to predict future users' locations, has been an active field of research in recent years. In this work, we study the extent to which social phenomena are reflected in mobile phone data, focusing in particular in the cases of urban commute and major sports events. We illustrate how these events are reflected in the data, and show how information about the events can be used to improve predictability in a simple model for a mobile phone user's location.

preprint2013arXiv

Les POMDP font de meilleurs hackers: Tenir compte de l'incertitude dans les tests de penetration

Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality.

preprint2013arXiv

Penetration Testing == POMDP Solving?

Penetration Testing is a methodology for assessing network security, by generating and executing possible attacks. Doing so automatically allows for regular and systematic testing without a prohibitive amount of human labor. A key question then is how to generate the attacks. This is naturally formulated as a planning problem. Previous work (Lucangeli et al. 2010) used classical planning and hence ignores all the incomplete knowledge that characterizes hacking. More recent work (Sarraute et al. 2011) makes strong independence assumptions for the sake of scaling, and lacks a clear formal concept of what the attack planning problem actually is. Herein, we model that problem in terms of partially observable Markov decision processes (POMDP). This grounds penetration testing in a well-researched formalism, highlighting important aspects of this problem's nature. POMDPs allow to model information gathering as an integral part of the problem, thus providing for the first time a means to intelligently mix scanning actions with actual exploits.

preprint2010arXiv

Advanced Software Protection Now

Software digital rights management is a pressing need for the software development industry which remains, as no practical solutions have been acclamaimed succesful by the industry. We introduce a novel software-protection method, fully implemented with today's technologies, that provides traitor tracing and license enforcement and requires no additional hardware nor inter-connectivity. Our work benefits from the use of secure triggers, a cryptographic primitive that is secure assuming the existence of an ind-cpa secure block cipher. Using our framework, developers may insert license checks and fingerprints, and obfuscate the code using secure triggers. As a result, this rises the cost that software analysis tools have detect and modify protection mechanisms. Thus rising the complexity of cracking this system.

preprint2010arXiv

An attack on MySQL's login protocol

The MySQL challenge-and-response authentication protocol is proved insecure. We show how can an eavesdropper impersonate a valid user after witnessing only a few executions of this protocol. The algorithm of the underlying attack is presented. Finally we comment about implementations and statistical results.

preprint2010arXiv

Building Computer Network Attacks

In this work we start walking the path to a new perspective for viewing cyberwarfare scenarios, by introducing conceptual tools (a formal model) to evaluate the costs of an attack, to describe the theater of operations, targets, missions, actions, plans and assets involved in cyberwarfare attacks. We also describe two applications of this model: autonomous planning leading to automated penetration tests, and attack simulations, allowing a system administrator to evaluate the vulnerabilities of his network.

preprint2010arXiv

Outrepasser les limites des techniques classiques de Prise d'Empreintes grace aux Reseaux de Neurones

We present an application of Artificial Intelligence techniques to the field of Information Security. The problem of remote Operating System (OS) Detection, also called OS Fingerprinting, is a crucial step of the penetration testing process, since the attacker (hacker or security professional) needs to know the OS of the target host in order to choose the exploits that he will use. OS Detection is accomplished by passively sniffing network packets and actively sending test packets to the target host, to study specific variations in the host responses revealing information about its operating system. The first fingerprinting implementations were based on the analysis of differences between TCP/IP stack implementations. The next generation focused the analysis on application layer data such as the DCE RPC endpoint information. Even though more information was analyzed, some variation of the "best fit" algorithm was still used to interpret this new information. Our new approach involves an analysis of the composition of the information collected during the OS identification process to identify key elements and their relations. To implement this approach, we have developed tools using Neural Networks and techniques from the field of Statistics. These tools have been successfully integrated in a commercial software (Core Impact).

preprint2010arXiv

Simulating Cyber-Attacks for Fun and Profit

We introduce a new simulation platform called Insight, created to design and simulate cyber-attacks against large arbitrary target scenarios. Insight has surprisingly low hardware and configuration requirements, while making the simulation a realistic experience from the attacker's standpoint. The scenarios include a crowd of simulated actors: network devices, hardware devices, software applications, protocols, users, etc. A novel characteristic of this tool is to simulate vulnerabilities (including 0-days) and exploits, allowing an attacker to compromise machines and use them as pivoting stones to continue the attack. A user can test and modify complex scenarios, with several interconnected networks, where the attacker has no initial connectivity with the objective of the attack. We give a concise description of this new technology, and its possible uses in the security research field, such as pentesting training, study of the impact of 0-days vulnerabilities, evaluation of security countermeasures, and risk assessment tool.

preprint2010arXiv

Simulation of Computer Network Attacks

In this work we present a prototype for simulating computer network attacks. Our objective is to simulate large networks (thousands of hosts, with applications and vulnerabilities) while remaining realistic from the attacker's point of view. The foundation for the simulator is a model of computer intrusions, based on the analysis of real world attacks. In particular we show how to interpret vulnerabilities and exploits as communication channels. This conceptual model gives a tool to describe the theater of operations, targets, actions and assets involved in multistep network attacks. We conclude with applications of the attack simulator.

preprint2010arXiv

Using Neural Networks to improve classical Operating System Fingerprinting techniques

We present remote Operating System detection as an inference problem: given a set of observations (the target host responses to a set of tests), we want to infer the OS type which most probably generated these observations. Classical techniques used to perform this analysis present several limitations. To improve the analysis, we have developed tools using neural networks and Statistics tools. We present two working modules: one which uses DCE-RPC endpoints to distinguish Windows versions, and another which uses Nmap signatures to distinguish different version of Windows, Linux, Solaris, OpenBSD, FreeBSD and NetBSD systems. We explain the details of the topology and inner workings of the neural networks used, and the fine tuning of their parameters. Finally we show positive experimental results.

Carlos Sarraute

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

A Comparative Study of Social Network Classifiers for Predicting Churn in the Telecommunication Industry

BatPay: a gas efficient protocol for the recurrent micropayment of ERC20 tokens

Computing Accessibility Metrics for Argentina

Credit Scoring for Good: Enhancing Financial Inclusion with Smartphone-Based Microlending

Detecting Areas of Potential High Prevalence of Chagas in Argentina

Fair and Decentralized Exchange of Digital Goods

Segregated interactions in urban and online space

Snel: SQL Native Execution for LLVM

Social Network Analytics for Churn Prediction in Telco: Model Building, Evaluation and Network Architecture

The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics

Wibson Protocol for Secure Data Exchange and Batch Payments

WibsonTree: Efficiently Preserving Seller's Privacy in a Decentralized Data Marketplace

A Bayesian Approach to Income Inference in a Communication Network

Brief survey of Mobility Analyses based on Mobile Phone Datasets

Comparison of Feature Extraction Methods and Predictors for Income Inference

Socioeconomic correlations and stratification in social-communication networks

A Study of Age and Gender seen through Mobile Phone Usage Patterns in Mexico

Harnessing Mobile Phone Social Network Topology to Infer Users Demographic Attributes

An Oblivious Password Cracking Server

Aplicacion de las Redes Neuronales al Reconocimiento de Sistemas Operativos

Attack Planning in the Real World

Automated Attack Planning

Evolution of Communities with Focus on Stability

Evolution of Communities with Focus on Stability (extended abstract)

Human Mobility and Predictability enriched by Social Phenomena Information

Human Mobility and Predictability enriched by Social Phenomena Information (extended abstract)

Les POMDP font de meilleurs hackers: Tenir compte de l'incertitude dans les tests de penetration

Penetration Testing == POMDP Solving?

Advanced Software Protection Now

An attack on MySQL's login protocol

Building Computer Network Attacks

Outrepasser les limites des techniques classiques de Prise d'Empreintes grace aux Reseaux de Neurones

Simulating Cyber-Attacks for Fun and Profit

Simulation of Computer Network Attacks

Using Neural Networks to improve classical Operating System Fingerprinting techniques