Researcher profile

Carlos Sarraute

Carlos Sarraute contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2020arXiv

A Comparative Study of Social Network Classifiers for Predicting Churn in the Telecommunication Industry

Relational learning in networked data has been shown to be effective in a number of studies. Relational learners, composed of relational classifiers and collective inference methods, enable the inference of nodes in a network given the existence and strength of links to other nodes. These methods have been adapted to predict customer churn in telecommunication companies showing that incorporating them may give more accurate predictions. In this research, the performance of a variety of relational learners is compared by applying them to a number of CDR datasets originating from the telecommunication industry, with the goal to rank them as a whole and investigate the effects of relational classifiers and collective inference methods separately. Our results show that collective inference methods do not improve the performance of relational classifiers and the best performing relational classifier is the network-only link-based classifier, which builds a logistic model using link-based measures for the nodes in the network.

preprint2020arXiv

BatPay: a gas efficient protocol for the recurrent micropayment of ERC20 tokens

BatPay is a proxy scaling solution for the transfer of ERC20 tokens. It is suitable for micropayments in one-to-many and few-to-many scenarios, including digital markets and the distribution of rewards and dividends. In BatPay, many similar operations are bundled together into a single transaction in order to optimize gas consumption on the Ethereum blockchain. In addition, some costly verifications are replaced by a challenge game, pushing most of the computing cost off-chain. This results in a gas reduction of the transfer costs of three orders of magnitude, achieving around 1700 transactions per second on the Ethereum blockchain. Furthermore, it includes many relevant features, like meta-transactions for end-user operation without ether, and key-locked payments for atomic exchange of digital goods.

preprint2020arXiv

Computing Accessibility Metrics for Argentina

We present a tool to calculate distances and travel times between a set of origins and a set of destinations, using different modes of transport in Argentina. The input data for the tool is a set of destinations (a geo-referenced list of points of city amenities or "opportunities", such as firms, schools, hospitals, parks, banks or retail, etc.) and a set of origins characterized by their geographic coordinates that could be interpreted as households or other. The tool determines, from each origin, which is the closest destination, depending on the distance or travel time and the mode of transport (on foot, by bicycle, by car, and by public transport). The sets of origins and destinations are large sets, which can contain up to several thousand points. We applied and developed algorithms to improve the scalability of the different parts of the procedure. For the public transportation network, we pre-processed the reachable lines from each point and used quad-trees to determine the distance between said points and the bus line's path. A second objective of this project was to rely only on open data, such as Open Street Map (OSM) data, together with making this tool open source. Therefore, the successful development and implementation of this tool is potentially beneficial to both public sector agencies as well as NGOs and other civil society organizations that focus their work on the design and implementation of public policies, aimed at improving accessibility in cities as a way to reduce spatial inequalities and social exclusion.

preprint2020arXiv

Credit Scoring for Good: Enhancing Financial Inclusion with Smartphone-Based Microlending

Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance financial inclusion. We propose a methodology to improve the predictive performance of credit scoring models used by these applications. Our approach is composed of several steps, where we mostly focus on engineering appropriate features from the user data. Thereby, we construct pseudo-social networks to identify similar people and combine complex network analysis with representation learning. Subsequently we build credit scoring models using advanced machine learning techniques with the goal of obtaining the most accurate credit scores, while also taking into consideration ethical and privacy regulations to avoid unfair discrimination. A successful deployment of our proposed methodology could improve the performance of microlending smartphone applications and help enhance financial wellbeing worldwide.

preprint2020arXiv

Detecting Areas of Potential High Prevalence of Chagas in Argentina

A map of potential prevalence of Chagas disease (ChD) with high spatial disaggregation is presented. It aims to detect areas outside the Gran Chaco ecoregion (hyperendemic for the ChD), characterized by high affinity with ChD and high health vulnerability. To quantify potential prevalence, we developed several indicators: an Affinity Index which quantifies the degree of linkage between endemic areas of ChD and the rest of the country. We also studied favorable habitability conditions for Triatoma infestans, looking for areas where the predominant materials of floors, roofs and internal ceilings favor the presence of the disease vector. We studied determinants of a more general nature that can be encompassed under the concept of Health Vulnerability Index. These determinants are associated with access to health providers and the socio-economic level of different segments of the population. Finally we constructed a Chagas Potential Prevalence Index (ChPPI) which combines the affinity index, the health vulnerability index, and the population density. We show and discuss the maps obtained. These maps are intended to assist public health specialists, decision makers of public health policies and public officials in the development of cost-effective strategies to improve access to diagnosis and treatment of ChD.

preprint2020arXiv

Fair and Decentralized Exchange of Digital Goods

We construct a privacy-preserving, distributed and decentralized marketplace where parties can exchange data for tokens. In this market, buyers and sellers make transactions in a blockchain and interact with a third party, called notary, who has the ability to vouch for the authenticity and integrity of the data. We introduce a protocol for the data-token exchange where neither party gains more information than what it is paying for, and the exchange is fair: either both parties gets the other's item or neither does. No third party involvement is required after setup, and no dispute resolution is needed.

preprint2020arXiv

Segregated interactions in urban and online space

Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter mentions among thousands of individuals in three culturally different metropolitan areas. We show that segregated interaction is amplified relative to the expected effects of geographic segregation in terms of both purchase activity and online communication. Furthermore, we find that segregation increases with difference in socio-economic status but is asymmetric for purchase activity, i.e., the amount of interaction from poorer to wealthier neighborhoods is larger than vice versa. Our results provide novel insights into the understanding of behavioral segregation in human interactions with significant socio-political and economic implications.

preprint2020arXiv

Snel: SQL Native Execution for LLVM

Snel is a relational database engine featuring Just-In-Time (JIT) compilation of queries and columnar data representation. Snel is designed for fast on-line analytics by leveraging the LLVM compiler infrastructure. It also has custom special methods like resolving histograms as extensions to the SQL language. "Snel" means "SQL Native Execution for LLVM". Unlike traditional database engines, it does not provide a client-server interface. Instead, it exposes its interface as an extension to SQLite, for a simple interactive usage from command line and for embedding in applications. Since Snel tables are read-only, it does not provide features like transactions or updates. This allows queries to be very fast since they don't have the overhead of table locking or ensuring consistency. At its core, Snel is simply a dynamic library that can be used by client applications. It has an SQLite extension for seamless integration with a traditional SQL environment and simple interactive usage from command line.

preprint2020arXiv

Social Network Analytics for Churn Prediction in Telco: Model Building, Evaluation and Network Architecture

Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational learner by applying them to a total of eight distinct call-detail record datasets, originating from telecommunication organizations across the world. We statistically evaluate the effect of relational classifiers and collective inference methods on the predictive power of relational learners, as well as the performance of models where relational learners are combined with traditional methods of predicting customer churn in the telecommunication industry. Finally we investigate the effect of network construction on model performance; our findings imply that the definition of edges and weights in the network does have an impact on the results of the predictive models. As a result of the study, the best configuration is a non-relational learner enriched with network variables, without collective inference, using binary weights and undirected networks. In addition, we provide guidelines on how to apply social networks analytics for churn prediction in the telecommunication industry in an optimal way, ranging from network architecture to model building and evaluation.

preprint2020arXiv

The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics

Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.

preprint2020arXiv

Wibson Protocol for Secure Data Exchange and Batch Payments

Wibson is a blockchain-based, decentralized data marketplace that provides individuals a way to securely and anonymously sell information in a trusted environment. The combination of the Wibson token and blockchain-enabled smart contracts hopes to allow Data Sellers and Data Buyers to transact with each other directly while providing individuals the ability to maintain anonymity as desired. The Wibson marketplace will provide infrastructure and financial incentives for individuals to securely sell personal information without sacrificing personal privacy. Data Buyers receive information from willing and actively participating individuals with the benefit of knowing that the personal information should be accurate and current. We present here two different components working together to achieve an efficient decentralized marketplace. The first is a smart contract called Data Exchange, which stores references to Data Orders that different Buyers open in order to show to the market that they are interested in buying certain types of data, and provides secure mechanisms to perform the transactions. The second is used to process payments from Buyers to Sellers and intermediaries, and is called Batch Payments.

preprint2020arXiv

WibsonTree: Efficiently Preserving Seller's Privacy in a Decentralized Data Marketplace

We present a cryptographic primitive called WibsonTree designed to preserve users' privacy by allowing them to demonstrate predicates on their personal attributes, without revealing the values of those attributes. We suppose that there are three types of agents --buyers, sellers and notaries-- who interact in a decentralized privacy-preserving data marketplace (dPDM) such as the Wibson marketplace. We introduce the WibsonTree protocol as an efficient cryptographic primitive that enables the exchange of private information while preserving the seller's privacy. Using our primitive, a data seller can efficiently prove that he/she belongs to the target audience of a buyer's data request, without revealing any additional information.

preprint2018arXiv

A Bayesian Approach to Income Inference in a Communication Network

The explosion of mobile phone communications in the last years occurs at a moment where data processing power increases exponentially. Thanks to those two changes in a global scale, the road has been opened to use mobile phone communications to generate inferences and characterizations of mobile phone users. In this work, we use the communication network, enriched by a set of users' attributes, to gain a better understanding of the demographic features of a population. Namely, we use call detail records and banking information to infer the income of each person in the graph.

preprint2018arXiv

Brief survey of Mobility Analyses based on Mobile Phone Datasets

This is a brief survey of the research performed by Grandata Labs in collaboration with numerous academic groups around the world on the topic of human mobility. A driving theme in these projects is to use and improve Data Science techniques to understand mobility, as it can be observed through the lens of mobile phone datasets. We describe applications of mobility analyses for urban planning, prediction of data traffic usage, building delay tolerant networks, generating epidemiologic risk maps and measuring the predictability of human mobility.

preprint2018arXiv

Comparison of Feature Extraction Methods and Predictors for Income Inference

Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users' socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users' income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using node-based features.