Source author record

Paolo D'Alberto

Paolo D'Alberto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Other Computer Science Applications Information Retrieval Mathematical Software

Catalog footprint

What is connected

5works

5topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Mapping and Matching Algorithms: Data Mining by Adaptive Graphs

Assume we have two bijective functions $U(x)$ and $M(x)$ with $M(x)\neq U(x)$ for all $x$ and $M,N: \N \rightarrow \N$ . Every day and in different locations, we see the different results of $U$ and $M$ without seeing $x$. We are not assured about the time stamp nor the order within the day but at least the location is fully defined. We want to find the matching between $U(x)$ and $M(x)$ (i.e., we will not know $x$). We formulate this problem as an adaptive graph mining: we develop the theory, the solution, and the implementation. This work stems from a practical problem thus our definitions. The solution is simple, clear, and the implementation parallel and efficient. In our experience, the problem and the solution are novel and we want to share our finding.

preprint2015arXiv

Multiple-Campaign Ad-Targeting Deployment: Parallel Response Modeling, Calibration and Scoring Without Personal User Information

We present a vertical introduction to campaign optimization; that is, the ability to predict the user response to an ad campaign without any users' profiles on average and for each exposed ad. In practice, we present an approach to build a polytomous model, multi response, composed by several hundred binary models using generalized linear models. The theory has been introduced twenty years ago and it has been applied in different fields since then. Here, we show how we optimize hundreds campaigns and how this large number of campaigns may overcome a few characteristic caveats of single campaign optimization. We discuss the problem and solution of training and calibration at scale. We present statistical performance as {\em coverage}, {\em precision} and {\em recall} used in classification. We present also a discussion about the potential performance as throughput: how many decisions can be done per second streaming the bid auctions also by using dedicated hardware.

preprint2012arXiv

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional graphics processing units (GPUs) onto a single motherboard. These APU processors provide multiple symmetric cores with their memory hierarchies and an integrated GPU. Moreover, these processors are designed to work with external GPUs that can push the peak performance towards the TeraFLOPS boundary. We present a case study for the development of dense Matrix Multiplication (MM) codes for matrix sizes up to 19K\times19K, thus using all of the above computational engines, and an achievable peak performance of 200 GFLOPS for, literally, a made- at-home built. We present the results of our experience, the quirks, the pitfalls, the achieved performance, and the achievable peak performance.

preprint2012arXiv

Non-Parametric Methods Applied to the N-Sample Series Comparison

Anomaly and similarity detection in multidimensional series have a long history and have found practical usage in many different fields such as medicine, networks, and finance. Anomaly detection is of great appeal for many different disciplines; for example, mathematicians searching for a unified mathematical formulation based on probability, statisticians searching for error bound estimates, and computer scientists who are trying to design fast algorithms, to name just a few. In summary, we have two contributions: First, we present a self-contained survey of the most promising methods being used in the fields of machine learning, statistics, and bio-informatics today. Included we present discussions about conformal prediction, kernels in the Hilbert space, Kolmogorov's information measure, and non-parametric cumulative distribution function comparison methods (NCDF). Second, building upon this foundation, we provide a powerful NCDF method for series with small dimensionality. Through a combination of data organization and statistical tests, we describe extensions that scale well with increased dimensionality.

preprint2011arXiv

On the Weakenesses of Correlation Measures used for Search Engines' Results (Unsupervised Comparison of Search Engine Rankings)

The correlation of the result lists provided by search engines is fundamental and it has deep and multidisciplinary ramifications. Here, we present automatic and unsupervised methods to assess whether or not search engines provide results that are comparable or correlated. We have two main contributions: First, we provide evidence that for more than 80% of the input queries - independently of their frequency - the two major search engines share only three or fewer URLs in their search results, leading to an increasing divergence. In this scenario (divergence), we show that even the most robust measures based on comparing lists is useless to apply; that is, the small contribution by too few common items will infer no confidence. Second, to overcome this problem, we propose the fist content-based measures - i.e., direct comparison of the contents from search results; these measures are based on the Jaccard ratio and distribution similarity measures (CDF measures). We show that they are orthogonal to each other (i.e., Jaccard and distribution) and extend the discriminative power w.r.t. list based measures. Our approach stems from the real need of comparing search-engine results, it is automatic from the query selection to the final evaluation and it apply to any geographical markets, thus designed to scale and to use as first filtering of query selection (necessary) for supervised methods.

Paolo D'Alberto

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Mapping and Matching Algorithms: Data Mining by Adaptive Graphs

Multiple-Campaign Ad-Targeting Deployment: Parallel Response Modeling, Calibration and Scoring Without Personal User Information

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

Non-Parametric Methods Applied to the N-Sample Series Comparison

On the Weakenesses of Correlation Measures used for Search Engines' Results (Unsupervised Comparison of Search Engine Rankings)

Paolo D&#39;Alberto

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Mapping and Matching Algorithms: Data Mining by Adaptive Graphs

Multiple-Campaign Ad-Targeting Deployment: Parallel Response Modeling, Calibration and Scoring Without Personal User Information

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

Non-Parametric Methods Applied to the N-Sample Series Comparison

On the Weakenesses of Correlation Measures used for Search Engines' Results (Unsupervised Comparison of Search Engine Rankings)

Paolo D'Alberto