Source author record

Tiep Mai

Tiep Mai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Computation and Language Information Retrieval Machine Learning Social and Information Networks

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Distributed Entity Disambiguation with Per-Mention Learning

Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, either struggle to meet the accuracy requirements of many real-world applications or they are too complex to satisfy real-time constraints of applications. In this paper, we propose a new disambiguation system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. To train and validate the hundreds of thousands of learning models for this purpose, we use a Wikipedia hyperlink dataset with more than 170 million labelled annotations. We provide an extensive experimental evaluation to show that the accuracy of our approach compares favourably with respect to many state-of-the-art disambiguation systems. The training required for our approach can be easily distributed over a cluster. Furthermore, updating our system for new entities or calibrating it for special ones is a computationally fast process, that does not affect the disambiguation of the other entities.

preprint2015arXiv

Bayesian sequential parameter estimation with a Laplace type approximation

A method for sequential inference of the fixed parameters of a dynamic latent Gaussian models is proposed and evaluated that is based on the iterated Laplace approximation. The method provides a useful trade-off between computational performance and the accuracy of the approximation to the true posterior distribution. Approximation corrections are shown to improve the accuracy of the approximation in simulation studies. A population-based approach is also shown to provide a more robust inference method.

preprint2015arXiv

Modifying iterated Laplace approximations

In this paper, several modifications are introduced to the functional approximation method iterLap to reduce the approximation error, including stopping rule adjustment, proposal of new residual function, starting point selection for numerical optimisation, scaling of Hessian matrix. Illustrative examples are also provided to show the trade-off between running time and accuracy of the original and modified methods.

preprint2015arXiv

Profiling user activities with minimal traffic traces

Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs.

Tiep Mai

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Distributed Entity Disambiguation with Per-Mention Learning

Bayesian sequential parameter estimation with a Laplace type approximation

Modifying iterated Laplace approximations

Profiling user activities with minimal traffic traces