Researcher profile

Patrick K. Nicholson

Patrick K. Nicholson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2016arXiv

Distributed Entity Disambiguation with Per-Mention Learning

Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, either struggle to meet the accuracy requirements of many real-world applications or they are too complex to satisfy real-time constraints of applications. In this paper, we propose a new disambiguation system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. To train and validate the hundreds of thousands of learning models for this purpose, we use a Wikipedia hyperlink dataset with more than 170 million labelled annotations. We provide an extensive experimental evaluation to show that the accuracy of our approach compares favourably with respect to many state-of-the-art disambiguation systems. The training required for our approach can be easily distributed over a cluster. Furthermore, updating our system for new entities or calibrating it for special ones is a computationally fast process, that does not affect the disambiguation of the other entities.

preprint2015arXiv

Encodings of Range Maximum-Sum Segment Queries and Applications

Given an array A containing arbitrary (positive and negative) numbers, we consider the problem of supporting range maximum-sum segment queries on A: i.e., given an arbitrary range [i,j], return the subrange [i' ,j' ] \subseteq [i,j] such that the sum of the numbers in A[i'..j'] is maximized. Chen and Chao [Disc. App. Math. 2007] presented a data structure for this problem that occupies Θ(n) words, can be constructed in Θ(n) time, and supports queries in Θ(1) time. Our first result is that if only the indices [i',j'] are desired (rather than the maximum sum achieved in that subrange), then it is possible to reduce the space to Θ(n) bits, regardless the numbers stored in A, while retaining the same construction and query time. We also improve the best known space lower bound for any data structure that supports range maximum-sum segment queries from n bits to 1.89113n - Θ(lg n) bits, for sufficiently large values of n. Finally, we provide a new application of this data structure which simplifies a previously known linear time algorithm for finding k-covers: i.e., given an array A of n numbers and a number k, find k disjoint subranges [i_1 ,j_1 ],...,[i_k ,j_k ], such that the total sum of all the numbers in the subranges is maximized.

preprint2015arXiv

Optimal Encodings for Range Top-k, Selection, and Min-Max

We consider encoding problems for range queries on arrays. In these problems the goal is to store a structure capable of recovering the answer to all queries that occupies the information theoretic minimum space possible, to within lower order terms. As input, we are given an array $A[1..n]$, and a fixed parameter $k \in [1,n]$. A range top-$k$ query on an arbitrary range $[i,j] \subseteq [1,n]$ asks us to return the ordered set of indices $\{\ell_1, ..., \ell_{k}\}$ such that $A[\ell_m]$ is the $m$-th largest element in $A[i..j]$, for $1 \le m \le k$. A range selection query for an arbitrary range $[i,j] \subseteq [1,n]$ and query parameter $k' \in [1,k]$ asks us to return the index of the $k'$-th largest element in $A[i..j]$. We completely resolve the space complexity of both of these heavily studied problems---to within lower order terms---for all $k = o(n)$. Previously, the constant factor in the space complexity was known only for $k=1$. We also resolve the space complexity of another problem, that we call range min-max, in which the goal is to return the indices of both the minimum and maximum elements in a range.

preprint2014arXiv

Algorithms in the Ultra-Wide Word Model

The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel architectures. In this paper we consider an alternative view of parallelism in the form of an ultra-wide word processor. We introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model that allows for constant time operations on thousands of bits in parallel. Word parallelism as exploited by the word-RAM model does not suffer from the more difficult aspects of parallel programming, namely synchronization and concurrency. For the standard word-RAM algorithms, the speedups obtained are moderate, as they are limited by the word size. We argue that a large class of word-RAM algorithms can be implemented in the Ultra-Wide Word model, obtaining speedups comparable to multi-threaded computations while keeping the simplicity of programming of the sequential RAM model. We show that this is the case by describing implementations of Ultra-Wide Word algorithms for dynamic programming and string searching. In addition, we show that the Ultra-Wide Word model can be used to implement a nonstandard memory architecture, which enables the sidestepping of lower bounds of important data structure problems such as priority queues and dynamic prefix sums. While similar ideas about operating on large words have been mentioned before in the context of multimedia processors [Thorup 2003], it is only recently that an architecture like the one we propose has become feasible and that details can be worked out.

preprint2014arXiv

Weighted ancestors in suffix trees

The classical, ubiquitous, predecessor problem is to construct a data structure for a set of integers that supports fast predecessor queries. Its generalization to weighted trees, a.k.a. the weighted ancestor problem, has been extensively explored and successfully reduced to the predecessor problem. It is known that any solution for both problems with an input set from a polynomially bounded universe that preprocesses a weighted tree in O(n polylog(n)) space requires Ω(loglogn) query time. Perhaps the most important and frequent application of the weighted ancestors problem is for suffix trees. It has been a long-standing open question whether the weighted ancestors problem has better bounds for suffix trees. We answer this question positively: we show that a suffix tree built for a text w[1..n] can be preprocessed using O(n) extra space, so that queries can be answered in O(1) time. Thus we improve the running times of several applications. Our improvement is based on a number of data structure tools and a periodicity-based insight into the combinatorial structure of a suffix tree.

preprint2013arXiv

Dynamic Range Selection in Linear Space

Given a set $S$ of $n$ points in the plane, we consider the problem of answering range selection queries on $S$: that is, given an arbitrary $x$-range $Q$ and an integer $k > 0$, return the $k$-th smallest $y$-coordinate from the set of points that have $x$-coordinates in $Q$. We present a linear space data structure that maintains a dynamic set of $n$ points in the plane with real coordinates, and supports range selection queries in $O((\lg n / \lg \lg n)^2)$ time, as well as insertions and deletions in $O((\lg n / \lg \lg n)^2)$ amortized time. The space usage of this data structure is an $Θ(\lg n / \lg \lg n)$ factor improvement over the previous best result, while maintaining asymptotically matching query and update times. We also present a succinct data structure that supports range selection queries on a dynamic array of $n$ values drawn from a bounded universe.

preprint2012arXiv

Dynamic Range Majority Data Structures

Given a set $P$ of coloured points on the real line, we study the problem of answering range $α$-majority (or "heavy hitter") queries on $P$. More specifically, for a query range $Q$, we want to return each colour that is assigned to more than an $α$-fraction of the points contained in $Q$. We present a new data structure for answering range $α$-majority queries on a dynamic set of points, where $α\in (0,1)$. Our data structure uses O(n) space, supports queries in $O((\lg n) / α)$ time, and updates in $O((\lg n) / α)$ amortized time. If the coordinates of the points are integers, then the query time can be improved to $O(\lg n / (α\lg \lg n) + (\lg(1/α))/α))$. For constant values of $α$, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for $d \ge 2$, as well as dynamic arrays, in which each entry is a colour.

preprint2012arXiv

Succinct Posets

We describe an algorithm for compressing a partially ordered set, or \emph{poset}, so that it occupies space matching the information theory lower bound (to within lower order terms), in the worst case. Using this algorithm, we design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an $n$ element poset, the data structure occupies $n^2/4 + o(n^2)$ bits, in the worst case, which is roughly half the space occupied by an upper triangular matrix. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using roughly a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time.