Source author record

Carl Lagoze

Carl Lagoze appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Human-Computer Interaction physics.soc-ph Social and Information Networks Databases Information Retrieval

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2013arXiv

Flexible and Extensible Digital Object and Repository Architecture (FEDORA)

We describe a digital object and respository architecture for storing and disseminating digital library content. The key features of the architecture are: (1) support for heterogeneous data types; (2) accommodation of new types as they emerge; (3) aggregation of mixed, possibly distributed, data into complex objects; (4) the ability to specify multiple content disseminations of these objects; and (5) the ability to associate rights management schemes with these disseminations. This architecture is being implemented in the context of a broader research project to develop next-generation service modules for a layered digital library architecture.

preprint2013arXiv

Policy-Carrying, Policy-Enforcing Digital Objects

We describe the motivation for moving policy enforcement for access control down to the digital object level. The reasons for this include handling of item-specific behaviors, adapting to evolution of digital objects, and permitting objects to move among repositories and portable devices. We then describe our experiments that integrate the Fedora architecture for digital objects and repositories and the PoET implementation of security automata to effect such objectcentric policy enforcement.

preprint2013arXiv

RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by developing a model of user-interest with respect to a personal knowledge context; and Wikipedia, a particularly well-established and reliable knowledge base, is used to instantiate the procedure. We conduct systematic evaluations using individuals' posts from Twitter, YouTube, and Flickr and demonstrate that our novel technique is able to achieve substantial performance gains beyond state-of-the-art NED methods.

preprint2013arXiv

ResourceSync: Leveraging Sitemaps for Resource Synchronization

Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization protocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work to implement this protocol for arXiv.org and also provide an experimental prototype for the English Wikipedia as well as a client API.

preprint2013arXiv

Semantic Tagging on Historical Maps

Tags assigned by users to shared content can be ambiguous. As a possible solution, we propose semantic tagging as a collaborative process in which a user selects and associates Web resources drawn from a knowledge context. We applied this general technique in the specific context of online historical maps and allowed users to annotate and tag them. To study the effects of semantic tagging on tag production, the types and categories of obtained tags, and user task load, we conducted an in-lab within-subject experiment with 24 participants who annotated and tagged two distinct maps. We found that the semantic tagging implementation does not affect these parameters, while providing tagging relationships to well-defined concept definitions. Compared to label-based tagging, our technique also gathers positive and negative tagging relationships. We believe that our findings carry implications for designers who want to adopt semantic tagging in other contexts and systems on the Web.

preprint2013arXiv

The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.

preprint2013arXiv

Web Synchronization Simulations using the ResourceSync Framework

Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis for understanding its properties. We then describe experiments in which simulations of a variety of synchronization scenarios illustrate the effects of model configuration on consistency, latency, and data transfer efficiency. These results provide insight into which congurations are appropriate for various application scenarios.

preprint2011arXiv

Resolving Author Name Homonymy to Improve Resolution of Structures in Co-author Networks

We investigate how author name homonymy distorts clustered large-scale co-author networks, and present a simple, effective, scalable and generalizable algorithm to ameliorate such distortions. We evaluate the performance of the algorithm to improve the resolution of mesoscopic network structures. To this end, we establish the ground truth for a sample of author names that is statistically representative of different types of nodes in the co-author network, distinguished by their role for the connectivity of the network. We finally observe that this distinction of node roles based on the mesoscopic structure of the network, in combination with a quantification of author name commonality, suggests a new approach to assess network distortion by homonymy and to analyze the reduction of distortion in the network after disambiguation, without requiring ground truth sampling.

Carl Lagoze

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Flexible and Extensible Digital Object and Repository Architecture (FEDORA)

Policy-Carrying, Policy-Enforcing Digital Objects

RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

ResourceSync: Leveraging Sitemaps for Resource Synchronization

Semantic Tagging on Historical Maps

The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

Web Synchronization Simulations using the ResourceSync Framework

Resolving Author Name Homonymy to Improve Resolution of Structures in Co-author Networks