Source author record

Ted Pedersen

Ted Pedersen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cmp-lg

Catalog footprint

What is connected

5works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets

This paper describes the Duluth systems that participated in SemEval--2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval). For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data. However, our most successful system for classifying a tweet as offensive (or not) was a rule-based black--list approach, and we also experimented with combining the training data from two different but related SemEval tasks. Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.

preprint2020arXiv

Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in English with Logistic Regression

This paper describes the Duluth systems that participated in SemEval--2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval--2020). We participated in the three English language tasks. Our systems provide a simple Machine Learning baseline using logistic regression. We trained our models on the distantly supervised training data made available by the task organizers and used no other resources. As might be expected we did not rank highly in the comparative evaluation: 79th of 85 in Task A, 34th of 43 in Task B, and 24th of 39 in Task C. We carried out a qualitative analysis of our results and found that the class labels in the gold standard data are somewhat noisy. We hypothesize that the extremely high accuracy (> 90%) of the top ranked systems may reflect methods that learn the training data very well but may not generalize to the task of identifying offensive language in English. This analysis includes examples of tweets that despite being mildly redacted are still offensive.

preprint2020arXiv

Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

We use pretrained transformer-based language models in SemEval-2020 Task 7: Assessing the Funniness of Edited News Headlines. Inspired by the incongruity theory of humor, we use a contrastive approach to capture the surprise in the edited headlines. In the official evaluation, our system gets 0.531 RMSE in Subtask 1, 11th among 49 submissions. In Subtask 2, our system gets 0.632 accuracy, 9th among 32 submissions.

preprint2010arXiv

Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.

preprint1995arXiv

Lexical Acquisition via Constraint Solving

This paper describes a method to automatically acquire the syntactic and semantic classifications of unknown words. Our method reduces the search space of the lexical acquisition problem by utilizing both the left and the right context of the unknown word. Link Grammar provides a convenient framework in which to implement our method.

Ted Pedersen

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets

Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in English with Logistic Regression

Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

Lexical Acquisition via Constraint Solving