Source author record

Scott Duke Kominers

Scott Duke Kominers appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.CO Computer Science and Game Theory cs.CY math.NT Computation and Language Discrete Mathematics Distributed, Parallel, and Cluster Computing econ.TH math.HO physics.soc-ph

Catalog footprint

What is connected

7works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Shill-Proof Auctions

We characterize single-item auction formats that are shill-proof in the sense that a profit-maximizing seller has no incentive to submit shill bids. We distinguish between strong shill-proofness, in which a seller with full knowledge of bidders' valuations can never profit from shilling, and weak shill-proofness, which requires only that the expected equilibrium profit from shilling is non-positive. The Dutch auction (with a suitable reserve) is the unique (revenue-)optimal and strongly shill-proof auction. Any deterministic auction can satisfy only two properties in the set {static, strategy-proof, weakly shill-proof}. Our main results extend to settings with affiliated and interdependent values.

preprint2022arXiv

The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

Innovation is a major driver of economic and social development, and information about many kinds of innovation is embedded in semi-structured data from patents and patent applications. Although the impact and novelty of innovations expressed in patent data are difficult to measure through traditional means, ML offers a promising set of techniques for evaluating novelty, summarizing contributions, and embedding semantics. In this paper, we introduce the Harvard USPTO Patent Dataset (HUPD), a large-scale, well-structured, and multi-purpose corpus of English-language patent applications filed to the United States Patent and Trademark Office (USPTO) between 2004 and 2018. With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora. Unlike previously proposed patent datasets in NLP, HUPD contains the inventor-submitted versions of patent applications--not the final versions of granted patents--thereby allowing us to study patentability at the time of filing using NLP methods for the first time. It is also novel in its inclusion of rich structured metadata alongside the text of patent filings: By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks that leverage variation in structured covariates. As a case study on the types of research HUPD makes possible, we introduce a new task to the NLP community--namely, binary classification of patent decisions. We additionally show the structured metadata provided in the dataset enables us to conduct explicit studies of concept shifts for this task. Finally, we demonstrate how HUPD can be used for three additional tasks: multi-class classification of patent subject areas, language modeling, and summarization.

preprint2020arXiv

Generalization by Recognizing Confusion

A recently-proposed technique called self-adaptive training augments modern neural networks by allowing them to adjust training labels on the fly, to avoid overfitting to samples that may be mislabeled or otherwise non-representative. By combining the self-adaptive objective with mixup, we further improve the accuracy of self-adaptive models for image recognition; the resulting classifier obtains state-of-the-art accuracies on datasets corrupted with label noise. Robustness to label noise implies a lower generalization gap; thus, our approach also leads to improved generalizability. We find evidence that the Rademacher complexity of these algorithms is low, suggesting a new path towards provable generalization for this type of deep learning model. Last, we highlight a novel connection between difficulties accounting for rare classes and robustness under noise, as rare classes are in a sense indistinguishable from label noise. Our code can be found at https://github.com/Tuxianeer/generalizationconfusion.

preprint2020arXiv

Prisoners, Rooms, and Lightswitches

We examine a new variant of the classic prisoners and lightswitches puzzle: A warden leads his $n$ prisoners in and out of $r$ rooms, one at a time, in some order, with each prisoner eventually visiting every room an arbitrarily large number of times. The rooms are indistinguishable, except that each one has $s$ lightswitches; the prisoners win their freedom if at some point a prisoner can correctly declare that each prisoner has been in every room at least once. What is the minimum number of switches per room, $s$, such that the prisoners can manage this? We show that if the prisoners do not know the switches' starting configuration, then they have no chance of escape -- but if the prisoners do know the starting configuration, then the minimum sufficient $s$ is surprisingly small. The analysis gives rise to a number of puzzling open questions, as well.

preprint2020arXiv

Smarter Parking: Using AI to Identify Parking Inefficiencies in Vancouver

On-street parking is convenient, but has many disadvantages: on-street spots come at the expense of other road uses such as traffic lanes, transit lanes, bike lanes, or parklets; drivers looking for parking contribute substantially to traffic congestion and hence to greenhouse gas emissions; safety is reduced both due to the fact that drivers looking for spots are more distracted than other road users and that people exiting parked cars pose a risk to cyclists. These social costs may not be worth paying when off-street parking lots are nearby and have surplus capacity. To see where this might be true in downtown Vancouver, we used artificial intelligence techniques to estimate the amount of time it would take drivers to both park on and off street for destinations throughout the city. For on-street parking, we developed (1) a deep-learning model of block-by-block parking availability based on data from parking meters and audits and (2) a computational simulation of drivers searching for an on-street spot. For off-street parking, we developed a computational simulation of the time it would take drivers drive from their original destination to the nearest city-owned off-street lot and then to queue for a spot based on traffic and lot occupancy data. Finally, in both cases we also computed the time it would take the driver to walk from their parking spot to their original destination. We compared these time estimates for destinations in each block of Vancouver's downtown core and each hour of the day. We found many areas where off street would actually save drivers time over searching the streets for a spot, and many more where the time cost for parking off street was small. The identification of such areas provides an opportunity for the city to repurpose valuable curbside space for community-friendly uses more in line with its transportation goals.

preprint2015arXiv

Configurations of Extremal Type II Codes

We prove configuration results for extremal Type II codes, analogous to the configuration results of Ozeki and of the second author for extremal Type II lattices. Specifically, we show that for $n \in \{8, 24, 32, 48, 56, 72, 96\}$ every extremal Type II code of length $n$ is generated by its codewords of minimal weight. Where Ozeki and Kominers used spherical harmonics and weighted theta functions, we use discrete harmonic polynomials and harmonic weight enumerators. Along we way we introduce "$t\frac12$-designs" as a discrete analog of Venkov's spherical designs of the same name.

preprint2011arXiv

Weighted Generating Functions for Type II Lattices and Codes

We give a new structural development of harmonic polynomials on Hamming space, and harmonic weight enumerators of binary linear codes, that parallels one approach to harmonic polynomials on Euclidean space and weighted theta functions of Euclidean lattices. Namely, we use the finite-dimensional representation theory of sl_2 to derive a decomposition theorem for the spaces of discrete homogeneous polynomials in terms of the spaces of discrete harmonic polynomials, and prove a generalized MacWilliams identity for harmonic weight enumerators. We then present several applications of harmonic weight enumerators, corresponding to some uses of weighted theta functions: an equivalent characterization of t-designs, the Assmus-Mattson Theorem in the case of extremal Type II codes, and configuration results for extremal Type II codes of lengths 8, 24, 32, 48, 56, 72, and 96.

Scott Duke Kominers

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Shill-Proof Auctions

The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

Generalization by Recognizing Confusion

Prisoners, Rooms, and Lightswitches

Smarter Parking: Using AI to Identify Parking Inefficiencies in Vancouver

Configurations of Extremal Type II Codes

Weighted Generating Functions for Type II Lattices and Codes