Researcher profile

Sam Roweis

Sam Roweis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2012arXiv

Collaborative Filtering and the Missing at Random Assumption

Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.

preprint2009arXiv

Astrometry.net: Blind astrometric calibration of arbitrary astronomical images

We have built a reliable and robust system that takes as input an astronomical image, and returns as output the pointing, scale, and orientation of that image (the astrometric calibration or WCS information). The system requires no first guess, and works with the information in the image pixels alone; that is, the problem is a generalization of the "lost in space" problem in which nothing--not even the image scale--is known. After robust source detection is performed in the input image, asterisms (sets of four or five stars) are geometrically hashed and compared to pre-indexed hashes to generate hypotheses about the astrometric calibration. A hypothesis is only accepted as true if it passes a Bayesian decision theory test against a background hypothesis. With indices built from the USNO-B Catalog and designed for uniformity of coverage and redundancy, the success rate is 99.9% for contemporary near-ultraviolet and visual imaging survey data, with no false positives. The failure rate is consistent with the incompleteness of the USNO-B Catalog; augmentation with indices built from the 2MASS Catalog brings the completeness to 100% with no false positives. We are using this system to generate consistent and standards-compliant meta-data for digital and digitized imaging from plate repositories, automated observatories, individual scientific investigators, and hobbyists. This is the first step in a program of making it possible to trust calibration meta-data for astronomical data of arbitrary provenance.

preprint2008arXiv

Cleaning the USNO-B Catalog through automatic detection of optical artifacts

The USNO-B Catalog contains spurious entries that are caused by diffraction spikes and circular reflection halos around bright stars in the original imaging data. These spurious entries appear in the Catalog as if they were real stars; they are confusing for some scientific tasks. The spurious entries can be identified by simple computer vision techniques because they produce repeatable patterns on the sky. Some techniques employed here are variants of the Hough transform, one of which is sensitive to (two-dimensional) overdensities of faint stars in thin right-angle cross patterns centered on bright ($<13 \mag$) stars, and one of which is sensitive to thin annular overdensities centered on very bright ($<7 \mag$) stars. After enforcing conservative statistical requirements on spurious-entry identifications, we find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of them ($2.3 \percent$) are identified as spurious by diffraction-spike criteria and 196,133 ($0.02 \percent$) are identified as spurious by reflection-halo criteria. The spurious entries are often detected in more than 2 bands and are not overwhelmingly outliers in any photometric properties; they therefore cannot be rejected easily on other grounds, i.e., without the use of computer vision techniques. We demonstrate our method, and return to the community in electronic form a table of spurious entries in the Catalog.