Researcher profile

Alex Rubinsteyn

Alex Rubinsteyn contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
1topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2013arXiv

How fast can we make interpreted Python?

Python is a popular dynamic language with a large part of its appeal coming from powerful libraries and extension modules. These augment the language and make it a productive environment for a wide variety of tasks, ranging from web development (Django) to numerical analysis (NumPy). Unfortunately, Python's performance is quite poor when compared to modern implementations of languages such as Lua and JavaScript. Why does Python lag so far behind these other languages? As we show, the very same API and extension libraries that make Python a powerful language also make it very difficult to efficiently execute. Given that we want to retain access to the great extension libraries that already exist for Python, how fast can we make it? To evaluate this, we designed and implemented Falcon, a high-performance bytecode interpreter fully compatible with the standard CPython interpreter. Falcon applies a number of well known optimizations and introduces several new techniques to speed up execution of Python bytecode. In our evaluation, we found Falcon an average of 25% faster than the standard Python interpreter on most benchmarks and in some cases about 2.5X faster.

preprint2013arXiv

Locality Optimization for Data Parallel Programs

Productivity languages such as NumPy and Matlab make it much easier to implement data-intensive numerical algorithms. However, these languages can be intolerably slow for programs that don't map well to their built-in primitives. In this paper, we discuss locality optimizations for our system Parakeet, a just-in-time compiler and runtime system for an array-oriented subset of Python. Parakeet dynamically compiles whole user functions to high performance multi-threaded native code. Parakeet makes extensive use of the classic data parallel operators Map, Reduce, and Scan. We introduce a new set of data parallel operators,TiledMap, TiledReduce, and TiledScan, that break up their computations into local pieces of bounded size so as better to make use of small fast memories. We introduce a novel tiling transformation to generate tiled operators automatically. Applying this transformation once tiles the program for cache, and applying it again enables tiling for registers. The sizes for cache tiles are left unspecified until runtime, when an autotuning search is performed. Finally, we evaluate our optimizations on benchmarks and show significant speedups on programs that exhibit data locality.