Source author record

Alexander M. Petersen

Alexander M. Petersen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph physics.data-an Digital Libraries physics.pop-ph q-fin.ST Computation and Language physics.geo-ph Social and Information Networks Applications cond-mat.stat-mech cs.CY econ.GN Information Retrieval nlin.AO q-fin.EC q-fin.RM q-fin.TR

Catalog footprint

What is connected

19works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Evolution of biomedical innovation quantified via billions of distinct article-level MeSH keyword combinations

We develop a systematic approach to measuring combinatorial innovation in the biomedical sciences based upon the comprehensive ontology of Medical Subject Headings (MeSH). This approach leverages an expert-defined knowledge ontology that features both breadth (27,875 MeSH analyzed across 25 million articles indexed by PubMed from 1902 onwards) and depth (we differentiate between Major and Minor MeSH terms to identify differences in the knowledge network representation constructed from primary research topics only). With this level of uniform resolution we differentiate between three different modes of innovation contributing to the combinatorial knowledge network: (i) conceptual innovation associated with the emergence of new concepts and entities (measured as the entry of new MeSH); and (ii) recombinant innovation, associated with the emergence of new combinations, which itself consists of two types: peripheral (i.e., combinations involving new knowledge) and core (combinations comprised of pre-existing knowledge only). Another relevant question we seek to address is whether examining triplet and quartet combinations, in addition to the more traditional dyadic or pairwise combinations, provide evidence of any new phenomena associated with higher-order combinations. Analysis of the size, growth, and coverage of combinatorial innovation yield results that are largely independent of the combination order, thereby suggesting that the common dyadic approach is sufficient to capture essential phenomena. Our main results are twofold: (a) despite the persistent addition of new MeSH terms, the network is densifying over time meaning that scholars are increasingly exploring and realizing the vast space of all knowledge combinations; and (b) conceptual innovation is increasingly concentrated within single research articles, a harbinger of the recent paradigm shift towards convergence science.

preprint2020arXiv

Renormalizing individual performance metrics for cultural heritage management of sports records

Individual performance metrics are commonly used to compare players from different eras. However, such cross-era comparison is often biased due to significant changes in success factors underlying player achievement rates (e.g. performance enhancing drugs and modern training regimens). Such historical comparison is more than fodder for casual discussion among sports fans, as it is also an issue of critical importance to the multi-billion dollar professional sport industry and the institutions (e.g. Hall of Fame) charged with preserving sports history and the legacy of outstanding players and achievements. To address this cultural heritage management issue, we report an objective statistical method for renormalizing career achievement metrics, one that is particularly tailored for common seasonal performance metrics, which are often aggregated into summary career metrics -- despite the fact that many player careers span different eras. Remarkably, we find that the method applied to comprehensive Major League Baseball and National Basketball Association player data preserves the overall functional form of the distribution of career achievement, both at the season and career level. As such, subsequent re-ranking of the top-50 all-time records in MLB and the NBA using renormalized metrics indicates reordering at the local rank level, as opposed to bulk reordering by era. This local order refinement signals time-independent mechanisms underlying annual and career achievement in professional sports, meaning that appropriately renormalized achievement metrics can be used to compare players from eras with different season lengths, team strategies, rules -- and possibly even different sports.

preprint2014arXiv

A quantitative perspective on ethics in large team science

The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the individual scientist's ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and increased levels of competition for promotion within the system; we analyze teaming trends across disciplines and national borders demonstrating why it is becoming difficult to distribute credit and to avoid conflicts of interest; and we use more than a century of Nobel prize data to show how science is outgrowing its old institutions of singleton awards. Of particular concern within the large team environment is the weakening of the mentor-mentee relation, which undermines the cultivation of virtue ethics across scientific generations. These trends and emerging organizational complexities call for a universal set of behavioral norms that transcend team heterogeneity and hierarchy. To this end, our expository analysis provides a survey of ethical issues in team settings to inform science ethics education and science policy.

preprint2014arXiv

Inequality and cumulative advantage in science careers: a case study of high-impact journals

Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. We analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcher's successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcher's publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly publishing high-impact research and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.

preprint2014arXiv

Reputation and Impact in Academic Careers

Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate $Δc$ depends on the reputation of its central author $i$, in addition to its net citation count $c$. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations $C_{i}$ of each scientist as his/her reputation measure. We find a citation crossover $c_{\times}$ which distinguishes the strength of the reputation effect. For publications with $c < c_{\times}$, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in $C_{i}$. However, the reputation effect becomes negligible for highly cited publications meaning that for $c\geq c_{\times}$ the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.

preprint2013arXiv

Is Europe Evolving Toward an Integrated Research Area?

An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 along with a sample of 2.6 x 10^5 records from the ISI Web of Science to quantitatively measure the role of borders in international R&D collaboration and mobility. From these data we construct five different networks for each year analyzed: (i) the patent co-inventor network, (ii) the publication co-author network, (iii) the co-applicant patent network, (iv) the patent citation network, and (v) the patent mobility network. We use methods from network science and econometrics to perform a comparative analysis across time and between EU and non-EU countries to determine the "treatment effect" resulting from EU integration policies. Using non-EU countries as a control set, we provide quantitative evidence that, despite decades of efforts to build a European Research Area, there has been little integration above global trends in patenting and publication. This analysis provides concrete evidence that Europe remains a collection of national innovation systems.

preprint2013arXiv

On the Predictability of Future Impact in Science

Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.

preprint2013arXiv

The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile

We present a simple generalization of Hirsch's h-index, Z = \sqrt{h^{2}+C}/\sqrt{5}, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^{2})/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of $Z$ and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists.

preprint2012arXiv

Languages cool as they expand: Allometric scaling and the decreasing need for new words

We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.

preprint2012arXiv

Persistence and Uncertainty in the Academic Career

Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative representation of how careers evolve over time. Since knowledge spillovers, cumulative advantage, competition, and collaboration are distinctive features of the academic profession, both the employment relationship and the procedures for assigning recognition and allocating funding should be designed to account for these factors. We study the annual production n_{i}(t) of a given scientist i by analyzing longitudinal career data for 200 leading scientists and 100 assistant professors from the physics community. We compare our results with 21,156 sports careers. Our empirical analysis of individual productivity dynamics shows that (i) there are increasing returns for the top individuals within the competitive cohort, and that (ii) the distribution of production growth is a leptokurtic "tent-shaped" distribution that is remarkably symmetric. Our methodology is general, and we speculate that similar features appear in other disciplines where academic publication is essential and collaboration is a key feature. We introduce a model of proportional growth which reproduces these two observations, and additionally accounts for the significantly right-skewed distributions of career longevity and achievement in science. Using this theoretical model, we show that short-term contracts can amplify the effects of competition and uncertainty making careers more vulnerable to early termination, not necessarily due to lack of individual talent and persistence, but because of random negative production shocks. We show that fluctuations in scientific production are quantitatively related to a scientist's collaboration radius and team efficiency.

preprint2012arXiv

Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death

We analyze the dynamic properties of 10^7 words recorded in English, Spanish and Hebrew over the period 1800--2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis.

preprint2011arXiv

Detrending career statistics in professional baseball: Accounting for the steroids era and beyond

There is a long standing debate over how to objectively compare the career achievements of professional athletes from different historical eras. Developing an objective approach will be of particular importance over the next decade as Major League Baseball (MLB) players from the "steroids era" become eligible for Hall of Fame induction. Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from both exogenous and endogenous factors, such as talent dilution from expansion, equipment and training improvements, as well as performance enhancing drugs (PED). In this paper we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics for five statistical categories -- hits (H), home runs (HR), runs batted in (RBI), wins (W) and strikeouts (K) -- over the 90-year period 1920-2009. We find that the functional form of these pdfs are stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional baseball arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. Using this simple detrending technique, we examine the top 50 all-time careers for H, HR, RBI, W and K. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.

preprint2011arXiv

Quantitative and empirical demonstration of the Matthew effect in a study of career longevity

The Matthew effect refers to the adage written some two-thousand years ago in the Gospel of St. Matthew: "For to all those who have, more will be given." Even two millennia later, this idiom is used by sociologists to qualitatively describe the dynamics of individual progress and the interplay between status and reward. Quantitative studies of professional careers are traditionally limited by the difficulty in measuring progress and the lack of data on individual careers. However, in some professions, there are well-defined metrics that quantify career longevity, success, and prowess, which together contribute to the overall success rating for an individual employee. Here we demonstrate testable evidence of the age-old Matthew "rich get richer" effect, wherein the longevity and past success of an individual lead to a cumulative advantage in further developing his/her career. We develop an exactly solvable stochastic career progress model that quantitatively incorporates the Matthew effect, and validate our model predictions for several competitive professions. We test our model on the careers of 400,000 scientists using data from six high-impact journals, and further confirm our findings by testing the model on the careers of more than 20,000 athletes in four sports leagues. Our model highlights the importance of early career development, showing that many careers are stunted by the relative disadvantage associated with inexperience.

preprint2011arXiv

Statistical regularities in the rank-citation profile of scientists

Recent "science of science" research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate scientific production and impact of individual careers using the rank-citation profile c_{i}(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c_{i}(r) to a common distribution function that is parameterized by two scaling exponents. Since two scientists with equivalent Hirsch h-index can have significantly different c_{i}(r) profiles, our results demonstrate the utility of the β_{i} scaling parameter in conjunction with h_{i} for quantifying individual publication impact. We show that the total number of citations C_{i} tallied from a scientist's N_{i} papers scales as C_{i} \sim h_{i}^{1+β_{i}}. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.

preprint2010arXiv

Bankruptcy risk model and empirical tests

We analyze the size dependence and temporal stability of firm bankruptcy risk in the US economy by applying Zipf scaling techniques. We focus on a single risk factor-the debt-to-asset ratio R-in order to study the stability of the Zipf distribution of R over time. We find that the Zipf exponent increases during market crashes, implying that firms go bankrupt with larger values of R. Based on the Zipf analysis, we employ Bayes's theorem and relate the conditional probability that a bankrupt firm has a ratio R with the conditional probability of bankruptcy for a firm with a given R value. For 2,737 bankrupt firms, we demonstrate size dependence in assets change during the bankruptcy proceedings. Prepetition firm assets and petition firm assets follow Zipf distributions but with different exponents, meaning that firms with smaller assets adjust their assets more than firms with larger assets during the bankruptcy process. We compare bankrupt firms with nonbankrupt firms by analyzing the assets and liabilities of two large subsets of the US economy: 2,545 Nasdaq members and 1,680 New York Stock Exchange (NYSE) members. We find that both assets and liabilities follow a Pareto distribution. The finding is not a trivial consequence of the Zipf scaling relationship of firm size quantified by employees-although the market capitalization of Nasdaq stocks follows a Pareto distribution, the same distribution does not describe NYSE stocks. We propose a coupled Simon model that simultaneously evolves both assets and debt with the possibility of bankruptcy, and we also consider the possibility of firm mergers.

preprint2010arXiv

Cross-correlations between volume change and price change

In finance, one usually deals not with prices but with growth rates $R$, defined as the difference in logarithm between two consecutive prices. Here we consider not the trading volume, but rather the volume growth rate $\tilde R$, the difference in logarithm between two consecutive values of trading volume. To this end, we use several methods to analyze the properties of volume changes $|\tilde R|$, and their relationship to price changes $|R|$. We analyze $14,981$ daily recordings of the S\&P 500 index over the 59-year period 1950--2009, and find power-law {\it cross-correlations\/} between $|R|$ and $|\tilde R|$ using detrended cross-correlation analysis (DCCA). We introduce a joint stochastic process that models these cross-correlations. Motivated by the relationship between $| R|$ and $|\tilde R|$, we estimate the tail exponent ${\tildeα}$ of the probability density function $P(|\tilde R|) \sim |\tilde R|^{-1 -\tildeα}$ for both the S\&P 500 index as well as the collection of 1819 constituents of the New York Stock Exchange Composite index on 17 July 2009. As a new method to estimate $\tildeα$, we calculate the time intervals $τ_q$ between events where $\tilde R>q$. We demonstrate that $\barτ_q$, the average of $τ_q$, obeys $\bar τ_q \sim q^{\tildeα}$. We find $\tilde α\approx 3$. Furthermore, by aggregating all $τ_q$ values of 28 global financial indices, we also observe an approximate inverse cubic law.

preprint2010arXiv

Market dynamics immediately before and after financial shocks: quantifying the Omori, productivity and Bath laws

We study the cascading dynamics immediately before and immediately after 219 market shocks. We define the time of a market shock T_{c} to be the time for which the market volatility V(T_{c}) has a peak that exceeds a predetermined threshold. The cascade of high volatility "aftershocks" triggered by the "main shock" is quantitatively similar to earthquakes and solar flares, which have been described by three empirical laws --- the Omori law, the productivity law, and the Bath law. We analyze the most traded 531 stocks in U.S. markets during the two-year period 2001-2002 at the 1-minute time resolution. We find quantitative relations between (i) the "main shock" magnitude M \equiv \log V(T_{c}) occurring at the time T_{c} of each of the 219 "volatility quakes" analyzed, and (ii) the parameters quantifying the decay of volatility aftershocks as well as the volatility preshocks. We also find that stocks with larger trading activity react more strongly and more quickly to market shocks than stocks with smaller trading activity. Our findings characterize the typical volatility response conditional on M, both at the market and the individual stock scale. We argue that there is potential utility in these three statistical quantitative relations with applications in option pricing and volatility trading.

preprint2010arXiv

Methods for measuring the citations and productivity of scientists across time and discipline

Publication statistics are ubiquitous in the ratings of scientific achievement, with citation counts and paper tallies factoring into an individual's consideration for postdoctoral positions, junior faculty, tenure, and even visa status for international scientists. Citation statistics are designed to quantify individual career achievement, both at the level of a single publication, and over an individual's entire career. While some academic careers are defined by a few significant papers (possibly out of many), other academic careers are defined by the cumulative contribution made by the author's publications to the body of science. Several metrics have been formulated to quantify an individual's publication career, yet none of these metrics account for the dependence of citation counts and journal size on time. In this paper, we normalize publication metrics across both time and discipline in order to achieve a universal framework for analyzing and comparing scientific achievement. We study the publication careers of individual authors over the 50-year period 1958-2008 within six high-impact journals: CELL, the New England Journal of Medicine (NEJM), Nature, the Proceedings of the National Academy of Science (PNAS), Physical Review Letters (PRL), and Science. In comparing the achievement of authors within each journal, we uncover quantifiable statistical regularity in the probability density function (pdf) of scientific achievement across both time and discipline. The universal distribution of career success within these arenas for publication raises the possibility that a fundamental driving force underlying scientific achievement is the competitive nature of scientific advancement.

preprint2010arXiv

Quantitative law describing market dynamics before and after interest-rate change

We study the behavior of U.S. markets both before and after U.S. Federal Open Market Committee (FOMC) meetings, and show that the announcement of a U.S. Federal Reserve rate change causes a financial shock, where the dynamics after the announcement is described by an analogue of the Omori earthquake law. We quantify the rate n(t) of aftershocks following an interest rate change at time T, and find power-law decay which scales as n(t-T) (t-T)^-$Ω$, with $Ω$ positive. Surprisingly, we find that the same law describes the rate n'(|t-T|) of "pre-shocks" before the interest rate change at time T. This is the first study to quantitatively relate the size of the market response to the news which caused the shock and to uncover the presence of quantifiable preshocks. We demonstrate that the news associated with interest rate change is responsible for causing both the anticipation before the announcement and the surprise after the announcement. We estimate the magnitude of financial news using the relative difference between the U. S. Treasury Bill and the Federal Funds Effective rate. Our results are consistent with the "sign effect," in which "bad news" has a larger impact than "good news." Furthermore, we observe significant volatility aftershocks, confirming a "market underreaction" that lasts at least 1 trading day.

Alexander M. Petersen

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Evolution of biomedical innovation quantified via billions of distinct article-level MeSH keyword combinations

Renormalizing individual performance metrics for cultural heritage management of sports records

A quantitative perspective on ethics in large team science

Inequality and cumulative advantage in science careers: a case study of high-impact journals

Reputation and Impact in Academic Careers

Is Europe Evolving Toward an Integrated Research Area?

On the Predictability of Future Impact in Science

The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile

Languages cool as they expand: Allometric scaling and the decreasing need for new words

Persistence and Uncertainty in the Academic Career

Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death

Detrending career statistics in professional baseball: Accounting for the steroids era and beyond

Quantitative and empirical demonstration of the Matthew effect in a study of career longevity

Statistical regularities in the rank-citation profile of scientists

Bankruptcy risk model and empirical tests

Cross-correlations between volume change and price change

Market dynamics immediately before and after financial shocks: quantifying the Omori, productivity and Bath laws

Methods for measuring the citations and productivity of scientists across time and discipline

Quantitative law describing market dynamics before and after interest-rate change