A. M. Petersen, F. Arroyave, F. Pammolli
The disruption index is biased by citation inflation (pdf)    (short version)
A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to ‘citation inflation’, an unavoidable characteristic of real citation networks that manifests as a systematic time-dependent bias and renders cross-temporal analysis challenging. One driver of citation inflation is the ever-increasing lengths of reference lists over time, which in turn increases the density of links in citation networks, and causes the disruption index to converge to 0. The impact of this systematic bias further stymies efforts to correlate disruption to other measures that are also time-dependent, such as team size and citation counts. In order to demonstrate this fundamental measurement problem, we present three complementary lines of critique (deductive, empirical and computational modeling), and also make available an ensemble of synthetic citation networks that can be used to test alternative citation-based measures for systematic bias.
F. Arroyave, J. Jenkins, S. Shackelton, A. M. Petersen
Research alignment in the U.S. National Park Service: Impact of transformative science policy on the supply of scientific knowledge for protected area management (pdf)
The US National Park System includes 63 national parks encompassing diverse environmental and tourism management regimes, together governed by the 1916 Organic Act and its dual mandate of conservation and provision of public enjoyment. However, with the introduction of transformative science policy mandates concentrated around the year 2000 (e.g., National Parks Omnibus Management Act; Natural Resources Challenge), the mission scope has since expanded to promote overarching science-based objectives, thereby fostering knowledge generation critical to park management, as well as promoting the “pristine” territory for “wild science” – as protected areas represent valuable counterfactuals to anthropogenic biomes. To this end, individual US national parks formally explicate itemized “need statements” representing targeted calls for science-based problem solving. Yet despite the paradigm shift instituting “science for parks, parks for science”, there is scant research exploring the impact of science policy on research alignment (i.e., supply-demand) in national parks. We address this gap by leveraging the clearly delineated and well-ordered attributes of the US national parks to develop a spatiotemporal framework for evaluating knowledge alignment, here operationalized via quantifiable measures of supply and demand for scientific knowledge. More specifically, we apply a machine learning algorithm (Latent Dirichlet analysis) to a comprehensive park-specific text corpus (combining official needs statements and scientific research metadata) in order to define a joint topic space, which thereby facilitates quantifying the direction and degree of knowledge alignment at both the parks and systems levels. Additionally, we grouped topics into two categories — normative (overarching) and non-normative (idiosyncratic) — to facilitate assessing their differential response to the transformative science policy characterized as addressing normative issues such as air quality and wilderness. Results indicate an overall robust degree of knowledge alignment, with misaligned topics tending to be over-researched (as opposed to over-demanded), which may be favorable to many parks, but is inefficient from the park system perspective. Results further indicate that the transformative science policy exacerbated the over-supply of research in normative knowledge domains, manifesting in higher levels of misalignment. In light of these results, we argue for improved decision support mechanisms to achieve more timely alignment of research efforts towards distinctive park needs, thereby fostering convergent knowledge co-production and leveraging the full value of national parks as living laboratories.
F. Arroyave, J. Jenkins, A. M. Petersen
Network embedding for understanding the National Park System through the lenses of news media, scientific communication and biogeography (pdf)
The U.S. national parks encompass a variety of biophysical and historical resources important for national cultural heritage. Yet how these resources are socially constructed often depends upon the beholder. Parks tend to be conceptualized according to their (fixed) geographic context, so our understanding of this system of systems is dominated by this geographic lens. To expose the systemic structure that exists beyond their geographic embedding, we analyze three representations of the national park system using park-park similarity networks according to their co-occurrence in: (a) ~423,000 news media articles; (b) ~11,000 research publications; and (c) ~60,000 species inhabiting parks. We quantify structural variation between network representations leveraging similarity measures at different scales: park-level (park-park correlations) and system-level (network communities’ consistency). Because parks are governed and experienced at multiple scales, cross-network comparison informs how management should account for the varying objectives and constraints that dominate at each scale. Our results identify an interesting paradox: whereas park-level correlations depend strongly on the representative lens, the network communities are remarkably robust and consistent with the underlying geographic embedding. Overall, our data-driven methodology is generalizable to other geographically embedded systems and supports the holistic analysis of systems-level structure that may elude other approaches.
A. M. Petersen
Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation (pdf)    (short summary)
We exploit a timely city-level panel of individual house price estimates for both small and big real-estate markets in California USA to estimate the impact of COVID-19 on the housing market. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale. Combining unit-level matching and difference-in-difference approaches, we estimate that properties listed for sale after the pandemic featured an excess monthly price growth of roughly 1 percentage points, corresponding to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the annual growth observed across those regions in 2021. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counter-intuitive roles of uncertainty and interruptions in decision-making.
A. M. Petersen, F. Arroyave, I. Pavlidis
Methods for measuring social and conceptual dimensions of Convergence Science (pdf)    (Supporting Information)
Research Evaluation (2023). DOI:10.1093/reseval/rvad020
Convergence science is an intrepid form of interdisciplinarity defined by the US National Research Council as "the coming together of insights and approaches from originally distinct fields" to strategically address grand challenges. This paradigm has been promoted extensively in the last decade, becoming a model for designing flagship research programs that strategically address grand challenges. Despite its increasing relevance to science policy and institutional design, there is still no practical framework for measuring convergence. We address this gap by developing a measure of disciplinary distance based upon disciplinary boundaries delineated by hierarchical ontologies. We apply this approach using two widely used ontologies – the Classification of Instructional Programs (CIP) and the Medical Subject Headings (MeSH) – each comprised of thousands of entities that facilitate classifying two distinct research dimensions, respectively. The social dimension codifies the disciplinary pedigree of individual scholars, connoting core expertise associated with traditional modes of mono-disciplinary graduate education. The conceptual dimension codifies the knowledge, methods, and equipment fundamental to a given target problem, which together may exceed the researchers' core expertise. Considered in tandem, this decomposition facilitates measuring social-conceptual alignment and optimizing team assembly around domain-spanning problems – a key aspect that eludes other approaches. We demonstrate the utility of this framework in a case study of the human brain science (HBS) ecosystem, a relevant convergence nexus that highlights several practical considerations for designing, evaluating, institutionalizing and accelerating convergence. Econometric analysis of 655,386 publications derived from 9,121 distinct HBS scholars reveals a 11.4% article-level citation premium attributable to research featuring full topical convergence, and an additional 2.7% citation premium if the social (disciplinary) configuration of scholars is maximally aligned with the topical configuration of the research.
- Funded by NSF award #1738163
D. Yang, I. Pavlidis, A. M. Petersen
Biomedical convergence facilitated by the emergence of technological and informatic capabilities (pdf)
Advances in Complex Systems 26, 2350003 (2023). DOI:10.1142/S0219525923500030
We analyzed Medical Subject Headings (MeSH) from 21.6 million research articles indexed by PubMed to map this vast space of entities and their relations, providing insights into the origins and future of biomedical convergence. Detailed analysis of MeSH co-occurrence networks identifies three robust knowledge clusters: the vast universe of microscopic biological entities and structures; systems, disease and diagnostics; and emergent biological and social phenomena underlying the complex problems driving the health, behavioral and brain science frontiers. These domains integrated from the 1990s onward by way of technological and informatic capabilities that introduced highly controllable, scalable and permutable research processes and invaluable imaging techniques for illuminating fundamental structure-function-behavior questions. Article-level analysis confirms a positive relationship between team size and topical diversity, and shows convergence to be increasing in prominence but with recent saturation. Together, our results invite additional policy support for cross-disciplinary team assembly to harness transdisciplinary convergence.
- Funded by NSF award #1738163
I. Pavlidis, E. Akleman, A. M. Petersen
From Polymaths to Cyborgs - Convergence Is Relentless (pdf)
American Scientist 110, 196-200 (2022). DOI:10.1511/2022.110.4.196
The first draft of the human genome - a historic map of our species’ genetic instruction manual - was completed not by biologists, but by a computer science group at the University of California, Santa Cruz. Parsing the complexity of 2.85 billion nucleotides, written across more than 20,000 genes, required technical assistance from and close collaboration with researchers from many disciplines. Ultimately, the Human Genome Project included researchers from engineering, informatics, ethics, physics, biology, and chemistry. The Human Genome Project is a powerful example of the convergence approach in science. In a 2014 report, the National Research Council defined convergence science as the integration of multidisciplinary approaches aiming to address complex questions. For more than half a century, convergent approaches have become increasingly common and impactful in science, prompting some historians of ideas like Peter Watson to identify ongoing convergence as the ultimate scientific trend.
- Read more in the American Scientist special issue on Convergence
- Destiny of science modeled and explained in new study, Phys.Org press summary
A. M. Petersen
Evolution of biomedical innovation quantified via billions of distinct article-level MeSH keyword combinations (pdf)
Advances in Complex Systems 24, 2150016 (2022). DOI:10.1142/S0219525921500168
To what degree has the vast space of higher-order knowledge combinations been explored and how has it evolved over time? To address these questions, we first develop a systematic approach to measuring combinatorial innovation in the biomedical sciences based upon the comprehensive ontology of Medical Subject Headings (MeSH) developed and maintained by the US National Library of Medicine. As such, this approach leverages an expert-defined knowledge ontology that features both breadth (27,875 MeSH analyzed across 25 million articles indexed by PubMed that were published from 1902 onwards) and depth (we differentiate between Major and Minor MeSH terms to identify differences in the knowledge network representation constructed from primary research topics only). With this level of uniform resolution we differentiate between three different modes of innovation contributing to the combinatorial knowledge network: (i) conceptual innovation associated with the emergence of new concepts and entities (measured as the entry of new MeSH); and (ii) recombinant innovation, associated with the emergence of new combinations, which itself consists of two types: peripheral (i.e., combinations involving new knowledge) and core (combinations comprised of pre-existing knowledge only). Another relevant question we seek to address is whether examining triplet and quartet combinations, in addition to the more traditional dyadic or pairwise combinations, provide evidence of any new phenomena associated with higher-order combinations. Analysis of the size, growth, and coverage of combinatorial innovation yield results that are largely independent of the combination order, thereby suggesting that the common dyadic approach is sufficient to capture essential phenomena. Our main results are twofold: (a) despite the persistent addition of new MeSH terms, the network is densifying over time meaning that scholars are increasingly exploring and realizing the vast space of all knowledge combinations; and (b) conceptual innovation is increasingly concentrated within single research articles, a harbinger of the recent paradigm shift towards convergence science.
A. M. Petersen, M. E. Ahmed, I. Pavlidis
Grand challenges and emergent modes of convergence science (pdf)    (Supporting Information)
Nature Humanities and Social Sciences Communications 8, 194 (2021). DOI:10.1057/s41599-021-00869-9
To address complex problems, scholars are increasingly faced with challenges of integrating diverse knowledge domains. We analyzed the evolution of this convergence paradigm in the broad ecosystem of brain science, which provides a real-time testbed for evaluating two modes of cross-domain integration – subject area exploration via expansive learning and cross-disciplinary collaboration among domain experts. We show that research involving both modes features a 16% citation premium relative to a mono-disciplinary baseline. Further comparison of research integrating neighboring versus distant research domains shows that the cross-disciplinary mode is essential for integrating across relatively large disciplinary distances. Yet we find research utilizing cross- domain subject area exploration alone – a convergence shortcut – to be growing in prevalence at roughly 3% per year, significantly faster than the alternative cross-disciplinary mode, despite being less effective at integrating domains and markedly less impactful. By measuring shifts in the prevalence and impact of different convergence modes in the 5-year intervals before and after 2013, our results indicate that these counterproductive patterns may relate to competitive pressures associated with global Human Brain flagship funding initiatives. Without additional policy guidance, such Grand Challenge flagships may unintentionally incentivize such convergence shortcuts, thereby undercutting the advantages of cross-disciplinary teams in tackling challenges calling on convergence.
- Funded by NSF award #1738163
- Presented at INSciTS 2022 & ICCSI 2022 hosted by the US National Academy of Sciences: Full Poster & Video Summary
F. Arroyave, O. Y. Romero Goyeneche, M. Gore, G. Heimeriks, J. Jenkins, A. M. Petersen
On the social and cognitive dimensions of wicked environmental problems characterized by conceptual and solution uncertainty (pdf)
Advances in Complex Systems 24, 215005 (2021). DOI:10.1142/S0219525921500053
We develop a quantitative framework for understanding the class of wicked problems that emerge at the intersections of natural, social, and technological complex systems. Wicked problems reflect our incomplete understanding of interdependent global systems and the systemic risk they pose; such problems escape solutions because they are often ill-defined, and thus mis-identified and under-appreciated by communities of problem-solvers. While there are well-documented benefits to tackling boundary-crossing problems from various viewpoints, the integration of diverse approaches can nevertheless contribute confusion around the collective understanding of the core concepts and feasible solutions. We explore this paradox by analyzing the development of both scholarly (social) and topical (cognitive) communities - two facets of knowledge production studies here that contribute towards the evolution of knowledge in and around a problem, termed a knowledge trajectory — associated with three wicked problems: deforestation, invasive species, and wildlife trade. We posit that saturation in the dynamics of social and cognitive diversity growth is an indicator of reduced uncertainty in the evolution of the comprehensive knowledge trajectory emerging around each wicked problem. Informed by comprehensive bibliometric data capturing both social and cognitive dimensions of each problem domain, we thereby develop a framework that assesses the stability of knowledge trajectory dynamics as an indicator of wickedness associated with conceptual and solution uncertainty. As such, our results identify wildlife trade as a wicked problem that may be difficult to address given recent instability in its knowledge trajectory.
F. J. Arroyave, A. M. Petersen, J. Jenkins, R. G. Hurtado
Multiplex networks reveal geographic constraints on illicit wildlife trafficking (pdf)
Applied Network Science 5, 20 (2020). DOI:10.1007/s41109-020-00262-6
Illicit wildlife trafficking poses a threat to the conservation of species and ecosystems, and represents a fundamental source of biodiversity loss, alongside climate change and large-scale land degradation. Despite the seriousness of this issue, little is known about various socio-cultural demand sources underlying trafficking networks, for example the forthright consumption of endangered species on different cultural contexts. Our study illustrates how wildlife trafficking represents a wicked problem at the intersection of criminal enforcement, cultural heritage and environmental systems management. As with similar network-based crimes, institutions are frequently ineffective at curbing wildlife trafficking, partly due to the lack of information detailing activities within illicit trading networks. To address this shortcoming, we leverage official government records documenting the illegal trade of reptiles in Colombia. As such, our study contributes to the understanding of how and why wildlife trafficking persists across robust trafficking networks, which are conduits for a broader range of black-market goods. Leveraging geo-spatial data, we construct a multiplex representation of wildlife trafficking networks, which facilitates identifying network properties that are signatures of strategic trafficker behavior. In particular, our results indicate that traffickers’ actions are constrained by spatial and market customs, a result which is apparent only within an integrated multiplex representation. Characteristic levels of sub-network coupling further indicate that traffickers strategically leverage knowledge of the entire system. We argue that this multiplex representation is essential for prioritizing crime enforcement strategies aimed at disrupting robust trade networks, thereby enhancing the effectiveness and resources allocation of institutions charged with curbing illicit trafficking. We develop a generalizable model of multiplex criminal trade networks suitable for communicating with policy makers and practitioners, thereby facilitating rapid translation into public policy and environmental conservation efforts.
A. M. Petersen, O. Penner
Renormalizing individual performance metrics for cultural heritage management of sports records (pdf)    (Pre-print OA Version)
Chaos, Solitons & Fractals 136, 109821 (2020). DOI:10.1016/j.chaos.2020.109821
Individual performance metrics are commonly used to compare players from different eras. However, such cross-era comparison is often biased due to significant changes in success factors underlying player achievement rates (e.g. performance enhancing drugs and modern training regimens). Such historical comparison is more than fodder for casual discussion among sports fans, as it is also an issue of critical importance to the multi- billion dollar professional sport industry and the institutions (e.g. Hall of Fame) charged with preserving sports history and the legacy of outstanding players and achievements. To address this cultural heritage management issue, we report an objective statistical method for renormalizing career achievement metrics, one that is particularly tailored for common seasonal performance metrics, which are often aggregated into summary career metrics - despite the fact that many player careers span different eras. Remarkably, we find that the method applied to comprehensive Major League Baseball and National Basketball Association player data preserves the overall functional form of the distribution of career achievement, both at the season and career level. As such, subsequent re-ranking of the top-50 all-time records in MLB and the NBA using renormalized metrics indicates reordering at the local rank level, as opposed to bulk reordering by era. This local order refinement signals time-independent mechanisms underlying seasonal and cumulative achievement in professional sports, meaning that appropriately renormalized achievement metrics can be used to compare players from eras with different season lengths, team strategies, rules - and possibly even different sports.
D. Majeti, E. Akleman, M. E. Ahmed, A. M. Petersen, B. Uzzi, I. Pavlidis
Scholar Plot: Design and Evaluation of an Information Interface for Faculty Research Performance (pdf)
Frontiers in Research Metrics and Analytics 4, 6 (2020). DOI:10.3389/frma.2019.00006
The ability to objectively assess academic performance is critical to rewarding academic merit, charting academic policy, and promoting science. Quintessential to performing these functions is first the ability to collect valid and current data through increasingly automated online interfaces. Moreover, it is crucial to remove disciplinary and other biases from these data, presenting them in ways that support insightful analysis at various levels. Existing systems are lacking in some of these respects. Here we present Scholar Plot (SP), an interface that harvests bibliographic and research funding data from online sources. SP addresses systematic biases in the collected data through nominal and normalized metrics. Eventually, SP combines synergistically these metrics in a plot form for expert appraisal, and an iconic form for broader consumption. SP’s plot and iconic forms are scalable, representing equally well individual scholars and their academic units, thus contributing to consistent ranking practices across the university organizational structure. In order to appreciate the design principles underlying SP, in particular the informativeness of nominal versus normalized metrics, we also present the results of an evaluation survey taken by senior faculty (n=28) with significant promotion and tenure assessment experience.
- Funded by NSF award #1738163
A. M. Petersen
Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE (pdf)
J. Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974
Since their emergence just a decade ago, nearly 2% of scientific research is now published by megajournals, representing a major industrial shift in the production of knowledge. Such high-throughput production stresses several aspects of the publication process, including the editorial oversight of peer-review. As the largest mega- journal, PLOS ONE has relied on a single-tier editorial board comprised of ∼7,000 active academics, who thereby face conflicts of interest relating to their dual roles as both producers and gatekeepers of peer-reviewed literature. While such conflicts of interest are also a factor for editorial boards of smaller journals, little is known about how the scalability of megajournals may introduce perverse incentives for editorial service. To address this issue, we analyzed the activity of PLOS ONE editors over the journal’s inaugural decade (2006-2015) and find highly variable activity levels. We then leverage this variation to model how editorial bias in the manuscript decision process relates to two editor-specific factors: repeated editor-author interactions and shifts in the rates of citations directed at editors – a form of citation remuneration that is analogue to self-citation. Our results indicate significantly stronger manuscript bias among a relatively small number of extremely active editors, who also feature relatively high self-citation rates coincident in the manuscripts they handle. These anomalous activity patterns are consistent with the perverse incentives and the temptations they offer at scale, which is the- oretically grounded in the "slippery-slope" evolution of apathy and misconduct in power-driven environments. By applying quantitative evaluation to the gatekeepers of scientific knowledge, we shed light on various ethics issues crucial to science policy – in particular, calling for more transparent and structured management of editor activity in megajournals that rely on active academics.
-Personal biases speed up research publication: Megajournal editors under the microscope, Nature Index, Gemma Conroy
-Analysis of highly prolific Plos One editors finds evidence for 'editor-author backscratching', Times Higher Education, Jack Grove
-Editors secured citation bump, Science, Jeffrey Brainard
A. M. Petersen, R. K. Pan, F. Pammolli, S. Fortunato
Methods to Account for Citation Inflation in Research Evaluation (pdf)
Research Policy 48, 1855-1865 (2019). DOI:10.1016/j.respol.2019.04.009
Quantitative research evaluation requires measures that are transparent, simple, and free of disciplinary and temporal bias. We document and provide solution to a hitherto unaddressed temporal bias - citation inflation - which arises from the basic fact that scientific publication is steadily growing at roughly 4% per year. Because the total production of citations grows by a factor of 2 every 12 years, this means that the real value of a citation depends on when it was produced. As such, failing to convert nominal citation values into real citation values produces significant mis-measurement of scientific impact. To address this problem, we develop a citation deflator method, outline the steps to generalize and implement it using the Web of Science portal, and analyze a large set of researchers from biology and physics to demonstrate how two common evaluation metrics (total citations and h-index) can differ, by a remarkable amount, depending on whether the citations are deflated or not. In particular, our results show that the scientific impact of older generations is likely to be significantly underestimated when citations are not deflated, often by 100% or more of the nominal value. Thus, our study points to the need for a systemic overhaul of the counting methods used evaluating citation impact - especially in the case of researchers, journals, and institutions - which can span several decades and thus several doubling periods.
A. M. Petersen
Multiscale Impact of Researcher Mobility (pdf)    (Supporting Information) Journal of the Royal Society Interface 15, 20180580 (2018). DOI:10.1098/rsif.2018.0580
International mobility facilitates the exchange of scientific, institutional and cultural knowledge. Yet whether globalization and advances in virtual communication technologies have altered the impact of researcher mobility is a relevant and open question that we address by analysing a broad international set of 26,170 physicists from 1980 to 2009, focusing on the 10-year period centred around each mobility event to assess the impact of mobility on research outcomes. We account for secular globalization trends by splitting the analysis into three periods, measuring for each period the effect of mobility on researchers' citation impact, research topic diversity, collaboration networks and geographical coordination. In order to identify causal effects we leverage statistical matching methods that pair mobile researchers with non-mobile researchers that are similar in research profile attributes prior the mobility event. We find that mobile researchers gain up to a 17% increase in citations relative to their non-mobile counterparts, which can be explained by the simultaneous increase in their diversity of co-authors, topics and geographical coordination in the period immediately following migration. Nevertheless, we also observe that researchers completely curtail prior collaborations with their source country in 11% of the cross-border mobility events. As such, these individual-level perturbations fuel multiscale churning in scientific networks, e.g. rewiring the connectivity of individuals and ideas and affecting international integration. Together these results provide additional clarity on the complex relationship between human capital mobility and the dynamics of social capital investment, with implications for immigration and national innovation system policy.
-Why you should move country, Nature, Virginia Gewin
-Considering going abroad for work?, Science, Elisabeth Pain
-Physicists who move abroad can receive a 17% uplift in citations, study reveals, Physics World, Michael Allen
-Migration brings citations boost, Science, Warren Cornwall
A. M. Petersen, D. Majeti, K. Kwon, M. E. Ahmed, I. Pavlidis
Cross-disciplinary evolution of the genomics revolution (pdf)    (Supporting Information)
Science Advances 4(8), eaat4211 (2018). DOI:10.1126/sciadv.aat4211
Born out of the Human Genome Project (HGP), the field of genomics evolved with phenomenal speed into a dominant scientific and business force. While other efforts were intent on estimating the economic impact of the genomics revolution, we shift focus to the social and cultural capital generated by bridging together biology and computing - two of the constitutive disciplines of "genomics". We quantify this capital by measuring the pervasiveness of bio-computing cross-disciplinarity (XD) in genomics research during and after the HGP. To provide interlocking perspectives at the career and epistemic levels, we assembled three data sets to measure XD via (i) the collaboration network between 4190 biology and computing faculty from 155 departments in the United States, (ii) cross-departmental affiliations within a comprehensive set of human genomics publications, and (iii) the application of computational concepts and methods in research published in a preeminent genomics journal. Our results show the following: First, research featuring XD collaborations has higher citation impact than other disciplinary research - an effect observed at both the career and individual article levels. Second, genomics articles featuring XD methods tend to have higher citation impact than epistemically pure articles. Third, XD researchers of computing pedigree are drawn to the biology culture. This statistical evidence acquires deeper meaning when viewed against the organizational and knowledge transfer mechanisms revealed by the data models. With cross-disciplinary initiatives set to dominate the agenda of funding agencies, our case study provides a framework for appreciating the long-term effects of these initiatives on science and its standard-bearers.
-How gene hunting changed the culture of science, EurekAlert! / CPLab Univ. Houston
- Funded by NSF award #1738163
R. K. Pan, A. M. Petersen, F. Pammolli, S. Fortunato
The Memory of Science: Inflation, Myopia, and the Knowledge Network (pdf)
J. Informetrics 12, 656-678 (2018). DOI:10.1016/j.joi.2018.06.005
Scientific production is steadily growing, exhibiting 4% annual growth in publications and 1.8% annual growth in the number of references per publication, together producing a 12-year doubling period in the total supply of references, i.e. links in the science citation network. This growth has far-reaching implications for how academic knowledge is connected, accessed and evaluated.
Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a detailed analysis of the 'attention economy' in science. Our results show how growth relates to 'citation inflation', increased connectivity in the citation network resulting from decreased levels of uncitedness, and
a narrowing range of attention - as both very classic and very recent literature are being cited increasingly less.
The decreasing attention to recent literature published within the last 6 years suggests that science has become stifled by a publication deluge destabilizing the balance between production and consumption.
To better understand these patterns together, we developed a generative model of the citation network, featuring exponential growth, the redirection of scientific attention via publications' reference lists, and the crowding out of old literature by the new.
We validate our model against several empirical benchmarks, and then use perturbation analysis to measure the impact of shifts in citing behavior
on the synthetic system's properties, thereby providing insights into the functionality of the science citation network as an infrastructure supporting the memory of science.
-The growth of papers is crowding out old classics, Nature Index, Gemma Conroy
S. Fortunato, C. T. Bergstrom, K. Borner, J. A. Evans, D. Helbing, S. Milojevic, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman, D. Wang, A.-L. Barabasi
Science of Science (pdf)
Science 359, eaao0185 (2018). DOI:10.1126/science.aao0185
Identifying fundamental drivers of science and developing predictive models to capture its evolution are instrumental for the design of policies that can improve the scientific enterprise - for example, through enhanced career paths for scientists, better performance evaluation for organizations hosting research, discovery of novel effective funding vehicles, and even identification of promising regions along the scientific frontier. The science of science uses large-scale data on the production of science to search for universal and domain-specific patterns. Here, we review recent developments in this transdisciplinary field.
O. A. Doria Arrieta, F. Pammolli, A. M. Petersen
Quantifying the negative impact of brain drain on the integration of European science (pdf)    (Supporting Information)
Science Advances 3(4), e1602232 (2017). DOI:10.1126/sciadv.1602232
The 2004/2007 European Union (EU) enlargement by 12 member states offers a unique opportunity to quantify the impact of EU efforts to expand and integrate the scientific competitiveness of the European Research Area (ERA). We apply two causal estimation schemes to cross-border collaboration data extracted from millions of academic publications from 1996 to 2012, which are disaggregated across 14 subject areas and 32 European countries. Our results illustrate the unintended consequences following the 2004/2007 enlargement, namely, its negative impact on cross-border collaboration in science. First, we use the synthetic control method to show that levels of European cross-border collaboration would have been higher without EU enlargement, despite the 2004/2007 EU entrants gaining access to EU resources incentivizing cross-border integration. Second, we implement a difference-in-difference panel regression, incorporating official intra-European high-skilled mobility statistics, to identify migration imbalance - principally from entrant to incumbent EU member states - as a major factor underlying the divergence in cross-border integration between Western and Eastern Europe. These results challenge central tenets underlying ERA integration policies that unifying labor markets will increase the international competitiveness of the ERA, thereby calling attention to the need for effective home-return incentives and policies.
-Study Identifies Effects of EU Expansion on Labor, Research, UC Merced Communications, Lorena Anderson
-Europe's paradox: Why increased scientific mobility has not led to more international collaborations, Science, Erik Stokstad
-Joining the European Union leads to less cross-border collaboration, Nature, Daniel Cressy
-EU Expansion did not Increase Cross-Border Research, AAAS, Megan Jula
-The EU Had a Scientific Collaboration Problem Long Before Brexit, Motherboard, Ben Sullivan
A. M. Petersen, M. Puliga
High-skilled labour mobility in Europe before and after the 2004 enlargement (pdf)    (Supporting Information)
Journal of the Royal Society Interface 14, 20170030 (2017). DOI:10.1098/rsif.2017.0030
The extent to which international high-skilled mobility channels are forming is a question of great importance in an increasingly global knowledge-based economy. One factor facilitating the growth of high-skilled labor markets is the standardization of certifiable degrees meriting international recognition. Within this context, we analyzed an extensive high-skilled mobility database comprising roughly 382,000 individuals from 5 broad profession groups (Medical, Education, Technical, Science & Engineering, and Business & Legal) over the period 1997-2014, using the 13-country expansion of the European Union (EU) to provide insight into labor market integration. We compare the periods before and after the 2004 enlargement, showing the emergence of a new East-West migration channel between the 13 mostly eastern EU entrants (E) and the rest of the western European countries (W). Indeed, we observe a net directional loss of human capital from E->W, representing 29% of the total mobility after 2004. Nevertheless, the counter-migration from W->E is 7% of the total mobility over the same period, signaling the emergence of brain circulation within the EU. Our analysis of the country-country mobility networks and the country-profession bipartite networks provides timely quantitative evidence for the convergent integration of the EU, and highlights the central role of the UK and Germany as high-skilled labor hubs.We conclude with two data-driven models to explore the structural dynamics of the mobility networks.First, we operationalize a redistribution model to explore the potential ramifications of Brexit, showing the extent to which a 'hard' Brexit, i.e. complete disintegration from the EU, may benefit the overall homogeneity of the European mobility network. Second, we use a panel regression model to explain empirical high-skilled mobility rates in terms of various economic `push-pull' factors, the results of which show that government expenditure on education, per-capita wealth, geographic proximity, and labor force size are significant attractive features of destination countries.
-European mobility and the potential consequences of Brexit, The Royal Society, Ruth Milne
-Cover of April 2017 issue of JRSI
L. Leydesdorff, A. M. Petersen, I. Ivanova
Self-Organization of Meaning and the Reflexive Communication of Information (pdf)
Social Science Information 56(1), 4-27 (2017). DOI:10.1177/0539018416675074
Following a suggestion of Warren Weaver, we extend the Shannon model of communication piecemeal into a complex systems model in which communication is differentiated both vertically and horizontally. This model enables us to bridge the divide between Niklas Luhmann's theory of the self-organization of meaning in communications and empirical research using information theory. First, we distinguish between communication relations and correlations between patterns of relations. The correlations span a vector space in which relations are positioned and thus provided with meaning. Second, positions provide reflexive perspectives. Whereas the different meanings are integrated locally, each instantiation opens horizons of meaning that can be codified along eigenvectors of the communication matrix. The next-order specification of codified meaning can generate redundancies (as feedback on the forward arrow of entropy production). The horizontal differentiation among the codes of communication enables us to quantify the creation of new options as mutual redundancy. Increases in redundancy can then be measured as local reduction of prevailing uncertainty (in bits). The generation of options can also be considered as a hallmark of the knowledge-based economy: new knowledge provides new options. Both the communication-theoretical and the operational (information-theoretical) perspectives can thus be further developed.
A. M. Petersen, D. Rotolo, L. Leydesdorff
A Triple Helix Model of Medical Innovation: Supply, Demand, and Technological Capabilities in terms of Medical Subject Headings (pdf)
Research Policy 45(3), 666-681 (2016). DOI:10.1016/j.respol.2015.12.004
We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii)
supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on triple helix research, we use entropy statistics to elaborate an indicator of mutual information among these dimensions that can provide indication of reduction of uncertainty. To do so, we focus on the medical context, where uncertainty poses significant challenges to the governance of innovation. We use the Medical Subject Headings (MeSH) of MEDLINE/PubMed to identify publications classified within the categories "Diseases" (C),"Drugs and Chemicals" (D), "Analytic, Diagnostic, and Therapeutic Techniques and Equipment" (E) and use these as knowledge representations of demand, supply, and technological capabilities, respectively. Three case-studies of medical research areas are used as representative 'entry perspectives' of the medical innovation process. These are: (i) human papilloma virus, (ii) RNA interference, and (iii) magnetic resonance imaging. We find statistically significant periods of synergy among demand, supply, and technological capabilities (C-D-E) that point to three-dimensional interactions as a fundamental perspective for the understanding and governance of the uncertainty associated with medical innovation. Among the pairwise configurations in these contexts, the demand-technological capabilities (C-E) provided the strongest link, followed by the supply-demand (D-C) and the supply-technological capabilities (D-E) channels.
A. M. Petersen
Quantifying the impact of weak, strong, and super ties in scientific careers (pdf)    (short summary)    (Supporting Information)
Proceedings of the National Academy of Sciences USA 112, E4671-E4680 (2015). DOI:10.1073/pnas.1501444112
Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles using an ego-centric perspective which accounts for researcher-specific characteristics and provides insight into a range of topics, from career achievement and sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify the frequency distributions of collaboration duration and tie-strength, showing that collaboration networks are dominated by weak ties characterized by high turnover rates. We use analytic extreme-value thresholds to identify a new class of indispensable `super ties', the strongest of which commonly exhibit >50% publication overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of descriptive and panel regression methods to compare the subset of publications coauthored with a super tie to the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group experience. We find that super ties contribute to above-average productivity and a 17% citation increase per publication, thus identifying these partnerships -- the analog of life partners -- as a major factor in science career development.
-Dynamic duos in science can reap rewards of academic partnerships, Times Higher Education
-Lifetime collaborators reap the benefits, Nature
-Quantifying scientific collaboration, Physics Today
-Publishing Partners, The Scientist
-Collaboration and scientific career development, PNAS Highlight
-Study suggests long term collaborations result in more productive scientific careers, Phys.org
-Collaboration Fosters More Productive Scientific Careers than Competition, Technology.org
-What science can tell us about building great teams, Kellogg School of Management, Emily Stone
A. Morescalchi, F. Pammolli, O. Penner, A. M. Petersen, M. Riccaboni
The evolution of networks of innovators within and across borders: Evidence from patent data (pdf)
Research Policy 44(3), 651-668 (2015). DOI:10.1016/j.respol.2014.10.015
Recent studies on the geography of knowledge networks have documented a negative impact of physical distance and institutional borders
upon research and development (R&D) collaborations. Though it is widely recognized that geographic constraints hamper the diffusion of
knowledge, less attention has been devoted to the temporal evolution of these constraints. In this study we use data on patents filed with the European Patent Office (EPO) for 50 countries to analyze the impact of physical distance and country borders on inter-regional
links in four different networks over the period 1988-2009: (1) co-inventorship, (2) patent citations, (3) inventor mobility and (4) the location of R&D laboratories. We find the constraint imposed by country borders and distance decreased until mid-1990s then started t
o grow, particularly for distance. The intensity of European cross-country inventor collaborations increased at a higher pace than their non-European counterparts until 2004, with no significant relative progress afterwards. Moreover, when analyzing networks of geographical mobility, multinational R&D activities and patent citations we do not depict any substantial progress in European research integration aside from the influence of common global trends.
C. Schulz, A. Mazloumian, A. M. Petersen, O. Penner, D. Helbing
Exploiting citation networks for large-scale author name disambiguation (pdf)
EPJ Data Science 3, 11 (2014). DOI:10.1140/epjds/s13688-014-0011-3
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon on the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects linked papers and then merges similar clusters. This parameterized model is optimized towards an h-index based recall, which favors the inclusion of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.
A. M. Petersen, O. Penner
Inequality and cumulative advantage in science careers: a case study of high-impact journals (pdf)    (short summary)
EPJ Data Science 3, 24 (2014). DOI:10.1140/epjds/s13688-014-0024-y
Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role.
While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. Here we analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals, accounting for censoring biases in the publication data by using distinct researcher cohorts defined over non-overlapping time periods. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries.
Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcher's successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcher's publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly producing research findings in the highest citation-impact echelon, as well as the role played by finite career and knowledge life-cycles, and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.
-Scientific networks and success in science, EPJ Data Science Editorial
I. Pavlidis, A. M. Petersen, I. Semendeferi
Together we stand (pdf)
Nature Physics 10, 700-702 (2014). DOI:10.1038/nphys3110
During the past 70 years science has been transforming, from the solitary operation that for centuries it used to be, into an endeavor characterized by ever-increasing team size. The importance of this transformation to our technology-driven society cannot be overestimated. As science undergoes this phenomenal evolution, one might expect that the scientific community and its main host - academia - would develop new norms that better serve a new stage. Moth metamorphosis is an example of a natural process that does exactly this, brilliantly adapting form to function. Alas, social constructs are not as flexible as natural processes. The academic career structure originally conceived to reward self-sufficient singletons, continues to be implemented in a system dominated by teams and characterized by symbiotic relationships. To make matters worse, increasingly specialized education leaves academics ill prepared to cope with this challenge. When, how, and why did this malformation start, where does it lead, and how can it be ameliorated? By addressing these questions we bring to the fore the causal links and future projections of the problem, informing a policy and moral dialogue for its resolution.
-Team Science Is Tied to Growth in Grants With Multiple Recipients, The Chronicle of Higher Education
-Researchers say academia can learn from Hollywood, Phys.org
A. M. Petersen, S. Fortunato, R. K. Pan, K. Kaski, O. Penner, A. Rungi, M. Riccaboni, H. E. Stanley, F. Pammolli
Reputation and Impact in Academic Careers(pdf)    (Supporting Information)
Proceedings of the National Academy of Sciences USA 111, 15316-15321 (2014). DOI:10.1073/pnas.1323111111
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate \Delta c depends on the reputation of its central author i, in addition to its net citation count c. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations C_i of each scientist as his/her reputation measure. We find a citation crossover cx which distinguishes the strength of the reputation effect. For publications with c < c_x, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in C_i. However, the reputation effect becomes negligible for highly cited publications meaning that for c >= c_x the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.
-Recognition: Build a reputation, Nature Jobs
-Being a big name in science brings benefits, Nature
-Researchers prefer citing researchers of good reputation, Phys.org
-Scientists' reputations and citation rates, PNAS Highlight
A. M. Petersen, I. Pavlidis, I. Semendeferi
A quantitative perspective on ethics in large team science (pdf)
Science & Engineering Ethics 20, 923-945 (2014). DOI: 10.1007/s11948-014-9562-8
The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the scientists' ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and increased levels of competition for promotion within the system; we analyze teaming trends across disciplines and national borders demonstrating why it is becoming difficult to distribute credit and to avoid conflicts of interest; and we use more than a century of Nobel prize data to show how science is outgrowing its old institutions of singleton awards. Of particular concern within the large team environment is the weakening of the mentor-mentee relation, which undermines the cultivation of virtue ethics across scientific generations. These trends and emerging organizational complexities call for a universal set of behavioral norms that transcend team heterogeneity and hierarchy. To this end, our expository analysis provides a survey of ethical issues in team settings to inform science ethics education and science policy.
- Family values, Philip Ball (Chemisty world, April 17, 2014)
O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, S. Fortunato
On the Predictability of Future Impact in Science (pdf)
Scientific Reports 3, 3052 (2013). DOI: 10.1038/srep03052
Correct assessment of scientist's past research impact and potential for future impact is fundamental to all personnel recruitment decisions in science. Quantitative measures for impact of previous work are already, formally and informally, involved in the recruitment and evaluation process. Of greater concern in the recruitment process is what a candidate will do in the future. Attempts have recently been made to
develop models capable of predicting a scientist's future impact by way of his or her future h-index. Here we present a cross-sectional analysis of 762 longitudinal careers drawn from three disciplines: physics, biology, and mathematics. By applying future impact models to these careers we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the
h-index contain intrinsic spurious autocorrelation, resulting in a significant overestimation of their "predictive power". Applying the model to a scientist's annual h-index change (a non-cumulative measure), the models exhibit far less predictive power. Moreover, the predictive power of these models vary greatly with the career age of scientists, producing least accurate estimates for already risk-burdened early
career researchers. These results place in doubt the suitability of linear regression models of future h-index for real application in recruitment decisions and indicate that more effort is needed to develop and benchmark career predictability algorithms.
- Models to predict scientists' future impact often fail, Phys.Org (Oct.30, 2013)
- Divinations of academic success may be flawed, Nature
A. M. Petersen, S. Succi
The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile (pdf)
J. Informetrics 7, 823-832 (2013). DOI: 10.1016/j.joi.2013.07.003
We present a simple generalization of Hirsch's h-index, Z = sqrt(h^2 + C)/5, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^2)/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of Z and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists.
O. Penner, A. M. Petersen, R. K. Pan, S. Fortunato
The case for caution in predicting scientists' future impact (pdf)
Physics Today 66, 8-9 (2013). DOI: 10.1063/PT.3.1928
To further examine dimensions of career predictability as proposed by Acuna et al. [Nature 489, 201-2 2012], we applied their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. We use the Acuna model to calculate the predictive power of the model as a function of the number of years into the future we are attempting to predict as well as the career age of the scientists. The Acuna model does a respectable job of predicting h(t+Delta t), say, 3 or 4 years into the future when aggregating all age cohorts together. However, when calculated for subset of specific age cohorts we find that the model's predictive power significantly decreases, especially when applied to researchers in the first three years of their career. In those cases the model does a much worse job of predicting future success, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied.
A. Chessa, A. Morescalchi, F. Pammolli, O. Penner, A. M. Petersen, M. Riccaboni
Is Europe Evolving Toward an Integrated Research Area? (pdf)
Science 339, 650-651 (2013). DOI: 10.1126/science.1227970
An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 along with a sample of 2.6 x 10^5 records from the ISI Web of Science to quantitatively measure the role of borders in international R&D collaboration and mobility. From these data we construct five different networks for each year analyzed: (i) the patent co-inventor network, (ii) the publication co-author network, (iii) the co-applicant patent network, (iv) the patent citation network, and (v) the patent mobility network. We use methods from network science and econometrics to perform a comparative analysis across time and between EU and non-EU countries to determine the ``treatment effect'' resulting from EU integration policies. Using non-EU countries as a control set, we provide quantitative evidence that, despite decades of efforts to build a European Research Area, there has been little integration above global trends in patenting and publication. This analysis provides concrete evidence that Europe remains a collection of national innovation systems.
- European Research: Still Fragmented After All These Years, AlphaGalileo Foundation
- Europe still has a way to go to achieve true unity, Research Europe, Issue 359
- Ricerca europea, l'integrazione ancora non c'e, Le Scienze (Scientific American, Italy)
- Unione europea, ancora non cadono le frontiere della ricerca, Wired
A. M. Petersen, J. Tenenbaum, S. Havlin, H. E. Stanley, M. Perc
Languages cool as they expand: Allometric scaling and the decreasing need for new words (pdf)
Scientific Reports 2, 943 (2012). DOI: 10.1038/srep00943
Language is the hallmark of our cumulative culture, by which means we are able to continuously improve on the achievements of previous generations. According to the most recent estimates, the size of the English lexicon has grown by roughly 88% during the 20th century alone. But what is the utility of so many new words? What is the reach of each new word? Since many new words are technical, what is the likelihood of encountering them, and alternatively, what is the use in remembering them? Underlying these questions is the pressure applied by technological change, which is fundamentally altering the ways in which humans communicate, store, and recall information. We test the stability of the large-scale statistical properties of written language over the 209-year period 1800-2008, analyzing the Zipf law, the Heaps' law, and the size-variance relation quantifying langauge growth patterns for all Google 1-gram databases comprising 7 different langauges. We find that the annual growth fluctuations of word use has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
- Choice Words: Graphing the evolution of language, arts&sciences Fall 2013 Magazine (Annual BU Research Highlight)
- How big is your language?, The Hindu, (Dec. 20, 2012)
- Physicists Explore The Rise And Fall Of Words, Inside Science News Service (ISNS)
- When physicists do linguistics, The Boston Globe / International Herald Tribune (Feb. 10/11, 2013)
A. M. Petersen, M. Riccaboni, H. E. Stanley, F. Pammolli.
Persistence and Uncertainty in the Academic Career (pdf)
Proceedings of the National Academy of Sciences USA 109, 5213 - 5218 (2012). DOI: 10.1073/pnas.1121429109
Recent shifts in the business structure of universities and a bottleneck
in the supply of tenure track positions are two issues that threaten to change the
longstanding patronage system in academia.
Understanding how institutional changes within academia
may affect the overall potential of science requires a better quantitative understanding of how careers evolve over
time. Since knowledge spillovers, cumulative advantage, and collaboration are distinctive features of the academic
profession, the employment relationship should
be designed to account for these factors. We quantify the impact of these factors in the production n_i(t) of a
given scientist i by analyzing the longitudinal career data of 300 scientists and
compare our results with 21,156 sports careers comprising a non-academic labor force.
The increase in the typical size of scientific collaborations has led to the increasingly difficult task of allocating funding and assigning recognition.
We use measures of the scientific collaboration radius, which can change dramatically over the course of a career, to provide insight into the role of collaboration in productio
We introduce a model of proportional growth to provide insight into the complex relation between knowledge spillovers,
competition, and uncertainty at the individual scale.
Our model shows that high competition levels can make careers vulnerable to ``sudden death'' termination relatively early in the career as a result of negative production fluctuations and not necessarily due to lack of individual persistence.
- Short-term contracts may hinder young scientists, PNAS Highlight
A. M. Petersen, J. Tenenbaum, S. Havlin, H. E. Stanley.
Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death (pdf)
Scientific Reports 2, 313 (2012). DOI: 10.1038/srep00313
In this aggregate analysis of the growth rates of millions of words we demonstrate significant signatures of competition
driven systems in the linguistic arena of English, Spanish and Hebrew.
How often a given word is used, relative to other words, can convey information about the word's linguistic utility. Using Google word data for 3 languages over the 209-year period 1800-2008, we found by analyzing word use an anomalous recent change in the birth and death rates of words, which indicates a shift towards increased levels of competition between words as a result of new standardization technology. We demonstrate unexpected analogies between the growth dynamics of word use and the growth dynamics of economic institutions. Our results support the intriguing concept that a language's lexicon is a generic arena for competition which evolves according to selection laws that are related to social, technological, and political trends. Specifically, the aggregate properties of language show pronounced differences during periods of world conflict, e.g. World War II.
- F1000 Evaluated Article, Faculty of 1000 post-publication peer review
- The New Science of the Birth and Death of Words, Wall Street Journal (Mar. 17, 2012)
- Languages Lose Vocab to Science and Spell-Check, InnovationNewsDaily
- Digital Spell-Checking May Be Killing Off Words, LiveScience / MSNBC / Discovery.com
- Modern era brings death to words, ScienceNews
- Study tracks births, deaths of words, United Press International (UPI)
- Study reveals words' Darwinian struggle for survival, theGuardian
- La guerra de las palabras, el Espectador (Colombia)
- Word Extinction, A nice blog summary by Dev Gualtieri (Aug. 11, 2011)
A. M. Petersen, W-S. Jung, J-S. Yang, H. E. Stanley.
Quantitative and Empirical demonstration of the Matthew Effect in a study of Career Longevity (pdf)
Proceedings of the National Academy of Sciences USA 108, 18-23 (2011). DOI: 10.1073/pnas.1016733108
In many competitive systems, there are typically only few "big winners." This largely reflects the everyday fact that
obtaining future opportunities often depends on an individual's record of achievement since employment opportunities are limited to a finite number of competitors.
We solve exactly a longevity model which predicts the distribution of career length P(x) for professions characterized by high selectivity and uncertainty. We confirm the model's prediction for P(x) using extensive empirical data for the careers of both scientists (publishing in high-impact journals such as Nature, Science, etc.) and professional athletes (playing in MLB, NBA, Premier League, and Korean Professional Baseball). This study uncovers a remarkably simple statistical law which describes the frequencies of the extremely short careers of `one-hit wonders' as well as the extremely long careers of the `iron-horses'. Our model highlights the importance of early career development, showing that many careers are stunted by the relative disadvan- tage associated with inexperience.
A. M. Petersen, H. E. Stanley, S. Succi.
Statistical regularities in the rank-citation profile of scientists (pdf)
Scientific Reports 1, 181 (2011). DOI: 10.1038/srep00181
We analyze the individual career publication statistics of 200 'stellar' physicists and 100 Assistant professors in order to better understand success, productivity, and the h-index. In order to analyze the entire set of publications of a given scientist at once, we analyze the rank-citation curve c(r) using the Zipf ranking technique. Incredibly, we observe universal feature: although every scientist has a distinct h value, each scientist also has a similar (two-parameter) curve c(r)! Using the properties of this universal curve we show that the total number of citations C scales with an author's h-index as C ~ h^(1+\beta), where \beta is a high-rank power-law scaling exponent for c(r). That the human endeavors of these scientists produces a common representative curve suggests that scientific careers are governed by the statistical laws of competition and cumulative advantage. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
A. M. Petersen, O. Penner, H. E. Stanley.
Methods for detrending success metrics to account for inflationary and deflationary factors (pdf) Eur. Phys. J. B 79, 67-78 (2011). DOI: 10.1140/epjb/e2010-10647-1
Pre-print title: Detrending career statistics in professional Baseball: accounting for the Steroids Era and beyond Abstract
We compare both career and seasonal achievements of 130+ years of baseball players, (e.g., addressing the question of who effectively hit more home runs -- Babe Ruth or Barry Bonds?), using statistical methods to account for time-dependent factors that inflate success measures. We provide non-technical top-50 record tables for career HR, H, RBI, W, K and season HR, H, RBI, K, focussing on the accessible measures found in newspaper box-scores and on the back of baseball cards.
- Complexity Theory and the National Baseball Hall of Fame, the European Physical Journal News Highlights
- New Statistical Method Ranks Sports Players From Different Eras, MIT Technology Review
- Boston University clip, The Daily Free Press
- A Physics Curveball, arts&sciences Fall 2010 Magazine (Annual BU Research Highlight)
- Baseball Greats Reranked, BU Today, April 8, 2011
A. M. Petersen.
Applications of Statistical Physics to the Social and Economic Sciences (pdf)
PhD Thesis, Boston University (2011). Thesis Advisor: H. Eugene Stanley
B. Podobnik, D. Horvatic, A. M. Petersen, B. Urosevic, H. E. Stanley.
Bankruptcy risk model and empirical tests (pdf)
Proceedings of the National Academy of Sciences USA 107, 18325 (2010). DOI: 10.1073/pnas.1011942107
We compare bankrupt companies with non-bankrupt companies using Zipf ranking techniques to analyze the debt-to-assets leverage ratio R.
Using the distribution of R for bankrupt versus non-bankrupt companies, we estimate the bankruptcy risk of an existing company conditional on its current R value and find that the probability of bankruptcy P(B) ~ R.
- The relationship between bankruptcy and relative debt
for U.S. companies , PNAS Highlight
A. M. Petersen, F, Wang, S. Havlin, H. E. Stanley.
Market dynamics immediately before and after financial shocks: quantifying the Omori, productivity and Bath laws (pdf)
Physical Review E 82, 036114 (2010). DOI: 10.1103/PhysRevE.82.036114
Financial shocks (incoming information) can cause significant cascading (e.g. "market rallies"), so we use methods from earthquake physics to better understand the expected dynamics before and after shocks of characteristic main-shock magnitude M.
A. M. Petersen, F. Wang, S. Havlin, H. E. Stanley.
Quantitative law describing market dynamics before and after interest-rate change (pdf)
Physical Review E 81, 066121 (2010). DOI: 10.1103/PhysRevE.81.066121
We analyze the financial "earthquake" that occurs evey time the U.S. Federal Reserve makes an announcement to change the federal target interest rate, and estimate the magnitude of market `anticipation' and `surprise' using the fundamental relationship between the federal effective `overnight' interest rate and the 6-month Treasury Bill.
- Bernanke Announcement Leaves Quake Like Aftershocks , Inside Science News Service
A. M. Petersen, B. Podobnik, D. Horvatic, H. E. Stanley.
Scale-invariant properties of public-debt growth (pdf)
Europhysics Letters 90, 38006 (2010). DOI: 10.1209/0295-5075/90/38006
Applying methods from macro-economic growth theory, we find 'convergence' in country debt-to-GDP leverage ratios over the last 30+ years.
A. M. Petersen, F. Wang, H. E. Stanley.
Methods for measuring the citations and productivity of scientists across time and discipline (pdf)
Physical Review E 81, 036114 (2010). DOI: 10.1103/PhysRevE.81.036114
If we account for the time-dependent increase in paper citations as well as variations in paper collaboration group size, what do the distributions of (i) career total citations and (ii) career total number of publications look like for individual scientists in highly competitive journals? Also, evidence of cumulative advantage demonstrated by the increasing publication rate of individual scientists with each new publication in his/her career.
B. Podobnik, D. Horvatic, A. M. Petersen, M. Njavro, H. E. Stanley.
Common scaling behavior in finance and macroeconomics (pdf)
Eur. Phys. J. B 76, 487 (2010). DOI: 10.1140/epjb/e2009-00380-3
We analyze the growth rates of worldwide stock indices and relate the market capitalization (MC) of the index to the gross domestic product (GDP) of the index country.
B. Podobnik, D. Horvatic, A. M. Petersen, H. E. Stanley.
Quantitative relations between risk, return, and firm size (pdf)
Europhysics Letters 85, 50003 (2009). DOI: 10.1209/0295-5075/85/50003
For individual companies comprising the Nasdaq (2002-2008) and S&P500 (2003-2008) indices, we analyze the logarithmic growth rate (return) R of the stock price. We also relate the annual market capitalization (MC) and the return-to-risk < R >/sigma(R) for each company and find interesting differences between the Nasdaq and S&P500.
B. Podobnik, D. Horvatic, A. M. Petersen, H. E. Stanley.
Cross-Correlations between Volume Change and Price Change (pdf)
Proceedings of the National Academy of Sciences USA 106, 22079 (2009). DOI: 10.1073/pnas.0911983106
In analogy to the analysis of price volatility in financial markets, we analyze the absolute logarithmic returns (volatility) of total volume at the 1-day time resolution for individual stocks as well as stock indices, and use Detrended Cross-Correlation Analysis (DCCA) to quantify the relation between price volatility and volume
A. M. Petersen, W-S. Jung, H. E. Stanley.
On the distribution of career longevity and the evolution of home run prowess in professional baseball (pdf)
Europhysics Letters 83, 50010 (2008). DOI: 10.1209/0295-5075/83/50010
How is it that 3% of all fielders finish their career with one at-bat and 3% of all pitchers finish their career with less than one inning pitched; Yet, there are also some careers that span more than 10,000 at-bats and 3,000 innings pitched? Analyzing every Major League Baseball player career over the 80-year period 1920-2000, we find a beautiful statistical law which describes both the extremely short careers of `one-hit wonders' as well as the extremely long careers of the `iron-horses'. Furthermore, analyzing home run rates, we find evidence consistent with performance enhancing drugs during the `Steroids Era' of the 1990's and 2000's.
M. Mobilia, A. Petersen, S. Redner.
On the role of Zealotry in the Voter Model (pdf)
J. Stat. Mech. 08, P08029 (2007). DOI: 10.1088/1742-5468/2007/08/P08029
Why is it that in the history of democratic elections (e.g. Presidential elections), complete consensus (polarization) has never been achieved? For example, the largest percentage of voters for U.S. President elect was approximately 61% in Johnson over Goldwater, 1964. We investigate a stochastic opinion model in which consensus is stymied by the presence of zealots, agents who are completely fixed in their opinion, even if all their neighbors are of opposite opinion. Surprisingly, we find that the number and not the density of zealots determines the degree of consensus among the voters in our model.
Using big data to quanitfy the evoloution of language at the micro and macro scale (2013) (pdf)
R-rated version here (pdf), presented at Nerd Nite Milan, Oct. 30 2013
Ascent in competitive arenas: From Fenway Park to Mass Ave (2013) (pdf)
Presented at the "Science of Success" Symposium, Northeastern Univ. & IQSS Harvard University
Multilevel networks in science: from individual careers to Europe (2013) (pdf)
Presented at the "Econophysics and Networks Across Scales" Workshop, Lorentz Center International Center for workshops in the Sciences, Leiden University
Beyond the Asterisk* : Adjusting for Performance Inflation in Professional Sports (2012) (pdf)
presented at the "Sabermetrics, Scouting and the Science of Baseball" weekend seminar for the benefit of the Jimmy Fund
Persistency and uncertainty across the academic career (2012) (pdf)
Quantifying statistical regularities in the career achievements of scientists and professional athletes (2012) (pdf)
Quantitative laws describing market dynamics before and after interest-rate change and other financial shocks (2011) (pdf)