Places I like to go

Zabriskie Point — Death Valley National Park, CA



Complex Systems:

Computational and Mathematical Modeling

  1. A. M. Petersen, F. Arroyave, F. Pammolli
    The disruption index is biased by citation inflation (pdf)
    In press, Quantitative Science Studies (2024). DOI:10.1162/qss_a_00333 Abstract A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to ‘citation inflation’, an unavoidable characteristic of real citation networks that manifests as a systematic time-dependent bias and renders cross-temporal analysis challenging. One driver of citation inflation is the ever-increasing lengths of reference lists over time, which in turn increases the density of links in citation networks, and causes the disruption index to converge to 0. The impact of this systematic bias further stymies efforts to correlate disruption to other measures that are also time-dependent, such as team size and citation counts. In order to demonstrate this fundamental measurement problem, we present three complementary lines of critique (deductive, empirical and computational modeling), and also make available an ensemble of synthetic citation networks that can be used to test alternative citation-based measures for systematic bias.

  2. F. Arroyave, J. Jenkins, A. M. Petersen
    Network embedding for understanding the National Park System through the lenses of news media, scientific communication and biogeography (pdf)
    Annals of the American Association of Geographers (2023). DOI:10.1080/24694452.2023.2277808 Abstract The U.S. national parks encompass a variety of biophysical and historical resources important for national cultural heritage. Yet how these resources are socially constructed often depends upon the beholder. Parks tend to be conceptualized according to their (fixed) geographic context, so our understanding of this system of systems is dominated by this geographic lens. To expose the systemic structure that exists beyond their geographic embedding, we analyze three representations of the national park system using park-park similarity networks according to their co-occurrence in: (a) ~423,000 news media articles; (b) ~11,000 research publications; and (c) ~60,000 species inhabiting parks. We quantify structural variation between network representations leveraging similarity measures at different scales: park-level (park-park correlations) and system-level (network communities’ consistency). Because parks are governed and experienced at multiple scales, cross-network comparison informs how management should account for the varying objectives and constraints that dominate at each scale. Our results identify an interesting paradox: whereas park-level correlations depend strongly on the representative lens, the network communities are remarkably robust and consistent with the underlying geographic embedding. Overall, our data-driven methodology is generalizable to other geographically embedded systems and supports the holistic analysis of systems-level structure that may elude other approaches.
    -How Scientific Research Can Inform Visitor and Environmental Management at National Parks, UC Merced Newsroom, Patty Guerra,

  3. F. J. Arroyave, A. M. Petersen, J. Jenkins, R. G. Hurtado
    Multiplex networks reveal geographic constraints on illicit wildlife trafficking (pdf)
    Applied Network Science 5, 20 (2020). DOI:10.1007/s41109-020-00262-6 Abstract Illicit wildlife trafficking poses a threat to the conservation of species and ecosystems, and represents a fundamental source of biodiversity loss, alongside climate change and large-scale land degradation. Despite the seriousness of this issue, little is known about various socio-cultural demand sources underlying trafficking networks, for example the forthright consumption of endangered species on different cultural contexts. Our study illustrates how wildlife trafficking represents a wicked problem at the intersection of criminal enforcement, cultural heritage and environmental systems management. As with similar network-based crimes, institutions are frequently ineffective at curbing wildlife trafficking, partly due to the lack of information detailing activities within illicit trading networks. To address this shortcoming, we leverage official government records documenting the illegal trade of reptiles in Colombia. As such, our study contributes to the understanding of how and why wildlife trafficking persists across robust trafficking networks, which are conduits for a broader range of black-market goods. Leveraging geo-spatial data, we construct a multiplex representation of wildlife trafficking networks, which facilitates identifying network properties that are signatures of strategic trafficker behavior. In particular, our results indicate that traffickers’ actions are constrained by spatial and market customs, a result which is apparent only within an integrated multiplex representation. Characteristic levels of sub-network coupling further indicate that traffickers strategically leverage knowledge of the entire system. We argue that this multiplex representation is essential for prioritizing crime enforcement strategies aimed at disrupting robust trade networks, thereby enhancing the effectiveness and resources allocation of institutions charged with curbing illicit trafficking. We develop a generalizable model of multiplex criminal trade networks suitable for communicating with policy makers and practitioners, thereby facilitating rapid translation into public policy and environmental conservation efforts.

  4. R. K. Pan, A. M. Petersen, F. Pammolli, S. Fortunato
    The Memory of Science: Inflation, Myopia, and the Knowledge Network (pdf)
    J. Informetrics 12, 656-678 (2018). DOI:10.1016/j.joi.2018.06.005 Abstract Scientific production is steadily growing, exhibiting 4% annual growth in publications and 1.8% annual growth in the number of references per publication, together producing a 12-year doubling period in the total supply of references, i.e. links in the science citation network. This growth has far-reaching implications for how academic knowledge is connected, accessed and evaluated. Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a detailed analysis of the 'attention economy' in science. Our results show how growth relates to 'citation inflation', increased connectivity in the citation network resulting from decreased levels of uncitedness, and a narrowing range of attention - as both very classic and very recent literature are being cited increasingly less. The decreasing attention to recent literature published within the last 6 years suggests that science has become stifled by a publication deluge destabilizing the balance between production and consumption. To better understand these patterns together, we developed a generative model of the citation network, featuring exponential growth, the redirection of scientific attention via publications' reference lists, and the crowding out of old literature by the new. We validate our model against several empirical benchmarks, and then use perturbation analysis to measure the impact of shifts in citing behavior on the synthetic system's properties, thereby providing insights into the functionality of the science citation network as an infrastructure supporting the memory of science.
    -The growth of papers is crowding out old classics, Nature Index, Gemma Conroy

  5. L. Leydesdorff, A. M. Petersen, I. Ivanova
    Self-Organization of Meaning and the Reflexive Communication of Information (pdf)
    Social Science Information 56(1), 4-27 (2017). DOI:10.1177/0539018416675074 Abstract Following a suggestion of Warren Weaver, we extend the Shannon model of communication piecemeal into a complex systems model in which communication is differentiated both vertically and horizontally. This model enables us to bridge the divide between Niklas Luhmann's theory of the self-organization of meaning in communications and empirical research using information theory. First, we distinguish between communication relations and correlations between patterns of relations. The correlations span a vector space in which relations are positioned and thus provided with meaning. Second, positions provide reflexive perspectives. Whereas the different meanings are integrated locally, each instantiation opens horizons of meaning that can be codified along eigenvectors of the communication matrix. The next-order specification of codified meaning can generate redundancies (as feedback on the forward arrow of entropy production). The horizontal differentiation among the codes of communication enables us to quantify the creation of new options as mutual redundancy. Increases in redundancy can then be measured as local reduction of prevailing uncertainty (in bits). The generation of options can also be considered as a hallmark of the knowledge-based economy: new knowledge provides new options. Both the communication-theoretical and the operational (information-theoretical) perspectives can thus be further developed.

  6. A. M. Petersen, D. Rotolo, L. Leydesdorff
    A Triple Helix Model of Medical Innovation: Supply, Demand, and Technological Capabilities in terms of Medical Subject Headings (pdf)
    Research Policy 45(3), 666-681 (2016). DOI:10.1016/j.respol.2015.12.004 Abstract We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii) supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on triple helix research, we use entropy statistics to elaborate an indicator of mutual information among these dimensions that can provide indication of reduction of uncertainty. To do so, we focus on the medical context, where uncertainty poses significant challenges to the governance of innovation. We use the Medical Subject Headings (MeSH) of MEDLINE/PubMed to identify publications classified within the categories "Diseases" (C),"Drugs and Chemicals" (D), "Analytic, Diagnostic, and Therapeutic Techniques and Equipment" (E) and use these as knowledge representations of demand, supply, and technological capabilities, respectively. Three case-studies of medical research areas are used as representative 'entry perspectives' of the medical innovation process. These are: (i) human papilloma virus, (ii) RNA interference, and (iii) magnetic resonance imaging. We find statistically significant periods of synergy among demand, supply, and technological capabilities (C-D-E) that point to three-dimensional interactions as a fundamental perspective for the understanding and governance of the uncertainty associated with medical innovation. Among the pairwise configurations in these contexts, the demand-technological capabilities (C-E) provided the strongest link, followed by the supply-demand (D-C) and the supply-technological capabilities (D-E) channels.

  7. C. Schulz, A. Mazloumian, A. M. Petersen, O. Penner, D. Helbing
    Exploiting citation networks for large-scale author name disambiguation (pdf)
    EPJ Data Science 3, 11 (2014). DOI:10.1140/epjds/s13688-014-0011-3 Abstract We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon on the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects linked papers and then merges similar clusters. This parameterized model is optimized towards an h-index based recall, which favors the inclusion of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.

  8. A. M. Petersen, S. Fortunato, R. K. Pan, K. Kaski, O. Penner, A. Rungi, M. Riccaboni, H. E. Stanley, F. Pammolli
    Reputation and Impact in Academic Careers (pdf)    (Supporting Information)
    Proceedings of the National Academy of Sciences USA 111, 15316-15321 (2014). DOI:10.1073/pnas.1323111111 Abstract Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate \Delta c depends on the reputation of its central author i, in addition to its net citation count c. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations C_i of each scientist as his/her reputation measure. We find a citation crossover cx which distinguishes the strength of the reputation effect. For publications with c < c_x, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in C_i. However, the reputation effect becomes negligible for highly cited publications meaning that for c >= c_x the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.
    -Recognition: Build a reputation, Nature Jobs
    -Being a big name in science brings benefits, Nature
    -Researchers prefer citing researchers of good reputation, Phys.org
    -Scientists' reputations and citation rates, PNAS Highlight

  9. A. M. Petersen, M. Riccaboni, H. E. Stanley, F. Pammolli.
    Persistence and Uncertainty in the Academic Career (pdf)
    Proceedings of the National Academy of Sciences USA 109, 5213 - 5218 (2012). DOI: 10.1073/pnas.1121429109 Abstract Recent shifts in the business structure of universities and a bottleneck in the supply of tenure track positions are two issues that threaten to change the longstanding patronage system in academia. Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative understanding of how careers evolve over time. Since knowledge spillovers, cumulative advantage, and collaboration are distinctive features of the academic profession, the employment relationship should be designed to account for these factors. We quantify the impact of these factors in the production n_i(t) of a given scientist i by analyzing the longitudinal career data of 300 scientists and compare our results with 21,156 sports careers comprising a non-academic labor force. The increase in the typical size of scientific collaborations has led to the increasingly difficult task of allocating funding and assigning recognition. We use measures of the scientific collaboration radius, which can change dramatically over the course of a career, to provide insight into the role of collaboration in productio n efficiency. We introduce a model of proportional growth to provide insight into the complex relation between knowledge spillovers, competition, and uncertainty at the individual scale. Our model shows that high competition levels can make careers vulnerable to ``sudden death'' termination relatively early in the career as a result of negative production fluctuations and not necessarily due to lack of individual persistence.
    - Short-term contracts may hinder young scientists, PNAS Highlight

  10. A. M. Petersen, W-S. Jung, J-S. Yang, H. E. Stanley.
    Quantitative and Empirical demonstration of the Matthew Effect in a study of Career Longevity (pdf)
    Proceedings of the National Academy of Sciences USA 108, 18-23 (2011). DOI: 10.1073/pnas.1016733108 Abstract In many competitive systems, there are typically only few "big winners." This largely reflects the everyday fact that obtaining future opportunities often depends on an individual's record of achievement since employment opportunities are limited to a finite number of competitors. We solve exactly a longevity model which predicts the distribution of career length P(x) for professions characterized by high selectivity and uncertainty. We confirm the model's prediction for P(x) using extensive empirical data for the careers of both scientists (publishing in high-impact journals such as Nature, Science, etc.) and professional athletes (playing in MLB, NBA, Premier League, and Korean Professional Baseball). This study uncovers a remarkably simple statistical law which describes the frequencies of the extremely short careers of `one-hit wonders' as well as the extremely long careers of the `iron-horses'. Our model highlights the importance of early career development, showing that many careers are stunted by the relative disadvan- tage associated with inexperience.

  11. A. M. Petersen, H. E. Stanley, S. Succi.
    Statistical regularities in the rank-citation profile of scientists (pdf)
    Scientific Reports 1, 181 (2011). DOI: 10.1038/srep00181 Abstract We analyze the individual career publication statistics of 200 'stellar' physicists and 100 Assistant professors in order to better understand success, productivity, and the h-index. In order to analyze the entire set of publications of a given scientist at once, we analyze the rank-citation curve c(r) using the Zipf ranking technique. Incredibly, we observe universal feature: although every scientist has a distinct h value, each scientist also has a similar (two-parameter) curve c(r)! Using the properties of this universal curve we show that the total number of citations C scales with an author's h-index as C ~ h^(1+\beta), where \beta is a high-rank power-law scaling exponent for c(r). That the human endeavors of these scientists produces a common representative curve suggests that scientific careers are governed by the statistical laws of competition and cumulative advantage. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.

  12. B. Podobnik, D. Horvatic, A. M. Petersen, B. Urosevic, H. E. Stanley.
    Bankruptcy risk model and empirical tests (pdf)
    Proceedings of the National Academy of Sciences USA 107, 18325 (2010). DOI: 10.1073/pnas.1011942107 Abstract We compare bankrupt companies with non-bankrupt companies using Zipf ranking techniques to analyze the debt-to-assets leverage ratio R. Using the distribution of R for bankrupt versus non-bankrupt companies, we estimate the bankruptcy risk of an existing company conditional on its current R value and find that the probability of bankruptcy P(B) ~ R.
    - The relationship between bankruptcy and relative debt for U.S. companies , PNAS Highlight

  13. B. Podobnik, D. Horvatic, A. M. Petersen, H. E. Stanley.
    Cross-Correlations between Volume Change and Price Change (pdf)
    Proceedings of the National Academy of Sciences USA 106, 22079 (2009). DOI: 10.1073/pnas.0911983106 Abstract In analogy to the analysis of price volatility in financial markets, we analyze the absolute logarithmic returns (volatility) of total volume at the 1-day time resolution for individual stocks as well as stock indices, and use Detrended Cross-Correlation Analysis (DCCA) to quantify the relation between price volatility and volume volatility.

  14. M. Mobilia, A. Petersen, S. Redner.
    On the role of Zealotry in the Voter Model (pdf)
    J. Stat. Mech. 08, P08029 (2007). DOI: 10.1088/1742-5468/2007/08/P08029 Abstract Why is it that in the history of democratic elections (e.g. Presidential elections), complete consensus (polarization) has never been achieved? For example, the largest percentage of voters for U.S. President elect was approximately 61% in Johnson over Goldwater, 1964. We investigate a stochastic opinion model in which consensus is stymied by the presence of zealots, agents who are completely fixed in their opinion, even if all their neighbors are of opposite opinion. Surprisingly, we find that the number and not the density of zealots determines the degree of consensus among the voters in our model.

Presentations: