IMT Institutional Repository: No conditions. Results ordered -Date Deposited.

A Neural Network Ensemble Approach for GDP Forecasting

2021-03-17T10:04:23Z

We propose an ensemble learning methodology to forecast the future US GDP growth release. Our approach combines a Recurrent Neural Network (RNN) with a Dynamic Factor model accounting for time-variation in mean with a General- ized Autoregressive Score (DFM-GAS). The analysis is based on a set of predictors encompassing a wide range of variables measured at different frequencies. The forecast exercise is aimed at evaluating the predictive ability of each model's com- ponent of the ensemble by considering variations in mean, potentially caused by recessions affecting the economy. Thus, we show how the combination of RNN and DFM-GAS improves forecasts of the US GDP growth rate in the aftermath of the 2008-09 global financial crisis. We find that a neural network ensemble markedly reduces the root mean squared error for the short-term forecast horizon.

Improving the Prediction of Clinical Success Using Machine Learning

2020-10-05T08:12:33Z

In pharmaceutical research, assessing drug candidates’ odds of success as they move through clinical research often relies on crude methods based on historical data. However, the rapid progress of machine learning offers a new tool to identify the more promising projects. To evaluate its usefulness, we trained and validated several machine learning algorithms on a large database of projects. Using various project descriptors as input data we were able to predict the clinical success and failure rates of projects with an average balanced accuracy of 83% to 89%, which compares favorably with the 56% to 70% balanced accuracy of the method based on historical data. We also identified the variables that contributed most to trial success and used the algorithm to predict the success (or failure) of assets currently in the industry pipeline. We conclude by discussing how pharmaceutical companies can use such model to improve the quantity and quality of their new drugs, and how the broad adoption of this technology could reduce the industry’s risk profile with important consequences for industry structure, R&D investment, and the cost of innovation.

Measuring the Input Rank in Global Supply Networks

2020-09-02T14:12:07Z

We introduce the Input Rank as a measure of relevance of direct and indirect suppliers in Global Value Chains. We conceive an intermediate input to be more relevant for a downstream buyer if a decrease in that input’s productivity affects that buyer more. In particular, in our framework, the relevance of any input depends on: i) the network position of the supplier relative to the buyer, ii) the patterns of intermediate inputs vs. labor intensities connecting the buyer and the supplier, iii) and the competitive pressures along supply chains. After we compute the Input Rank from both U.S. and world Input-Output tables, we provide useful insights on the crucial role of services inputs as well as on the relatively higher relevance of domestic suppliers and suppliers coming from regionally integrated partners. Finally, we test that the Input Rank is a good predictor of vertical integration choices made by 20,489 U.S. parent companies controlling 154,836 subsidiaries worldwide.

Machine Learning for Zombie Hunting. Firms’ Failures and Financial Constraints.

2020-06-15T11:52:24Z

In this contribution, we exploit machine learning techniques to predict the risk of failure of firms. Then, we propose an empirical definition of zombies as firms that persist in a status of high risk, beyond the highest decile, after which we observe that the chances to transit to lower risk are minimal. We implement a Bayesian Additive Regression Tree with Missing Incorporated in Attributes (BART-MIA), which is specifically useful in our setting as we provide evidence that patterns of undisclosed accounts correlate with firms’ failures. After training our algorithm on 304,906 firms active in Italy in the period 2008-2017, we show how it outperforms proxy models like the Z-scores and the Distance-to-Default, traditional econometric methods, and other widely used machine learning techniques. We document that zombies are on average 21% less productive, 76% smaller, and they increased in times of financial crisis. In general, we argue that our application helps in the design of evidence-based policies in the presence of market failures, for example optimal bankruptcy laws. We believe our framework can help to inform the design of support programs for highly distressed firms after the recent pandemic crisis.

Simulation of Covid-19 epidemic evolution: are compartmental models really predictive?

2020-04-16T09:53:42Z

Computational models for the simulation ofthe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic evolution would be extremely useful to support authorities in designing healthcare policies and lockdown measures to contain its impact on public health and economy. In Italy, the devised forecasts have been mostly based on a pure data-driven approach, by fitting and extrapolating open data on the epidemic evolution collected by the Italian Civil Protection Center. In this respect, SIR epidemiological models, which start from the description of the nonlinear interactions between population compartments, would be a much more desirable approach to understand and predict the collective emergent response. The present contribution addresses the fundamental question whether a SIR epidemiological model, suitably enriched with asymptomatic and dead individual compartments, could be able to provide reliable predictions on the epidemic evolution. To this aim, a machine learning approach based on particle swarm optimization (PSO) is proposed to automatically identify the model parameters based on a training set of data of progressive increasing size, considering Lombardy in Italy as a case study. The analysis of the scatter in the forecasts shows that model predictions are quite sensitive to the size of the dataset used for training, and that further data are still required to achieve convergent - and therefore reliable- predictions.

Reconnecting statistical physics and combinatorics beyond ensemble equivalence

2018-03-09T13:30:20Z

In statistical physics, the challenging combinatorial enumeration of the configurations of a system subject to hard constraints (microcanonical ensemble) is mapped to a mathematically easier calculation where the constraints are softened (canonical ensemble). However, the mapping is exact only when the size of the system is infinite and if the property of ensemble equivalence (EE), i.e. the asymptotic identity of canonical and microcanonical large deviations, holds. For finite systems, or when EE breaks down, statistical physics is currently believed to provide no answer to the combinatorial problem. In contrast with this expectation, here we establish exact relationships connecting conjugate ensembles in full generality, even for finite system size and when EE does not hold. We also show that in the thermodynamic limit the ensembles are directly related through the matrix of canonical (co)variances of the constraints, plus a correction term that survives only if this matrix has an infinite number of finite eigenvalues. These new relationships restore the possibility of enumerating microcanonical configurations via canonical probabilities, thus reconnecting statistical physics and combinatorics in realms where they were believed to be no longer in mutual correspondence.

The double role of GDP in shaping the structure of the International Trade Network

2018-03-09T13:15:51Z

The International Trade Network (ITN) is the network formed by trade relationships between world countries. The complex structure of the ITN impacts important economic processes such as globalization, competitiveness, and the propagation of instabilities. Modeling the structure of the ITN in terms of simple macroeconomic quantities is therefore of paramount importance. While traditional macroeconomics has mainly used the Gravity Model to characterize the magnitude of trade volumes, modern network theory has predominantly focused on modeling the topology of the ITN. Combining these two complementary approaches is still an open problem. Here we review these approaches and emphasize the double role played by GDP in empirically determining both the existence and the volume of trade linkages. Moreover, we discuss a unified model that exploits these patterns and uses only the GDP as the relevant macroeconomic factor for reproducing both the topology and the link weights of the ITN.

Evidence for Mixed Rationalities in Preference Formation

2018-03-08T17:16:02Z

Understanding the mechanisms underlying the formation of cultural traits is an open challenge. This is intimately connected to cultural dynamics, which has been the focus of a variety of quantitative models. Recent studies have emphasized the importance of connecting thosemodels to empirically accessible snapshots of cultural dynamics. In particular, it has been suggested that empirical cultural states, which differ systematically from randomized counterparts, exhibit properties that are universally present. Hence, a question about the mechanism responsible for the observed patterns naturally arises. This study proposes a stochastic structural model for generating cultural states that retain those robust empirical properties. One ingredient of the model assumes that every individual’s set of traits is partly dictated by one of several universal “rationalities,” informally postulated by several social science theories.The second, new ingredient assumes that, apart from a dominant rationality, each individual also has a certain exposure to the other rationalities. It is shown that both ingredients are required for reproducing the empirical regularities. This suggests that the effects of cultural dynamics in the real world can be described as an interplay of multiple, mixing rationalities, providing indirect evidence for the class of social science theories postulating such a mixing.

Synchronization and functional central limit theorems for interacting reinforced random walks

2018-03-05T09:56:48Z

We obtain Central Limit Theorems in Functional form for a class of time-inhomogeneous interacting random walks. Due to a reinforcement mechanism and interaction, the walks are strongly correlated and converge almost surely to the same, possibly random, limit. We study random walks interacting through a mean-field rule and compare the rate they converge to their limit with the rate of synchronization, i.e. the rate at which their mutual distances converge to zero. We show that, under certain conditions, synchronization is faster than convergence. Even if our focus is on theoretical results, we propose as main motivations two contexts in which such results could directly apply: urn models and opinion dynamics in a random network evolving via preferential attachment.

Management Science for Complex Networks and Smart Water Grids: a case study in Italy

2018-03-02T11:48:35Z

As the effects of climate change unfold and become more visible, infrastruc- tures, especially those related to the distribution of water are the most exposed to the deep changes expected in the next years. Water is fundamental for peo- ple, and for infrastructures like energy, waste, and food production. Water sus- tainability is then a fundamental aspect to address by an efficient use of the resources and the maintenance of quality standards adopting a management science perspective. Therefore, water industry and infrastructure need a deep transformation, and we claim that this transformation is the result of a synergy between different fields or research. Our paper presents a managerial framework based on a complex systems used to reshape and optimize in different meanings the performance of the water infrastructure through the development of a case study in Italy. Our framework, called Acque 2.0 (Water 2.0) is based on these pillars: 1. The current and future scenarios for water management 2. Management science and water 3. Digitalization of water infrastructure 4. Increase the network resiliency and quality of service using complex networks 5. Use of predic- tive maintenance methods based on network simulations and big data 6. Involve utilities, regulators, policy makers, and citizens 7. Remarks and conclusion. The case study will be developed in the municipality of Viareggio, characterized by old infrastructures, seasonal variation of population, and water scarcity.

The Network of U.S. Mutual Fund Investments: Diversification, Similarity and Fragility throughout the Global Financial Crisis

2018-01-16T10:04:27Z

Network theory proved recently to be useful in the quantification of many properties of financial systems. The analysis of the structure of investment portfolios is a major application since their eventual correlation and overlap impact the actual risk diversification by individual investors. We nvestigate the bipartite network of US mutual fund portfolios and their assets. We follow its evolution during the Global Financial Crisis and analyse the interplay between diversification, as understood in classical portfolio theory, and similarity of the investments of different funds. We show that, on average, portfolios have become more diversified and less similar during the crisis. However, we also find that large overlap is far more likely than expected from models of random allocation of investments. This indicates the existence of strong correlations between fund portfolio strategies. We introduce a simplified model of propagation of financial shocks, that we exploit to show that a systemic risk component origins from the similarity of portfolios. The network is still vulnerable after crisis because of this effect, despite the increase in the diversification of portfolios. Our results indicate that diversification may even increase systemic risk when funds diversify in the same way. Diversification and similarity can play antagonistic roles and the trade-off between the two should be taken into account to properly assess systemic risk.

A Complex Network Approach for the Estimation of the Energy Demand of Electric Mobility

2018-01-16T09:53:09Z

We study how renewable energy impacts regional infrastructures considering the full deployment of electric mobility at that scale. We use the Sardinia Island in Italy as a paradigmatic case study of a semi-closed system both by energy and mobility point of view. Human mobility patterns are estimated by means of census data listing the mobility dynamics of about 700,000 vehicles, the energy demand is estimated by modeling the charging behavior of electric vehicle owners. Here we show that current renewable energy production of Sardinia is able to sustain the commuter mobility even in the theoretical case of a full switch from internal combustion vehicles to electric ones. Centrality measures from network theory on the reconstructed network of commuter trips allows to identify the most important areas (hubs) involved in regional mobility. The analysis of the expected energy flows reveals long-range effects on infrastructures outside metropolitan areas and points out that the most relevant unbalances are caused by spatial segregation between production and consumption areas. Finally, results suggest the adoption of planning actions supporting the installation of renewable energy plants in areas mostly involved by the commuting mobility, avoiding spatial segregation between consumption and generation areas.

Spatio-Temporal Patterns of the International Merger and Acquisition Network

2018-01-15T08:04:11Z

This paper analyses the world web of mergers and acquisitions (M&As) using a complex network approach. We use data of M&As to build a temporal sequence of binary and weighted-directed networks for the period 1995-2010 and 224 countries (nodes) connected according to their M&As flows (links). We study different geographical and temporal aspects of the international M&A network (IMAN), building sequences of filtered sub-networks whose links belong to specific intervals of distance or time. Given that M&As and trade are complementary ways of reaching foreign markets, we perform our analysis using statistics employed for the study of the international trade network (ITN), highlighting the similarities and differences between the ITN and the IMAN. In contrast to the ITN, the IMAN is a low density network characterized by a persistent giant component with many external nodes and low reciprocity. Clustering patterns are very heterogeneous and dynamic. High-income economies are the main acquirers and are characterized by high connectivity, implying that most countries are targets of a few acquirers. Like in the ITN, geographical distance strongly impacts the structure of the IMAN: link-weights and node degrees have a non-linear relation with distance, and an assortative pattern is present at short distances.

Consistency and Trends of Technological Innovations: A Network Approach to the International Patent Classification Data

2017-12-28T11:03:30Z

Classifying patents by the technology areas they pertain is important to enable information search and facilitate policy analysis and socio-economic studies. Based on the OECD Triadic Patent Family database, this study constructs a cohort network based on the grouping of IPC subclasses in the same patent families, and a citation network based on citations between subclasses of patent families citing each other. This paper presents a systematic analysis approach which obtains naturally formed network clusters identified using a Lumped Markov Chain method, extracts community keys traceable over time, and investigates two important community characteristics: consistency and changing trends. The results are verified against several other methods, including a recent research measuring patent text similarity. The proposed method contributes to the literature a network-based approach to study the endogenous community properties of an exogenously devised classification system. The application of this method may improve accuracy and efficiency of the IPC search platform and help detect the emergence of new technologies.

Mapping social dynamics on Facebook: The Brexit debate

2017-08-04T11:32:46Z

Abstract Nowadays users get informed and shape their opinion through social media. However, the disintermediated access to contents does not guarantee quality of information. Selective exposure and confirmation bias, indeed, have been shown to play a pivotal role in content consumption and information spreading. Users tend to select information adhering (and reinforcing) their worldview and to ignore dissenting information. This pattern elicits the formation of polarized groups – i.e., echo chambers – where the interaction with like-minded people might even reinforce polarization. In this work we address news consumption around Brexit in {UK} on Facebook. In particular, we perform a massive analysis on more than 1 million users interacting with Brexit related posts from the main news providers between January and July 2016. We show that consumption patterns elicit the emergence of two distinct communities of news outlets. Furthermore, to better characterize inner group dynamics, we introduce a new technique which combines automatic topic extraction and sentiment analysis. We compare how the same topics are presented on posts and the related emotional response on comments finding significant differences in both echo chambers and that polarization influences the perception of topics. Our results provide important insights about the determinants of polarization and evolution of core narratives on online debating.

Investigating the interplay between fundamentals of national research systems: Performance, investments and international collaborations

2017-08-04T11:26:13Z

Abstract We discuss, at the macro-level of nations, the contribution of research funding and rate of international collaboration to research performance, with important implications for the “science of science policy”. In particular, we cross-correlate suitable measures of these quantities with a scientometric-based assessment of scientific success, studying both the average performance of nations and their temporal dynamics in the space defined by these variables during the last decade. We find significant differences among nations in terms of efficiency in turning (financial) input into bibliometrically measurable output, and we confirm that growth of international collaboration positively correlate with scientific success—with significant benefits brought by {EU} integration policies. Various geo-cultural clusters of nations naturally emerge from our analysis. We critically discuss the factors that potentially determine the observed patterns.

Entangling Credit and Funding Shocks in Interbank Markets

2017-08-04T11:22:51Z

Credit and liquidity shocks represent main channels of financial contagion for interbank lending markets. On one hand, banks face potential losses whenever their counterparties are under distress and thus unable to fulfill their obligations. On the other hand, solvency constraints may force banks to recover lost fundings by selling their illiquid assets, resulting in effective losses in the presence of fire sales—that is, when funding shortcomings are widespread over the market. Because of the complex structure of the network of interbank exposures, these losses reverberate among banks and eventually get amplified, with potentially catastrophic consequences for the whole financial system. Inspired by the recently proposed Debt Rank, in this work we define a systemic risk metric that estimates the potential amplification of losses in interbank markets accounting for both credit and liquidity contagion channels: the Debt-Solvency Rank. We implement this framework on a dataset of 183 European banks that were publicly traded between 2004 and 2013, showing indeed that liquidity spillovers substantially increase systemic risk, and thus cannot be neglected in stress-test scenarios. We also provide additional evidence that the interbank market was extremely fragile up to the global financial crisis, becoming slightly more robust only afterwards.

Statistically validated network of portfolio overlaps and systemic risk

2017-08-04T10:57:35Z

Common asset holding by financial institutions (portfolio overlap) is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and severe losses at the systemic level. We propose a method to assess the statistical significance of the overlap between heterogeneously diversified portfolios, which we use to build a validated network of financial institutions where links indicate potential contagion channels. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be applied to any bipartite network. We find that the proportion of validated links (i.e. of significant overlaps) increased steadily before the 2007–2008 financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about institutions that are about to suffer (enjoy) the most significant losses (gains).

Who's who in global value chains? A weighted network approach

2017-08-04T07:29:15Z

This paper represents global value chains (GVCs) as weighted networks of foreign value added in exports, which allows for the identification of the specific roles of countries and for the quantification of their relative importance over time. A major structural change occurred in the beginning of the century as GVCs steadily turned into global networks, amid an unprecedented growth of value-added flows and the rise of China as a major player. First-order network metrics highlight the vital but also distinct roles of Germany, the US, China and Japan in the international organisation of production. Germany is very relevant both as a user and as a supplier of foreign inputs, while the US acts mostly as a supplier of value added to other countries. Second-order properties of networks shed light on the complex architecture of GVCs, notably in terms of cyclical triangular relationships. Germany's GVCs mostly root in direct relationships, while Japanese ones typically involve more than two countries.

Organization and hierarchy of the human functional brain network lead to a chain-like core

2017-08-03T07:18:17Z

The brain is a paradigmatic example of a complex system: its functionality emerges as a global property of local mesoscopic and microscopic interactions. Complex network theory allows to elicit the functional architecture of the brain in terms of links (correlations) between nodes (grey matter regions) and to extract information out of the noise. Here we present the analysis of functional magnetic resonance imaging data from forty healthy humans at rest for the investigation of the basal scaffold of the functional brain network organization. We show how brain regions tend to coordinate by forming ahighly hierarchical chain-like structure of homogeneously clustered anatomical areas. A maximum spanning tree approach revealed the centrality of the occipital cortex and the peculiar aggregation of cerebellar regions to form a closed core. We also report the hierarchy of network segregation and the level of clusters integration as a function of the connectivity strength between brain regions.

Networks of reinforced stochastic processes: Asymptotics for the empirical means

2017-05-08T12:57:15Z

This work deals with systems of interacting reinforced stochastic processes, where each process X^j = (X_{n,j})_n is located at a vertex j of a finite weighted direct graph, and it can be interpreted as the sequence of “actions” adopted by an agent j of the network. The interaction among the evolving dynamics of these processes depends on the weighted adjacency matrix W associated to the underlying graph: indeed, the probability that an agent j chooses a certain action depends on its personal “inclination” Z_{n,j} and on the inclinations Z_{n,h} , with h not equal to j, of the other agents according to the elements of W. Asymptotic results for the stochastic processes of the personal inclinations Z^j = (Z_{n,j})_n have been subject of studies in recent papers (e.g. [2, 21]); while the asymptotic behavior of the stochastic processes of the actions (X_{n,j})_n has never been studied yet. In this paper, we fill this gap by characterizing the asymptotic behavior of the empirical means N_{n,j} = \sum_{k=1}^n X_{k,j} /n, proving their almost sure synchronization and some central limit theorems in the sense of stable convergence. Moreover, we discuss some statistical applications of these convergence results concerning confidence intervals for the random limit toward which all the processes of the system converge and tools to make inference on the matrix W.

Modeling networks with a growing feature-structure

2017-05-08T12:56:40Z

We present a new network model accounting for multidimensional assortativity. Each node is characterized by a number of features and the probability of a link between two nodes depends on common features. We do not fix a priori the total number of possible features. The bipartite network of the nodes and the features evolves according to a stochastic dynamics that depends on three parameters that respectively regulate the preferential attachment in the transmission of the features to the nodes, the number of new features per node, and the power-law behavior of the total number of observed features. Our model also takes into account a mechanism of triadic closure. We provide theoretical results and statistical estimators for the parameters of the model. We validate our approach by means of simulations and an empirical analysis of a network of scientific collaborations.

Synchronization of Reinforced Stochastic Processes with a Network-based Interaction

2017-05-08T12:28:10Z

Randomly evolving systems composed by elements which interact among each other have always been of great interest in several scientific fields. This work deals with the synchronization phenomenon, that could be roughly defined as the tendency of different components to adopt a common behavior. We continue the study of a model of interacting stochastic processes with reinforcement, that recently has been introduced in [21]. Generally speaking, by reinforcement we mean any mechanism for which the probability that a given event occurs has an increasing dependence on the number of times that events of the same type occurred in the past. The particularity of systems of such interacting stochastic processes is that synchronization is induced along time by the reinforcement mechanism itself and does not require a large-scale limit. We focus on the relationship between the topology of the network of the interactions and the long-time synchronization phenomenon. After proving the almost sure synchronization, we provide some CLTs in the sense of stable convergence that establish the convergence rates and the asymptotic distributions for both convergence to the common limit and synchronization. The obtained results lead to the construction of asymptotic confidence intervals for the limit random variable and of statistical tests to make inference on the topology of the network.

Modeling confirmation bias and polarization

2017-04-18T08:57:40Z

Online users tend to select claims that adhere to their system of beliefs and to ignore dissenting information. Confirmation bias, indeed, plays a pivotal role in viral phenomena. Furthermore, the wide availability of content on the web fosters the aggregation of likeminded people where debates tend to enforce group polarization. Such a configuration might alter the public debate and thus the formation of the public opinion. In this paper we provide a mathematical model to study online social debates and the related polarization dynamics. We assume the basic updating rule of the Bounded Confidence Model (BCM) and we develop two variations a) the Rewire with Bounded Confidence Model (RBCM), in which discordant links are broken until convergence is reached; and b) the Unbounded Confidence Model, under which the interaction among discordant pairs of users is allowed even with a negative feedback, either with the rewiring step (RUCM) or without it (UCM). From numerical simulations we find that the new models (UCM and RUCM), unlike the BCM, are able to explain the coexistence of two stable final opinions, often observed in reality. Lastly, we present a mean field approximation of the newly introduced models.

It’s Always April Fools’ Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features

2017-04-18T08:50:07Z

Given the huge impact that Online Social Networks (OSN) had in the way people get informed and form their opinion, they became an attractive playground for malicious entities that want to spread misinformation, and leverage their effect. In fact, misinformation easily spreads on OSN and is a huge threat for modern society, possibly influencing also the outcome of elections, or even putting people’s life at risk (e.g., spreading “anti-vaccines” misinformation). Therefore, it is of paramount importance for our society to have some sort of “validation” on information spreading through OSN. The need for a wide-scale validation would greatly benefit from automatic tools. In this paper, we show that it is difficult to carry out an automatic classification of misinformation considering only structural properties of content propagation cascades. We focus on structural properties, because they would be inherently dif- ficult to be manipulated, with the the aim of circumventing classification systems. To support our claim, we carry out an extensive evaluation on Facebook posts belonging to conspiracy theories (as representative of misinformation), and scientific news (representative of fact-checked content). Our findings show that conspiracy content actually reverberates in a way which is hard to distinguish from the one scientific content does: for the classification mechanisms we investigated, classification F1-score never exceeds 0.65 during content propagation stages, and is still less than 0.7 even after propagation is complete.

Public discourse and news consumption on online social media: A quantitative, cross-platform analysis of the Italian Referendum

2017-04-18T08:40:58Z

The rising attention to the spreading of fake news and unsubstantiated rumors on online social media and the pivotal role played by confirmation bias led researchers to investigate different aspects of the phenomenon. Experimental evidence showed that confirmatory information gets accepted even if containing deliberately false claims while dissenting information is mainly ignored or might even increase group polarization. It seems reasonable that, to address misinformation problem properly, we have to understand the main determinants behind content consumption and the emergence of narratives on online social media. In this paper we address such a challenge by focusing on the discussion around the Italian Constitutional Referendum by conducting a quantitative, cross-platform analysis on both Facebook public pages and Twitter accounts. We observe the spontaneous emergence of well-separated communities on both platforms. Such a segregation is completely spontaneous, since no categorization of contents was performed a priori. By exploring the dynamics behind the discussion, we find that users tend to restrict their attention to a specific set of Facebook pages/Twitter accounts. Finally, taking advantage of automatic topic extraction and sentiment analysis techniques, we are able to identify the most controversial topics inside and across both platforms. We measure the distance between how a certain topic is presented in the posts/tweets and the related emotional response of users. Our results provide interesting insights for the understanding of the evolution of the core narratives behind different echo chambers and for the early detection of massive viral phenomena around false claims.

Mapping social dynamics on Facebook: The Brexit debate

2017-04-18T08:25:38Z

Central limit theorems for a hypergeometric randomly reinforced urn

2016-11-14T11:56:25Z

We consider a variant of the randomly reinforced urn where more balls can be simultaneously drawn out and balls of different colors can be simultaneously added. More precisely, at each time-step, the conditional distribution of the number of extracted balls of a certain color given the past is assumed to be hypergeometric. We prove some central limit theorems in the sense of stable convergence and of almost sure conditional convergence, which are stronger than convergence in distribution. The proven results provide asymptotic confidence intervals for the limit proportion, whose distribution is generally unknown. Moreover, we also consider the case of more urns subjected to some random common factors.

Introduzione alla nozione di convergenza stabile e sue varianti

2016-05-31T12:09:02Z

The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to Human Migration

2016-05-11T10:39:38Z

Disentangling the relations between human migrations and water resources is relevant for food security and trade policy in water-scarce countries. It is commonly believed that human migrations are beneficial to the water endowments of origin countries for reducing the pressure on local resources. We show here that such belief is over-simplistic. We reframe the problem by considering the international food trade and the corresponding virtual water fluxes, which quantify the water used for the production of traded agricultural commodities. By means of robust analytical tools, we show that migrants strengthen the commercial links between countries, triggering trade fluxes caused by food consumption habits persisting after migration. Thus migrants significantly increase the virtual water fluxes and the use of water in the countries of origin. The flux ascribable to each migrant, i.e. the "water suitcase", is found to have increased from 321 m3/y in 1990 to 1367 m3/y in 2010. A comparison with the water footprint of individuals shows that where the water suitcase exceeds the water footprint of inhabitants, migrations turn out to be detrimental to the water endowments of origin countries, challenging the common perception that migrations tend to relieve the pressure on the local (water) resources of origin countries.

A Network Model characterized by a Latent Attribute Structure with Competition

2016-04-28T07:52:35Z

The quest for a model that is able to explain, describe, analyze and simulate real-world complex networks is of uttermost practical, as well as theoretical, interest. In fact, networks can be a natural way to represent many phenomena; often, they arise from a complex interweaving of some features of the nodes. For example, in a co-authorship network, a link stems more easily between authors with similar interests; similarly, in a genetic regulatory network, links are affected by the different biological functions of the regulators. In this paper we introduce and study a novel network model that is based on a latent attribute structure: this model, inspired by a generalization of the Indian Buffet process, is simple and contains a small number of parameters, with a clear and intuitive role. Each node is characterized by a number of features and the probability of the existence of an edge between two nodes depends on the features they share; the number of possible features is not fixed a priori and can grow indefinitely. Moreover, a random fitness parameter is introduced for each node in order to determine its ability to transmit its own features to other nodes; this behavior is added on top of a process of Indian-Buffet type. Because of the fitness property, a node’s connectivity does not depend on its age alone, so that also “young but fit” nodes are able to compete and succeed in propagating their features and acquiring links. We also show how, considering the resulting bipartite node-attribute network, it is possible to gain some insight about which nodes were originally the most “fit”. Our model for this bipartite network depends on few parameters, that are characterized by their straightforward interpretation and by the availability of proper estimators. Even if the parameters are easy to interpret and tune, the model is general enough to represent complex phenomena—e.g., homophily, heterophily, or any interplay between features. We provide some theoretical as well as experimental results regarding the power-law behavior of the model and the proposed tools for the estimation of the parameters. We also show, through a number of experiments, how the proposed model naturally captures most local and global properties (e.g., degree distributions, connectivity and distance distributions) real networks exhibit.

Asymptotics for randomly reinforced urns with random barriers

2016-02-24T12:03:56Z

An urn contains black and red balls. Let Zn be the proportion of black balls at time n and 0≤LL, then bn is replaced together with a random number Rn of red balls. Otherwise, no additional balls are added, and bn alone is replaced. In this paper we assume that Rn=Bn. Then, under mild conditions, it is shown that Zn→a.s.Z for some random variable Z, and Dn≔√n(Zn-Z)→

Fluctuation Theorems for Synchronization of Interacting Polya's urns

2016-01-25T09:41:37Z

We consider a model of N two-colors urns in which the reinforcement of each urn depends also on the content of all the other urns. This interaction is of mean-field type and it is tuned by a parameter \alpha in [0,1]; in particular, for \alpha=0 the N urns behave as N independent Polya's urns. For \alpha>0 urns synchronize, in the sense that the fraction of balls of a given color converges a.s. to the same (random) limit in all urns. In this paper we study fluctuations around this synchronized regime. The scaling of these fluctuations depends on the parameter \alpha. In particular, the standard scaling t^{-1/2} appears only for \alpha>1/2. For \alpha\geq 1/2 we also determine the limit distribution of the rescaled fluctuations. We use the notion of stable convergence, which is stronger than convergence in distribution.

Viral Misinformation: The Role of Homophily and Polarization

2016-01-20T08:59:10Z

The spreading of misinformation online

2016-01-20T08:51:59Z

The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web (WWW) also allows for the rapid dissemination of unsubstantiated rumors and conspiracy theories that often elicit rapid, large, but naive social responses such as the recent case of Jade Helm 15––where a simple military exercise turned out to be perceived as the beginning of a new civil war in the United States. In this work, we address the determinants governing misinformation spreading through a thorough quantitative analysis. In particular, we focus on how Facebook users consume information related to two distinct narratives: scientific and conspiracy news. We find that, although consumers of scientific and conspiracy stories present similar consumption patterns with respect to content, cascade dynamics differ. Selective exposure to content is the primary driver of content diffusion and generates the formation of homogeneous clusters, i.e., “echo chambers.” Indeed, homogeneity appears to be the primary driver for the diffusion of contents and each echo chamber has its own cascade dynamics. Finally, we introduce a data-driven percolation model mimicking rumor spreading and we show that homogeneity and polarization are the main determinants for predicting cascades’ size.

Memory Kernel in the Expertise of Chess Players

2015-11-16T09:10:37Z

In this work we investigate a mechanism for the emergence of long-range time correlations observed in a chronologically ordered database of chess games. We analyze a modified Yule-Simon preferential growth process proposed by Cattuto et al., which includes memory effects by means of a probabilistic kernel. According to the Hurst exponent of different constructed time series from the record of games, artificially generated databases from the model exhibit similar long-range correlations. In addition, the inter-event time frequency distribution is well reproduced by the model for realistic parameter values. In particular, we find the inter-event time distribution properties to be correlated with the expertise of the chess players through the memory kernel extension. Our work provides new information about the strategies implemented by players with different levels of expertise, showing an interesting example of how popularities and long-range correlations build together during a collective learning process.

Correlated bursts and the role of memory range

2015-11-16T09:04:49Z

Inhomogeneous temporal processes in natural and social phenomena have been described by bursts that are rapidly occurring events within short time periods alternating with long periods of low activity. In addition to the analysis of heavy-tailed inter-event time distributions, higher-order correlations between inter-event times, called correlated bursts, have been studied only recently. As the possible mechanisms underlying such correlated bursts are far from being fully understood, we devise a simple model for correlated bursts by using a self-exciting point process with variable memory range. Here the probability that a new event occurs is determined by a memory function that is the sum of decaying memories of the past events. In order to incorporate the noise and/or limited memory capacity of systems, we apply two memory loss mechanisms, namely either fixed number or variable number of memories. By using theoretical analysis and numerical simulations we find that excessive amount of memory effect may lead to a Poissonian process, which implies that for memory effect there exists an intermediate range that will generate correlated bursts of magnitude comparable to empirical findings. Hence our results provide deeper understanding of how long-range memory affects correlated bursts.

Contact Patterns in a High School: A Comparison between Data Collected Using Wearable Sensors, Contact Diaries and Friendship Surveys

2015-11-09T15:05:53Z

Given their importance in shaping social networks and determining how information or transmissible diseases propagate in a population, interactions between individuals are the subject of many data collection efforts. To this aim, different methods are commonly used, ranging from diaries and surveys to decentralised infrastructures based on wearable sensors. These methods have each advantages and limitations but are rarely compared in a given setting. Moreover, as surveys targeting friendship relations might suffer less from memory biases than contact diaries, it is interesting to explore how actual contact patterns occurring in day-to-day life compare with friendship relations and with online social links. Here we make progresses in these directions by leveraging data collected in a French high school and concerning (i) face-to-face contacts measured by two concurrent methods, namely wearable sensors and contact diaries, (ii) self-reported friendship surveys, and (iii) online social links. We compare the resulting data sets and find that most short contacts are not reported in diaries while long contacts have a large reporting probability, and that the durations of contacts tend to be overestimated in the diaries. Moreover, measured contacts corresponding to reported friendship can have durations of any length but all long contacts do correspond to a reported friendship. On the contrary, online links that are not also reported in the friendship survey correspond to short face-to-face contacts, highlighting the difference of nature between reported friendships and online links. Diaries and surveys suffer moreover from a low sampling rate, as many students did not fill them, showing that the sensor-based platform had a higher acceptability. We also show that, despite the biases of diaries and surveys, the overall structure of the contact network, as quantified by the mixing patterns between classes, is correctly captured by both networks of self-reported contacts and of friendships, and we investigate the correlations between the number of neighbors of individuals in the three networks. Overall, diaries and surveys tend to yield a correct picture of the global structural organization of the contact network, albeit with much less links, and give access to a sort of backbone of the contact network corresponding to the strongest links, i.e., the contacts of longest cumulative durations.

Evolutionary network games: equilibria from imitation and best-response dynamics

2015-11-06T13:27:40Z

We consider games of strategic substitutes and strategic complements on networks. We introduce two different evolutionary dynamics in order to refine their multiplicity of equilibria, and we analyse the system through a mean field approach. We find that for the best-shot game, taken as a model for substitutes, a replicator-like dynamics does not lead to Nash equilibria, whereas it leads to unique equilibria (full cooperation or full defection, depending on the initial condition and the game parameter) for complements, represented by a coordination game. On the other hand, when the dynamics becomes more cognitively demanding in the form of a best response evolution, predictions are always Nash equilibria (at least when individuals are fully rational): For the best-shot game we find equilibria with a definite value of the fraction of contributors, whereas for the coordination game symmetric equilibria arise only for low or high initial fractions of cooperators. We also extend our study by considering complex heterogeneous topologies, and show that the nature of the selected equilibria does not change for the best-shot game. However for coordination games we reveal an important difference, namely that on infinitely large scale-free networks cooperation arises for any value of the incentive to cooperate.

Network-Driven Reputation in Online Scientific Communities

2015-11-06T12:41:43Z

The ever-increasing quantity and complexity of scientific production have made it difficult for researchers to keep track of advances in their own fields. This, together with growing popularity of online scientific communities, calls for the development of effective information filtering tools. We propose here an algorithm which simultaneously computes reputation of users and fitness of papers in a bipartite network representing an online scientific community. Evaluation on artificially-generated data and real data from the Econophysics Forum is used to determine the method's best-performing variants. We show that when the input data is extended to a multilayer network including users, papers and authors and the algorithm is correspondingly modified, the resulting performance improves on multiple levels. In particular, top papers have higher citation count and top authors have higher h-index than top papers and top authors chosen by other algorithms. We finally show that our algorithm is robust against persistent authors (spammers) which makes the method readily applicable to the existing online scientific communities.

Learning dynamics explains human behaviour in Prisoner’s Dilemma on networks

2015-11-06T12:36:51Z

Cooperative behaviour lies at the very basis of human societies, yet its evolutionary origin remains a key unsolved puzzle. Whereas reciprocity or conditional cooperation is one of the most prominent mechanisms proposed to explain the emergence of cooperation in social dilemmas, recent experimental findings on networked Prisoner’s Dilemma games suggest that conditional cooperation also depends on the previous action of the player—namely on the ‘mood’ in which the player is currently in. Roughly, a majority of people behave as conditional cooperators if they cooperated in the past, whereas they ignore the context and free ride with high probability if they did not. However, the ultimate origin of this behaviour represents a conundrum itself. Here, we aim specifically to provide an evolutionary explanation of moody conditional cooperation (MCC). To this end, we perform an extensive analysis of different evolutionary dynamics for players’ behavioural traits—ranging from standard processes used in game theory based on pay-off comparison to others that include non-economic or social factors. Our results show that only a dynamic built upon reinforcement learning is able to give rise to evolutionarily stable MCC, and at the end to reproduce the human behaviours observed in the experiments.

Adaptive social recommendation in a multiple category landscape

2015-11-06T12:28:26Z

People in the Internet era have to cope with the information overload, striving to find what they are interested in, and usually face this situation by following a limited number of sources or friends that best match their interests. A recent line of research, namely adaptive social recommendation, has therefore emerged to optimize the information propagation in social networks and provide users with personalized recommendations. Validation of these methods by agent-based simulations often assumes that the tastes of users can be represented by binary vectors, with entries denoting users’ preferences. In this work we introduce a more realistic assumption that users’ tastes are modeled by multiple vectors. We show that within this framework the social recommendation process has a poor outcome. Accordingly, we design novel measures of users’ taste similarity that can substantially improve the precision of the recommender system. Finally, we discuss the issue of enhancing the recommendations’ diversity while preserving their accuracy.

Measuring Quality, Reputation and Trust in Online Communities

2015-11-06T11:17:03Z

In the Internet era the information overload and the challenge to detect quality content has raised the issue of how to rank both resources and users in online communities. In this paper we develop a general ranking method that can simultaneously evaluate users’ reputation and objects’ quality in an iterative procedure, and that exploits the trust relationships and social acquaintances of users as an additional source of information. We test our method on two real online communities, the EconoPhysics forum and the Last.fm music catalogue, and determine how different variants of the algorithm influence the resultant ranking. We show the benefits of considering trust relationships, and define the form of the algorithm better apt to common situations.

Emergence of Scale-Free Leadership Structure in Social Recommender Systems

2015-11-06T11:04:00Z

The study of the organization of social networks is important for the understanding of opinion formation, rumor spreading, and the emergence of trends and fashion. This paper reports empirical analysis of networks extracted from four leading sites with social functionality (Delicious, Flickr, Twitter and YouTube) and shows that they all display a scale-free leadership structure. To reproduce this feature, we propose an adaptive network model driven by social recommending. Artificial agent-based simulations of this model highlight a “good get richer” mechanism where users with broad interests and good judgments are likely to become popular leaders for the others. Simulations also indicate that the studied social recommendation mechanism can gradually improve the user experience by adapting to tastes of its users. Finally we outline implications for real online resource-sharing systems.

Early-warning signals of topological collapse in interbank networks

2015-11-05T12:03:38Z

The financial crisis clearly illustrated the importance of characterizing the level of ‘systemic’ risk associated with an entire credit network, rather than with single institutions. However, the interplay between financial distress and topological changes is still poorly understood. Here we analyze the quarterly interbank exposures among Dutch banks over the period 1998–2008, ending with the crisis. After controlling for the link density, many topological properties display an abrupt change in 2008, providing a clear – but unpredictable – signature of the crisis. By contrast, if the heterogeneity of banks' connectivity is controlled for, the same properties show a gradual transition to the crisis, starting in 2005 and preceded by an even earlier period during which anomalous debt loops could have led to the underestimation of counter-party risk. These early-warning signals are undetectable if the network is reconstructed from partial bank-specific data, as routinely done. We discuss important implications for bank regulatory policies.

Reciprocity of weighted networks

2015-11-05T11:58:38Z

In directed networks, reciprocal links have dramatic effects on dynamical processes, network growth, and higher-order structures such as motifs and communities. While the reciprocity of binary networks has been extensively studied, that of weighted networks is still poorly understood, implying an ever-increasing gap between the availability of weighted network data and our understanding of their dyadic properties. Here we introduce a general approach to the reciprocity of weighted networks, and define quantities and null models that consistently capture empirical reciprocity patterns at different structural levels. We show that, counter-intuitively, previous reciprocity measures based on the similarity of mutual weights are uninformative. By contrast, our measures allow to consistently classify different weighted networks according to their reciprocity, track the evolution of a network's reciprocity over time, identify patterns at the level of dyads and vertices, and distinguish the effects of flux (im)balances or other (a)symmetries from a true tendency towards (anti-)reciprocation.

The Role of Distances in the World Trade Web

2015-11-05T11:48:40Z

In the economic literature, geographic distances are considered fundamental factors to be included in any theoretical model whose aim is the quantification of the trade between countries. Quantitatively, distances enter into the so-called gravity models that successfully predict the weight of non-zero trade flows. However, it has been recently shown that gravity models fail to reproduce the binary topology of the World Trade Web. In this paper a different approach is presented: the formalism of exponential random graphs is used and the distances are treated as constraints, to be imposed on a previously chosen ensemble of graphs. Then, the information encoded in the geographical distances is used to explain the binary structure of the World Trade Web, by testing it on the degree-degree correlations and the reciprocity structure. This leads to the definition of a novel null model that combines spatial and non-spatial effects. The effectiveness of spatial constraints is compared to that of nonspatial ones by means of the Akaike Information Criterion and the Bayesian Information Criterion. Even if it is commonly believed that the World Trade Web is strongly dependent on the distances, what emerges from our analysis is that distances do not play a crucial role in shaping the World Trade Web binary structure and that the information encoded into the reciprocity is far more useful in explaining the observed patterns.

Science vs Conspiracy: Collective Narratives in the Age of Misinformation

2015-11-02T14:08:08Z

The large availability of user provided contents on online social media facilitates people aggregation around shared beliefs, interests, worldviews and narratives. In spite of the enthusiastic rhetoric about the so called collective intelligence unsubstantiated rumors and conspiracy theories—e.g., chemtrails, reptilians or the Illuminati—are pervasive in online social networks (OSN). In this work we study, on a sample of 1.2 million of individuals, how information related to very distinct narratives—i.e. main stream scientific and conspiracy news—are consumed and shape communities on Facebook. Our results show that polarized communities emerge around distinct types of contents and usual consumers of conspiracy news result to be more focused and self-contained on their specific contents. To test potential biases induced by the continued exposure to unsubstantiated rumors on users’ content selection, we conclude our analysis measuring how users respond to 4,709 troll information—i.e. parodistic and sarcastic imitation of conspiracy theories. We find that 77.92 of likes and 80.86 of comments are from users usually interacting with conspiracy stories.

DebtRank: A Microscopic Foundation for Shock Propagation

2015-11-02T14:00:36Z

The DebtRank algorithm has been increasingly investigated as a method to estimate the impact of shocks in financial networks, as it overcomes the limitations of the traditional default-cascade approaches. Here we formulate a dynamical “microscopic” theory of instability for financial networks by iterating balance sheet identities of individual banks and by assuming a simple rule for the transfer of shocks from borrowers to lenders. By doing so, we generalise the DebtRank formulation, both providing an interpretation of the effective dynamics in terms of basic accounting principles and preventing the underestimation of losses on certain network topologies. Depending on the structure of the interbank leverage matrix the dynamics is either stable, in which case the asymptotic state can be computed analytically, or unstable, meaning that at least one bank will default. We apply this framework to a dataset of the top listed European banks in the period 2008–2013. We find that network effects can generate an amplification of exogenous shocks of a factor ranging between three (in normal periods) and six (during the crisis) when we stress the system with a 0.5 shock on external (i.e. non-interbank) assets for all banks.

The Effects of Twitter Sentiment on Stock Price Returns

2015-11-02T13:37:08Z

Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known “event study” from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the “event study” methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1–2), but the dependence is statistically significant for several days after the events.

Hyperbolicity measures democracy in real-world networks

2015-11-02T13:32:38Z

In this work, we analyze the hyperbolicity of real-world networks, a geometric quantity that measures if a space is negatively curved. We provide two improvements in our understanding of this quantity: first of all, in our interpretation, a hyperbolic network is “aristocratic”, since few elements “connect” the system, while a non-hyperbolic network has a more “democratic” structure with a larger number of crucial elements. The second contribution is the introduction of the average hyperbolicity of the neighbors of a given node. Through this definition, we outline an “influence area” for the vertices in the graph. We show that in real networks the influence area of the highest degree vertex is small in what we define “local” networks (i.e., social or peer-to-peer networks), and large in “global” networks (i.e., power grid, metabolic networks, or autonomous system networks).

The self-organization of meaning and the reflexive communication of information

2015-10-08T07:52:15Z

Following a suggestion of Warren Weaver, we extend the Shannon model of communication piecemeal into a complex systems model in which communication is differentiated both vertically and horizontally. This model enables us to bridge the divide between Niklas Luhmann’s theory of the self-organization of meaning in communications and empirical research using information theory. First, we distinguish between communication relations and correlations between patterns of relations. The correlations span a vector space in which relations are positioned and thus provided with meaning. Second, positions provide reflexive perspectives. Whereas the different meanings are integrated locally, each instantiation opens horizons of meaning that can be codified along eigenvectors of the communication matrix. The next-order specification of codified meaning can generate redundancies (as feedback on the forward arrow of entropy production). The horizontal differentiation among the codes of communication enables us to quantify the creation of new options as mutual redundancy. Increases in redundancy can then be measured as local reduction of prevailing uncertainty (in bits). The generation of options can also be considered as a hallmark of the knowledge-based economy: new knowledge provides new options. Both the communication-theoretical and the operational (information-theoretical) perspectives.

The interaction ‘Supply’, ‘Demand’, and ‘Technological Capabilities’ in terms of Medical Subject Headings: A triple helix model of medical innovations

2015-10-08T07:35:04Z

We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii) supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on Triple Helix research, we use entropy statistics to elaborate an indicator of mutual information among these dimensions that can provide indication of reduction of uncertainty. To do so, we focus on the medical context, where uncertainty poses significant challenges to the governance of innovation. The Medical Subject Headings (MeSH) of MEDLINE/PubMed provide us with publication records classified within the categories “Diseases” (C), “Drugs and Chemicals” (D), “Analytic, Diagnostic, and Therapeutic Techniques and Equipment” (E) as knowledge representations of demand, supply, and technological capabilities, respectively. Three case-studies of medical research areas are used as representative ‘entry perspectives’ of the medical innovation process. These are: (i) Human Papilloma Virus, (ii) RNA interference, and (iii) Magnetic Resonance Imaging. We find statistically significant periods of synergy among demand, supply, and technological capabilities (C-D-E) that points to three-dimensional interactions as a fundamental perspective for the understanding and governance of the uncertainty associated with medical innovation. Among the pairwise configurations in these contexts, the demand-technological capabilities (C-E) provided the strongest link, followed by the supply-demand (D-C) and the supply-technological capabilities (D-E) channels.

Quantifying the impact of weak, strong, and super ties in scientific careers

2015-10-08T07:34:21Z

Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles using an egocentric perspective that accounts for researcher-specific characteristics and provides insight into a range of topics, from career achievement and sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify the frequency distributions of collaboration duration and tie strength, showing that collaboration networks are dominated by weak ties characterized by high turnover rates. We use analytic extreme value thresholds to identify a new class of indispensable super ties, the strongest of which commonly exhibit >50% publication overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of descriptive and panel regression methods to com- pare the subset of publications coauthored with a super tie to the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group experience. We find that super ties contribute to above-average productivity and a 17% citation increase per publication, thus identifying these partner- ships—the analog of life partners—as a major factor in science career development.

Voting Behavior, Coalitions and Government Strength through a Complex Network Analysis

2015-05-21T07:59:17Z

We analyze the network of relations between parliament members according to their voting behavior. In particular, we examine the emergent community structure with respect to political coalitions and government alliances. We rely on tools developed in the Complex Network literature to explore the core of these communities and use their topological features to develop new metrics for party polarization, internal coalition cohesiveness and government strength. As a case study, we focus on the Chamber of Deputies of the Italian Parliament, for which we are able to characterize the heterogeneity of the ruling coalition as well as parties specific contributions to the stability of the government over time. We find sharp contrast in the political debate which surprisingly does not imply a relevant structure based on established parties. We take a closer look to changes in the community structure after parties split up and their effect on the position of single deputies within communities. Finally, we introduce a way to track the stability of the government coalition over time that is able to discern the contribution of each member along with the impact of its possible defection. While our case study relies on the Italian parliament, whose relevance has come into the international spotlight in the present economic downturn, the methods developed here are entirely general and can therefore be applied to a multitude of other scenarios.

Twitter-based analysis of the dynamics of collective attention to political parties

2015-05-19T09:56:38Z

Large-scale data from social media have a significant potential to describe complex phenomena in real world and to anticipate collective behaviors such as information spreading and social trends. One specific case of study is represented by the collective attention to the action of political parties. Not surprisingly, researchers and stakeholders tried to correlate parties' presence on social media with their performances in elections. Despite the many efforts, results are still inconclusive since this kind of data is often very noisy and significant signals could be covered by (largely unknown) statistical fluctuations. In this paper we consider the number of tweets (tweet volume) of a party as a proxy of collective attention to the party, we identify the dynamics of the volume, and show that this quantity has some information on the elections outcome. We find that the distribution of the tweet volume for each party follows a log-normal distribution with a positive autocorrelation over short terms. Furthermore, by measuring the ratio of two consecutive daily tweet volumes, we find that the evolution of the daily volume of a party can be described by means of a geometric Brownian motion. Finally, we determine the optimal period of averaging tweet volume for reducing fluctuations and extracting short-term tendencies. We conclude that the tweet volume is a good indicator of parties' success in the elections when considered over an optimal time window. Our study identifies the statistical nature of collective attention to political issues and sheds light on how to model the dynamics of collective attention in social media.

Central Limit Theorems for an Indian Buffet Model with Random Weights

2015-02-23T08:34:38Z

The three-parameter Indian buffet process is generalized. The possibly different role played by customers is taken into account by suitable (random) weights. Various limit theorems are also proved for such generalized Indian buffet process. Let L_n be the number of dishes experimented by the first n customers, and let {\bar K}_n=(1/n)\sum_{i=1}^n K_i where K_i is the number of dishes tried by customer i. The asymptotic distributions of L_n and {\bar K}_n, suitably centered and scaled, are obtained. The convergence turns out to be stable (and not only in distribution). As a particular case, the results apply to the standard (i.e., non generalized) Indian buffet process.

Memory effects in stock price dynamics: evidences of technical trading

2015-02-05T10:17:38Z

Technical trading represents a class of investment strategies for Financial Markets based on the analysis of trends and recurrent patterns in price time series. According standard economical theories these strategies should not be used because they cannot be profitable. On the contrary, it is well-known that technical traders exist and operate on different time scales. In this paper we investigate if technical trading produces detectable signals in price time series and if some kind of memory effects are introduced in the price dynamics. In particular, we focus on a specific figure called supports and resistances. We first develop a criterion to detect the potential values of supports and resistances. Then we show that memory effects in the price dynamics are associated to these selected values. In fact we show that prices more likely re-bounce than cross these values. Such an effect is a quantitative evidence of the so-called self-fulfilling prophecy, that is the self-reinforcement of agents' belief and sentiment about future stock prices' behavior.

Everyday the same picture: popularity and content diversity

2015-02-02T10:18:27Z

Facebook is flooded by diverse and heterogeneous content, from kittens up to music and news, passing through satirical and funny stories. Each piece of that corpus reflects the heterogeneity of the underlying social background. In the Italian Facebook we have found an interesting case: a page having more than 40K followers that every day posts the same picture of Toto Cutugno, a popular Italian singer. In this work, we use such a page as a benchmark to study and model the effects of content heterogeneity on popularity. In particular, we use that page for a comparative analysis of information consumption patterns with respect to pages posting science and conspiracy news. In total, we analyze about 2M likes and 190K comments, made by approximately 340K and 65K users, respectively. We conclude the paper by introducing a model mimicking users selection preferences accounting for the heterogeneity of contents.

Innovation and nested preferential growth in chess playing behavior

2014-12-04T10:56:07Z

Complexity develops via the incorporation of innovative properties. Chess is one of the most complex strategy games, where expert contenders exercise decision making by imitating old games or introducing innovations. In this work, we study innovation in chess by analyzing how different move sequences are played at the population level. It is found that the probability of exploring a new or innovative move decreases as a power law with the frequency of the preceding move sequence. Chess players also exploit already known move sequences according to their frequencies, following a preferential growth mechanism. Furthermore, innovation in chess exhibits Heaps' law suggesting similarities with the process of vocabulary growth. We propose a robust generative mechanism based on nested Yule-Simon preferential growth processes that reproduces the empirical observations. These results, supporting the self-similar nature of innovations in chess are important in the context of decision making in a competitive scenario, and extend the scope of relevant findings recently discovered regarding the emergence of Zipf's law in chess.

Memory and long-range correlations in chess games

2014-12-04T10:44:08Z

In this paper we report the existence of long-range memory in the opening moves of a chronologically ordered set of chess games using an extensive chess database. We used two mapping rules to build discrete time series and analyzed them using two methods for detecting long-range correlations; rescaled range analysis and detrended fluctuation analysis. We found that long-range memory is related to the level of the players. When the database is filtered according to player levels we found differences in the persistence of the different subsets. For high level players, correlations are stronger at long time scales; whereas in intermediate and low level players they reach the maximum value at shorter time scales. This can be interpreted as a signature of the different strategies used by players with different levels of expertise. These results are robust against the assignation rules and the method employed in the analysis of the time series.

Tail-scope: using friends to estimate heavy tails of degree distributions in large-scale complex networks

2014-12-03T14:06:34Z

Many complex networks in natural and social phenomena have often been characterized by heavy-tailed degree distributions. However, due to rapidly growing size of network data and concerns on privacy issues about using these data, it becomes more difficult to analyze complete data sets. Thus, it is crucial to devise effective and efficient estimation methods for heavy tails of degree distributions in large-scale networks only using local information of a small fraction of sampled nodes. Here we propose a tail-scope method based on local observational bias of the friendship paradox. We show that the tail-scope method outperforms the uniform node sampling for estimating heavy tails of degree distributions, while the opposite tendency is observed in the range of small degrees. In order to take advantages of both sampling methods, we devise the hybrid method that successfully recovers the whole range of degree distributions. Our tail-scope method shows how structural heterogeneities of large-scale complex networks can be used to effectively reveal the network structure only with limited local information

Interactions of cultures and top people of Wikipedia from ranking of 24 language editions

2014-12-03T14:04:42Z

Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtain the top 100 historical figures, for each edition and for each algorithm. We investigate their spatial, temporal, and gender distributions in dependence of their cultural origins. Our study demonstrates not only the existence of skewness with local figures, mainly recognized only in their own cultures, but also the existence of global historical figures appearing in a large number of editions. By determining the birth time and place of these persons, we perform an analysis of the evolution of such figures through 35 centuries of human history for each language, thus recovering interactions and entanglement of cultures over time. We also obtain the distributions of historical figures over world countries, highlighting geographical aspects of cross-cultural links. Considering historical figures who appear in multiple editions as interactions between cultures, we construct a network of cultures and identify the most influential cultures according to this network.

Generalized friendship paradox in networks with tunable degree-attribute correlation

2014-12-03T13:52:55Z

One of the interesting phenomena due to topological heterogeneities in complex networks is the friendship paradox: Your friends have on average more friends than you do. Recently, this paradox has been generalized for arbitrary node attributes, called the generalized friendship paradox (GFP). The origin of GFP at the network level has been shown to be rooted in positive correlations between degrees and attributes. However, how the GFP holds for individual nodes needs to be understood in more detail. For this, we first analyze a solvable model to characterize the paradox holding probability of nodes for the uncorrelated case. Then we numerically study the correlated model of networks with tunable degree-degree and degree-attribute correlations. In contrast to the network level, we find at the individual level that the relevance of degree-attribute correlation to the paradox holding probability may depend on whether the network is assortative or dissortative. These findings help us to understand the interplay between topological structure and node attributes in complex networks.

Google matrix of the citation network of Physical Review

2014-12-03T13:43:24Z

We study the statistical properties of spectrum and eigenstates of the Google matrix of the citation network of Physical Review for the period 1893–2009. The main fraction of complex eigenvalues with largest modulus is determined numerically by different methods based on high-precision computations with up to p=16384 binary digits that allow us to resolve hard numerical problems for small eigenvalues. The nearly nilpotent matrix structure allows us to obtain a semianalytical computation of eigenvalues. We find that the spectrum is characterized by the fractal Weyl law with a fractal dimension df≈1. It is found that the majority of eigenvectors are located in a localized phase. The statistical distribution of articles in the PageRank-CheiRank plane is established providing a better understanding of information flows on the network. The concept of ImpactRank is proposed to determine an influence domain of a given article. We also discuss the properties of random matrix models of Perron-Frobenius operators.

Generalized friendship paradox in complex networks: the case of scientific collaboration

2014-12-03T11:47:33Z

The friendship paradox states that your friends have on average more friends than you have. Does the paradox “hold” for other individual characteristics like income or happiness? To address this question, we generalize the friendship paradox for arbitrary node characteristics in complex networks. By analyzing two coauthorship networks of Physical Review journals and Google Scholar profiles, we find that the generalized friendship paradox (GFP) holds at the individual and network levels for various characteristics, including the number of coauthors, the number of citations, and the number of publications. The origin of the GFP is shown to be rooted in positive correlations between degree and characteristics. As a fruitful application of the GFP, we suggest effective and efficient sampling methods for identifying high characteristic nodes in large-scale networks. Our study on the GFP can shed lights on understanding the interplay between network structure and node characteristics in complex networks.

Time evolution of Wikipedia network ranking

2014-12-03T11:36:48Z

We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003–2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007–2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period.

Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles

2014-12-03T11:20:09Z

How different cultures evaluate a person? Is an important person in one culture is also important in the other culture? We address these questions via ranking of multilingual Wikipedia articles. With three ranking algorithms based on network structure of Wikipedia, we assign ranking to all articles in 9 multilingual editions of Wikipedia and investigate general ranking structure of PageRank, CheiRank and 2DRank. In particular, we focus on articles related to persons, identify top 30 persons for each rank among different editions and analyze distinctions of their distributions over activity fields such as politics, art, science, religion, sport for each edition. We find that local heroes are dominant but also global heroes exist and create an effective network representing entanglement of cultures. The Google matrix analysis of network of cultures shows signs of the Zipf law distribution. This approach allows to examine diversity and shared characteristics of knowledge organization between cultures. The developed computational, data driven approach highlights cultural interconnections in a new perspective.

Characterizing and modeling citation dynamics

2014-12-02T15:39:26Z

Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.

How citation boosts promote scientific paradigm shifts and Nobel Prizes

2014-12-02T15:33:39Z

Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boost the citation rates of their previous publications. Given that innovations must outcompete the rich-gets-richer effect for scientific citations, it turns out that they can make their way only through citation cascades. A quantitative analysis reveals how and why they happen. Science appears to behave like a self-organized critical system, in which citation cascades of all sizes occur, from continuous scientific progress all the way up to scientific revolutions, which change the way we see our world. Measuring the “boosting effect” of landmark papers, our analysis reveals how new ideas and new players can make their way and finally triumph in a world dominated by established paradigms. The underlying “boost factor” is also useful to discover scientific breakthroughs and talents much earlier than through classical citation analysis, which by now has become a widespread method to measure scientific excellence, influencing scientific careers and the distribution of research funds. Our findings reveal patterns of collective social behavior, which are also interesting from an attention economics perspective. Understanding the origin of scientific authority may therefore ultimately help to explain how social influence comes about and why the value of goods depends so strongly on the attention they attract.

The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile

2014-11-24T08:25:15Z

We present a simple generalization of Hirsch's h-index, Z \equiv \sqrt{h^{2}+C}/\sqrt 5, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^{2})/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of Z and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists.

Science vs Conspiracy: collective narratives in the age of (mis)information

2014-09-02T11:09:25Z

The large availability of user provided contents on online social media facilitates people aggregation around common interests, worldviews and narratives. However, in spite of the enthusiastic rhetoric about the so called {\em wisdom of crowds}, unsubstantiated rumors -- as alternative explanation to main stream versions of complex phenomena -- find on the Web a natural medium for their dissemination. In this work we study, on a sample of 1.2 million of individuals, how information related to very distinct narratives -- i.e. main stream scientific and alternative news -- are consumed on Facebook. Through a thorough quantitative analysis, we show that distinct communities with similar information consumption patterns emerge around distinctive narratives. Moreover, consumers of alternative news (mainly conspiracy theories) result to be more focused on their contents, while scientific news consumers are more prone to comment on alternative news. We conclude our analysis testing the response of this social system to 4709 troll information -- i.e. parodistic imitation of alternative and conspiracy theories. We find that, despite the false and satirical vein of news, usual consumers of conspiracy news are the most prone to interact with them.

The Italian primary school-size distribution and the city-size: a complex nexus

2014-06-24T12:20:11Z

We characterize the statistical law according to which Italian primary school-size distributes. We find that the school-size can be approximated by a log-normal distribution, with a fat lower tail that collects a large number of very small schools. The upper tail of the school-size distribution decreases exponentially and the growth rates are distributed with a Laplace PDF. These distributions are similar to those observed for firms and are consistent with a Bose-Einstein preferential attachment process. The body of the distribution features a bimodal shape suggesting some source of heterogeneity in the school organization that we uncover by an in-depth analysis of the relation between schools-size and city-size. We propose a novel cluster methodology and a new spatial interaction approach among schools which outline the variety of policies implemented in Italy. Different regional policies are also discussed shedding lights on the relation between policy and geographical features.

Bootstrapping topological properties and systemic risk of complex networks using the fitness model

2014-06-16T11:16:52Z

In this paper we present a novel method to reconstruct global topological properties of a complex network starting from limited information. We assume to know for all the nodes a non-topological quantity that we interpret as fitness. In contrast, we assume to know the degree, i.e. the number of connections, only for a subset of the nodes in the network. We then use a fitness model, calibrated on the subset of nodes for which degrees are known, in order to generate ensembles of networks. Here, we focus on topological properties that are relevant for processes of contagion and distress propagation in networks, i.e. network density and k-core structure, and we study how well these properties can be estimated as a function of the size of the subset of nodes utilized for the calibration. Finally, we also study how well the resilience to distress propagation in the network can be estimated using our method. We perform a first test on ensembles of synthetic networks generated with the Exponential Random Graph model, which allows to apply common tools from statistical mechanics. We then perform a second test on empirical networks taken from economic and financial contexts. In both cases, we find that a subset as small as 10 % of nodes can be enough to estimate the properties of the network along with its resilience with an error of 5 %.

Network communities within and across borders

2014-04-01T12:19:37Z

We investigate the impact of borders on the topology of spatially embedded networks. Indeed territorial subdivisions and geographical borders significantly hamper the geographical span of networks thus playing a key role in the formation of network communities. This is especially important in scientific and technological policy-making, highlighting the interplay between pressure for the internationalization to lead towards a global innovation system and the administrative borders imposed by the national and regional institutions. In this study we introduce an outreach index to quantify the impact of borders on the community structure and apply it to the case of the European and US patent co-inventors networks. We find that (a) the US connectivity decays as a power of distance, whereas we observe a faster exponential decay for Europe; (b) European network communities essentially correspond to nations and contiguous regions while US communities span multiple states across the whole country without any characteristic geographic scale. We confirm our findings by means of a set of simulations aimed at exploring the relationship between different patterns of cross-border community structures and the outreach index.

A Probabilistic Voting Model of Social Security: The Role of Myopic Agents

2013-07-17T10:41:54Z

This paper investigates the political incentives for the design of social security policy in competitive democracies with both far-sighted and myopic households. The social security scheme depends on both a payroll tax rate which determines the size of the pension and a Bismarckian factor that represents its redistributive component. By considering a probabilistic voting setting of electoral competition, we analyze the political game between office-seeking politicians and self-interested citizens. Politicians can win the election by targeting the voters in each group by trading off the generosity and the redistribution degree of the public pension system. In the political equilibrium, the contribution rate is U-shaped with respect to the Bismarckian factor. Moreover, the equilibrium Bismarckian factor unambiguously decreases with the proportion of myopic agents, whereas the equilibrium payroll tax rate curve is U-shaped with respect to the proportion of myopic agents.

Study of Discrete Choice Models and Adaptive Neuro-Fuzzy Inference System in the Prediction of Economic Crisis Periods in USA

2013-07-10T10:59:17Z

In this study two approaches are applied for the prediction of the economic recession or expansion periods in USA. The first approach includes Logit and Probit models and the second is an Adaptive Neuro-Fuzzy Inference System (ANFIS) with Gaussian and Generalized Bell membership functions. The in-sample period 1950-2006 is examined and the forecasting performance of the two approaches is evaluated during the out-of sample period 2007-2010. The estimation results show that the ANFIS model outperforms the Logit and Probit model. This indicates that neuro-fuzzy model provides a better and more reliable signal on whether or not a financial crisis will take place.

Ordinary Least Squares and Genetic Algorithms Optimization in Smoothing Transition Autoregressive (STAR) Models

2013-07-10T10:42:01Z

Abstract—In this paper we present, propose and examine additional membership functions as also we propose least squares with genetic algorithms optimization in order to find the optimum fuzzy membership functions parameters. More specifically, we present the tangent hyperbolic, Gaussian and Generalized bell functions. The reason we propose that is because Smoothing Transition Autoregressive (STAR) models follow fuzzy logic approach therefore more functions should be tested. Some numerical applications for S&P 500, FTSE 100 stock returns and for unemployment rate are presented and MATLAB routines are provided

Application of Adaptive Νeuro-Fuzzy Inference System in Interest Rates Effects on Stock Returns

2013-07-10T09:46:59Z

In the current study we examine the effects of interest rate changes on common stock returns of Greek banking sector. We examine the Generalized Autoregressive Heteroskedasticity (GARCH) process and an Adaptive Neuro-Fuzzy Inference System (ANFIS). The conclusions of our findings are that the changes of interest rates, based on GARCH model, are insignificant on common stock returns during the period we examine. On the other hand, with ANFIS we can get the rules and in each case we can have positive or negative effects depending on the conditions and the firing rules of inputs, which information is not possible to be retrieved with the traditional econometric modelling. Furthermore we examine the forecasting performance of both models and we conclude that ANFIS outperforms GARCH model in both in-sample and out-of-sample periods.

Application of a Modified Generalized Regression Neural Networks Algorithm in Economics and Finance

2013-07-10T09:32:33Z

In this paper we propose an alternative and modified Generalized Regression Neural Networks Autoregressive model (GRNN-AR) in S&P 500 and FTSE 100 index returns, as also in Gross domestic product growth rate of Italy, USA and UK. We compare the forecasts with Generalized Autoregressive conditional Heteroskedasticity (GARCH) and Autoregressive Integrated Moving Average (ARIMA) models. The results indicate that GRNN outperform significant the conventional econometric models and can be an efficient alternative tool for forecasting. The MATLAB algorithm we propose is provided in appendix for further applications, suggestions, modifications and improvements.

Smoothing Transition Autoregressive (STAR) Models for the Day of the Week Effect : An Application to S&P 500 Stock Index

2013-07-09T13:40:34Z

MATLAB Applications of Trading Rules and GARCH with Wavelets Analysis

2013-07-08T13:58:37Z

In this paper we provide MATLAB routines for two major used trading rules, the moving average indicator and MACD oscillator as also the GARCH univariate regression with Monte Carlo simulations and wavelets decomposition, which is an update of an older algorithm

MATLAB Routine for Bootstrapping Statistic Hypothesis for Calendar Effects in Stock Returns

2013-07-08T13:49:35Z

This paper presents a programming routine in MATLAB software for applications in calendar effects or anomalies in stock returns. The calendar effects which are tested is the turn-of-the-month, the day-of-the-Week, the month-of-the-Year and the semi-month effect.

ARIMA and Neural Networks: An Application to the Real GNP Growth Rate and the Unemployment Rate of U.S.A.

2013-07-08T13:33:12Z

This paper examines the estimation and forecasting performance of ARIMA models in comparison with some of the most popular and common models of neural networks. Specifically we provide the estimation results of AR-GRNN (Generalized regression neural networks) and the AR-RBF (Radial basis function). We show that neural networks models outperform the ARIMA forecasting. We found that the best model in the case of real US GNP is the AR-GRNN and for US unemployment rate is the AR-MLP.

Antitrust?, Grazie, abbiamo altri impegni!

2013-05-31T08:42:21Z

The paper critically analyzes the enforcement practice of the competition authority as related to the use of commitments decisions to conclude an antitrust investigation. In this perspective – and to derive conclusions about the use, suitability and appropriateness of this legal instrument – a quantitative and qualitative analysis is carried out paying specific attention to its most relevant variables: the adoption rate, the comparison with the EU practice, the typology of commitments and their consistency with the ‘concerns’ detected in the preliminary investigation phases, the rate of presentation and acceptance, and finally the efficiency of this process in leading to the formal conclusion of the case as compared to the ordinary one. On the basis of the findings of such an analysis, operated since the enactment of the legal rule, it is then assessed the change in the authority enforcement practice induced by an excessive expansion of the scope of the commitments decisions, with the view to suggest a discontinuity.

Analysis of a Hurst parameter estimator based on the modified Allan variance

2013-05-14T08:41:54Z

In order to estimate the Hurst parameter of Internet traffic data, it has been recently proposed a log-regression estimator based on the so-called modified Allan variance (MAVAR). Simulations have shown that this estimator achieves higher accuracy and better confidence when compared with an other method of common use based on wavelet analysis. Here we link it to the wavelets setting and stress why a different analysis for the two approaches is required. We then focus on the asymptotic analysis of the MAVAR log-regression estimator and provide new formulas for the related confidence intervals. By numerical evaluation, we analyze these formulas and make a comparison between three suitable choices on the regression weights, also optimizing over different choices on the data progression.

Evolution of controllability in interbank networks

2013-04-15T13:48:49Z

The Statistical Physics of Complex Networks has recently provided new theoretical tools for policy makers. Here we extend the notion of network controllability to detect the financial institutions, i.e. the drivers, that are most crucial to the functioning of an interbank market. The system we investigate is a paradigmatic case study for complex networks since it undergoes dramatic structural changes over time and links among nodes can be observed at several time scales. We find a scale-free decay of the fraction of drivers with increasing time resolution, implying that policies have to be adjusted to the time scales in order to be effective. Moreover, drivers are often not the most highly connected “hub” institutions, nor the largest lenders, contrary to the results of other studies. Our findings contribute quantitative indicators which can support regulators in developing more effective supervision and intervention policies.

Weighted Networks as Randomly Reinforced Urn Processes

2013-03-04T08:37:55Z

We analyze weighted networks as randomly reinforced urn processes, in which the edge-total weights are determined by a reinforcement mechanism. We develop a new statistical test and a new procedure, based on it, to study the evolution of networks over time, detecting the “dominance” of some edges with respect to the others and then assessing if a given instance of the network is taken at its steady state or not. Distance from the steady state can be considered as a measure of the relevance of the observed properties of the network. Our results are quite general, in the sense that they are not based on a particular probability distribution or functional form of the random weights. Moreover, the proposed tool can be applied also to dense networks, which have received little attention by network community so far since they are often problematic. We apply our procedure in the context of the International Trade Network, determining a core of “dominant edges”.

Metamodel variability analysis combining bootstrapping and validation techniques

2012-12-14T08:26:02Z

Research on metamodel-based optimization has received considerably increasing interest in recent years, and has found successful applications in solving computationally expensive problems. The joint use of computer simulation experiments and metamodels introduces a source of uncertainty that we refer to as metamodel variability. To analyze and quantify this variability, we apply bootstrapping to residuals derived as prediction errors computed from cross-validation. The proposed method can be used with different types of metamodels, especially when limited knowledge on parameters’ distribution is available or when a limited computational budget is allowed. Our preliminary experiments based on the robust version of the EOQ model show encouraging results.

Robust optimization in simulation: Taguchi and Krige combined

2012-11-29T16:39:01Z

Optimization of simulated systems is the goal of many methods, but most methods assume known environments. We, however, develop a "robust" methodology that accounts for uncertain environments. Our methodology uses Taguchi's view of the uncertain world but replaces his statistical techniques by design and analysis of simulation experiments based on Kriging (Gaussian process model); moreover, we use bootstrapping to quantify the variability in the estimated Kriging metamodels. In addition, we combine Kriging with nonlinear programming, and we estimate the Pareto frontier. We illustrate the resulting methodology through economic order quantity (EOQ) inventory models. Our results suggest that robust optimization requires order quantities that differ from the classic EOQ. We also compare our results with results we previously obtained using response surface methodology instead of Kriging.

Asymptotic Normality of a Hurst Parameter Estimator Based on the Modified Allan Variance

2012-11-28T13:24:24Z

In order to estimate the memory parameter of Internet traffic data, it has been recently proposed a log-regression estimator based on the so-called modified Allan variance (MAVAR). Simulations have shown that this estimator achieves higher accuracy and better confidence when compared with other methods. In this paper we present a rigorous study of the MAVAR log-regression estimator. In particular, under the assumption that the signal process is a fractional Brownian motion, we prove that it is consistent and asymptotically normally distributed. Finally, we discuss its connection with the wavelets estimators.

Evolution of the Dependence of Residual Lifetimes

2012-09-19T10:28:08Z

We investigate the dependence properties of a vector of residual lifetimes by means of the copula associated with the conditional distribution function. In particular, the evolution of positive dependence properties (like quadrant dependence and total positivity) are analyzed and expressions for the evolution of measures of association are given.

Tomas Bjork, A geometric view of the term structure of interest rates, Cattedra Galileiana 2000 (Lecture notes written by Irene Crimaldi)

2012-08-10T08:02:12Z

This set of lecture notes is the outcome of a lecture series, given in April 2000 by Prof. Tomas Bjork while holding the "Cattedra Galileiana" at Scuola Normale Superiore in Pisa. The purpose of the lectures was to give an overview of some recent work concerning structural properties of the evolution of the forward rate curve in an arbitrage free bond market.

DebtRank: Too Central to Fail? Financial Networks, the FED and Systemic Risk

2012-08-07T07:49:34Z

Systemic risk, here meant as the risk of default of a large portion of the financial system, depends on the network of financial exposures among institutions. However, there is no widely accepted methodology to determine the systemically important nodes in a network. To fill this gap, we introduce, DebtRank, a novel measure of systemic impact inspired by feedback-centrality. As an application, we analyse a new and unique dataset on the USD 1.2 trillion FED emergency loans program to global financial institutions during 2008–2010. We find that a group of 22 institutions, which received most of the funds, form a strongly connected graph where each of the nodes becomes systemically important at the peak of the crisis. Moreover, a systemic default could have been triggered even by small dispersed shocks. The results suggest that the debate on too-big-to-fail institutions should include the even more serious issue of too-central-to-fail.

Statistical agent based modelization of the phenomenon of drug abuse

2012-07-26T07:51:58Z

We introduce a statistical agent based model to describe the phenomenon of drug abuse and its dynamical evolution at the individual and global level. The agents are heterogeneous with respect to their intrinsic inclination to drugs, to their budget attitude and social environment. The various levels of drug use were inspired by the professional description of the phenomenon and this permits a direct comparison with all available data. We show that certain elements have a great importance to start the use of drugs, for example the rare events in the personal experiences which permit to overcame the barrier of drug use occasionally. The analysis of how the system reacts to perturbations is very important to understand its key elements and it provides strategies for effective policy making. The present model represents the first step of a realistic description of this phenomenon and can be easily generalized in various directions.

Can persistent Epstein–Barr virus infection induce chronic fatigue syndrome as a Pavlov reflex of the immune response?

2012-07-20T09:51:02Z

Chronic fatigue syndrome is a protracted illness condition (lasting even years) appearing with strong flu symptoms and systemic defiances by the immune system. Here, by means of statistical mechanics techniques, we study the most widely accepted picture for its genesis, namely a persistent acute mononucleosis infection, and we show how such infection may drive the immune system towards an out-of-equilibrium etastable state displaying chronic activation of both humoral and cellular responses (a state of full inflammation without a direct ‘causes–effect’ reason). By exploiting a bridge with a neural scenario, we mirror killer lymphocytes TK and B cells to neurons and helper lymphocytes and to synapses, hence showing that the immune system may experience the Pavlov conditional reflex phenomenon: if the exposition to a stimulus (Epstein–Barr virus antigens) lasts for too long, strong internal correlations among B,TK and TH may develop ultimately resulting in a persistent activation even though the stimulus itself is removed. These outcomes are corroborated by several experimental findings.

Absorptive Capacity and Efficiency: A Comparative Stochastic Frontier Approach Using Sectoral Data

2012-06-07T10:17:05Z

In this paper, we investigate differences in and determinants of technical efficiency across three groups of OECD, Asian and Latin American countries. As technical efficiency determines the capacity with which countries absorb technology produced abroad, these differences are important to understand differences in growth and productivity across countries, especially for developing countries which depend to a large extend on foreign technology. Using a stochastic frontier framework and data for 22 manufacturing sectors for 1996-2005, we find notable differences in technical efficiency between the three country groups we examine. We then investigate the effect of human capital and domestic R&D, proxied by the stock of patents, on technical efficiency. We find that while human capital has always a strongly positive effect on efficiency, an increase in the stock of patents has positive effects on efficiency in high-tech sectors, but negative effects in low-tech sectors.

Ageing and risk aspects in predictive inference based on proportional Hazard Models

2012-05-15T10:59:51Z

Proportional Hazard Models arise from a straightforward generalization of the simple case of conditionally i.i.d., exponentially distributed random variables and, in a sense, can be considered as the idealized models in the statistical analysis of failure and survival data for lifetimes. For these reasons, they have been extensively studied in the literature. Despite of the richness of related contributions, there are still special aspects of these models that are worthwhile focusing. In this discussion paper we aim to present some contributions, in the frame of a Bayesian approach and by using some very basic notions of stochastic ordering.

Interactions between ageing and risk properties in the analysis of burn-in problems

2012-05-08T08:52:38Z

Several relevant problems in reliability can be looked at as problems of risk management and of decisions in the face of uncertainty. However, in this frame, the so-called burn-in problem can be seen as a problem of risk taking par excellence. In this paper, we in particular point out some aspects concerning interactions between the probabilistic model for lifetimes and considerations of an economic kind. As one of the features of our work, we hinge on some unexplored connections between ageing properties of a one-dimensional survival function Formula and risk-aversion-type properties of the function u(t) = bG(t), b > 0, when the latter is seen as a utility function.

Warfare, Fiscal Capacity, and Performance

2012-03-13T12:56:40Z

We exploit differences in casualties sustained in pre-modern wars to estimate the impact of fiscal capacity on economic performance. In the past, states fought different amounts of external conflicts, of various lengths and magnitudes. To raise the revenues to wage wars, states made fiscal innovations, which persisted and helped to shape current fiscal institutions. Economic historians claim that greater fiscal capacity was the key long-run institutional change brought about by historical conflicts. Using casualties sustained in pre-modern wars to instrument for current fiscal institutions, we estimate substantial impacts of fiscal capacity on GDP per worker. The results are robust to a broad range of specifications, controls, and sub-samples.

Sex-oriented stable matchings of the marriage problem with correlated and incomplete information

2012-02-23T10:07:06Z

In the stable marriage problem two sets of agents must be paired according to mutual preferences, which may happen to conflict. We present two generalizations of its sex-oriented version, aiming to take into account correlations between the preferences of agents and costly information. Their effects are investigated both numerically and analytically.

Social network growth with assortative mixing

2012-02-14T13:19:55Z

Networks representing social systems display specific features that put them apart from biological and technological ones. In particular, the number of links attached to a node is positively correlated to that of its nearest neighbours. We develop a model that reproduces this feature, starting from microscopical mechanisms of growth. The statistical properties arising from the simulations are in good agreement with those of the real-world social networks of scientists co-authoring papers in condensed matter physics. Moreover, the model highlights the determinant role of correlations in shaping the network's topology.

The topology of shareholding networks

2012-02-03T14:52:07Z

We study the statistical properties of the network of shareholding relationships in the Italian stock market (MIB) and in two US stock markets (NYSE and NASDAQ). The networks are found to be scale free a feature which doesn't seem to be in accord with classical theories of portfolio optimization. Several statistical properties differ across markets and allow for a classification. We introduce two quantities, HI and SI, analogous to in-degree and out-degree for weighted graphs. The distribution of HI and SI allow us to characterize from a statistical perspective the individual ownership concentration of stocks and the individual power of holders. They also suggest two different global pictures of the structure of the networks: the MIB seems characterized by medium size holders controlling separate subsets of stocks, while the US markets seem characterized by very large holders sharing control over subsets of stocks.

Preferential attachment in the growth of social networks: the internet encyclopedia Wikipedia

2012-02-03T14:29:51Z

We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with those of the World Wide Web, despite the very different growth mechanism. In particular, we measure a scale-invariant distribution of the in and out degree and we are able to reproduce these features by means of a simple statistical model. As a major consequence, Wikipedia growth can be described by local rules such as the preferential attachment mechanism, though users, who are responsible of its evolution, can act globally on the network.

Temperature in complex networks

2012-02-01T16:07:06Z

Various statistical-mechanics approaches to complex networks have been proposed to describe expected topological properties in terms of ensemble averages. Here we extend this formalism by introducing the fundamental concept of graph temperature, controlling the degree of topological optimization of a network. We recover the temperature-dependent version of various important models as particular cases of our approach, and show examples where, remarkably, the onset of a percolation transition, a scale-free degree distribution, correlations and clustering can be understood as natural properties of an optimized (low-temperature) topology. We then apply our formalism to real weighted networks and we compute their temperature, finding that various techniques used to extract information from complex networks are again particular cases of our approach.

Interplay between topology and dynamics in the World Trade Web

2012-02-01T15:35:22Z

We present an empirical analysis of the network formed by the trade relationships between all world countries, or World Trade Web (WTW). Each (directed) link is weighted by the amount of wealth flowing between two countries, and each country is characterized by the value of its Gross Domestic Product (GDP). By analysing a set of year-by-year data covering the time interval 1950–2000, we show that the dynamics of all GDP values and the evolution of the WTW (trade flow and topology) are tightly coupled. The probability that two countries are connected depends on their GDP values, supporting recent theoretical models relating network topology to the presence of a `hidden' variable (or fitness). On the other hand, the topology is shown to determine the GDP values due to the exchange between countries. This leads us to a new framework where the fitness value is a dynamical variable determining, and at the same time depending on, network topology in a continuous feedback.

Self-organized network evolution coupled to extremal dynamics

2012-02-01T14:01:47Z

The interplay between topology and dynamics in complex networks is a fundamental but widely unexplored problem. Here, we study this phenomenon on a prototype model in which the network is shaped by a dynamical variable. We couple the dynamics of the Bak–Sneppen evolution model with the rules of the so-called fitness network model for establishing the topology of a network; each vertex is assigned a 'fitness', and the vertex with minimum fitness and its neighbours are updated in each iteration. At the same time, the links between the updated vertices and all other vertices are drawn anew with a fitness-dependent connection probability. We show analytically and numerically that the system self-organizes to a non-trivial state that differs from what is obtained when the two processes are decoupled. A power-law decay of dynamical and topological quantities above a threshold emerges spontaneously, as well as a feedback between different dynamical regimes and the underlying correlation and percolation properties of the network.

The Italian interbank network: statistical properties and a simple model

2012-02-01T13:51:08Z

We use the theory of complex networks in order to quantitatively characterize the structure of reciprocal expositions of Italian banks in the interbank money market market. We observe two main different strategies of banks: small banks tend to be the lender of the system, while large banks are borrowers. We propose a model to reproduce the main statistical features of this market. Moreover the network analysis allows us to investigate properties of robustness of this system.

A network analysis of the Italian overnight money market

2012-02-01T12:00:55Z

The objective of this paper is to analyse the network topology of the Italian segment of the European overnight money market through methods of statistical mechanics applied to complex networks. We investigate differences in the activities of banks of different sizes and the evolution of their connectivity structure over the maintenance period. The main purpose of the analysis is to establish the potential implications of the current institutional arrangements on the stability of the banking system and to assess the efficiency of the interbank market in terms of absence of speculative and preferential trading relationships.

Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

2012-02-01T11:47:12Z

In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.

Folksonomies and clustering in the collaborative system CiteULike

2012-02-01T11:43:36Z

We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tri-partite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient can be used to analyze the semantical patterns among tags.

Quantifying the taxonomic diversity in real species communities

2012-02-01T11:39:20Z

We analyze several florae (collections of plant species populating specific areas) in different geographic and climatic regions. For every list of species we produce a taxonomic classification tree and we consider its statistical properties. We find that regardless of the geographical location, the climate and the environment all species collections have universal statistical properties that we show to be also robust in time. We then compare observed data sets with simulated communities obtained by randomly sampling a large pool of species from all over the world. We find differences in the behavior of the statistical properties of the corresponding taxonomic trees. Our results suggest that it is possible to distinguish quantitatively real species assemblages from random collections and thus demonstrate the existence of correlations between species.

Invasion percolation and the time scaling behavior of a queuing model of human dynamics

2012-01-25T13:47:35Z

In this paper we study the properties of the Barabási model of queuing under the hypothesis that the number of tasks is steadily growing in time. We map this model exactly onto an invasion percolation dynamics on a Cayley tree. This allows us to recover the correct waiting time distribution PW(τ)~τ−3/2 at the stationary state (as observed in different realistic data) and also to characterize it as a sequence of causally and geometrically connected bursts of activity. We also find that the approach to stationarity is very slow.

Statistical regularities in the rank-citation profile of scientists

2012-01-20T08:32:48Z

Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile c_i(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c_i(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different c_i(r) profiles, our results demonstrate the utility of the bi scaling parameter in conjunction with hi for quantifying individual publication impact. We show that the total number of citations C tallied from a scientist’s N_i papers scales as C_i ~ h_i^\beta_i . Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.

Lo spazio di Skorohod

2011-11-02T11:27:05Z

Lo spazio di Skorohod, la cui introduzione risale agli anni '60 è ormai divenuto uno strumento classico nello studio di quei processi stocastici le cui traiettorie siano funzioni cadlag, ossia funzioni continue a destra e dotate di limite a sinistra in ciascun punto. Esso è ottenuto considerando, sull'insieme di tutte le funzioni di questo tipo, un'opportuna topologia (detta topologia di Skorohod). La sua utilità si rivela soprattutto nello studio di problemi di convergenza o di compattezza per leggi di processi dotati di traiettorie cadlag. Il presente lavoro vuol essere un'introduzione all'argomento. Esso si propone di rendere più chiare e più semplici alcune delle dimostrazioni presenti in [Billingsley,1968] e di puntualizzare certi particolari tecnici un po' delicati che, in alcune esposizioni, sono trattati con eccessiva disinvoltura. Particolare attenzione è rivolta alla tribù boreliana dello spazio di Skorohod.

Introduzione alla teoria delle misure aleatorie puntuali

2011-11-02T11:17:55Z

Il presente lavoro è un'introduzione alla teoria delle misure aleatorie puntuali. Gli argomenti trattati sono i seguenti: concetto di classe fondamentale, concetto di misura aleatoria, indipendenza di misure aleatorie, misure aleatorie puntuali e misure aleatorie di Poisson, esistenza di una misura aleatoria di Poisson di assegnata intensità, misure aleatorie puntuali semplici, nucleo di Poisson e misure aleatorie di Cox, estensione aleatoria di una misura puntuale, estensione aleatoria di una misura aleatoria puntuale, misure aleatorie marcate.

A dose-finding sequential method for targeting a given mean response: Up&Down experiments

2011-10-31T15:45:47Z

Tradizionalmente, gli studi dose-risposta sono esperimenti di tipo binario volti a stimare un determinato “quantile” di interesse di una curva di risposta. In questo lavoro si considera il caso in cui la risposta osservata sia una generica variabile aleatoria reale, non necessariamente dicotomica, e lo scopo dell’esperimento consiste nello stimare la dose target associata ad una preassegnata risposta media. Ripercorrendo i risultati di Giovagnoli e Pintacuda (1998) e Baldi Antognini et al (2006), viene proposta ed analizzata un’estensione randomizzata dell’algoritmo up-and-down, fornendo inoltre una procedura di stima della risposta media basata sul metodo di massima verosimiglianza.

Paul Lévy type inequalities for symmetric random variables

2011-10-31T15:19:03Z

We prove some inequalites for a jointly symmetric system of n random variables with values in a measurable group. These inequalites include, as a particular case, the classical Paul Lévy's inequalities.

Sur la tribu borélienne de l'espace de Skorohod

2011-10-31T15:12:09Z

Dans le chapitre VII de [Parthasarathy, 1967], consacré à l'espace de Skorohod, l'auteur introduit tout d'abord un espace un peu plus grand: à savoir l'espace costitué par les fonctions réglée, définies das [0,1], pour lesquelles chaque point de ]0,1[ est, soit un point de continuité à droite, soit un point de continuité à gauche (le type de continuité pouvant dépendre du point considéré). Dans le présent article, aprés avoir montré que la définition et un certain nombre de propriétés de la topologie de Skorohod s'étendent de manière quasi automatique à cet espace plus grand, on prouve qu'il n'en est pas de meme pour les propriétées concernant la tribu borélienne: dans le nouveau cadre, non seulement la tribu borélienne n'est plus engendrée par les projections canoniques, mais celles-ci ne sont meme pas boréliennes.

Convergence results for a normalized triangular array of symmetric random variables

2011-10-31T15:04:57Z

For a triangular array of symmetric random variables (without any integrability condition) we replace the classical assumption of row-wise independence by that of row-wise joint symmetry. Under this weaker assumption we prove some results concerning the convergence in distribution of a suitable sequence of randomly normalized sums to the standard normal distribution. Then we exhibit a class of row-wise independent triangular arrays for which the ordinary sums fail to converge in distribution, while our results enable us to affirm the convergence in distribution of the normalized sums.

On the behavior of the conditional expectations in Skorohod representation theorem

2011-10-31T15:01:12Z

In this paper we deal with the Skorohod representation of a given system of probability measures. More precisely, we give conditions for the existence of a Skorohod representation (X,(Xn)) with the following additional property: for each real number p⩾1 and each real random variable Z in Lp, the conditional expectation E[Z|Xn] converges in Lp to the conditional expectation E[Z|X]

Two inequalities for conditional expectations and convergence results for filters

2011-10-31T14:55:36Z

In this paper we prove, first of all, two inequalities for conditional expectations, from which we easily deduce a result by Landers and Rogge. Then we prove convergence results for conditional expectations of the form Pn [f(Xn)|Yn] to a conditional expectation of the form P [f(X)|Y]. We study, in particular, the case in which the random variables Yn Y are of the type hn (Xn), h(X)

Convergence results for multivariate martingales

2011-10-31T14:48:17Z

We present a new version of the Central Limit Theorem for multivariate martingales.

Convergence results for conditional expectations

2011-10-31T14:40:32Z

Let E,F be two Polish spaces and [Xn,Yn],[X,Y] random variables with values in E×F (not necessarily defined on the same probability space). We show some conditions which are sufficient in order to assure that, for each bounded continuous function f on E×F, the conditional expectation of f(Xn,Yn) given Yn converges in distribution to the conditional expectation of f(X,Y) given Y.

Sur l'interversion de l'ordre entre deux opérations sur les tribus

2011-10-31T14:19:25Z

We characterize the measurable spaces (Ω,A) such that, for each sub-σ-field G of A and each decreasing filtered family (F_t) of sub-σ-fields of A, with F_t ↓ F_∞, we have F_t ∨ G ↓ F_∞ ∨ G. It follows a characterization of the probability spaces (Ω, A, P) such that, for each sub-σ-field G of A and each decreasing sequence (F_n) of sub-σ-fields of A, with F_n ↓ F_∞, we have ⋂_n (F_n ∨ G) ∼ F_∞ ∨ G (mod P).

A strong form of stable convergence

2011-10-31T13:49:03Z

We introduce and study a strengthening of the notion of stable convergence.

Asymptotic results for a generalized Pólya urn with “multi-updating” and applications to clinical trials

2011-10-31T13:38:27Z

In this article, a new Pólya urn model is introduced and studied; in particular, a strong law of large numbers and two central limit theorems are proved. This urn generalizes a model studied in Berti et al. (2004), May et al. (2005), and in Crimaldi (2007), and it has natural applications in clinical trials. Indeed, the model includes both delayed and missing (or null) responses. Moreover, a connection with the conditional identity in distribution of Berti et al. (2004) is given.

Rate of convergence of predictive distributions for dependent data

2011-10-31T13:28:39Z

This paper deals with empirical processes of the type [C_{n}(B)=\sqrt{n}\{\mu_{n}(B)-P(X_{n+1}\in B\mid X_{1},\ldots,X_{n})\},\] where (Xn) is a sequence of random variables and μn=(1/n)∑i=1nδXi the empirical measure. Conditions for supB|Cn(B)| to converge stably (in particular, in distribution) are given, where B ranges over a suitable class of measurable sets. These conditions apply when (Xn) is exchangeable or, more generally, conditionally identically distributed (in the sense of Berti et al. [Ann. Probab. 32 (2004) 2029–2052]). By such conditions, in some relevant situations, one obtains that $\sup_{B}|C_{n}(B)|\stackrel{P}{\rightarrow}0$ or even that $\sqrt{n}\sup_{B}|C_{n}(B)|$ converges a.s. Results of this type are useful in Bayesian statistics.

An almost sure conditional convergence result and an application to a generalized Polya urn

2011-10-31T13:16:20Z

We prove an almost sure conditional convergence result toward a Gaussian kernel and we apply it to a two-colors randomly reinforced urn.

Central limit theorems for multicolor urns with dominated colors

2011-10-31T12:04:05Z

An urn contains balls of d≥2 colors. At each time n≥1, a ball is drawn and then replaced together with a random number of balls of the same color. Let A n = diag (An,1,…,An,d) be the n-th reinforce matrix. Assuming that EAn,j=EAn,1 for all n and j, a few central limit theorems (CLTs) are available for such urns. In real problems, however, it is more reasonable to assume that EA n,j = EA n,1 whenever n ≥ 1 and 1 ≤ j ≤ d0 , liminfn EAn,1 > limsupn EAn,j whenever j > d0 for some integer 1≤d0≤d. Under this condition, the usual weak limit theorems may fail, but it is still possible to prove the CLTs for some slightly different random quantities. These random quantities are obtained by neglecting dominated colors, i.e., colors from d0+1 to d, and they allow the same inference on the urn structure. The sequence (An : n ≥ 1) is independent but need not be identically distributed. Some statistical applications are given as well.

Conditionally identically distributed species sampling sequences

2011-10-31T11:30:01Z

In this paper the theory of species sampling sequences is linked to the theory of conditionally identically distributed sequences in order to enlarge the set of species sampling sequences which are mathematically tractable. The conditional identity in distribution (see Berti, Pratelli and Rigo (2004)) is a new type of dependence for random variables, which generalizes the well-known notion of exchangeability. In this paper a class of random sequences, called generalized species sampling sequences, is defined and a condition to have conditional identity in distribution is given. Moreover, two types of generalized species sampling sequence that are conditionally identically distributed are introduced and studied: the generalized Poisson-Dirichlet sequence and the generalized Ottawa sequence. Some examples are discussed.

A central limit theorem and its applications to multicolor randomly reinforced urns

2011-10-31T11:23:14Z

Let Xn be a sequence of integrable real random variables, adapted to a filtration (Gn). Define Cn = √{(1 / n)∑k=1nXk - E(Xn+1 | Gn)} and Dn = √n{E(Xn+1 | Gn) - Z}, where Z is the almost-sure limit of E(Xn+1 | Gn) (assumed to exist). Conditions for (Cn, Dn) → N(0, U) x N(0, V) stably are given, where U and V are certain random variables. In particular, under such conditions, we obtain √n{(1 / n)∑k=1nX_k - Z} = Cn + Dn → N(0, U + V) stably. This central limit theorem has natural applications to Bayesian statistics and urn problems. The latter are investigated, by paying special attention to multicolor randomly reinforced urns.

Robust simulation-optimization methodology

2011-08-01T13:49:28Z

This contribution summarizes a methodology for simulation optimization assuming some simulation inputs are uncertain. This methodology integrates Taguchi’s worldview (distinguishing between decision and environmental inputs), metamodeling (either Response Surface Methodology or Kriging), and mathematical programming. Instead of Taguchi’s statistical designs, this contribution uses Latin Hypercube Sampling for the environmental inputs. Mathematical programming is used to estimate the decision inputs that minimize the mean output, subject to a threshold for the standard deviation of the simulation output. Changing that threshold gives the estimated Pareto frontier. Confidence regions for the Pareto-optimal solution based on that frontier can be estimated through bootstrapping. This methodology is illustrated through Economic Order Quantity (EOQ) simulations.

Robust simulation-optimization using metamodels

2011-08-01T13:42:54Z

Optimization of simulated systems is the goal of many methods, but most methods assume known environments. In this paper we present a methodology that does account for uncertain environments. Our methodology uses Taguchi's view of the uncertain world, but replaces his statistical techniques by either Response Surface Methodology or Kriging metamodeling. We illustrate the resulting methodology through the well-known Economic Order Quantity (EOQ) model

Threshold copulas and positive dependence

2011-07-07T08:04:30Z

Starting with a notion of positive dependence View the MathML source and with the family of the lower threshold copulas Ct associated with a bivariate distribution having copula C, we define different notions of positive dependence for C, reflecting the dependence properties of the copulas Ct for some t. Then, we analyze some structural aspects of lower threshold copulas and of the given definitions. Furthermore we consider several specific cases arising from relevant special choices of View the MathML source (e.g., PQD, LTD, TP2 and PLR). Our analysis, in particular, allows us to present a number of relevant examples and counter-examples, which can be useful in the study of the tail dependence for a bivariate distribution.

Dynamics of Dependence Properties for Lifetimes Influenced by Unobservable Environmental Factors

2011-07-06T10:42:15Z

We consider non-negative conditionally independent and identically distributed random variables and analyze conditions for monotonicity of survival copulas of residual lifetimes. Concentrating attention on the bivariate copula, we compare its behavior at the instant of default with its evolution between two defaults. The assumptions for our results will be expressed in terms of conditional hazard rates.

Distorted Copulas: Constructions and Tail Dependence

2011-07-06T10:28:16Z

Given a copula C, we examine under which conditions on an order isomorphism ψ of [0, 1] the distortion C ψ: [0, 1]2 → [0, 1], C ψ(x, y) = ψ{C[ψ−1(x), ψ−1(y)]} is again a copula. In particular, when the copula C is totally positive of order 2, we give a sufficient condition on ψ that ensures that any distortion of C by means of ψ is again a copula. The presented results allow us to introduce in a more flexible way families of copulas exhibiting different behavior in the tails.

Aging functions and multivariate notions of NBU and IFR

2011-07-06T09:56:58Z

For d≥2, let X=(X1, …, Xd) be a vector of exchangeable continuous lifetimes with joint survival function $\overline{F}$. For such models, we study some properties of multivariate aging of $\overline{F}$ that are described by means of the multivariate aging function $B_{\overline{F}}$, which is a useful tool for describing the level curves of $\overline{F}$. Specifically, the attention is devoted to notions that generalize the univariate concepts of New Better than Used and Increasing Failure Rate. These multivariate notions are satisfied by random vectors whose components are conditionally independent and identically distributed having univariate conditional survival function that is New Better than Used (respectively, Increasing Failure Rate). Furthermore, they also have an interpretation in terms of comparisons among conditional survival functions of residual lifetimes, given a same history of observed survivals.

A spatial mixed Poisson framework for combination of excess-of-loss and proportional reinsurance contracts

2011-07-06T09:44:42Z

In this paper a purely theoretical reinsurance model is presented, where the reinsurance contract is assumed to be simultaneously of an excess-of-loss and of a proportional type. The stochastic structure of the set of pairs (claim’s arrival time, claim’s size) is described by a Spatial Mixed Poisson Process. By using an invariance property of the Spatial Mixed Poisson Processes, we estimate the amount that the ceding company obtains in a fixed time interval in force of the reinsurance contract.

Quantitative relations between risk, return and firm size

2011-07-04T09:21:46Z

We analyze —for a large set of stocks comprising four financial indices— the annual logarithmic growth rate R and the firm size, quantified by the market capitalization MC. For the Nasdaq Composite and the New York Stock Exchange Composite we find that the probability density functions of growth rates are Laplace ones in the broad central region, where the standard deviation σ(R), as a measure of risk, decreases with the MC as a power law σ(R)~(MC)- β. For both the Nasdaq Composite and the S&P 500, we find that the average growth rate langRrang decreases faster than σ(R) with MC, implying that the return-to-risk ratio langRrang/σ(R) also decreases with MC. For the S&P 500, langRrang and langRrang/σ(R) also follow power laws. For a 20-year time horizon, for the Nasdaq Composite we find that σ(R) vs. MC exhibits a functional form called a volatility smile, while for the NYSE Composite, we find power law stability between σ(r) and MC.

Methods for detrending success metrics to account for inflationary and deflationary factors

2011-07-04T09:21:27Z

Time-dependent economic, technological, and social factors can artificially inflate or deflate quantitative measures for career success. Here we develop and test a statistical method for normalizing career success metrics across time dependent factors. In particular, this method addresses the long standing question: how do we compare the career achievements of professional athletes from different historical eras? Developing an objective approach will be of particular importance over the next decade as major league baseball (MLB) players from the steroids era become eligible for Hall of Fame induction. Some experts are calling for asterisks (*) to be placed next to the career statistics of athletes found guilty of using performance enhancing drugs (PED). Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from a range of factors. Our methods are general, and can be extended to various arenas of competition where time-dependent factors play a key role. For five statistical categories, we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics calculated for all player careers in the 90-year period 1920–2009. We find that the functional form of these pdfs is stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional sports arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.

Methods for measuring the citations and productivity of scientists across time and discipline

2011-07-04T09:21:14Z

Publication statistics are ubiquitous in the ratings of scientific achievement, with citation counts and paper tallies factoring into an individual’s consideration for postdoctoral positions, junior faculty, and tenure. Citation statistics are designed to quantify individual career achievement, both at the level of a single publication, and over an individual’s entire career. While some academic careers are defined by a few significant papers (possibly out of many), other academic careers are defined by the cumulative contribution made by the author’s publications to the body of science. Several metrics have been formulated to quantify an individual’s publication career, yet none of these metrics account for the collaboration group size, and the time dependence of citation counts. In this paper we normalize publication metrics in order to achieve a universal framework for analyzing and comparing scientific achievement across both time and discipline. We study the publication careers of individual authors over the 50-year period 1958–2008 within six high-impact journals: CELL, the New England Journal of Medicine (NEJM), Nature, the Proceedings of the National Academy of Science (PNAS), Physical Review Letters (PRL), and Science. Using the normalized metrics (i) “citation shares” to quantify scientific success, and (ii) “paper shares” to quantify scientific productivity, we compare the career achievement of individual authors within each journal, where each journal represents a local arena for competition. We uncover quantifiable statistical regularity in the probability density function of scientific achievement in all journals analyzed, which suggests that a fundamental driving force underlying scientific achievement is the competitive nature of scientific advancement.

Applications of Statistical Physics to the Social and Economic Sciences

2011-07-04T09:18:53Z

This thesis applies statistical physics concepts and methods to quantitatively analyze socioeconomic systems. For each system we combine theoretical models and empirical data analysis in order to better understand the real-world system in relation to the complex interactions between the underlying human agents. This thesis is separated into three parts: (i) response dynamics in financial markets, (ii) dynamics of career trajectories, and (iii) a stochastic opinion model with quenched disorder. In Part I we quantify the response of U.S. markets to financial shocks, which perturb markets and trigger “herding behavior” among traders. We use concepts from earthquake physics to quantify the decay of volatility shocks after the “main shock.” We also find, surprisingly, that we can make quantitative statements even before the main shock. In order to analyze market behavior before as well as after “anticipated news” we use Federal Reserve interest-rate announcements, which are regular events that are also scheduled in advance. In Part II we analyze the statistical physics of career longevity. We construct a stochastic model for career progress which has two main ingredients: (a) random forward progress in the career and (b) random termination of the career. We incorporate the rich-get-richer (Matthew) effect into ingredient (a), meaning that it is easier to move forward in the career the farther along one is in the career. We verify the model predictions analyzing data on 400,000 scientific careers and 20,000 professional sports careers. Our model highlights the importance of early career development, showing that many careers are stunted by the relative disadvantage associated with inexperience. In Part III we analyze a stochastic two-state spin model which represents a system of voters embedded on a network. We investigate the role in consensus formation of “zealots”, which are agents with time-independent opinion. Our main result is the unexpected finding that it is the number and not the density of zealots which determines the steady-state opinion polarization. We compare our findings with results for United States Presidential elections.

Statistical properties of business firms structure and growth

2011-06-30T14:25:51Z

We analyze a database comprising quarterly sales of 55624 pharmaceutical products commercialized by 3939 pharmaceutical firms in the period 1992-2001. We study the probability density function (PDF) of growth in firms and product sales and find that the width of the PDF of growth decays with the sales as a power law with exponent Î² = 0.20 Â± 0.01. We also find that the average sales of products scales with the firm sales as a power law with exponent Î± = 0.57 Â± 0.02. And that the average number products of a firm scales with the firm sales as a power law with exponent Î³ = 0.42 Â± 0.02. We compare these findings with the predictions of models proposed till date on growth of business firms.

Reassessing the Link between Voter Heterogeneity and Political Accountability: A Latent Class Regression Model of Economic Voting

2011-02-22T15:57:22Z

While recent research has underscored the conditioning effect of individual characteristics on economic voting behavior, most empirical studies have failed to explicitly incorporate observed heterogeneity into statistical analyses linking citizens' economic evaluations to electoral choices. In order to overcome these drawbacks, we propose a latent class regression model to jointly analyze the determinants and influence of economic voting in Presidential and Congressional elections. Our modeling approach allows us to better describe the effects of individual covariates on economic voting and to test hypotheses on the existence of heterogeneous types of voters, providing an empirical basis for assessing the relative validity of alternative explanations proposed in the literature. Using survey data from the 2004 U.S. Presidential, Senate and House elections, we and that voters with college education and those more interested in political campaigns based their vote on factors other than their economic perceptions. In contrast, less educated and interested respondents assigned considerable weight to economic assessments, with sociotropic jugdgments strongly in uencing their vote in the Presidential election and personal financial considerations affecting their vote in House elections. We conclude that the main distinction in the 2004 election was not between `sociotropic' and `pocketbook' voters, but rather between `economic' and `non-economic' voters.

A Statistical Model of Abstention under Compulsory Voting

2011-02-22T15:54:52Z

Invalid voting and electoral absenteeism are two important sources of abstention in compulsory voting systems. Previous studies in this area have not considered the correlation between both variables and ignored the compositional nature of the data, potentially leading to unfeasible results and discarding helpful information from an inferential standpoint. In order to overcome these problems, this paper develops a statistical model that accounts for the compositional and hierarchical structure of the data and addresses robustness concerns raised by the use of small samples that are typical in the literature. The model is applied to analyze invalid voting and electoral absenteeism in Brazilian legislative elections between 1945 and 2006 via MCMC simulations. The results show considerable differences in the determinants of both forms of non-voting; while invalid voting was strongly positively related both to political protest and to the existence of important informational barriers to voting, the influence of these variables on absenteeism is less evident. Comparisons based on posterior simulations indicate that the model developed in this paper fits the dataset better than several alternative modeling approaches and leads to different substantive conclusions regarding the effect of different predictors on the both sources of abstention.

Structural cleavages, electoral competition and partisan divide: A Bayesian multinomial probit analysis of Chile's 2005 election

2011-02-22T15:52:33Z

The transformations in Chile's party structure since 1989 have led several authors to examine the main cleavages shaping partisan divisions and the impact of different factors on citizens' party preferences. Previous studies, however, failed to analyze the effect of these variables on actual vote choice and neglected the influence of election-specific factors. In order to address these issues, we implement a Bayesian multinomial probit model to analyze Chile's 2005 election. We show that, while both socio-demographic variables and attitudes towards democracy affected voter behavior, the latter were the main determinants of the choice between Chile's two main political coalitions. In addition, we find that the presence of a second conservative candidate, together with voters' strategic considerations, significantly affected candidate choice. These results cannot be accounted for by analyses focused on citizens' party identification or by methodologies that ignore the effect of substitution patterns between candidates on voters' electoral behavior.

Correcting for Survey Misreports Using Auxiliary Information with an Application to Estimating Turnout

2011-02-22T15:50:30Z

Misreporting is a problem that plagues researchers who use survey data. In this article, we develop a parametric model that corrects for misclassified binary responses using information on the misreporting patterns obtained from auxiliary data sources. The model is implemented within the Bayesian framework via Markov Chain Monte Carlo (MCMC) methods and can be easily extended to address other problems exhibited by survey data, such as missing response and/or covariate values. While the model is fully general, we illustrate its application in the context of estimating models of turnout using data from the American National Elections Studies.

Measurement Error and Dynamic Nonlinear Models: (Over)Estimating the Effect of Habit

2011-02-22T14:39:24Z

Estimates from non-linear models are known to be inconsistent when the dependent variable is misclassified. Although methods have been developed to correct this inconsistency in static non-linear models, no correction exists for dynamic non-linear models. This is a serious omission from the literature. Since the lagged dependent variable is an explanatory variable in dynamic models, any inconsistency that arises from misclassifcation of the dependent variable in a static non-linear model will be magnifed when that model is made dynamic. Here, we demonstrate this fact using the habitual voting literature and develop a parametric model to correct for this inconsistency. We find that, on average, estimates of habitual voting are approximately twice as large when using survey respondents' self-reports versus official records of their turnout decisions. When we apply our corrected model to respondents' self-reports, however, the estimates of habitual voting are significantly closer to those provided by the official records.