eprintid: 1097 rev_number: 7 eprint_status: archive userid: 6 dir: disk0/00/00/10/97 datestamp: 2012-02-01 11:47:12 lastmod: 2012-02-01 11:47:52 status_changed: 2012-02-01 11:47:12 type: article metadata_visibility: show creators_name: Capocci, Andrea creators_name: Rao, Francesco creators_name: Caldarelli, Guido creators_id: creators_id: creators_id: guido.caldarelli@imtlucca.it title: Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia ispublished: pub subjects: HA subjects: QC divisions: EIC full_text_status: none keywords: PACS: 89.75.Hc Networks and genealogical trees; 89.75.Fb Structures and organization in complex systems; 89.75.-k Complex systems abstract: In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method. date: 2008-01 date_type: published publication: EPL (Europhysics Letters) volume: 81 number: 2 publisher: IOPscience pagerange: 28006 id_number: 10.1209/0295-5075/81/28006 refereed: TRUE issn: 0295-5075 official_url: http://dx.doi.org/10.1209/0295-5075/81/28006 citation: Capocci, Andrea and Rao, Francesco and Caldarelli, Guido Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia. EPL (Europhysics Letters), 81 (2). p. 28006. ISSN 0295-5075 (2008)