Searching the long tail: hidden structure in social tagging

Emma Tonkin

    Research output: Contribution to journalArticlepeer-review

    4 Citations (SciVal)

    Abstract

    In this paper we explore a method of decomposition of compound tags found in social tagging systems and outline several results, including improvement of search indexes, extraction of semantic information, and benefits to usability. Analysis of tagging habits demonstrates that social tagging systems such as del.icio.us and flickr include both formal metadata, such as geotags, and informally created metadata, such as annotations and descriptions. The majority of tags represent informal metadata; that is, they are not structured according to a formal model, nor do they correspond to a formal ontology. Statistical exploration of the main tag corpus demonstrates that such searches use only a subset of the available tags; for example, many tags are composed as ad hoc compounds of terms. In order to improve accuracy of searching across the data contained within these tags, a method must be employed to decompose compounds in such a way that there is a high degree of confidence in the result. An approach to decomposition of English-language compounds, designed for use within a small initial sample tagset, is described. Possible decompositions are identified from a generous wordlist, subject to selective lexicon snipping. In order to identify the most likely, a Bayesian classifier is used across term elements. To compensate for the limited sample set, a word classifier is employed and the results classified using a similar method, resulting in a successful classification rate of 88%, and a false negative rate of only 1%
    Original languageEnglish
    JournalAdvances in Classification Research Online
    Volume17
    Publication statusPublished - 2006
    EventProceedings of the 17th ASIS SIG/CR Classification Research Workshop -
    Duration: 1 Nov 2006 → …

    Keywords

    • information retrieval
    • metadata

    Fingerprint

    Dive into the research topics of 'Searching the long tail: hidden structure in social tagging'. Together they form a unique fingerprint.

    Cite this