Extracting Semantics from the Enron Corpus

T. MacFarlane

Research output: Book/ReportOther report

268 Downloads (Pure)


Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measure
of attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed.
The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMS
and LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86% on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amount
of attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, career
words are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.
Original languageEnglish
Place of PublicationBath, U. K.
PublisherDepartment of Computer Science, University of Bath
Number of pages126
Publication statusPublished - Nov 2013

Publication series

NameDepartment of Computer Science Technical Report Series
ISSN (Print)1740-9497

Bibliographical note

Undergraduate Dissertation


Dive into the research topics of 'Extracting Semantics from the Enron Corpus'. Together they form a unique fingerprint.

Cite this