Extracting Semantics from the Enron Corpus

T. MacFarlane

Research output: Book/ReportOther report

Abstract

Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measure
of attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed.
The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMS
and LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86% on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amount
of attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, career
words are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.
LanguageEnglish
Place of PublicationBath, U. K.
PublisherDepartment of Computer Science, University of Bath
Number of pages126
StatusPublished - Nov 2013

Publication series

NameDepartment of Computer Science Technical Report Series
No.CSBU-2013-08
ISSN (Print)1740-9497

Fingerprint

Semantics
Experiments

Cite this

MacFarlane, T. (2013). Extracting Semantics from the Enron Corpus. (Department of Computer Science Technical Report Series; No. CSBU-2013-08). Bath, U. K.: Department of Computer Science, University of Bath.

Extracting Semantics from the Enron Corpus. / MacFarlane, T.

Bath, U. K. : Department of Computer Science, University of Bath, 2013. 126 p. (Department of Computer Science Technical Report Series; No. CSBU-2013-08).

Research output: Book/ReportOther report

MacFarlane, T 2013, Extracting Semantics from the Enron Corpus. Department of Computer Science Technical Report Series, no. CSBU-2013-08, Department of Computer Science, University of Bath, Bath, U. K.
MacFarlane T. Extracting Semantics from the Enron Corpus. Bath, U. K.: Department of Computer Science, University of Bath, 2013. 126 p. (Department of Computer Science Technical Report Series; CSBU-2013-08).
MacFarlane, T. / Extracting Semantics from the Enron Corpus. Bath, U. K. : Department of Computer Science, University of Bath, 2013. 126 p. (Department of Computer Science Technical Report Series; CSBU-2013-08).
@book{0f2e15bcb60d4013b2e46201e4dcbb89,
title = "Extracting Semantics from the Enron Corpus",
abstract = "Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measureof attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed.The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMSand LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86{\%} on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amountof attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, careerwords are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.",
author = "T. MacFarlane",
note = "Undergraduate Dissertation",
year = "2013",
month = "11",
language = "English",
series = "Department of Computer Science Technical Report Series",
publisher = "Department of Computer Science, University of Bath",
number = "CSBU-2013-08",

}

TY - BOOK

T1 - Extracting Semantics from the Enron Corpus

AU - MacFarlane, T.

N1 - Undergraduate Dissertation

PY - 2013/11

Y1 - 2013/11

N2 - Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measureof attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed.The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMSand LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86% on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amountof attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, careerwords are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.

AB - Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measureof attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed.The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMSand LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86% on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amountof attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, careerwords are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.

M3 - Other report

T3 - Department of Computer Science Technical Report Series

BT - Extracting Semantics from the Enron Corpus

PB - Department of Computer Science, University of Bath

CY - Bath, U. K.

ER -