Abstract
A little over a decade ago, Goto and van de Geijn
wrote about the importance of the treatment of the translation
lookaside buffer (TLB) on the performance of matrix multiplication
[1]. Crucially, they did not say how important, nor did
they provide results that would allow the reader to make his
own judgement. In this paper, we revisit their work and look
at the effect on the performance of their algorithm when built
with different assumed data TLB sizes. Results on three different
processors, one relatively modern, two contemporary with Goto
and van de Geijn’s writings ([1] and [2]), are examined and
compared within a real-world context. Our findings show that,
although important when aiming for a place in the TOP500 [3]
list, these features have little practical effect on the architectures
we have chosen. We conclude, then, that the importance of
the various factors, which must be taken into account when
tuning matrix multiplication (GEMM, the heart of the High
Performance LINPACK benchmark, and hence of the TOP500
table), differ dramatically relative to one another on different
processors.
wrote about the importance of the treatment of the translation
lookaside buffer (TLB) on the performance of matrix multiplication
[1]. Crucially, they did not say how important, nor did
they provide results that would allow the reader to make his
own judgement. In this paper, we revisit their work and look
at the effect on the performance of their algorithm when built
with different assumed data TLB sizes. Results on three different
processors, one relatively modern, two contemporary with Goto
and van de Geijn’s writings ([1] and [2]), are examined and
compared within a real-world context. Our findings show that,
although important when aiming for a place in the TOP500 [3]
list, these features have little practical effect on the architectures
we have chosen. We conclude, then, that the importance of
the various factors, which must be taken into account when
tuning matrix multiplication (GEMM, the heart of the High
Performance LINPACK benchmark, and hence of the TOP500
table), differ dramatically relative to one another on different
processors.
Original language | English |
---|---|
Title of host publication | 2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES) |
Place of Publication | Piscataway, NJ |
Publisher | IEEE |
Pages | 110-114 |
Number of pages | 6 |
DOIs | |
Publication status | Published - Sept 2013 |
Event | DCABES 2013 - Kingston-on-Thames, UK United Kingdom Duration: 2 Sept 2013 → 4 Sept 2013 |
Conference
Conference | DCABES 2013 |
---|---|
Country/Territory | UK United Kingdom |
City | Kingston-on-Thames |
Period | 2/09/13 → 4/09/13 |
Keywords
- Linpack performance
- TLB
ASJC Scopus subject areas
- General Computer Science
Fingerprint
Dive into the research topics of 'The changing relevance of the TLB'. Together they form a unique fingerprint.Equipment
-
High Performance Computing (HPC) Facility
Chapman, S. (Manager)
University of BathFacility/equipment: Facility