7 Citations (SciVal)
117 Downloads (Pure)

Abstract

Semi-stream join algorithms join a fast stream input with a disk-based master data relation. A common class of these algorithms is derived from hash joins: they use the stream as build input for a main hash table, and also include a cache for frequent master data. The composition of the cache is very important for performance; however, the decision of which master data to cache has so far been solely based on heuristics. We present the first formal criterion, a cache inequality that leads to a provably optimal composition of the cache in a semi-stream many-to-many equijoin algorithm. We propose a novel algorithm, Semi-Stream Balanced Join (SSBJ), which exploits this cache inequality to achieve a given service rate with a provably minimal amount of memory for all stream distributions. We present a cost model for SSBJ and compare its service rate empirically and analytically with other related approaches.

Original languageEnglish
Pages (from-to)623-649
Number of pages27
JournalDistributed and Parallel Databases
Volume37
Issue number4
Early online date31 Aug 2018
DOIs
Publication statusPublished - 1 Dec 2019

Keywords

  • Cache optimization
  • Many-to-many semi-stream join
  • Performance evaluation

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'A Memory-Optimal Many-To-Many Semi-Stream Join'. Together they form a unique fingerprint.

Cite this