Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: 0.9.0
Affects Version/s: None
Component/s: None
Labels:
None

Merge is a serialised process, consisting of three phases:
- fetch from storage nodes (merge-fetch)
- backup to secondary merge nodes (merge-backup)
- distribution to frontend nodes (merge-dist)

This process, implemented in tools/merge.py, should be
parallellised.

First, break up merge.py in three programs and run them in
sequence. Let each program execute once and have the loop in the
program calling them. This will make testing easier.

Second, have the three programs communicate via files:
- merge-fetch maintains a single file 'currentsize' referring to a
  given entry in 'logorder', indicating which entries are fetched and
  sequenced so far
- merge-backup reads 'currentsize' and pushes these entries to
  secondary merge nodes, maintaining one file per secondary,
  'position.<secondary>', indicating how many entries have been copied
  and verified at the secondary in question
- merge-dist uses a new config knob 'backupquorum' and decides how
  many entries to include in a new STH by calculating
  sort(positions[backupquorum]), where positions is a list with the
  contents of 'position.<secondary>' files, one entry per file
Run the three pieces in parallell.

Third, improve merge-fetch by parallellsing it using one process per
storage node writing to a "queue info" file (storage-node, hash) and a
single queue handling process reading queue files and writing the
'currentsize' file.

Assignee:: Linus Nordberg [X] (Inactive)
Reporter:: Linus Nordberg [X] (Inactive)

Created:: 18.09.2015 14:57 UTC
Updated:: 16.11.2016 10:36 UTC
Resolved:: 25.09.2015 17:57 UTC
Archived:: 13.02.2025 07:43 UTC

Details

Description

Attachments

Activity

People

Dates