Parallellised merge

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major
    • 0.9.0
    • Affects Version/s: None
    • Component/s: None
    • None

      Merge is a serialised process, consisting of three phases:
      - fetch from storage nodes (merge-fetch)
      - backup to secondary merge nodes (merge-backup)
      - distribution to frontend nodes (merge-dist)

      This process, implemented in tools/merge.py, should be
      parallellised.

      First, break up merge.py in three programs and run them in
      sequence. Let each program execute once and have the loop in the
      program calling them. This will make testing easier.

      Second, have the three programs communicate via files:
      - merge-fetch maintains a single file 'currentsize' referring to a
        given entry in 'logorder', indicating which entries are fetched and
        sequenced so far
      - merge-backup reads 'currentsize' and pushes these entries to
        secondary merge nodes, maintaining one file per secondary,
        'position.<secondary>', indicating how many entries have been copied
        and verified at the secondary in question
      - merge-dist uses a new config knob 'backupquorum' and decides how
        many entries to include in a new STH by calculating
        sort(positions[backupquorum]), where positions is a list with the
        contents of 'position.<secondary>' files, one entry per file
      Run the three pieces in parallell.

      Third, improve merge-fetch by parallellsing it using one process per
      storage node writing to a "queue info" file (storage-node, hash) and a
      single queue handling process reading queue files and writing the
      'currentsize' file.

            Assignee:
            Linus Nordberg [X] (Inactive)
            Reporter:
            Linus Nordberg [X] (Inactive)
            Archiver:
            Josva Kleist

              Created:
              Updated:
              Resolved:
              Archived: