#opendaylight-clustering: clustering_hackers
Meeting started by tbachman at 16:02:12 UTC
(full logs).
Meeting summary
- agenda (tbachman, 16:02:20)
  - https://meetings.opendaylight.org/opendaylight-clustering/2015/_clustering_hackers_/opendaylight-clustering-_clustering_hackers_.2015-01-20-17.25.html
    Last recorded meeting minutes (tbachman,
    16:02:46)
- https://cisco.webex.com/ciscosales/e.php?MTID=mbd68cebe2f65f57084d63329e5e49e26
    (moizer,
    16:02:49)
 
 
- ongoing work (tbachman, 16:12:21)
  - pantelis says that harmon has the heartbeat
    development — feels that we need to think about that some
    more (tbachman,
    16:12:49)
- moizer asks what we need to think about with
    heartbeating (tbachman,
    16:13:04)
- pantelis says that send/append re-evaluates the
    followers state (tbachman,
    16:13:24)
- pantelis says that’s the piece that sends the
    next snapshot, too (tbachman,
    16:13:36)
- pantelis wonders why it wasn’t sent in the
    reply, and is wary of removing the send/append entries (tbachman,
    16:13:53)
- pantelis says that for a replicate, we send it
    out to all the followers, who then persist the data, but based on
    the reader’s commit, they don’t necesarilly apply the log to the
    state machine until the leader gets consensus back and then
    commits (tbachman,
    16:14:34)
- That’s what causes the last log entry to apply
    to the state machine to the followers (tbachman,
    16:14:51)
- moizer says that this is how the algorithm
    states it (tbachman,
    16:15:00)
- pantelis says it takes 2 append entries to get
    this to the data store (tbachman,
    16:15:13)
- moizer says we don’t want to lose the
    heartbeat. Previously, we had a heartbeat timeout of 500 and an
    election timeout of double that (tbachman,
    16:15:37)
- moizer says that when you have followers who
    are lagging behind, then the 5 second heartbeat will heart
    you (tbachman,
    16:15:53)
- moizer has made this factor-configurable, and
    saw some research that used a 20x timeout for elections (tbachman,
    16:16:18)
- moizer says the variance is a random interval
    for the election timeout. If your election timeout is 1 second, then
    this is between 1 and 1.2 seconds. (tbachman,
    16:17:14)
- This minimizes clashes, and minimize interval
    to when various candidates wake up (tbachman,
    16:17:59)
- moizer says we can have an optimization for
    when the append-entries is received, rather than waiting for the
    heartbeat (tbachman,
    16:19:38)
- pantelis says we have to redo the snapshot
    chunking then (tbachman,
    16:19:58)
- moizer says that we can do the same thing —
    when you get the reply, you can do the snapshot chunk (tbachman,
    16:20:19)
- pantelis says what if we have a separate
    heartbeat actor that just sends the heartbeat message with no
    data (tbachman,
    16:21:21)
- moizer says the heartbeat message has the
    current term of the leader and the follower index (tbachman,
    16:21:37)
- It needs to get the append-entries from the
    current leader (tbachman,
    16:21:51)
- pantelis says the only time we send a new index
    is on a replicate (tbachman,
    16:24:41)
- moizer says when there’s a new follower,
    there’s a lot of entries that need to be replicated (tbachman,
    16:24:55)
- when the heartbeat reply is received from a new
    follower, then we have to send the follower a bunch of
    updates (tbachman,
    16:25:22)
- If there’s a replicate and the message gets
    lost, there is no way to send the same message (tbachman,
    16:26:30)
- pantelis says when you have 2 followers and you
    replicate, one follower responds first, we don’t need to reply to
    the second follower to get consensus; if there’s a commit
    immediately after that, then the previous entry gets sent twice to
    the same follower, which is okay, but just  not as efficient
    (tbachman,
    16:27:20)
- moizer produced a very small simulator where
    you can run with a real controller, which can help to track down
    replication issues (tbachman,
    16:33:43)
- you can connect mininet to the controller, and
    it tries to replicate the data; the simulator provides the
    acknowledgement (tbachman,
    16:34:05)
- moizer will try to check this in sometime
    today (tbachman,
    16:34:21)
- Vamsi asks why we’re doing gerrit 14658
    (tbachman,
    16:35:12)
- moizer says we’re doing this b/c timeouts
    happen (tbachman,
    16:35:36)
- Vamsi asks what the cause for the loss of the
    heartbeat message (tbachman,
    16:36:16)
- moizer says he doesn’t know — network delays
    can cause it to arrive late, or happens due to some sort of
    partitioning (tbachman,
    16:36:40)
- pantelis says spurrious re-elections is either
    the shard actor is busy (e.g. processing a very large pre-commit,
    with akka only processing one message at a time), so no heartbeat
    goes out, causing a follower to re-elect; the other is garbage
    collection and thread context switching latencies (tbachman,
    16:38:03)
- pantelis says that in bigger clusers (7, 9, 11,
    etc. nodes), then you’ll have a lot more traffic coming out
    (tbachman,
    16:40:37)
- rovarga says we need to think of 100’s of
    shards as the default scale factor; if we’re looking at say 2k
    switches in a DC, the heartbeat chatter may be prohibitive
    (tbachman,
    16:41:38)
- rovarga says we should default the Java garbage
    collector to g1gc for clustering (tbachman,
    16:42:01)
- moizer says we produce a lot of garbage, and
    g1gc has a specific amount of time it spends on GC, it allows the
    heap to grow (tbachman,
    16:42:52)
- rovarga says this is a triangle; one is the
    heap size; the occurrence of GC; and the average time the GC takes.
    You have to move within that triangle (tbachman,
    16:43:26)
- rovarga with the current config, you may run
    okay for a certain amount of time, but eventually hit a wall
    (tbachman,
    16:43:47)
- moizer says he’s worried that it doesn’t
    actually collect (tbachman,
    16:43:56)
- moizer says he’s observed that you run out of
    memory faster (tbachman,
    16:44:06)
- rovarga says he was running the in-memory data
    store and was running BGP with 1M routes, and it almost ran out of
    heap; the trace showed ~3.9GB of heap used in oldgen, and 10 second
    pause where collections were happening like crazy, and took this
    down to ~.5GB (tbachman,
    16:45:33)
- pantelis asks about the shard logging; why are
    we using the akka logger, as it doesn’t preserve the line number
    (i.e. it outputs the line number of the actor instead) (tbachman,
    16:47:31)
- moizer says the logging adapter is used just to
    make sure it’s asychronous (tbachman,
    16:48:01)
- pantelis says that karaf does that anyway
    (pax-logging to an OSGI service) (tbachman,
    16:48:15)
- pantelis says that when you do a log.error, and
    you have formatting arguments and you also want to print the
    exception, with logsf4j you have to do a string format b/c if you
    pass in e as the last argument, it won’t format it correctly, which
    the akka logger will (tbachman,
    16:50:56)
- rovarga says if you don’t mention it in the
    string formatting, it will pick it up as an exception
    formatting (tbachman,
    16:51:19)
- LOGGER.warn("Foo {}", obj, ex); (rovarga,
    16:51:49)
- pantelis asks if it’s okay to migrate logsf4j
    logging (tbachman,
    16:52:13)
- moizer says there could be a lot of changes
    there (tbachman,
    16:52:23)
- pantelis says he’s willing to make these
    changes (tbachman,
    16:52:32)
- moizer says there’s a way to have the shard
    identifier as well; once we move to logsf4j, we lose that ability as
    well (tbachman,
    16:53:01)
- ACTION: pantelis to
    create a bug to address logging (tbachman,
    16:55:21)
- moizer is pushing a patch today for
    backpressure for creation of a transaction (tbachman,
    16:56:16)
- moizer has seen a problem with statistics
    collection, and in a multi-node cluster, this takes a long time for
    the commit to go through; the statistics manager continues to try
    pushing these through, and it eventually times out (tbachman,
    16:57:04)
- rovarga says openflow doesn’t have a single
    writer per data tree — will be addressed in new openflow
    design (tbachman,
    16:58:00)
- moizer says there are 2 cases requiring
    backpressure; the BGP case, where there’s a single transaction with
    multiple data; the other is stats manager where there is a new
    transaction per data (tbachman,
    16:59:17)
- moizer wants to be able to apply a transaction
    to a state without consensus for operational date (tbachman,
    17:03:54)
- moizer asks for thoughts on doing that
    (tbachman,
    17:04:05)
- pantelis asks if that breaks RAFT (tbachman,
    17:04:11)
- moizer says that for operational data, we
    already said we break RAFT by not being persistent (tbachman,
    17:04:31)
- pantelis says that the assumption is that
    operational data can be recalculated (tbachman,
    17:04:51)
- moizer says as soon as the leader gets the
    commit, they instantly try to replicate the data (tbachman,
    17:05:06)
- rovarga says he’s not familiar enough with RAFT
    to know for sure yet; sounds a bit scary, but asks what it is that
    the applications expect, and what is inovlved in reproducing the
    operational data (tbachman,
    17:05:53)
- rovarga says that some applications might see a
    failover rather than a graceful migration, in which case the
    applications might reproduce the data somehow (tbachman,
    17:06:23)
- moizer says we can have another flag for
    this (tbachman,
    17:06:58)
- pantelis asks if we turn it on by
    default (tbachman,
    17:07:04)
- moizer says no, and we have to see how this
    works first (tbachman,
    17:07:14)
- moizer says that our data store is more of a
    strong consistency data store (tbachman,
    17:07:50)
- the operational data store has things that
    change very rapidly — this makes for an eventually consistency
    model, which allows for better performance (tbachman,
    17:08:11)
- pantelis asks if we should be putting Time
    Series Data in the data store (tbachman,
    17:08:33)
- Vamsi is deprioritizing the 2-node cluster in
    favor of stablizing basic clustering (tbachman,
    17:11:06)
- moizer asks if HP is planning to submit any
    patches (tbachman,
    17:11:15)
- Vamsi says they are looking at the order that
    they will start contributing (tbachman,
    17:11:28)
- moizer says the best thing they can do is to
    report issues you find in bugzilla, to ensure that we don’t
    duplicate the work (tbachman,
    17:11:45)
- pantelis says there’s another meeting on
    Thursday the mark had set up; (tbachman,
    17:12:00)
- moizer says that Dell wants to continue, but HP
    doesn’t (tbachman,
    17:12:09)
- https://bugs.opendaylight.org/show_bug.cgi?id=2667
    (tbachman,
    17:12:55)
- https://bugs.opendaylight.org/show_bug.cgi?id=2667
    bug reported by GBP (tbachman,
    17:13:35)
 
Meeting ended at 17:37:20 UTC
(full logs).
Action items
  - pantelis to create a bug to address logging
People present (lines said)
  - tbachman (97)
- moizer (5)
- odl_meetbot (4)
- rovarga (2)
Generated by MeetBot 0.1.4.