16:02:12 <tbachman> #startmeeting clustering_hackers
16:02:12 <odl_meetbot> Meeting started Tue Feb  3 16:02:12 2015 UTC.  The chair is tbachman. Information about MeetBot at http://ci.openstack.org/meetbot.html.
16:02:12 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:02:12 <odl_meetbot> The meeting name has been set to 'clustering_hackers'
16:02:16 <tbachman> #chair moizer
16:02:16 <odl_meetbot> Current chairs: moizer tbachman
16:02:20 <tbachman> #topic agenda
16:02:46 <tbachman> #link https://meetings.opendaylight.org/opendaylight-clustering/2015/_clustering_hackers_/opendaylight-clustering-_clustering_hackers_.2015-01-20-17.25.html Last recorded meeting minutes
16:02:49 <moizer> https://cisco.webex.com/ciscosales/e.php?MTID=mbd68cebe2f65f57084d63329e5e49e26
16:02:54 <tbachman> moizer: thx!
16:05:30 <tbachman> calling all cluster-ers!
16:07:21 <moizer> just sent out an email too. Let’s give it 5 mins
16:07:30 <tbachman> moizer: sounds good
16:08:03 <tbachman> moizer: is password old, or odl?
16:08:43 <moizer> odl
16:08:58 <tbachman> I think the email says old
16:09:58 <moizer> oops
16:10:42 <rovarga> will be there in about a minute
16:12:21 <tbachman> #topic ongoing work
16:12:49 <tbachman> #info pantelis says that harmon has the heartbeat development — feels that we need to think about that some more
16:13:04 <tbachman> #info moizer asks what we need to think about with heartbeating
16:13:24 <tbachman> #info pantelis says that send/append re-evaluates the followers state
16:13:36 <tbachman> #info pantelis says that’s the piece that sends the next snapshot, too
16:13:53 <tbachman> #info pantelis wonders why it wasn’t sent in the reply, and is wary of removing the send/append entries
16:14:34 <tbachman> #info pantelis says that for a replicate, we send it out to all the followers, who then persist the data, but based on the reader’s commit, they don’t necesarilly apply the log to the state machine until the leader gets consensus back and then commits
16:14:51 <tbachman> #info That’s what causes the last log entry to apply to the state machine to the followers
16:15:00 <tbachman> #info moizer says that this is how the algorithm states it
16:15:13 <tbachman> #info pantelis says it takes 2 append entries to get this to the data store
16:15:37 <tbachman> #info moizer says we don’t want to lose the heartbeat. Previously, we had a heartbeat timeout of 500 and an election timeout of double that
16:15:53 <tbachman> #info moizer says that when you have followers who are lagging behind, then the 5 second heartbeat will heart you
16:16:18 <tbachman> #info moizer has made this factor-configurable, and saw some research that used a 20x timeout for elections
16:17:14 <tbachman> #info moizer says the variance is a random interval for the election timeout. If your election timeout is 1 second, then this is between 1 and 1.2 seconds.
16:17:59 <tbachman> #info This minimizes clashes, and minimize interval to when various candidates wake up
16:19:38 <tbachman> #info moizer says we can have an optimization for when the append-entries is received, rather than waiting for the heartbeat
16:19:58 <tbachman> #info pantelis says we have to redo the snapshot chunking then
16:20:19 <tbachman> #info moizer says that we can do the same thing — when you get the reply, you can do the snapshot chunk
16:20:41 * tbachman should really read up on this code/algo — just trying to parrott here :)
16:21:21 <tbachman> #info pantelis says what if we have a separate heartbeat actor that just sends the heartbeat message with no data
16:21:37 <tbachman> #info moizer says the heartbeat message has the current term of the leader and the follower index
16:21:51 <tbachman> #info It needs to get the append-entries from the current leader
16:24:41 <tbachman> #info pantelis says the only time we send a new index is on a replicate
16:24:55 <tbachman> #info moizer says when there’s a new follower, there’s a lot of entries that need to be replicated
16:25:22 <tbachman> #info when the heartbeat reply is received from a new follower, then we have to send the follower a bunch of updates
16:26:30 <tbachman> #info If there’s a replicate and the message gets lost, there is no way to send the same message
16:27:20 <tbachman> #info pantelis says when you have 2 followers and you replicate, one follower responds first, we don’t need to reply to the second follower to get consensus; if there’s a commit immediately after that, then the previous entry gets sent twice to the same follower, which is okay, but just  not as efficient
16:27:58 * tbachman didn’t quite follow this protocol issue — sorry folks
16:33:43 <tbachman> #info moizer produced a very small simulator where you can run with a real controller, which can help to track down replication issues
16:34:05 <tbachman> #info you can connect mininet to the controller, and it tries to replicate the data; the simulator provides the acknowledgement
16:34:21 <tbachman> #info moizer will try to check this in sometime today
16:35:12 <tbachman> #info Vamsi asks why we’re doing gerrit 14658
16:35:36 <tbachman> #info moizer says we’re doing this b/c timeouts happen
16:36:16 <tbachman> #info Vamsi asks what the cause for the loss of the heartbeat message
16:36:40 <tbachman> #info moizer says he doesn’t know — network delays can cause it to arrive late, or happens due to some sort of partitioning
16:38:03 <tbachman> #info pantelis says spurrious re-elections is either the shard actor is busy (e.g. processing a very large pre-commit, with akka only processing one message at a time), so no heartbeat goes out, causing a follower to re-elect; the other is garbage collection and thread context switching latencies
16:40:37 <tbachman> #info pantelis says that in bigger clusers (7, 9, 11, etc. nodes), then you’ll have a lot more traffic coming out
16:41:38 <tbachman> #info rovarga says we need to think of 100’s of shards as the default scale factor; if we’re looking at say 2k switches in a DC, the heartbeat chatter may be prohibitive
16:42:01 <tbachman> #info rovarga says we should default the Java garbage collector to g1gc for clustering
16:42:52 <tbachman> #info moizer says we produce a lot of garbage, and g1gc has a specific amount of time it spends on GC, it allows the heap to grow
16:43:26 <tbachman> #info rovarga says this is a triangle; one is the heap size; the occurrence of GC; and the average time the GC takes. You have to move within that triangle
16:43:47 <tbachman> #info rovarga with the current config, you may run okay for a certain amount of time, but eventually hit a wall
16:43:56 <tbachman> #info moizer says he’s worried that it doesn’t actually collect
16:44:06 <tbachman> #info moizer says he’s observed that you run out of memory faster
16:45:33 <tbachman> #info rovarga says he was running the in-memory data store and was running BGP with 1M routes, and it almost ran out of heap; the trace showed ~3.9GB of heap used in oldgen, and 10 second pause where collections were happening like crazy, and took this down to ~.5GB
16:47:31 <tbachman> #info pantelis asks about the shard logging; why are we using the akka logger, as it doesn’t preserve the line number (i.e. it outputs the line number of the actor instead)
16:48:01 <tbachman> #info moizer says the logging adapter is used just to make sure it’s asychronous
16:48:15 <tbachman> #info pantelis says that karaf does that anyway (pax-logging to an OSGI service)
16:50:56 <tbachman> #info pantelis says that when you do a log.error, and you have formatting arguments and you also want to print the exception, with logsf4j you have to do a string format b/c if you pass in e as the last argument, it won’t format it correctly, which the akka logger will
16:51:19 <tbachman> #info rovarga says if you don’t mention it in the string formatting, it will pick it up as an exception formatting
16:51:49 <rovarga> #info LOGGER.warn("Foo {}", obj, ex);
16:51:55 <tbachman> rovarga: thx! :)
16:52:13 <tbachman> #info pantelis asks if it’s okay to migrate logsf4j logging
16:52:23 <tbachman> #info moizer says there could be a lot of changes there
16:52:32 <tbachman> #info pantelis says he’s willing to make these changes
16:53:01 <tbachman> #info moizer says there’s a way to have the shard identifier as well; once we move to logsf4j, we lose that ability as well
16:55:21 <tbachman> #action pantelis to create a bug to address logging
16:56:16 <tbachman> #info moizer is pushing a patch today for backpressure for creation of a transaction
16:57:04 <tbachman> #info moizer has seen a problem with statistics collection, and in a multi-node cluster, this takes a long time for the commit to go through; the statistics manager continues to try pushing these through, and it eventually times out
16:58:00 <tbachman> #info rovarga says openflow doesn’t have a single writer per data tree — will be addressed in new openflow design
16:59:17 <tbachman> #info moizer says there are 2 cases requiring backpressure; the BGP case, where there’s a single transaction with multiple data; the other is stats manager where there is a new transaction per data
17:03:54 <tbachman> #info moizer wants to be able to apply a transaction to a state without consensus for operational date
17:04:05 <tbachman> #info moizer asks for thoughts on doing that
17:04:11 <tbachman> #info pantelis asks if that breaks RAFT
17:04:31 <tbachman> #info moizer says that for operational data, we already said we break RAFT by not being persistent
17:04:51 <tbachman> #info pantelis says that the assumption is that operational data can be recalculated
17:05:06 <tbachman> #info moizer says as soon as the leader gets the commit, they instantly try to replicate the data
17:05:53 <tbachman> #info rovarga says he’s not familiar enough with RAFT to know for sure yet; sounds a bit scary, but asks what it is that the applications expect, and what is inovlved in reproducing the operational data
17:06:23 <tbachman> #info rovarga says that some applications might see a failover rather than a graceful migration, in which case the applications might reproduce the data somehow
17:06:58 <tbachman> #info moizer says we can have another flag for this
17:07:04 <tbachman> #info pantelis asks if we turn it on by default
17:07:14 <tbachman> #info moizer says no, and we have to see how this works first
17:07:50 <tbachman> #info moizer says that our data store is more of a strong consistency data store
17:08:11 <tbachman> #info the operational data store has things that change very rapidly — this makes for an eventually consistency model, which allows for better performance
17:08:33 <tbachman> #info pantelis asks if we should be putting Time Series Data in the data store
17:11:06 <tbachman> #info Vamsi is deprioritizing the 2-node cluster in favor of stablizing basic clustering
17:11:15 <tbachman> #info moizer asks if HP is planning to submit any patches
17:11:28 <tbachman> #info Vamsi says they are looking at the order that they will start contributing
17:11:45 <tbachman> #info moizer says the best thing they can do is to report issues you find in bugzilla, to ensure that we don’t duplicate the work
17:12:00 <tbachman> #info pantelis says there’s another meeting on Thursday the mark had set up;
17:12:09 <tbachman> #info moizer says that Dell wants to continue, but HP doesn’t
17:12:54 <tbachman> moizer: not sure if you saw this one, btw
17:12:55 <tbachman> https://bugs.opendaylight.org/show_bug.cgi?id=2667
17:13:35 <tbachman> #link https://bugs.opendaylight.org/show_bug.cgi?id=2667 bug reported by GBP
17:13:38 <tbachman> moizer: np!
17:36:56 <tbachman> moizer: I have to run
17:37:01 <tbachman> shall I go ahead and endmeeting?
17:37:08 <moizer> thx Tom
17:37:14 <tbachman> np!
17:37:20 <tbachman> #endmeeting