#opendaylight-clustering: clustering hackers

Meeting started by colindixon at 15:08:17 UTC (full logs).

Meeting summary

agenda bashing (colindixon, 15:08:20)
1. Jan wants an update on the progress on BUG-5421 for a the singleton app in clustering (colindixon, 15:08:51)
2. ashutosh wants to cove the leader election issues (colindixon, 15:11:16)
leader elections repeatedly happening (colindixon, 15:11:52)
1. ashutosh says they're seeing spurious elections at high loads (I thnk 2 millions of txns per second) (colindixon, 15:12:27)
2. the simplest idea seems to be to split out the heartbeat actor from the RAFT actor so it doesn't get bogged down (colindixon, 15:12:59)
3. TomP says that when they moved to doing one actor per shard, but that resulted in a huge performance hit (not sure if it was 30% performance loss or droped to 30% of performance) (colindixon, 15:15:32)
4. the result is that this protoctype was abandoned (colindixon, 15:16:13)
5. rovarga notes that the performance issues we're seeing are really an issue of us not pipelining transactions (colindixon, 15:19:38)
6. rovarga says that we have an internal queue to the CDS which tracks the transactions which have been accepted, but not yet replicated, persisted, and committed, the only issue is that right now the RAFT actor doesn't offer an asynchronous way to do perist and replicate (colindixon, 15:28:39)
7. rovarga thinks that the only things which need to be synchronously peristed are internal to the RAFT actor (though they will also force syncing the prior data to disk to maintain order as part of Akka peristence) (colindixon, 15:29:56)
8. assuming internal synchronous peristence events are relatively rare compared to user data asks (which is real) that is likely to help peformance a lot (colindixon, 15:30:30)
singleton app template progress (colindixon, 15:30:47)
1. it seems like with one comment from Robert around the API, things look good (colindixon, 15:31:04)
2. TomP says that there two other patches he needs to do to move the EOS service to the new MD-SAL APIs, which are blocked on moving those APIs now (colindixon, 15:32:06)
3. vaclav says that the new MD-SAL APIs should be merged either now, or very shortly (colindixon, 15:32:39)
4. https://git.opendaylight.org/gerrit/#/q/owner:%22Vaclav+Demcak+%253Cvaclav.demcak%2540pantheon.sk%253E%22 the work is the ones talking about Bug 5421 here (colindixon, 15:34:05)
5. https://bugs.opendaylight.org/show_bug.cgi?id=5421 this is the bug (colindixon, 15:35:15)
6. Jan asks when this will be done, TomP says his hope will be it's ready for apps to start using by the end of the week (colindixon, 15:36:05)
7. Jan is looking to have an example of something using the new APIs in the code by Boron release so people can use it (colindixon, 15:36:39)
8. Jan asks how you advertise a singleton with Blueprint, TomP says right now it's up to the EOS to make sure to ignore the advertised services on the nodes where the singleton app is running but not the "owner", in the future that could be baked into Blueprint (colindixon, 15:44:10)
9. Jan and TomP agree that baking the EOS advertising of services into Blueprint makes sense to discuss at the summit and plan for carbon (colindixon, 15:44:14)
10. there's a long discussion about the internals of clustering, TomP says that we have two rate limiters: a txn rate limiter and an operation rate limiter, Moiz said the second one was still important but less than the first (colindixon, 16:04:11)
11. TomP isn't sure if we still need the operation rate limiter now that we have batching (colindixon, 16:04:42)

Meeting ended at 16:07:56 UTC (full logs).

Action items

(none)

People present (lines said)

colindixon (28)
odl_meetbot (4)

Generated by MeetBot 0.1.4.