======================================================= #opendaylight-devforum6: clustering and HA enhancements ======================================================= Meeting started by dlenrow at 17:02:55 UTC. The full logs are available at http://meetings.opendaylight.org/opendaylight-devforum6/2014/clustering_and_ha_enhancements/opendaylight-devforum6-clustering_and_ha_enhancements.2014-09-30-17.02.log.html . Meeting summary --------------- * questions about mapping cluster instances to devices and division of work.. Answer: that's out of scope for this discussion. This is about the distributed data store issues (dlenrow, 17:13:30) * resource links at end of deck point to documentation for those who want to learn more (dlenrow, 17:14:12) * discussion about need and feasibility of a two node cluster (dlenrow, 17:21:32) * establish that most customers want N < 3 for clustering. This is a market requirement (dlenrow, 17:22:37) * discussion followed about different models and limitations of two node cluster. Establshed that survivability is less certain and failure modes worse in a two-node Akka cluster. Two nodes can work with master/slave but we want a single model for apps to address (dlenrow, 17:25:06) * statement made that the app designer designs the sharding strategy and can decide about this performance tradeoff. (dlenrow, 17:43:42) * suggestion that cross-shard transaction can be disabled for folks who don't want to pay performance price. (dlenrow, 17:44:35) * Colin D. approach is to define shards in such a way that they don't require distributed transactions and that we build bounded domains of consistency with no atte3mpt at consistency across them (dlenrow, 17:47:36) * question about what is meant by programmatic shard config. (dlenrow, 17:48:23) * answer is that shard config is currently read only once at startup. To add an app/shard to a running controller we need a way to update config after startup. (dlenrow, 17:49:03) * Colin: we need to get apps running with shards to see where we have bottlenecks and what we need to do for optimization. (dlenrow, 17:52:08) * Jan asks should we set some performance/scaling requirements for Lithium? (dlenrow, 17:56:55) * Project team agrees to this (dlenrow, 17:57:20) * 10 minute break (dlenrow, 17:57:34) * Slide 10 Autonomous Data Replication (dlenrow, 18:17:47) * one of the links to resources is to the RAFT consensus paper for those who want to learn more (dlenrow, 18:19:36) * slide 12 shows evolution of features from Helium to planned Lithium (dlenrow, 18:21:16) * slide 13 shows how distributed execution is made transparent by Akka services (dlenrow, 18:23:56) * question: Can we use any actors external to the controller to help us identify partition and to recover. (dlenrow, 18:25:13) * answer: Have looked at this idea. Not yet clear when or how to depend/support (dlenrow, 18:25:39) * keith burns talks on use cases related to GBP and performance requirements. (dlenrow, 18:28:10) * Colin D. asks for more clarification of the config of apps. Big distirbuted app scaled out versus single instance running in a single cluster. (dlenrow, 18:29:28) * answer is both requirements exist (dlenrow, 18:29:37) * bunch of discussion establishes that the answer is very app specific (dlenrow, 18:32:39) * Colin asks where to you want app instances and where do you want events related to go. (dlenrow, 18:33:53) * room says we want all of the options names. Colin states that if you want all, you will get crappy performance or crappy usability (dlenrow, 18:36:37) * question AT&T What persists across total restart (dlenrow, 18:47:41) * answer: Was some discussion during break. Ideally we want this configurable per shard, and may also want consistency model and backstore config per shard (dlenrow, 18:48:19) * clarification that Helium stuff is POC and intended to get us to discussing the next layer of questions/answers about what we need to build (dlenrow, 18:50:20) * ATT What are the knobs we will be able to turn and how will we provide feedback to designers to make sure it meets needs (dlenrow, 18:52:30) * answer: We need you to work with us to get this right. If you do testing with latest code and give feedback this will help us prioritize (dlenrow, 18:54:48) * ATT what is deadline for input to affect Lithium planning? (dlenrow, 18:59:30) * answer: sooner is better. No hard deadline. 4-6 weeks likely window for impact on Lithium (dlenrow, 19:00:12) * question: does client need to know which nodes are up/down and worry about which node requests are directed to? (dlenrow, 19:04:21) * answer: We need to supplement with load balancers and/or VRRP to deal with the changing physical address. (dlenrow, 19:04:58) * discussion of techniques to make instance addresses transparent. (dlenrow, 19:07:35) * last slide has contact emails and links to background info. (dlenrow, 19:09:48) Meeting ended at 19:10:07 UTC. People present (lines said) --------------------------- * dlenrow (51) * odl_meetbot (3) * dfarrell07 (2) Generated by `MeetBot`_ 0.1.4