=======================================================
#opendaylight-devforum6: clustering and HA enhancements
=======================================================


Meeting started by dlenrow at 17:02:55 UTC.  The full logs are available
at
http://meetings.opendaylight.org/opendaylight-devforum6/2014/clustering_and_ha_enhancements/opendaylight-devforum6-clustering_and_ha_enhancements.2014-09-30-17.02.log.html
.


Meeting summary
---------------

* questions about mapping cluster instances to devices and division of
  work.. Answer: that's out of scope for this discussion. This is about
  the distributed data store issues  (dlenrow, 17:13:30)
* resource links at end of deck point to documentation for those who
  want to learn more  (dlenrow, 17:14:12)
* discussion about need and feasibility of a two node cluster  (dlenrow,
  17:21:32)
* establish that most customers want N < 3 for clustering. This is a
  market requirement  (dlenrow, 17:22:37)
* discussion followed about different models and limitations of two node
  cluster. Establshed that survivability is less certain and failure
  modes worse in a two-node Akka cluster. Two nodes can work with
  master/slave but we want a single model for apps to address  (dlenrow,
  17:25:06)
* statement made that the app designer designs the sharding strategy and
  can decide about this performance tradeoff.  (dlenrow, 17:43:42)
* suggestion that cross-shard transaction can be disabled for folks who
  don't want to pay performance price.  (dlenrow, 17:44:35)
* Colin D. approach is to define shards in such a way that they don't
  require distributed transactions and that we build bounded domains of
  consistency with no atte3mpt at consistency across them  (dlenrow,
  17:47:36)
* question about what is meant by programmatic shard config.  (dlenrow,
  17:48:23)
* answer is that shard config is currently read only once at startup. To
  add an app/shard to a running controller we need a way to update
  config after startup.  (dlenrow, 17:49:03)
* Colin: we need to get apps running with shards to see where we have
  bottlenecks and what we need to do for optimization.  (dlenrow,
  17:52:08)
* Jan asks should we set some performance/scaling requirements for
  Lithium?  (dlenrow, 17:56:55)
* Project team agrees to this  (dlenrow, 17:57:20)
* 10 minute break  (dlenrow, 17:57:34)
* Slide 10 Autonomous Data Replication  (dlenrow, 18:17:47)
* one of the links to resources is to the RAFT consensus paper for those
  who want to learn more  (dlenrow, 18:19:36)
* slide 12 shows evolution of features from Helium to planned Lithium
  (dlenrow, 18:21:16)
* slide 13  shows how distributed execution is made transparent by Akka
  services  (dlenrow, 18:23:56)
* question: Can we use any actors external to the controller to help us
  identify partition and to recover.  (dlenrow, 18:25:13)
* answer: Have looked at this idea. Not yet clear when or how to
  depend/support  (dlenrow, 18:25:39)
* keith burns talks on use cases related to GBP and performance
  requirements.  (dlenrow, 18:28:10)
* Colin D. asks for more clarification of the config of apps. Big
  distirbuted app scaled out versus single instance running in a single
  cluster.  (dlenrow, 18:29:28)
* answer is both requirements exist  (dlenrow, 18:29:37)
* bunch of discussion establishes that the answer is very app specific
  (dlenrow, 18:32:39)
* Colin asks where to you want app instances and where do you want
  events related to go.  (dlenrow, 18:33:53)
* room says we want all of the options names. Colin states that if you
  want all, you will get crappy performance or crappy usability
  (dlenrow, 18:36:37)
* question AT&T What persists across total restart  (dlenrow, 18:47:41)
* answer: Was some discussion during break. Ideally we want this
  configurable per shard, and may also want consistency model and
  backstore config per shard  (dlenrow, 18:48:19)
* clarification that Helium stuff is POC and intended to get us to
  discussing the next layer of questions/answers about what we need to
  build  (dlenrow, 18:50:20)
* ATT What are the knobs we will be able to turn and how will we provide
  feedback to designers to make sure it meets needs  (dlenrow, 18:52:30)
* answer: We need you to work with us to get this right. If you do
  testing with latest code and give feedback this will help us
  prioritize  (dlenrow, 18:54:48)
* ATT what is deadline for input to affect Lithium planning?  (dlenrow,
  18:59:30)
* answer: sooner is better. No hard deadline. 4-6 weeks likely window
  for impact on Lithium  (dlenrow, 19:00:12)
* question: does client need to know which nodes are up/down and worry
  about which node requests are directed to?  (dlenrow, 19:04:21)
* answer: We need to supplement with load balancers and/or VRRP to deal
  with the changing physical address.  (dlenrow, 19:04:58)
* discussion of techniques to make instance addresses transparent.
  (dlenrow, 19:07:35)
* last slide has contact emails and links to background info.  (dlenrow,
  19:09:48)


Meeting ended at 19:10:07 UTC.


People present (lines said)
---------------------------

* dlenrow (51)
* odl_meetbot (3)
* dfarrell07 (2)


Generated by `MeetBot`_ 0.1.4