#opendaylight-devforum6: clustering and HA enhancements
Meeting started by dlenrow at 17:02:55 UTC
(full logs).
Meeting summary
-
- questions about mapping cluster instances to
devices and division of work.. Answer: that's out of scope for this
discussion. This is about the distributed data store issues
(dlenrow,
17:13:30)
- resource links at end of deck point to
documentation for those who want to learn more (dlenrow,
17:14:12)
- discussion about need and feasibility of a two
node cluster (dlenrow,
17:21:32)
- establish that most customers want N < 3 for
clustering. This is a market requirement (dlenrow,
17:22:37)
- discussion followed about different models and
limitations of two node cluster. Establshed that survivability is
less certain and failure modes worse in a two-node Akka cluster. Two
nodes can work with master/slave but we want a single model for apps
to address (dlenrow,
17:25:06)
- statement made that the app designer designs
the sharding strategy and can decide about this performance
tradeoff. (dlenrow,
17:43:42)
- suggestion that cross-shard transaction can be
disabled for folks who don't want to pay performance price.
(dlenrow,
17:44:35)
- Colin D. approach is to define shards in such a
way that they don't require distributed transactions and that we
build bounded domains of consistency with no atte3mpt at consistency
across them (dlenrow,
17:47:36)
- question about what is meant by programmatic
shard config. (dlenrow,
17:48:23)
- answer is that shard config is currently read
only once at startup. To add an app/shard to a running controller we
need a way to update config after startup. (dlenrow,
17:49:03)
- Colin: we need to get apps running with shards
to see where we have bottlenecks and what we need to do for
optimization. (dlenrow,
17:52:08)
- Jan asks should we set some performance/scaling
requirements for Lithium? (dlenrow,
17:56:55)
- Project team agrees to this (dlenrow,
17:57:20)
- 10 minute break (dlenrow,
17:57:34)
- Slide 10 Autonomous Data Replication
(dlenrow,
18:17:47)
- one of the links to resources is to the RAFT
consensus paper for those who want to learn more (dlenrow,
18:19:36)
- slide 12 shows evolution of features from
Helium to planned Lithium (dlenrow,
18:21:16)
- slide 13 shows how distributed execution is
made transparent by Akka services (dlenrow,
18:23:56)
- question: Can we use any actors external to the
controller to help us identify partition and to recover.
(dlenrow,
18:25:13)
- answer: Have looked at this idea. Not yet clear
when or how to depend/support (dlenrow,
18:25:39)
- keith burns talks on use cases related to GBP
and performance requirements. (dlenrow,
18:28:10)
- Colin D. asks for more clarification of the
config of apps. Big distirbuted app scaled out versus single
instance running in a single cluster. (dlenrow,
18:29:28)
- answer is both requirements exist (dlenrow,
18:29:37)
- bunch of discussion establishes that the answer
is very app specific (dlenrow,
18:32:39)
- Colin asks where to you want app instances and
where do you want events related to go. (dlenrow,
18:33:53)
- room says we want all of the options names.
Colin states that if you want all, you will get crappy performance
or crappy usability (dlenrow,
18:36:37)
- question AT&T What persists across total
restart (dlenrow,
18:47:41)
- answer: Was some discussion during break.
Ideally we want this configurable per shard, and may also want
consistency model and backstore config per shard (dlenrow,
18:48:19)
- clarification that Helium stuff is POC and
intended to get us to discussing the next layer of questions/answers
about what we need to build (dlenrow,
18:50:20)
- ATT What are the knobs we will be able to turn
and how will we provide feedback to designers to make sure it meets
needs (dlenrow,
18:52:30)
- answer: We need you to work with us to get this
right. If you do testing with latest code and give feedback this
will help us prioritize (dlenrow,
18:54:48)
- ATT what is deadline for input to affect
Lithium planning? (dlenrow,
18:59:30)
- answer: sooner is better. No hard deadline. 4-6
weeks likely window for impact on Lithium (dlenrow,
19:00:12)
- question: does client need to know which nodes
are up/down and worry about which node requests are directed
to? (dlenrow,
19:04:21)
- answer: We need to supplement with load
balancers and/or VRRP to deal with the changing physical
address. (dlenrow,
19:04:58)
- discussion of techniques to make instance
addresses transparent. (dlenrow,
19:07:35)
- last slide has contact emails and links to
background info. (dlenrow,
19:09:48)
Meeting ended at 19:10:07 UTC
(full logs).
Action items
- (none)
People present (lines said)
- dlenrow (51)
- odl_meetbot (3)
- dfarrell07 (2)
Generated by MeetBot 0.1.4.