#opendaylight-meeting Meeting
Meeting started by colindixon at 16:07:05 UTC
(full logs).
Meeting summary
- picking up with APIC clustering (colindixon, 16:07:19)
- sharding (colindixon, 16:07:26)
- review from last time: (1) the core tree is
broken into subtrees at "context roots", (2) subtrees under context
roots are placed into shards, (3) shards are replicated 3 times with
one 1 leader and 2 followers, (4) this layout is static and not
adjusted except by modifying models—assumption is that failures are
temporary nodes will come back (colindixon,
16:10:50)
- shard layout algorithm does its best to even
out load among nodes (colindixon,
16:11:16)
- co-locating apps and data (colindixon, 16:11:52)
- again, review from last time: there is an
attempt to locate a services compute close to its data (colindixon,
16:12:54)
- right now, trying not to dictate, but the best
practice is that apps should shard their compute in the same manner
that they shard their data (colindixon,
16:14:22)
- regardless of what happens, there is an
actor/stub process co-located with each shard which proxies remote
access if nothing else (colindixon,
16:17:35)
- ken watsen asks when happens when you write,
does it get routed to right node (colindixon,
16:20:31)
- answer is yes, it's routed to the static leader
and if they're down, then it goes to the next follower (colindixon,
16:22:47)
- pub/sub on top of the data store (colindixon, 16:25:37)
- icbts asks: interesting setup for lead/follower
failover… so would apps follow a failover protocol similar to
https://activemq.apache.org/failover-transport-reference.html?
(colindixon,
16:27:07)
- colindixon responds: I think that it works by
(1) going to the leader all of the time unless he's marked as down,
(2) if he's marked is down, this is recorded and all subsequence
reads/writes go to the next follower (who is temporarily the new
leader) (colindixon,
16:27:27)
- w.r.t. to notifications, mike dvorkin points
out that it's important to make notifications idempotent so that you
can avoid certain issues, e.g., so that you can jump states in your
state machine (colindixon,
16:28:20)
- shard structure (colindixon, 16:31:53)
- each shard has a "doer" that applies changes to
to an in-memory cache (colindixon,
16:32:14)
- it also has a "persister" which pushes the
in-memory cache into some persistent DB (colindixon,
16:32:34)
- a replicator pushes changes to replicas and
stores what happens into the commit log (which is also
persisted) (colindixon,
16:33:10)
- on the side, there are a set of "dictionaries"
which act as DB indices to speed up access (colindixon,
16:34:24)
- http://hsqldb.org + JPA/JAT - fast
in memory relational SLQ, then add transactional persistence if
required. Reasonably easy to install on OSGi core. (icbts,
16:34:56)
- (there is a slide on how write transactions
work in leaders and another on replication, which we skim over to
review later in detail with Raghu and interested parties, possible
at an MD-SAL offsite) (colindixon,
16:35:38)
- comparison to other DBs (colindixon, 16:35:45)
- key differences seem to be (1) it works with
trees and (2) try to be simple and static so that people don't have
to be replicated DB admins (colindixon,
16:37:40)
- the part about trees relies on the fact that a
whole subtree always falls within a shard, so there's low overhead
to transactions, subscriptions, etc. at this subtree level
(colindixon,
16:39:08)
- no locking because there is only one writer for
a shard, and you just serialize the writes (colindixon,
16:42:31)
- general Q&A (colindixon, 16:42:40)
- colindixon says the next step would be to
figure out how our data models work and are they amenable to this
kind of subtree-based sharding (colindixon,
16:43:07)
- jan says that so far, yes, but that's that
current topology and inventory (colindixon,
16:43:36)
- colindixon asks about how the cluster
management is actually done, but this is mostly tabled for future
discussion (colindixon,
16:50:51)
- going forward (colindixon, 16:51:05)
- Jan proposes that we create some kind of
working group around starting to get implementations for the MD-SAL
datastore (colindixon,
16:51:55)
- colindixon asks when/if we can expect the APIC
cluster/datastore code to be open sourced (colindixon,
16:53:46)
- jan/mike say that it will not be, further it's
all written in C++, so this would be less useful, this presentation
is more for design ideas (colindixon,
16:54:25)
- it seems as though the "right" approach might
be to use Akka to do cluster membership and node up/down/unknown
state tracking and then using the current in-memory MD-SAL DOM data
store to store each shard (colindixon,
16:58:46)
- we will pick this up with mlemay next
week—people are heads-down on stable release stuff now—however we
really need to make sure that this gets into Helium (colindixon,
17:00:21)
Meeting ended at 17:00:27 UTC
(full logs).
Action items
- (none)
People present (lines said)
- colindixon (37)
- icbts (2)
- odl_meetbot (2)
Generated by MeetBot 0.1.4.