#opendaylight-clustering: clustering_hackers
Meeting started by moizer_ at 15:59:08 UTC
(full logs).
Meeting summary
-
- Gary Wu presenting information on Unified
Secure Channel (moizer_,
16:06:55)
- wants to support call home like netconf call
home (moizer_,
16:07:26)
- device needs to make an inbound call to
controller (moizer_,
16:07:39)
- device creates a call home connection
(moizer_,
16:08:23)
- this allows controller to talk to device
(moizer_,
16:08:57)
- assumptions that any node in cluster should be
able to respond to a request instead of bouncing it around
(moizer_,
16:11:11)
- important that rpc request needs to be routed
to the node with the connection (moizer_,
16:11:58)
- Other considerations: scalability; should call
home devices be “multi-homed” to multiple controller nodes
(tbachman,
16:12:29)
- moizer_ asks gwu if the idea is that the
request to controller be bounced — is that so you don’t get a
redirect? (tbachman,
16:13:08)
- gwu says yes (tbachman,
16:13:12)
- moizer_ says that the routed RPC mechanism
should support this (tbachman,
16:14:18)
- uchau asks in the clustering model, what
happens to an OF switch when taht node goes down; needs device
ownership model so that the device can work with another node in the
controller (tbachman,
16:15:48)
- gwu says when a node goes down, the device
needs to reconnect with one of the other nodes (tbachman,
16:16:07)
- uchau asks if USC was going have openflow also
go through the secure channel (tbachman,
16:16:26)
- gwu says yes (tbachman,
16:16:28)
- uchau is interested in developing a device
ownership concept, which helps provide failover direction
(tbachman,
16:16:58)
- uchau says in this case, if of connects
directly or through secure channel, the ownership model is the
same (tbachman,
16:17:32)
- gwu asks how openflow deals with
multihoming/mastership? (tbachman,
16:17:43)
- uchau says the openflow team is implementing a
message that allows the controller to assert the role (tbachman,
16:17:58)
- uchau says that it can look at the device
ownership when a device connects, and assert the role (tbachman,
16:18:16)
- Helen says that clustering already has a
supernode concept — asks if this is related (tbachman,
16:19:33)
- moizer_ says for data, there is a concept of
leaders and followers, but that does not mean you can go to another
node to access inventory (tbachman,
16:20:53)
- Helen asks that w/o a load balancer, is it
possible for clustering to solve this problem (tbachman,
16:26:00)
- moizer_ recommends using virtual IPs for the
controller (tbachman,
16:26:18)
- uchau says one option is to have the device
connect to all the controllers in a team, which is similar to the
openflow model (tbachman,
16:27:13)
- moizer_ says one problem with using a virtual
IP and load balancing is how to do keep-alives (tbachman,
16:30:28)
- gwu asks what the scalability is of that model
— how many connections can a node handle (tbachman,
16:30:58)
- uchau says that jmedved was maybe targeting 5k,
but wasn (tbachman,
16:31:20)
- uchau says that jmedved was maybe targeting 5k,
but wasn’t sure whether that was per-node or per-cluster
(tbachman,
16:31:51)
- Helen says that their requirement is for 1
million devices (tbachman,
16:32:04)
- moizer_ says with clustering, we can only store
that we can fit into memory (i.e. storage can’t exceed the amount of
memory available) (tbachman,
16:33:27)
- moizer_ says that’s a lot of operational
data (tbachman,
16:33:31)
- Helen says all the other data is
stateless (tbachman,
16:33:42)
- moizer_ says 1 million devices, and suspects
that’s a lot of data in memory (tbachman,
16:34:02)
- Fabiel Zuniga says that the persistence service
may be able to help here (tbachman,
16:34:49)
- markmozolewski says devices could maintain 1
Master / 1-2 Slave (backup) connections and establish new slave
connections as failover occurs (vs. maintaining connections to all
slaves), for cluster sizes >> 3. (tbachman,
16:35:04)
- moizer_ recommends connecting a bunch of
devices and see how things perform (tbachman,
16:36:09)
- uchau asks if Helen wants the controller to
support the load balancing, or using external load balancers
(tbachman,
16:37:32)
- uchau guesses that the 1 million nodes is to be
supported by the cluster, not by a single node in the cluster
(tbachman,
16:37:57)
- moizer_ says with 64 switches in openflow, it
takes about 4-1/2 MB in the data store (tbachman,
16:39:18)
- I need to talk about bugs/patches for 10
mins (moizer_,
16:39:38)
- catohornet asks with timeouts in the cluster —
sees issue with many nodes, and where they’re configured
topologically (tbachman,
16:40:08)
- moizer_ says you don’t need to have every node
fully replicated; as an example, with routing logic and 5 cluster
nodes, you might choose to do replication on only 3 of the
nodes (tbachman,
16:40:39)
- gwu asks if the proposal is workable
(tbachman,
16:42:16)
- moizer_ says yes (tbachman,
16:42:18)
- gwu was thinking of presenting statistics to
the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects
on data store as things scale) (tbachman,
16:42:50)
- moizer_ says if stats colllection interval
isn’t too low, then it should be okay (e.g. no client will be
reading stats every 3 seconds) (tbachman,
16:43:26)
Meeting ended at 17:58:15 UTC
(full logs).
Action items
- (none)
People present (lines said)
- tbachman (54)
- moizer_ (13)
- odl_meetbot (3)
- markmozolewski (3)
Generated by MeetBot 0.1.4.