#opendaylight-clustering: clustering_hackers

Meeting started by moizer_ at 15:59:08 UTC (full logs).

Meeting summary

1. Gary Wu presenting information on Unified Secure Channel (moizer_, 16:06:55)
2. wants to support call home like netconf call home (moizer_, 16:07:26)
3. device needs to make an inbound call to controller (moizer_, 16:07:39)
4. device creates a call home connection (moizer_, 16:08:23)
5. this allows controller to talk to device (moizer_, 16:08:57)
6. assumptions that any node in cluster should be able to respond to a request instead of bouncing it around (moizer_, 16:11:11)
7. important that rpc request needs to be routed to the node with the connection (moizer_, 16:11:58)
8. Other considerations: scalability; should call home devices be “multi-homed” to multiple controller nodes (tbachman, 16:12:29)
9. moizer_ asks gwu if the idea is that the request to controller be bounced — is that so you don’t get a redirect? (tbachman, 16:13:08)
10. gwu says yes (tbachman, 16:13:12)
11. moizer_ says that the routed RPC mechanism should support this (tbachman, 16:14:18)
12. uchau asks in the clustering model, what happens to an OF switch when taht node goes down; needs device ownership model so that the device can work with another node in the controller (tbachman, 16:15:48)
13. gwu says when a node goes down, the device needs to reconnect with one of the other nodes (tbachman, 16:16:07)
14. uchau asks if USC was going have openflow also go through the secure channel (tbachman, 16:16:26)
15. gwu says yes (tbachman, 16:16:28)
16. uchau is interested in developing a device ownership concept, which helps provide failover direction (tbachman, 16:16:58)
17. uchau says in this case, if of connects directly or through secure channel, the ownership model is the same (tbachman, 16:17:32)
18. gwu asks how openflow deals with multihoming/mastership? (tbachman, 16:17:43)
19. uchau says the openflow team is implementing a message that allows the controller to assert the role (tbachman, 16:17:58)
20. uchau says that it can look at the device ownership when a device connects, and assert the role (tbachman, 16:18:16)
21. Helen says that clustering already has a supernode concept — asks if this is related (tbachman, 16:19:33)
22. moizer_ says for data, there is a concept of leaders and followers, but that does not mean you can go to another node to access inventory (tbachman, 16:20:53)
23. Helen asks that w/o a load balancer, is it possible for clustering to solve this problem (tbachman, 16:26:00)
24. moizer_ recommends using virtual IPs for the controller (tbachman, 16:26:18)
25. uchau says one option is to have the device connect to all the controllers in a team, which is similar to the openflow model (tbachman, 16:27:13)
26. moizer_ says one problem with using a virtual IP and load balancing is how to do keep-alives (tbachman, 16:30:28)
27. gwu asks what the scalability is of that model — how many connections can a node handle (tbachman, 16:30:58)
28. uchau says that jmedved was maybe targeting 5k, but wasn (tbachman, 16:31:20)
29. uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster (tbachman, 16:31:51)
30. Helen says that their requirement is for 1 million devices (tbachman, 16:32:04)
31. moizer_ says with clustering, we can only store that we can fit into memory (i.e. storage can’t exceed the amount of memory available) (tbachman, 16:33:27)
32. moizer_ says that’s a lot of operational data (tbachman, 16:33:31)
33. Helen says all the other data is stateless (tbachman, 16:33:42)
34. moizer_ says 1 million devices, and suspects that’s a lot of data in memory (tbachman, 16:34:02)
35. Fabiel Zuniga says that the persistence service may be able to help here (tbachman, 16:34:49)
36. markmozolewski says devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3. (tbachman, 16:35:04)
37. moizer_ recommends connecting a bunch of devices and see how things perform (tbachman, 16:36:09)
38. uchau asks if Helen wants the controller to support the load balancing, or using external load balancers (tbachman, 16:37:32)
39. uchau guesses that the 1 million nodes is to be supported by the cluster, not by a single node in the cluster (tbachman, 16:37:57)
40. moizer_ says with 64 switches in openflow, it takes about 4-1/2 MB in the data store (tbachman, 16:39:18)
41. I need to talk about bugs/patches for 10 mins (moizer_, 16:39:38)
42. catohornet asks with timeouts in the cluster — sees issue with many nodes, and where they’re configured topologically (tbachman, 16:40:08)
43. moizer_ says you don’t need to have every node fully replicated; as an example, with routing logic and 5 cluster nodes, you might choose to do replication on only 3 of the nodes (tbachman, 16:40:39)
44. gwu asks if the proposal is workable (tbachman, 16:42:16)
45. moizer_ says yes (tbachman, 16:42:18)
46. gwu was thinking of presenting statistics to the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects on data store as things scale) (tbachman, 16:42:50)
47. moizer_ says if stats colllection interval isn’t too low, then it should be okay (e.g. no client will be reading stats every 3 seconds) (tbachman, 16:43:26)

Meeting ended at 17:58:15 UTC (full logs).

Action items

(none)

People present (lines said)

tbachman (54)
moizer_ (13)
odl_meetbot (3)
markmozolewski (3)

Generated by MeetBot 0.1.4.