#opendaylight-integration: cluster test

Meeting started by LuisGomez at 17:03:17 UTC (full logs).

Meeting summary

review of action points (LuisGomez, 17:03:34)
1. action point 1) car-people test: (LuisGomez, 17:03:55)
2. there was an action to open a task in trello (LuisGomez, 17:05:35)
3. that is done (LuisGomez, 17:05:41)
4. nobody has jumped on that yet (LuisGomez, 17:06:02)
5. Jamo says he will be helping with docker as well as the templates for cluster test (LuisGomez, 17:07:29)
6. action point 2) generic test plan for cluster test (LuisGomez, 17:09:32)
7. this one we have not started yet but LuisGomez will try to get this rolling at latest next week (LuisGomez, 17:10:33)
8. the test plan could be a robot suite with test cases and description (LuisGomez, 17:11:47)
9. action point 3) cluster deploy scripts (LuisGomez, 17:13:13)
10. send a mail to cluster devs to check whether they have something to deploy cluster (LuisGomez, 17:13:53)
11. cluster devs do not have nothing in mind today (LuisGomez, 17:16:15)
12. the easier seems to be a scripts similar to ad-sal (LuisGomez, 17:17:15)
13. the best is to join the cluster dev call and discuss this with them (LuisGomez, 17:19:49)
14. we recommend to join the cluster dev call going forward (LuisGomez, 17:20:11)
15. https://bugs.opendaylight.org/show_bug.cgi?id=2187 (colindixon, 17:24:30)
16. action point 4) docker for cluster (LuisGomez, 17:27:57)
17. colindixon agrees with Luis's point that cluster configuration should be easier and not require a reboot to apply, but there's reasons why it's hard (colindixon, 17:27:58)
18. jamoluhrsen is working on docker stuff right now, he's keeping his head down (colindixon, 17:28:51)
19. even though it's not a blocker, the docker work is coming to a close and we risk losing more by moving to other things than by getting it completed to a good stopping point (colindixon, 17:29:38)
20. LuisGomez says he's hoping to get draft for the cluster test plan by the end of the week, but certainly he's planning to work on it as his dominant task next week (colindixon, 17:30:09)
21. LuisGomez says he'll reach out ot moiz, TomP and other clustering developers to ask what scripts exist or neeed to be written to help that (colindixon, 17:30:46)
cluster robot keywords (LuisGomez, 17:31:16)
1. PhilShea is working most only things other than ODL these days, but he developed some cluster robot keywords (colindixon, 17:32:34)
2. PhilShea asks if we could create a simple cluster doctor out of the curses cluster monitor tool (colindixon, 17:32:56)
3. jamoluhrsen says that's a great idea, colindixon concurs (colindixon, 17:33:17)
4. PhilShea notes that it's written in python which means that there are people here woul can work on it more easily (colindixon, 17:33:53)
5. PhilShea starts presenting robot keywords (also see webex recording) (colindixon, 17:36:54)
6. there's some tools around isolating a node from a cluster, located in ClusterKeywords.robot in test/csit/libraries (colindixon, 17:37:34)
7. this eventually comes down to using iptables rules to block trafic to/from that controller and checking by reading back the iptables (colindixon, 17:38:15)
8. it also has the ability to do partial reachability failures by preventing a pair of controllers to talk (colindixon, 17:39:25)
9. https://wiki.opendaylight.org/view/Meetings#Cluster_Test (colindixon, 17:40:39)
10. LuisGomez asks if we have a test which tears down a controller and then brings it up so that we can test things other than iptables issues (colindixon, 17:40:54)
11. shaleen and PhilShea note that we already have keywords to bring controllers down and up, they might need slight (but very slight modifications) (colindixon, 17:41:30)
12. there are other keywords that flush ipatables and help clean up (colindixon, 17:42:06)
13. PhilShea points out that another test case would be partitions (which is more interesting with 4+ controllers) (colindixon, 17:42:27)
14. https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot test suite which alread tests starting/stopping of controllers (colindixon, 17:42:52)
15. colindixon points out in the long-run we're going to need to test even aggressive things like cutting off power to a server (colindixon, 17:45:00)
16. colindixon points out that this is a real thing that will happen with OpenDaylight in production, even if it's hard to test (colindixon, 17:47:51)
17. PhilShea says he likes this is a good downstream test, at least for now, colindixon agrees, it's a more advanced use case and not really amenable to our test infrastructure (colindixon, 17:48:26)
18. LuisGomez asks how can we tell when a controller is actually up? jamoluhrsen says that PhilShea wrote a keyword wait for cluster sync that's related to the 140 series (colindixon, 17:50:42)
19. https://github.com/opendaylight/integration-test/blob/master/csit/libraries/ClusterKeywords.robot this is the clusterkeywords robot file (colindixon, 17:51:23)
20. PhilShea says he's never tried the wait for cluster sync (which might help in some of this) in the sandbox or ODL CI (colindixon, 17:52:10)
21. PhilShea says that we should first try to start using the new wait for cluster sync in the 140 sync since it's likely to help a lot there (colindixon, 17:53:32)
22. PhilShea says somebody should take on the 140 test, there's a method that stopped working (colindixon, 17:54:09)
23. colindixon asks if LuisGomez was asking about knowing when the cluster component was up or the controller in general (colindixon, 17:57:59)
24. LuisGomez says both, but just the clustering for now, PhilShea says the "get controller syc status" keyword should do that (colindixon, 17:58:18)
25. colindixon adds that the general case of knowing if the controller (in it's entirety) is up, is really hard (colindixon, 17:58:49)

Meeting ended at 18:01:57 UTC (full logs).

Action items

(none)

People present (lines said)

colindixon (36)
LuisGomez (20)
odl_meetbot (3)
mgkwill (2)
odp-gerritbot (2)
dfarrell07 (1)
jamoluhrsen (1)

Generated by MeetBot 0.1.4.