17:03:17 <LuisGomez> #startmeeting cluster test 17:03:17 <odl_meetbot> Meeting started Wed Sep 30 17:03:17 2015 UTC. The chair is LuisGomez. Information about MeetBot at http://ci.openstack.org/meetbot.html. 17:03:17 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:03:17 <odl_meetbot> The meeting name has been set to 'cluster_test' 17:03:34 <LuisGomez> #topic review of action points 17:03:55 <LuisGomez> #info action point 1) car-people test: 17:05:35 <LuisGomez> #info there was an action to open a task in trello 17:05:41 <LuisGomez> #info that is done 17:06:02 <LuisGomez> #info nobody has jumped on that yet 17:07:29 <LuisGomez> #info Jamo says he will be helping with docker as well as the templates for cluster test 17:09:32 <LuisGomez> #info action point 2) generic test plan for cluster test 17:10:33 <LuisGomez> #info this one we have not started yet but LuisGomez will try to get this rolling at latest next week 17:11:47 <LuisGomez> #info the test plan could be a robot suite with test cases and description 17:13:13 <LuisGomez> #info action point 3) cluster deploy scripts 17:13:53 <LuisGomez> #info send a mail to cluster devs to check whether they have something to deploy cluster 17:16:15 <LuisGomez> #info cluster devs do not have nothing in mind today 17:17:15 <LuisGomez> #info the easier seems to be a scripts similar to ad-sal 17:19:49 <LuisGomez> #info the best is to join the cluster dev call and discuss this with them 17:20:11 <LuisGomez> #info we recommend to join the cluster dev call going forward 17:24:30 <colindixon> https://bugs.opendaylight.org/show_bug.cgi?id=2187 17:24:36 <dfarrell07> colindixon: welcome :) 17:26:52 <colindixon> added to my autojoin list 17:26:56 <colindixon> no idea how it wasn't 17:27:21 <colindixon> I sit in 13 ODL channels now 17:27:57 <odp-gerritbot> Vratko Polák proposed a change to integration/test: BGP testplan and performance suite against 10 fastbgp peers https://git.opendaylight.org/gerrit/27365 17:27:57 <LuisGomez> #info action point 4) docker for cluster 17:27:58 <colindixon> #info colindixon agrees with Luis's point that cluster configuration should be easier and not require a reboot to apply, but there's reasons why it's hard 17:28:51 <colindixon> #info jamoluhrsen is working on docker stuff right now, he's keeping his head down 17:29:38 <colindixon> #info even though it's not a blocker, the docker work is coming to a close and we risk losing more by moving to other things than by getting it completed to a good stopping point 17:30:09 <colindixon> #info LuisGomez says he's hoping to get draft for the cluster test plan by the end of the week, but certainly he's planning to work on it as his dominant task next week 17:30:46 <colindixon> #info LuisGomez says he'll reach out ot moiz, TomP and other clustering developers to ask what scripts exist or neeed to be written to help that 17:31:16 <LuisGomez> #topic cluster robot keywords 17:32:34 <colindixon> #info PhilShea is working most only things other than ODL these days, but he developed some cluster robot keywords 17:32:56 <colindixon> #info PhilShea asks if we could create a simple cluster doctor out of the curses cluster monitor tool 17:33:17 <colindixon> #info jamoluhrsen says that's a great idea, colindixon concurs 17:33:48 <odp-gerritbot> Radovan Sajben proposed a change to integration/test: Introduce support for Withdrawn Routes field in BGP UPDATE message https://git.opendaylight.org/gerrit/27191 17:33:53 <colindixon> #Info PhilShea notes that it's written in python which means that there are people here woul can work on it more easily 17:36:54 <colindixon> #info PhilShea starts presenting robot keywords (also see webex recording) 17:37:34 <colindixon> #info there's some tools around isolating a node from a cluster, located in ClusterKeywords.robot in test/csit/libraries 17:38:15 <colindixon> #info this eventually comes down to using iptables rules to block trafic to/from that controller and checking by reading back the iptables 17:39:25 <colindixon> #info it also has the ability to do partial reachability failures by preventing a pair of controllers to talk 17:39:48 <mgkwill> anyone have the link to this meeting? 17:40:39 <colindixon> https://wiki.opendaylight.org/view/Meetings#Cluster_Test 17:40:43 <colindixon> mgkwill: ^^^^^^ 17:40:54 <colindixon> #info LuisGomez asks if we have a test which tears down a controller and then brings it up so that we can test things other than iptables issues 17:41:10 <mgkwill> thansk colindixon 17:41:30 <colindixon> #info shaleen and PhilShea note that we already have keywords to bring controllers down and up, they might need slight (but very slight modifications) 17:41:43 <jamoluhrsen> LuisGomez, PhilShea, this suite does the stopping/starting of controllers: https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot 17:42:06 <colindixon> #info there are other keywords that flush ipatables and help clean up 17:42:27 <colindixon> #info PhilShea points out that another test case would be partitions (which is more interesting with 4+ controllers) 17:42:52 <colindixon> #link https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot test suite which alread tests starting/stopping of controllers 17:42:52 <LuisGomez> jamoluhrsen, i think these kwywords are all in python 17:45:00 <colindixon> #info colindixon points out in the long-run we're going to need to test even aggressive things like cutting off power to a server 17:47:51 <colindixon> #info colindixon points out that this is a real thing that will happen with OpenDaylight in production, even if it's hard to test 17:48:26 <colindixon> #info PhilShea says he likes this is a good downstream test, at least for now, colindixon agrees, it's a more advanced use case and not really amenable to our test infrastructure 17:50:42 <colindixon> #info LuisGomez asks how can we tell when a controller is actually up? jamoluhrsen says that PhilShea wrote a keyword wait for cluster sync that's related to the 140 series 17:51:23 <colindixon> #link https://github.com/opendaylight/integration-test/blob/master/csit/libraries/ClusterKeywords.robot this is the clusterkeywords robot file 17:52:10 <colindixon> #info PhilShea says he's never tried the wait for cluster sync (which might help in some of this) in the sandbox or ODL CI 17:53:32 <colindixon> #info PhilShea says that we should first try to start using the new wait for cluster sync in the 140 sync since it's likely to help a lot there 17:54:09 <colindixon> #info PhilShea says somebody should take on the 140 test, there's a method that stopped working 17:57:59 <colindixon> #info colindixon asks if LuisGomez was asking about knowing when the cluster component was up or the controller in general 17:58:18 <colindixon> #info LuisGomez says both, but just the clustering for now, PhilShea says the "get controller syc status" keyword should do that 17:58:49 <colindixon> #info colindixon adds that the general case of knowing if the controller (in it's entirety) is up, is really hard 17:59:17 * colindixon has to leave 18:01:57 <LuisGomez> #endmeeting