#opendaylight-integration log

17:03:17 <LuisGomez> #startmeeting cluster test
17:03:17 <odl_meetbot> Meeting started Wed Sep 30 17:03:17 2015 UTC.  The chair is LuisGomez. Information about MeetBot at http://ci.openstack.org/meetbot.html.
17:03:17 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:03:17 <odl_meetbot> The meeting name has been set to 'cluster_test'
17:03:34 <LuisGomez> #topic review of action points
17:03:55 <LuisGomez> #info action point 1) car-people test:
17:05:35 <LuisGomez> #info there was an action to open a task in trello
17:05:41 <LuisGomez> #info that is done
17:06:02 <LuisGomez> #info nobody has jumped on that yet
17:07:29 <LuisGomez> #info Jamo says he will be helping with docker as well as the templates for cluster test
17:09:32 <LuisGomez> #info action point 2) generic test plan for cluster test
17:10:33 <LuisGomez> #info this one we have not started yet but LuisGomez will try to get this rolling at latest next week
17:11:47 <LuisGomez> #info the test plan could be a robot suite with test cases and description
17:13:13 <LuisGomez> #info action point 3) cluster deploy scripts
17:13:53 <LuisGomez> #info send a mail to cluster devs to check whether they have something to deploy cluster
17:16:15 <LuisGomez> #info cluster devs do not have nothing in mind today
17:17:15 <LuisGomez> #info the easier seems to be a scripts similar to ad-sal
17:19:49 <LuisGomez> #info the best is to join the cluster dev call and discuss this with them
17:20:11 <LuisGomez> #info we recommend to join the cluster dev call going forward
17:24:30 <colindixon> https://bugs.opendaylight.org/show_bug.cgi?id=2187
17:24:36 <dfarrell07> colindixon: welcome :)
17:26:52 <colindixon> added to my autojoin list
17:26:56 <colindixon> no idea how it wasn't
17:27:21 <colindixon> I sit in 13 ODL channels now
17:27:57 <odp-gerritbot> Vratko Polák proposed a change to integration/test: BGP testplan and performance suite against 10 fastbgp peers  https://git.opendaylight.org/gerrit/27365
17:27:57 <LuisGomez> #info action point 4) docker for cluster
17:27:58 <colindixon> #info colindixon agrees with Luis's point that cluster configuration should be easier and not require a reboot to apply, but there's reasons why it's hard
17:28:51 <colindixon> #info jamoluhrsen is working on docker stuff right now, he's keeping his head down
17:29:38 <colindixon> #info even though it's not a blocker, the docker work is coming to a close and we risk losing more by moving to other things than by getting it completed to a good stopping point
17:30:09 <colindixon> #info LuisGomez says he's hoping to get draft for the cluster test plan by the end of the week, but certainly he's planning to work on it as his dominant task next week
17:30:46 <colindixon> #info LuisGomez says he'll reach out ot moiz, TomP and other clustering developers to ask what scripts exist or neeed to be written to help that
17:31:16 <LuisGomez> #topic cluster robot keywords
17:32:34 <colindixon> #info PhilShea is working most only things other than ODL these days, but he developed some cluster robot keywords
17:32:56 <colindixon> #info PhilShea asks if we could create a simple cluster doctor out of the curses cluster monitor tool
17:33:17 <colindixon> #info jamoluhrsen says that's a great idea, colindixon concurs
17:33:48 <odp-gerritbot> Radovan Sajben proposed a change to integration/test: Introduce support for Withdrawn Routes field in BGP UPDATE message  https://git.opendaylight.org/gerrit/27191
17:33:53 <colindixon> #Info PhilShea notes that it's written in python which means that there are people here woul can work on it more easily
17:36:54 <colindixon> #info PhilShea starts presenting robot keywords (also see webex recording)
17:37:34 <colindixon> #info there's some tools around isolating a node from a cluster, located in ClusterKeywords.robot in test/csit/libraries
17:38:15 <colindixon> #info this eventually comes down to using iptables rules to block trafic to/from that controller and checking by reading back the iptables
17:39:25 <colindixon> #info it also has the ability to do partial reachability failures by preventing a pair of controllers to talk
17:39:48 <mgkwill> anyone have the link to this meeting?
17:40:39 <colindixon> https://wiki.opendaylight.org/view/Meetings#Cluster_Test
17:40:43 <colindixon> mgkwill: ^^^^^^
17:40:54 <colindixon> #info LuisGomez asks if we have a test which tears down a controller and then brings it up so that we can test things other than iptables issues
17:41:10 <mgkwill> thansk colindixon
17:41:30 <colindixon> #info shaleen and PhilShea note that we already have keywords to bring controllers down and up, they might need slight (but very slight modifications)
17:41:43 <jamoluhrsen> LuisGomez, PhilShea, this suite does the stopping/starting of controllers:  https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot
17:42:06 <colindixon> #info there are other keywords that flush ipatables and help clean up
17:42:27 <colindixon> #info PhilShea points out that another test case would be partitions (which is more interesting with 4+ controllers)
17:42:52 <colindixon> #link https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot test suite which alread tests starting/stopping of controllers
17:42:52 <LuisGomez> jamoluhrsen, i think these kwywords are all in python
17:45:00 <colindixon> #info colindixon points out in the long-run we're going to need to test even aggressive things like cutting off power to a server
17:47:51 <colindixon> #info colindixon points out that this is a real thing that will happen with OpenDaylight in production, even if it's hard to test
17:48:26 <colindixon> #info PhilShea says he likes this is a good downstream test, at least for now, colindixon agrees, it's a more advanced use case and not really amenable to our test infrastructure
17:50:42 <colindixon> #info LuisGomez asks how can we tell when a controller is actually up? jamoluhrsen says that PhilShea wrote a keyword wait for cluster sync that's related to the 140 series
17:51:23 <colindixon> #link https://github.com/opendaylight/integration-test/blob/master/csit/libraries/ClusterKeywords.robot this is the clusterkeywords robot file
17:52:10 <colindixon> #info PhilShea says he's never tried the wait for cluster sync (which might help in some of this) in the sandbox or ODL CI
17:53:32 <colindixon> #info PhilShea says that we should first try to start using the new wait for cluster sync in the 140 sync since it's likely to help a lot there
17:54:09 <colindixon> #info PhilShea says somebody should take on the 140 test, there's a method that stopped working
17:57:59 <colindixon> #info colindixon asks if LuisGomez was asking about knowing when the cluster component was up or the controller in general
17:58:18 <colindixon> #info LuisGomez says both, but just the clustering for now, PhilShea says the "get controller syc status" keyword should do that
17:58:49 <colindixon> #info colindixon adds that the general case of knowing if the controller (in it's entirety) is up, is really hard
17:59:17 * colindixon has to leave
18:01:57 <LuisGomez> #endmeeting