#opendaylight-integration: cluster test
Meeting started by LuisGomez at 17:03:17 UTC
(full logs).
Meeting summary
- review of action points (LuisGomez, 17:03:34)
- action point 1) car-people test: (LuisGomez,
17:03:55)
- there was an action to open a task in
trello (LuisGomez,
17:05:35)
- that is done (LuisGomez,
17:05:41)
- nobody has jumped on that yet (LuisGomez,
17:06:02)
- Jamo says he will be helping with docker as
well as the templates for cluster test (LuisGomez,
17:07:29)
- action point 2) generic test plan for cluster
test (LuisGomez,
17:09:32)
- this one we have not started yet but LuisGomez
will try to get this rolling at latest next week (LuisGomez,
17:10:33)
- the test plan could be a robot suite with test
cases and description (LuisGomez,
17:11:47)
- action point 3) cluster deploy scripts
(LuisGomez,
17:13:13)
- send a mail to cluster devs to check whether
they have something to deploy cluster (LuisGomez,
17:13:53)
- cluster devs do not have nothing in mind
today (LuisGomez,
17:16:15)
- the easier seems to be a scripts similar to
ad-sal (LuisGomez,
17:17:15)
- the best is to join the cluster dev call and
discuss this with them (LuisGomez,
17:19:49)
- we recommend to join the cluster dev call going
forward (LuisGomez,
17:20:11)
- https://bugs.opendaylight.org/show_bug.cgi?id=2187
(colindixon,
17:24:30)
- action point 4) docker for cluster (LuisGomez,
17:27:57)
- colindixon agrees with Luis's point that
cluster configuration should be easier and not require a reboot to
apply, but there's reasons why it's hard (colindixon,
17:27:58)
- jamoluhrsen is working on docker stuff right
now, he's keeping his head down (colindixon,
17:28:51)
- even though it's not a blocker, the docker work
is coming to a close and we risk losing more by moving to other
things than by getting it completed to a good stopping point
(colindixon,
17:29:38)
- LuisGomez says he's hoping to get draft for the
cluster test plan by the end of the week, but certainly he's
planning to work on it as his dominant task next week (colindixon,
17:30:09)
- LuisGomez says he'll reach out ot moiz, TomP
and other clustering developers to ask what scripts exist or neeed
to be written to help that (colindixon,
17:30:46)
- cluster robot keywords (LuisGomez, 17:31:16)
- PhilShea is working most only things other than
ODL these days, but he developed some cluster robot keywords
(colindixon,
17:32:34)
- PhilShea asks if we could create a simple
cluster doctor out of the curses cluster monitor tool (colindixon,
17:32:56)
- jamoluhrsen says that's a great idea,
colindixon concurs (colindixon,
17:33:17)
- PhilShea notes that it's written in python
which means that there are people here woul can work on it more
easily (colindixon,
17:33:53)
- PhilShea starts presenting robot keywords (also
see webex recording) (colindixon,
17:36:54)
- there's some tools around isolating a node from
a cluster, located in ClusterKeywords.robot in
test/csit/libraries (colindixon,
17:37:34)
- this eventually comes down to using iptables
rules to block trafic to/from that controller and checking by
reading back the iptables (colindixon,
17:38:15)
- it also has the ability to do partial
reachability failures by preventing a pair of controllers to
talk (colindixon,
17:39:25)
- https://wiki.opendaylight.org/view/Meetings#Cluster_Test
(colindixon,
17:40:39)
- LuisGomez asks if we have a test which tears
down a controller and then brings it up so that we can test things
other than iptables issues (colindixon,
17:40:54)
- shaleen and PhilShea note that we already have
keywords to bring controllers down and up, they might need slight
(but very slight modifications) (colindixon,
17:41:30)
- there are other keywords that flush ipatables
and help clean up (colindixon,
17:42:06)
- PhilShea points out that another test case
would be partitions (which is more interesting with 4+
controllers) (colindixon,
17:42:27)
- https://github.com/opendaylight/integration-test/blob/master/csit/suites/controller/Clustering_Datastore/140_recovery_restart_follower.robot
test suite which alread tests starting/stopping of controllers (colindixon,
17:42:52)
- colindixon points out in the long-run we're
going to need to test even aggressive things like cutting off power
to a server (colindixon,
17:45:00)
- colindixon points out that this is a real thing
that will happen with OpenDaylight in production, even if it's hard
to test (colindixon,
17:47:51)
- PhilShea says he likes this is a good
downstream test, at least for now, colindixon agrees, it's a more
advanced use case and not really amenable to our test
infrastructure (colindixon,
17:48:26)
- LuisGomez asks how can we tell when a
controller is actually up? jamoluhrsen says that PhilShea wrote a
keyword wait for cluster sync that's related to the 140
series (colindixon,
17:50:42)
- https://github.com/opendaylight/integration-test/blob/master/csit/libraries/ClusterKeywords.robot
this is the clusterkeywords robot file (colindixon,
17:51:23)
- PhilShea says he's never tried the wait for
cluster sync (which might help in some of this) in the sandbox or
ODL CI (colindixon,
17:52:10)
- PhilShea says that we should first try to start
using the new wait for cluster sync in the 140 sync since it's
likely to help a lot there (colindixon,
17:53:32)
- PhilShea says somebody should take on the 140
test, there's a method that stopped working (colindixon,
17:54:09)
- colindixon asks if LuisGomez was asking about
knowing when the cluster component was up or the controller in
general (colindixon,
17:57:59)
- LuisGomez says both, but just the clustering
for now, PhilShea says the "get controller syc status" keyword
should do that (colindixon,
17:58:18)
- colindixon adds that the general case of
knowing if the controller (in it's entirety) is up, is really
hard (colindixon,
17:58:49)
Meeting ended at 18:01:57 UTC
(full logs).
Action items
- (none)
People present (lines said)
- colindixon (36)
- LuisGomez (20)
- odl_meetbot (3)
- mgkwill (2)
- odp-gerritbot (2)
- dfarrell07 (1)
- jamoluhrsen (1)
Generated by MeetBot 0.1.4.