15:00:03 <frankbrockners> #startmeeting OPNFV BGS weekly team meeting
15:00:03 <collabot> Meeting started Mon May 18 15:00:03 2015 UTC.  The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:03 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:00:03 <collabot> The meeting name has been set to 'opnfv_bgs_weekly_team_meeting'
15:00:15 <[1]JonasB> #info Jonas Bjurel
15:00:39 <frankbrockners> #info draft agenda for today's meeting: https://wiki.opnfv.org/meetings/bgs#may182015
15:00:46 <frankbrockners> #info Frank Brockners
15:01:03 <trozet> #info Tim Rozet
15:01:32 <morgan_orange> #info Morgan Richomme
15:02:06 <frankbrockners> several of us are likely in Vancouver... - so let's see how many show up
15:03:01 <pbandzi> #info Peter Bandzi
15:03:36 <frankbrockners> Let's get rolling...
15:04:10 <frankbrockners> We have the usual agenda: Updates on functest, docs, automatic deployments on PODs 1 and 2
15:04:18 <morgan_orange> #topic functest
15:04:19 <frankbrockners> anything else we should add to the agenda?
15:04:36 <frankbrockners> #topic Updates on Functest
15:04:57 <frankbrockners> ok Morgan... you're ahead of me... go ahead... :-)
15:05:03 <morgan_orange> #info since friday functest part was failing on build/deploy/test chain
15:05:29 <morgan_orange> #info problem shall be due to misconfiguration of rc file (retrieve manually from OpenStack including tenant id)
15:05:51 <morgan_orange> #info shall be fixed (tenant name enough for tests) but need to be re-run
15:06:05 <morgan_orange> #info today deploy POD2 was failed so functest not launched
15:06:22 <morgan_orange> #info tests done on POD1
15:06:26 <morgan_orange> #info vPing OK
15:07:06 <pbandzi> #info ODL stil not running
15:07:37 <morgan_orange> #info Tempest ~ 20 errors, anaysis started => several errors due to the test and/or config..(depends how you consider the issue) Multiple Networks => test expect only one network, when several available => test failed
15:07:38 <frankbrockners> pbandzi: Do you mean the ODL tests or ODL in general? If so on which POD?
15:08:09 <pbandzi> #info ODL tests, sory, trying to analyze it and execuitre manualy but still not success
15:08:49 <frankbrockners> morgan_orange: Are you planning to modify the tests so that they can run with multiple networks present?
15:08:54 <[1]JonasB> morgan_orange: Do we have tempest result consistensy across the PODs?
15:09:33 <morgan_orange> we could modify our script to be sure that only one network is available but it would probably trigger errors in other tests
15:09:44 <morgan_orange> best way would be to complete existing test
15:09:59 <morgan_orange> #link https://wiki.opnfv.org/r1_tempest
15:10:47 <morgan_orange> but we can first try with one single network - at the end of fuel installation there are already networks / at the end of foreman none so created 2 to be in line with fuel...
15:11:03 <[1]JonasB> morgan_orange: Sorry for my ignorance but what does KO mean?
15:11:19 <morgan_orange> KO = NOK
15:11:23 <morgan_orange> failed
15:11:32 <[1]JonasB> Ahh
15:11:47 <morgan_orange> I will change to NOK
15:11:54 <[1]JonasB> KO is the inverse of OK :-)
15:12:18 <morgan_orange> another interesting error is the error linked to quotas
15:12:43 <morgan_orange> it seems that Neutron default limit is 10 networks
15:13:07 <morgan_orange> and this value would be exceeded hence errors (// testing + cleaning not immediate)
15:13:44 <morgan_orange> additional errors on POD2 seems linked to network issue (the patch could solve the problem)
15:14:14 <frankbrockners> morgan_orange: which patch?
15:14:32 <morgan_orange> foreman was working on a network patch last week
15:14:41 <trozet> for external network
15:14:53 <frankbrockners> ah ok - is this in gerrit already?
15:14:57 <morgan_orange> that could explain the delta between POD1 and POD2
15:15:03 <trozet> no still fixing/testing
15:15:14 <morgan_orange> otherwise errors are almost the same on both PODs
15:15:25 <frankbrockners> ok - let me just quickly info the above discussion...
15:16:47 <frankbrockners> #info test errors in 3 main categories: (1) multiple networks present (some tests expect single network); (2) quota exceeded (Neutron defaults are smaller than what tests create); (3) external connectivity.
15:17:09 <frankbrockners> #info category (3): resolution pending a patch - trozet is working on it
15:17:47 <frankbrockners> #info once category (3) is resolved, likely that test results on POD1 (Fuel) and POD2 (Foreman) will be similar (if not even the same).
15:17:52 <frankbrockners> does this capture it?
15:18:17 <morgan_orange> yes with no certitude on patch incidence
15:18:54 <morgan_orange> #info CI/Functest if BGS Fuel OK possible to duplicate config POD2
15:18:56 <frankbrockners> how do we plan to come with the other two categories?
15:19:24 <morgan_orange> I am pretty sure the remaining errors are not reproducible..
15:19:25 <frankbrockners> (2) sounds like you need to change Neutron config
15:20:10 <morgan_orange> on the web (2) was mentioned, increasing the default neutron value can fix it but is it clean..
15:20:33 <frankbrockners> morgan_orange: "not reproducible" sounds like a concern - can you detail things a bit?
15:20:52 <frankbrockners> IMHO we should adjust the "defaults" to what makes sense for OPNFV
15:21:10 <morgan_orange> if I run the suite several times in a row, I am not sure to get always the error..
15:21:35 <amaged> I can check/ask if its clean or not ? It could be one thing to improve along the way in Neutron.
15:21:40 * frankbrockners not nice ...
15:22:16 <frankbrockners> amaged: agreed: OPNFV might have its own set of defaults
15:22:17 <morgan_orange> we will see with CI on fresh install if it is reproducible (or not)
15:22:30 <frankbrockners> thanks morgan_orange
15:22:36 <morgan_orange> idem for rally bench
15:23:00 <[1]JonasB> morgan_orange: Can we release POD1 for nightly CI runs?
15:23:13 <morgan_orange> yes I think so
15:23:13 <frankbrockners> that leaves things to category (1) - multiple networks. IMHO we should rather tweak the tests (so that they can cope with multiple networks)than the system config.
15:23:27 <morgan_orange> #info rally bench http://opnfv.orange.org/test/POD1/opnfv-neutron.html
15:23:58 <morgan_orange> as an example http://opnfv.orange.org/test/POD1/opnfv-neutron.html#/NeutronNetworks.create_and_delete_routers on POD1
15:24:11 <morgan_orange> 21% failed (Rally plays the scenario 100 times)
15:24:33 <morgan_orange> almost all the errors here during the first attempts
15:24:35 <frankbrockners> [1]JonasB, morgan_orange: Just to confirm: Did we say that we switch to auto-deploy on POD1 from today onwards?
15:24:45 <morgan_orange> ok for me
15:25:08 <[1]JonasB> frankbrockners: That was the plan, but we can change the plan
15:25:27 <[1]JonasB> Jose?
15:25:43 <frankbrockners> [1]JonasB: I don't see anyone objecting. So let's proceed with the plan.
15:26:16 <[1]JonasB> frankbrockners:Can you cathch Fatih in VC and tell him?
15:26:29 <frankbrockners> #info Team agrees to move to auto-deployment on POD1 from today onwards per the original plan (as agreed last week)
15:26:55 <frankbrockners> [1]JonasB: I'm not in Vancouver... - is anyone here in Vancouver and could tell Fatih?
15:27:06 <[1]JonasB> Ahh.
15:27:23 <[1]JonasB> Ill text him
15:27:51 <morgan_orange> trozet: did you have a look at the resaon why POD2 deployed failed last night?
15:27:57 <frankbrockners> back on the failed tests: morgan_orange: Could the initial set of failures be due to the fact that the system isn't fully up yet?
15:28:37 <trozet> morgan_orange: I'll go all at once
15:29:14 <frankbrockners> ok - let's finish up with functest first
15:29:16 <morgan_orange> not clear, some tests seem failing at the beginning (so could be the setup that is not finished) but not the case for all the tests
15:31:00 <frankbrockners> [1]JonasB: Just confirmed with fdegir - he'll switch to autodeploy for POD1
15:31:01 <morgan_orange> we could ask Rally community for support (after Vancouver)
15:31:16 <[1]JonasB> frankbrockners: Thx
15:31:51 <frankbrockners> morgan_orange: Key question is: Are these failures in the "critical path" - i.e. would they block us from releasing Arno?
15:32:20 <morgan_orange> I think not
15:32:31 <morgan_orange> critical path = automatic build/deploy/test
15:32:38 <morgan_orange> we assume that some tests will be failed
15:32:52 <frankbrockners> yup - as long as we can explain why they fail...
15:33:18 <morgan_orange> we could already give some explaination (Tempest) and for Rally we can imagine that we can tell we will work on that assuming that CI, stability, robustness is a key challenge
15:33:56 <morgan_orange> but we can accept not been full green everywhere for Arno
15:34:19 <morgan_orange> for ODL there is also still some work
15:34:51 <[1]JonasB> frankbrockners: I think we should write down what ARNO SR0 is and what it is not to set expectations right/guide us when we can release
15:35:10 <frankbrockners> ok - message would be something like "deployment works and can be validated, though system stability isn't given at all times"
15:35:53 <frankbrockners> are we done with functest? Would like to move on...
15:36:03 <[1]JonasB> frankbrockners: Exactly
15:37:01 <frankbrockners> ... let's switch topics to POD1 and POD2 autodeployment updates.
15:37:06 <amaged> frankbrockners: last point, before closure : I asked Ian Wells about increasing networks Quota and he said : There aren't, but it's better done as a config item in neutron.conf than by changing the code.
15:37:16 <frankbrockners> #topic Updates on POD1 and POD2 auto deployment
15:37:57 <frankbrockners> amaged: Agreed. And I believe morgan_orange was suggesting a change to the config
15:38:22 <frankbrockners> [1]JonasB, trozet - who'd like to go first?
15:38:33 <trozet> i can go
15:38:47 <frankbrockners> trozet: please go ahead...
15:39:22 <trozet> #info Looked into the LFPOD2 failures from last week.  There was an issue with how we were checking to see if a node was finished its puppet configuration
15:39:48 <trozet> #info I modifed that check to be a more reliable check, and 10/10 deployments passed over the weekend
15:40:27 <trozet> #info Also, it cuts down on the time it takes to deploy.  Previous deploys were clocking at 2.5 hours which wasn't right, it was just the check taking forever.  Now the deploys are around 1.5 hours
15:41:15 <[1]JonasB> trozet: Better than Fuel, right now constantly 120 mins
15:41:41 <trozet> #info Failure from last deploy (this morning) was a real failure.  The filesystem failed to mount correctly on the UCS and the install just bailed.  Just powercycled the server now and it is installing fine so I believe it was a transient error
15:42:42 <trozet> #info Just looked at our neutron.conf and we aren't setting any of the limits in there so I guess it's using defaults
15:43:42 <trozet> #info  Was testing out the external network patch some more this weekend and found 2 bugs I need to fix.  Has to do with puppet ordering...so still working on that patch
15:44:09 <trozet> thats it from my end
15:44:29 <[1]JonasB> #info POD1/Fuel. We're working on 1 patch for the console access/consol logging (Files are currently missing in the .iso)
15:45:13 <[1]JonasB> #info We will not have time to resolve the ODL/HA proxy issue for Arno SR0, will come in SR1 - work has started
15:45:32 <[1]JonasB> #info Documentation will be finished this week
15:45:42 <frankbrockners> [1]JonasB - what is SR0 and SR1?
15:46:18 <[1]JonasB> frankbrockners, presumably we will have service releases.
15:47:01 <[1]JonasB> frankbrockners: I call this first base releas SR0, and the first service release SR1 (just like I believe ODL is doing)
15:47:06 <frankbrockners> [1]JonasB: Yup. I know. Just wanted to inspire that you say this in the notes - so that a reader isn't wondering about SRx.
15:47:40 <[1]JonasB> #info SR0=Arno Base relese, SR1=incremental service release
15:47:51 <frankbrockners> thanks [1]JonasB
15:48:26 <[1]JonasB> #info Question, how do we do with the user manual, should we have one just pointing into OpenStack or?
15:49:09 <[1]JonasB> Other than this question - it is all from my side
15:49:58 <frankbrockners> IMHO it would be good enough to point to OpenStack etc. for now - and just document the install/deploy specifics for Arno.
15:50:24 <[1]JonasB> frankbrockners: Thx, and agree!
15:50:46 <[1]JonasB> frankbrockners: Thx, and agree!
15:50:47 <frankbrockners> From what I've seen, even in the install guide there is a simple recipe for "how to run a simple VNF". that should be sufficient for now
15:50:48 <[1]JonasB> *��-jmhhhhhhhhhhhhhhhhhhhhhhhhh
15:50:49 <[1]JonasB> *��_lp��
15:51:00 <trozet> Is anyone using POD2 or can I use it to try out my external network patch?
15:51:15 <morgan_orange> you can use it
15:51:29 <trozet> morgan_orange: thanks
15:51:30 <frankbrockners> [1]JonasB: On the ODL/HA proxy issue: How does the issue present itself to a user?
15:52:07 <[1]JonasB> Sorry cleanen my wireless keyboard in the kitchen, didt think it would reach that far:-)
15:53:01 <[1]JonasB> frankbrockners: It misserably fails all together with a HA deployment - very noticable :-(
15:54:19 <frankbrockners> does this mean "no HA at all" or "no HA for ODL" (the latter is the case anyway given that the Helium ML2 plugin does not support clustering)
15:55:36 <frankbrockners> [1]JonasB: does this mean "no HA at all" or "no HA for ODL" (the latter is the case anyway given that the Helium ML2 plugin does not support clustering)
15:55:38 <[1]JonasB> frankbrockners: Combination of HA and ODL will not be supported (Jira ticket) until SR0, we have made a misstake :-(
15:56:10 <frankbrockners> ok - that is something we can (and should) document
15:56:36 <[1]JonasB> frankbrockners: Absolutely, both in install instructions, release notes and a Jira ticket
15:57:12 <frankbrockners> #info Additional info on ODL HA proxy issue mentioned above: Combination of HA and ODL will not be supported initially when using Fuel as an installer. Plan to support it with the first service release (SR1). Work has already started
15:57:27 <frankbrockners> ok - looks like we're done with the updates.
15:57:40 <frankbrockners> two additional points...
15:57:52 <[1]JonasB> frankbrockners: What is our verdict to TSC tomorrow?
15:58:19 <frankbrockners> one ask: The usual one: Could you update https://wiki.opnfv.org/releases/arno/releasereadiness to get ready for TSC tomorrow?
15:58:31 <[1]JonasB> I will
15:58:40 <trozet> yup
15:59:05 <frankbrockners> and one question: (Jonas already asked it): TSC will ask whether we feel ready to commit to a target release date.
15:59:15 <frankbrockners> Do we feel like ready?
15:59:48 <trozet> frankbrockners: I thought there was already a release date for May 29?
15:59:56 <trozet> or no?
16:00:23 <[1]JonasB> Depends on: If the target is to have a community, WoW, toolchain up, etc as a baseline for R2 - then yes.
16:00:24 <frankbrockners> May/28 was a very implicit target so far
16:00:47 <frankbrockners> it is still in May and a Thursday. (Fridays don't work for the marketing folks)
16:00:54 <trozet> oh
16:00:57 <[1]JonasB> If the target is a welltested battle proven stack - then no
16:01:23 <trozet> I'll be PTO all next week.  Radez will be holding up my end.
16:01:27 <frankbrockners> we won't get to a "welltested battle proven stack" even if we gave ourselves another 4 weeks...
16:02:07 <[1]JonasB> frankbrockners: I agree, but lets get this in written so we set the expectations accordingly
16:02:38 <trozet> do we need more time to debug the failures from the smoke testing on each pod?
16:02:45 <trozet> at least make sure those pass?
16:03:02 <trozet> or should we just release it as is
16:03:32 <frankbrockners> let's recap: what are we missing? (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed
16:04:08 <trozet> I think Foreman is also missing: external network patch/verification, static packages
16:04:10 <frankbrockners> trozet: IMHO we can release, as soon as we know why things fail - we can continue to fix things for a SR1, per what Jonas is saying
16:04:18 <trozet> k
16:04:21 <frankbrockners> trozet: Good point
16:04:32 <trozet> I'm hoping to fix those 2 things this week before I go on PTO
16:04:40 <morgan_orange> ODL test suite
16:05:01 <morgan_orange> is also needed
16:05:42 <frankbrockners> so updated list: (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed (e) patches for autodeploys (Foreman: external network connectivity) (f) ODL test suite running and automated
16:06:11 <[1]JonasB> morgan_orange: Is ODL testsuite really needed, the ODL usecases we shall support are tested through Tempest/Neutron testcases or?
16:07:16 <frankbrockners> pbandzi: any thoughts? IMHO some smoke tests on ODL should be there as well. It is one of the two key components.
16:07:32 <morgan_orange> I think it is better to have it (another tool robot + another upstream suite) it is maybe not mandatory
16:07:50 <[1]JonasB> Ok
16:08:12 <pbandzi> I am not familiar much with tempest, so it is possible that we are duplicating tests, but robot at least test ODL by itself
16:08:27 <morgan_orange> we removed the vIMS testcase (which was the NFV flavor) if we remove oDL, it will lokk like an Openstack installer..
16:08:37 <pbandzi> in later releases we can support all robot tests for ODL testing
16:08:49 <frankbrockners> agree morgan_orange: let's keep the robot tests
16:09:08 <frankbrockners> or at least a reasonable subset to do basic smoke testing
16:09:48 <morgan_orange> and having Robot and Rally offer a good tooling to support future OPNFV scenario
16:09:50 <frankbrockners> so back on the question: Considering the open items (a)...(f). What is the level of confidence that we can hit a May/28 release date?
16:10:18 <frankbrockners> Let's do a roundtable - were everyone throws in his percentage and then we decide...
16:10:32 <[1]JonasB> for a), there should not be an issue
16:10:37 <trozet> frankbrockners: I would say 75%
16:10:41 <frankbrockners> [1]JonasB, morgan_orange, trozet, pbandzi, ...?
16:11:18 <morgan_orange> frankbrockners: 75%
16:12:02 <[1]JonasB> 75% with ODL testing, probably more like 85% without?
16:12:22 <pbandzi> frankbrockners: I can say for odl ~ 75%
16:13:01 <[1]JonasB> pbandzi: Is that for ODL/Robot in isolation?
16:13:18 <frankbrockners> so sounds like everyone has the same view... what we're for sure asked is: Would the level of confidence be siginficantly higher if we'd go to Jun/4th (one additional week)?
16:13:36 <pbandzi> [1]JonasB: I think not all but few tests for ODL we can deliver
16:14:29 <trozet> frankbrockners: I would be much more optimistic about that week since I would actually be back from PTO + more time
16:15:11 <frankbrockners> so how about we say: May/28: 75% confidence, June/4: 90% confidence?
16:15:33 <trozet> yeah
16:15:33 <frankbrockners> that gives the TSC something that they can decide on - and whether they want to take the risk or not.
16:16:01 <frankbrockners> others? [1]JonasB, morgan_orange?
16:16:08 <trozet> I think that we can tell TSC if its june 4th we can stick to that and not slip
16:16:26 <[1]JonasB> Agree
16:16:43 <morgan_orange> +1
16:17:18 <[1]JonasB> trozet: Just curious - what is PTO?
16:17:25 <trozet> paid time off :)
16:17:31 <[1]JonasB> ;-)
16:17:33 <trozet> its a week at the beach with my family :)
16:17:48 <[1]JonasB> trozet: You deserve it
16:17:54 <morgan_orange> be carefull you could become european...
16:18:04 <frankbrockners> :-)
16:18:05 <trozet> haha!
16:18:27 <trozet> thanks [1]JonasB...would rather take it after the release, but it has been planned for a long time
16:19:38 <frankbrockners> ok... - sounds like we have a way to approach the date discussion in the TSC tomorrow.
16:20:03 <frankbrockners> Many thanks
16:20:06 <[1]JonasB> frankbrockners - see you there, need to jump off now
16:20:09 <morgan_orange> BTW I will not be in PTO tomorrow and Wednesday, but I must attend a training (planned for 6 months..) so I may not be connected all the time but Jose will be on the bridge
16:20:21 <frankbrockners> let's just info this briefly
16:20:27 <frankbrockners> thanks morgan_orange
16:20:53 <trozet> morgan_orange: I may drop you an email to fire off your tests once I get the external patch done
16:20:57 <frankbrockners> #info key open items for BGS to reach release readiness: (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed (e) patches for autodeploys (Foreman: external network connectivity) (f) ODL test suite running and automated
16:21:20 <morgan_orange> trozet: OK
16:21:36 <trozet> morgan_orange: but I can investigate the failures
16:21:45 <morgan_orange> wiki is up to date
16:21:58 <morgan_orange> guyrodrigue will investigate on his side
16:22:19 <morgan_orange> (orange colleague)
16:23:33 <frankbrockners> #info potential release date: High-level of confidence in the team to achieve a release date of June/4th (90% confidence) - and having (a)..(f) addressed. Confidence level drops significantly when May/28 would be the target for Arno (several folks out for either OpenStack, training, or vacation between now and May/28).
16:23:43 <frankbrockners> is the above ok for everyone?
16:24:13 <morgan_orange> ok for me
16:24:52 <frankbrockners> I think [1]JonasB had to run. trozet: ok for you?
16:25:04 <frankbrockners> pbandzi: ok for you?
16:25:06 <pbandzi> ok
16:26:28 <frankbrockners> trozet:?
16:26:30 <trozet> hi
16:26:48 <trozet> that sounds perfect to me
16:26:56 <frankbrockners> is the above statement on the release date that I info'ed ok for you?
16:27:02 <frankbrockners> perfect - thanks
16:27:11 <frankbrockners> finally done for today
16:27:15 <frankbrockners> sorry for running over
16:27:23 <frankbrockners> many thanks to everyone!
16:27:23 <trozet> np
16:27:26 <[1]JonasB> Ok for me
16:27:34 <frankbrockners> thanks Jonas
16:27:43 <frankbrockners> #endmeeting