15:00:03 #startmeeting OPNFV BGS weekly team meeting 15:00:03 Meeting started Mon May 18 15:00:03 2015 UTC. The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:03 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:00:03 The meeting name has been set to 'opnfv_bgs_weekly_team_meeting' 15:00:15 <[1]JonasB> #info Jonas Bjurel 15:00:39 #info draft agenda for today's meeting: https://wiki.opnfv.org/meetings/bgs#may182015 15:00:46 #info Frank Brockners 15:01:03 #info Tim Rozet 15:01:32 #info Morgan Richomme 15:02:06 several of us are likely in Vancouver... - so let's see how many show up 15:03:01 #info Peter Bandzi 15:03:36 Let's get rolling... 15:04:10 We have the usual agenda: Updates on functest, docs, automatic deployments on PODs 1 and 2 15:04:18 #topic functest 15:04:19 anything else we should add to the agenda? 15:04:36 #topic Updates on Functest 15:04:57 ok Morgan... you're ahead of me... go ahead... :-) 15:05:03 #info since friday functest part was failing on build/deploy/test chain 15:05:29 #info problem shall be due to misconfiguration of rc file (retrieve manually from OpenStack including tenant id) 15:05:51 #info shall be fixed (tenant name enough for tests) but need to be re-run 15:06:05 #info today deploy POD2 was failed so functest not launched 15:06:22 #info tests done on POD1 15:06:26 #info vPing OK 15:07:06 #info ODL stil not running 15:07:37 #info Tempest ~ 20 errors, anaysis started => several errors due to the test and/or config..(depends how you consider the issue) Multiple Networks => test expect only one network, when several available => test failed 15:07:38 pbandzi: Do you mean the ODL tests or ODL in general? If so on which POD? 15:08:09 #info ODL tests, sory, trying to analyze it and execuitre manualy but still not success 15:08:49 morgan_orange: Are you planning to modify the tests so that they can run with multiple networks present? 15:08:54 <[1]JonasB> morgan_orange: Do we have tempest result consistensy across the PODs? 15:09:33 we could modify our script to be sure that only one network is available but it would probably trigger errors in other tests 15:09:44 best way would be to complete existing test 15:09:59 #link https://wiki.opnfv.org/r1_tempest 15:10:47 but we can first try with one single network - at the end of fuel installation there are already networks / at the end of foreman none so created 2 to be in line with fuel... 15:11:03 <[1]JonasB> morgan_orange: Sorry for my ignorance but what does KO mean? 15:11:19 KO = NOK 15:11:23 failed 15:11:32 <[1]JonasB> Ahh 15:11:47 I will change to NOK 15:11:54 <[1]JonasB> KO is the inverse of OK :-) 15:12:18 another interesting error is the error linked to quotas 15:12:43 it seems that Neutron default limit is 10 networks 15:13:07 and this value would be exceeded hence errors (// testing + cleaning not immediate) 15:13:44 additional errors on POD2 seems linked to network issue (the patch could solve the problem) 15:14:14 morgan_orange: which patch? 15:14:32 foreman was working on a network patch last week 15:14:41 for external network 15:14:53 ah ok - is this in gerrit already? 15:14:57 that could explain the delta between POD1 and POD2 15:15:03 no still fixing/testing 15:15:14 otherwise errors are almost the same on both PODs 15:15:25 ok - let me just quickly info the above discussion... 15:16:47 #info test errors in 3 main categories: (1) multiple networks present (some tests expect single network); (2) quota exceeded (Neutron defaults are smaller than what tests create); (3) external connectivity. 15:17:09 #info category (3): resolution pending a patch - trozet is working on it 15:17:47 #info once category (3) is resolved, likely that test results on POD1 (Fuel) and POD2 (Foreman) will be similar (if not even the same). 15:17:52 does this capture it? 15:18:17 yes with no certitude on patch incidence 15:18:54 #info CI/Functest if BGS Fuel OK possible to duplicate config POD2 15:18:56 how do we plan to come with the other two categories? 15:19:24 I am pretty sure the remaining errors are not reproducible.. 15:19:25 (2) sounds like you need to change Neutron config 15:20:10 on the web (2) was mentioned, increasing the default neutron value can fix it but is it clean.. 15:20:33 morgan_orange: "not reproducible" sounds like a concern - can you detail things a bit? 15:20:52 IMHO we should adjust the "defaults" to what makes sense for OPNFV 15:21:10 if I run the suite several times in a row, I am not sure to get always the error.. 15:21:35 I can check/ask if its clean or not ? It could be one thing to improve along the way in Neutron. 15:21:40 * frankbrockners not nice ... 15:22:16 amaged: agreed: OPNFV might have its own set of defaults 15:22:17 we will see with CI on fresh install if it is reproducible (or not) 15:22:30 thanks morgan_orange 15:22:36 idem for rally bench 15:23:00 <[1]JonasB> morgan_orange: Can we release POD1 for nightly CI runs? 15:23:13 yes I think so 15:23:13 that leaves things to category (1) - multiple networks. IMHO we should rather tweak the tests (so that they can cope with multiple networks)than the system config. 15:23:27 #info rally bench http://opnfv.orange.org/test/POD1/opnfv-neutron.html 15:23:58 as an example http://opnfv.orange.org/test/POD1/opnfv-neutron.html#/NeutronNetworks.create_and_delete_routers on POD1 15:24:11 21% failed (Rally plays the scenario 100 times) 15:24:33 almost all the errors here during the first attempts 15:24:35 [1]JonasB, morgan_orange: Just to confirm: Did we say that we switch to auto-deploy on POD1 from today onwards? 15:24:45 ok for me 15:25:08 <[1]JonasB> frankbrockners: That was the plan, but we can change the plan 15:25:27 <[1]JonasB> Jose? 15:25:43 [1]JonasB: I don't see anyone objecting. So let's proceed with the plan. 15:26:16 <[1]JonasB> frankbrockners:Can you cathch Fatih in VC and tell him? 15:26:29 #info Team agrees to move to auto-deployment on POD1 from today onwards per the original plan (as agreed last week) 15:26:55 [1]JonasB: I'm not in Vancouver... - is anyone here in Vancouver and could tell Fatih? 15:27:06 <[1]JonasB> Ahh. 15:27:23 <[1]JonasB> Ill text him 15:27:51 trozet: did you have a look at the resaon why POD2 deployed failed last night? 15:27:57 back on the failed tests: morgan_orange: Could the initial set of failures be due to the fact that the system isn't fully up yet? 15:28:37 morgan_orange: I'll go all at once 15:29:14 ok - let's finish up with functest first 15:29:16 not clear, some tests seem failing at the beginning (so could be the setup that is not finished) but not the case for all the tests 15:31:00 [1]JonasB: Just confirmed with fdegir - he'll switch to autodeploy for POD1 15:31:01 we could ask Rally community for support (after Vancouver) 15:31:16 <[1]JonasB> frankbrockners: Thx 15:31:51 morgan_orange: Key question is: Are these failures in the "critical path" - i.e. would they block us from releasing Arno? 15:32:20 I think not 15:32:31 critical path = automatic build/deploy/test 15:32:38 we assume that some tests will be failed 15:32:52 yup - as long as we can explain why they fail... 15:33:18 we could already give some explaination (Tempest) and for Rally we can imagine that we can tell we will work on that assuming that CI, stability, robustness is a key challenge 15:33:56 but we can accept not been full green everywhere for Arno 15:34:19 for ODL there is also still some work 15:34:51 <[1]JonasB> frankbrockners: I think we should write down what ARNO SR0 is and what it is not to set expectations right/guide us when we can release 15:35:10 ok - message would be something like "deployment works and can be validated, though system stability isn't given at all times" 15:35:53 are we done with functest? Would like to move on... 15:36:03 <[1]JonasB> frankbrockners: Exactly 15:37:01 ... let's switch topics to POD1 and POD2 autodeployment updates. 15:37:06 frankbrockners: last point, before closure : I asked Ian Wells about increasing networks Quota and he said : There aren't, but it's better done as a config item in neutron.conf than by changing the code. 15:37:16 #topic Updates on POD1 and POD2 auto deployment 15:37:57 amaged: Agreed. And I believe morgan_orange was suggesting a change to the config 15:38:22 [1]JonasB, trozet - who'd like to go first? 15:38:33 i can go 15:38:47 trozet: please go ahead... 15:39:22 #info Looked into the LFPOD2 failures from last week. There was an issue with how we were checking to see if a node was finished its puppet configuration 15:39:48 #info I modifed that check to be a more reliable check, and 10/10 deployments passed over the weekend 15:40:27 #info Also, it cuts down on the time it takes to deploy. Previous deploys were clocking at 2.5 hours which wasn't right, it was just the check taking forever. Now the deploys are around 1.5 hours 15:41:15 <[1]JonasB> trozet: Better than Fuel, right now constantly 120 mins 15:41:41 #info Failure from last deploy (this morning) was a real failure. The filesystem failed to mount correctly on the UCS and the install just bailed. Just powercycled the server now and it is installing fine so I believe it was a transient error 15:42:42 #info Just looked at our neutron.conf and we aren't setting any of the limits in there so I guess it's using defaults 15:43:42 #info Was testing out the external network patch some more this weekend and found 2 bugs I need to fix. Has to do with puppet ordering...so still working on that patch 15:44:09 thats it from my end 15:44:29 <[1]JonasB> #info POD1/Fuel. We're working on 1 patch for the console access/consol logging (Files are currently missing in the .iso) 15:45:13 <[1]JonasB> #info We will not have time to resolve the ODL/HA proxy issue for Arno SR0, will come in SR1 - work has started 15:45:32 <[1]JonasB> #info Documentation will be finished this week 15:45:42 [1]JonasB - what is SR0 and SR1? 15:46:18 <[1]JonasB> frankbrockners, presumably we will have service releases. 15:47:01 <[1]JonasB> frankbrockners: I call this first base releas SR0, and the first service release SR1 (just like I believe ODL is doing) 15:47:06 [1]JonasB: Yup. I know. Just wanted to inspire that you say this in the notes - so that a reader isn't wondering about SRx. 15:47:40 <[1]JonasB> #info SR0=Arno Base relese, SR1=incremental service release 15:47:51 thanks [1]JonasB 15:48:26 <[1]JonasB> #info Question, how do we do with the user manual, should we have one just pointing into OpenStack or? 15:49:09 <[1]JonasB> Other than this question - it is all from my side 15:49:58 IMHO it would be good enough to point to OpenStack etc. for now - and just document the install/deploy specifics for Arno. 15:50:24 <[1]JonasB> frankbrockners: Thx, and agree! 15:50:46 <[1]JonasB> frankbrockners: Thx, and agree! 15:50:47 From what I've seen, even in the install guide there is a simple recipe for "how to run a simple VNF". that should be sufficient for now 15:50:48 <[1]JonasB> *��-jmhhhhhhhhhhhhhhhhhhhhhhhhh 15:50:49 <[1]JonasB> *��_lp�� 15:51:00 Is anyone using POD2 or can I use it to try out my external network patch? 15:51:15 you can use it 15:51:29 morgan_orange: thanks 15:51:30 [1]JonasB: On the ODL/HA proxy issue: How does the issue present itself to a user? 15:52:07 <[1]JonasB> Sorry cleanen my wireless keyboard in the kitchen, didt think it would reach that far:-) 15:53:01 <[1]JonasB> frankbrockners: It misserably fails all together with a HA deployment - very noticable :-( 15:54:19 does this mean "no HA at all" or "no HA for ODL" (the latter is the case anyway given that the Helium ML2 plugin does not support clustering) 15:55:36 [1]JonasB: does this mean "no HA at all" or "no HA for ODL" (the latter is the case anyway given that the Helium ML2 plugin does not support clustering) 15:55:38 <[1]JonasB> frankbrockners: Combination of HA and ODL will not be supported (Jira ticket) until SR0, we have made a misstake :-( 15:56:10 ok - that is something we can (and should) document 15:56:36 <[1]JonasB> frankbrockners: Absolutely, both in install instructions, release notes and a Jira ticket 15:57:12 #info Additional info on ODL HA proxy issue mentioned above: Combination of HA and ODL will not be supported initially when using Fuel as an installer. Plan to support it with the first service release (SR1). Work has already started 15:57:27 ok - looks like we're done with the updates. 15:57:40 two additional points... 15:57:52 <[1]JonasB> frankbrockners: What is our verdict to TSC tomorrow? 15:58:19 one ask: The usual one: Could you update https://wiki.opnfv.org/releases/arno/releasereadiness to get ready for TSC tomorrow? 15:58:31 <[1]JonasB> I will 15:58:40 yup 15:59:05 and one question: (Jonas already asked it): TSC will ask whether we feel ready to commit to a target release date. 15:59:15 Do we feel like ready? 15:59:48 frankbrockners: I thought there was already a release date for May 29? 15:59:56 or no? 16:00:23 <[1]JonasB> Depends on: If the target is to have a community, WoW, toolchain up, etc as a baseline for R2 - then yes. 16:00:24 May/28 was a very implicit target so far 16:00:47 it is still in May and a Thursday. (Fridays don't work for the marketing folks) 16:00:54 oh 16:00:57 <[1]JonasB> If the target is a welltested battle proven stack - then no 16:01:23 I'll be PTO all next week. Radez will be holding up my end. 16:01:27 we won't get to a "welltested battle proven stack" even if we gave ourselves another 4 weeks... 16:02:07 <[1]JonasB> frankbrockners: I agree, but lets get this in written so we set the expectations accordingly 16:02:38 do we need more time to debug the failures from the smoke testing on each pod? 16:02:45 at least make sure those pass? 16:03:02 or should we just release it as is 16:03:32 let's recap: what are we missing? (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed 16:04:08 I think Foreman is also missing: external network patch/verification, static packages 16:04:10 trozet: IMHO we can release, as soon as we know why things fail - we can continue to fix things for a SR1, per what Jonas is saying 16:04:18 k 16:04:21 trozet: Good point 16:04:32 I'm hoping to fix those 2 things this week before I go on PTO 16:04:40 ODL test suite 16:05:01 is also needed 16:05:42 so updated list: (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed (e) patches for autodeploys (Foreman: external network connectivity) (f) ODL test suite running and automated 16:06:11 <[1]JonasB> morgan_orange: Is ODL testsuite really needed, the ODL usecases we shall support are tested through Tempest/Neutron testcases or? 16:07:16 pbandzi: any thoughts? IMHO some smoke tests on ODL should be there as well. It is one of the two key components. 16:07:32 I think it is better to have it (another tool robot + another upstream suite) it is maybe not mandatory 16:07:50 <[1]JonasB> Ok 16:08:12 I am not familiar much with tempest, so it is possible that we are duplicating tests, but robot at least test ODL by itself 16:08:27 we removed the vIMS testcase (which was the NFV flavor) if we remove oDL, it will lokk like an Openstack installer.. 16:08:37 in later releases we can support all robot tests for ODL testing 16:08:49 agree morgan_orange: let's keep the robot tests 16:09:08 or at least a reasonable subset to do basic smoke testing 16:09:48 and having Robot and Rally offer a good tooling to support future OPNFV scenario 16:09:50 so back on the question: Considering the open items (a)...(f). What is the level of confidence that we can hit a May/28 release date? 16:10:18 Let's do a roundtable - were everyone throws in his percentage and then we decide... 16:10:32 <[1]JonasB> for a), there should not be an issue 16:10:37 frankbrockners: I would say 75% 16:10:41 [1]JonasB, morgan_orange, trozet, pbandzi, ...? 16:11:18 frankbrockners: 75% 16:12:02 <[1]JonasB> 75% with ODL testing, probably more like 85% without? 16:12:22 frankbrockners: I can say for odl ~ 75% 16:13:01 <[1]JonasB> pbandzi: Is that for ODL/Robot in isolation? 16:13:18 so sounds like everyone has the same view... what we're for sure asked is: Would the level of confidence be siginficantly higher if we'd go to Jun/4th (one additional week)? 16:13:36 [1]JonasB: I think not all but few tests for ODL we can deliver 16:14:29 frankbrockners: I would be much more optimistic about that week since I would actually be back from PTO + more time 16:15:11 so how about we say: May/28: 75% confidence, June/4: 90% confidence? 16:15:33 yeah 16:15:33 that gives the TSC something that they can decide on - and whether they want to take the risk or not. 16:16:01 others? [1]JonasB, morgan_orange? 16:16:08 I think that we can tell TSC if its june 4th we can stick to that and not slip 16:16:26 <[1]JonasB> Agree 16:16:43 +1 16:17:18 <[1]JonasB> trozet: Just curious - what is PTO? 16:17:25 paid time off :) 16:17:31 <[1]JonasB> ;-) 16:17:33 its a week at the beach with my family :) 16:17:48 <[1]JonasB> trozet: You deserve it 16:17:54 be carefull you could become european... 16:18:04 :-) 16:18:05 haha! 16:18:27 thanks [1]JonasB...would rather take it after the release, but it has been planned for a long time 16:19:38 ok... - sounds like we have a way to approach the date discussion in the TSC tomorrow. 16:20:03 Many thanks 16:20:06 <[1]JonasB> frankbrockners - see you there, need to jump off now 16:20:09 BTW I will not be in PTO tomorrow and Wednesday, but I must attend a training (planned for 6 months..) so I may not be connected all the time but Jose will be on the bridge 16:20:21 let's just info this briefly 16:20:27 thanks morgan_orange 16:20:53 morgan_orange: I may drop you an email to fire off your tests once I get the external patch done 16:20:57 #info key open items for BGS to reach release readiness: (a) successful auto-deploys of Fuel (b) resolution and/or root-cause explanation of several test cases (c) ISO for Foreman/Quickstack (d) docs completed (e) patches for autodeploys (Foreman: external network connectivity) (f) ODL test suite running and automated 16:21:20 trozet: OK 16:21:36 morgan_orange: but I can investigate the failures 16:21:45 wiki is up to date 16:21:58 guyrodrigue will investigate on his side 16:22:19 (orange colleague) 16:23:33 #info potential release date: High-level of confidence in the team to achieve a release date of June/4th (90% confidence) - and having (a)..(f) addressed. Confidence level drops significantly when May/28 would be the target for Arno (several folks out for either OpenStack, training, or vacation between now and May/28). 16:23:43 is the above ok for everyone? 16:24:13 ok for me 16:24:52 I think [1]JonasB had to run. trozet: ok for you? 16:25:04 pbandzi: ok for you? 16:25:06 ok 16:26:28 trozet:? 16:26:30 hi 16:26:48 that sounds perfect to me 16:26:56 is the above statement on the release date that I info'ed ok for you? 16:27:02 perfect - thanks 16:27:11 finally done for today 16:27:15 sorry for running over 16:27:23 many thanks to everyone! 16:27:23 np 16:27:26 <[1]JonasB> Ok for me 16:27:34 thanks Jonas 16:27:43 #endmeeting