15:00:01 #startmeeting BGS/Genesis weekly and Arno SR1 synch meeting 15:00:01 Meeting started Mon Sep 28 15:00:01 2015 UTC. The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:01 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:00:01 The meeting name has been set to 'bgs_genesis_weekly_and_arno_sr1_synch_meeting' 15:00:07 #info Frank Brockners 15:00:11 <[1]JonasB> #info Jonas Bjurel 15:00:19 #info Morgan Richomme 15:00:21 #link draft agenda is posted here: https://wiki.opnfv.org/meetings/genesis 15:00:30 #info Tim Rozet 15:00:47 #info DanielSmith 15:00:52 #info Stefan Berg 15:00:55 #info Dan Radez 15:01:17 #info Narinder Gupta 15:01:35 #info Jose Lausuch 15:01:35 looks like we already have quorum for the SR1 synch 15:01:42 #info David Blaisonneau 15:01:47 #topic Arno SR1 synch and readiness 15:01:48 #info Randy Levensalor 15:02:08 #info chenshuai 15:02:14 #info Fatih Degirmenci 15:02:34 let's check whether everyone is "ready for release" and what the open items are (Jonas reported a few minor things) 15:02:44 #topic Fuel status 15:02:51 #info Ashlee Young 15:02:54 [1]JonasB? 15:04:03 [1]JonasB - r u there? 15:04:18 <[1]JonasB> #info Fuel is in seemeling good shape, all functional merges done, 20 good virtual devel. ci-pipeline runs, and a couple of manual, noone failed. Functest showing good results. 15:04:51 <[1]JonasB> #info: We will make the final touch on the release notes and installation instructions tonight. 15:05:18 [1]JonasB: you mentioned via email that stable/arno was not even with master? 15:05:38 <[1]JonasB> #info Only known issue is the config issue in LF-Pod which Szilard, Tim or Jose could enighten us about - if it still exists. 15:05:44 <[1]JonasB> That is all 15:05:53 #info Bryan Sullivan 15:06:18 #info Peter Bandzi 15:06:26 <[1]JonasB> trozet I made a git diff master..stable/arno and saw that clean.sh differed? 15:06:56 [1]JonasB: ah ok 15:07:10 what was the delta? 15:07:32 or better: what is the delta? 15:07:40 <[1]JonasB> Let me see. 15:09:18 frankbrockners, [1]JonasB: 66b527d7a94ad5be261aec39fe692a135f77b657 Make 03_install_repo.sh executable and fix typo 15:09:31 err wait 15:10:05 oh i think its jose_lausuch patch 15:10:16 to remove kill kvm instances 15:10:16 <[1]JonasB> It seems to be some code to forcefully remove VMs if they still exists 15:10:20 yeah 15:10:46 https://gerrit.opnfv.org/gerrit/#/c/1961/ 15:10:54 yes, it removes any qemu- process 15:11:06 [1]JonasB, frankbrockners: i can cherry pick it to stable/arno and merge 15:11:10 since we discovered that some vm leftovers were running 15:11:24 <[1]JonasB> cool 15:11:34 trozet: +1 15:12:06 now they are even 15:12:07 assuming that the script has been tried... 15:12:07 :) 15:12:11 <[1]JonasB> jose_lausuch: What about the LF-POD config issue, resolved or still there 15:12:15 thanks trozet 15:12:26 Jonas1: it seems its still there... 15:12:36 on my question: has the script been used before on master? 15:12:38 [1]JonasB: still there I think 15:12:42 frankbrockners: yeah 15:12:43 at least, this afternoon, after fuel was deployed, when Foreman triggered, it failed due to the setup 15:13:40 <[1]JonasB> trozet, jose_lausuch: Is it you two working on that? 15:14:04 [1]JonasB: its more important to get the test results we need rather than fix CI imo 15:14:20 <[1]JonasB> trozet: +1 15:14:27 yes 15:14:30 jose_lausuch - you say that foreman deploy failure was due to vm leftovers from fuel? 15:14:37 but after jose_lausuch feels like he has all his results I will go try to debug/fix it 15:14:40 and we are getting them fixed :) pbandzi can confirm 15:14:47 frankbrockners: no, sorry 15:15:00 ah ok 15:15:01 yep ODL suite runs now on both... 15:15:05 frankbrockners: foreman deploy failure was due to not proper setup of jumphost networking 15:15:12 pbandzi: how do the results look? 15:15:15 ok.. folks 15:15:22 let's switch topic to foreman 15:15:29 #topic Foreman/status 15:15:41 about config issue there is now workaround: puting down and up two interfaces restore their required config 15:16:07 but not sure about what was cause of that misconfiguration 15:16:22 let's info some of that 15:16:41 #info foreman config issue there is now workaround: puting down and up two interfaces restore their required config 15:16:47 #info foreman deploy failures due to not proper setup of jumphost networking (per jose_lausuch) 15:16:51 pbandzi: can you give the up/down specific command? 15:17:22 trozet: ifdown enp7s0; ifdown enp6s0; ifup enp7s0 ; ifup enp6s0 15:17:32 just reseting enp6 and enp7 15:17:55 issue was that enp7 ended up without IP and foreman deploy therefore failed 15:18:24 #info (per pbandzi): issue was that enp7 ended up without IP and foreman deploy therefore failed 15:18:48 #info (per jose_lausuch): quick fix: reset the interfaces: ifdown enp7s0; ifdown enp6s0; ifup enp7s0 ; ifup enp6s0 15:18:52 jose_lausuch, pbandzi: doing ifdown removes the .300 and .0 subinterfaces? 15:19:11 trozet: no those were removed with clean script 15:19:14 well, no 15:19:17 exactly 15:19:19 ifdown will not work if the if-cfg files arent there 15:19:31 lmcdasm: they are 15:19:34 lmcdasm: but they are there 15:19:38 you should be using the "ip link delete Xyz" instead of ifdown 15:19:43 cause ifdown will only turn it off 15:19:45 not remove the vlan nic 15:19:51 (i think we discussed this last week Jose). 15:19:53 ok so the proposed solution is to: delete the subinterfaces, then bounce the regular interfaces to gain their IPs back 15:19:59 yes, we did 15:20:09 #info ok so the proposed solution is to: delete the subinterfaces, then bounce the regular interfaces to gain their IPs back 15:20:15 but we did it this way because the scripts were existing 15:20:24 #action trozet to fix the clean with proposed solution 15:20:29 and didnt want to bother about what ip goes what 15:20:50 sound ok? 15:21:20 sounds good to me - question is when 15:21:29 lmcdasm: but what the clean script does is with ip link... thats correct :) 15:21:34 after functest is done with their results - its not release blocking 15:21:46 ok 15:21:50 thx for clarifying Jose 15:21:51 on the above discussion - should we first get all tests completed and then fix CI? 15:21:55 np 15:22:03 ok 15:22:07 we're on the same page. 15:22:10 frankbrockners: yeah this is an LF pod specific problem 15:22:15 nothing to do with release imo 15:22:23 +1 15:22:28 other than hurting our progress ;) 15:22:32 for functest results 15:22:47 #info I think foreman is ready for release barring the results from functest on ODL 15:22:50 #info we'll first get all functests completed - and then look into script changes to fix the issue on POD2 15:22:56 pbandzi can you please give latest results 15:22:57 for ODL 15:23:08 #info script fixes aren't required for Sr1 15:23:16 #info ODL tests runs on both fuel and foreman 15:23:51 pbandzi: can you give the pass/fail and comparison to Arno (or if already updated on the wiki, provide #link)? 15:24:04 I will update the wiki soon 15:24:12 ideally is should go here: https://wiki.opnfv.org/functest_release_1&#arno_sr0_vs_sr1 15:24:18 yes frankbrockners 15:24:20 thanks 15:24:31 #info ODL pass/fail status remains same as it was 15:24:37 <[1]JonasB> When will it be there, need to update release notes with it. 15:24:39 #action jose_lausuch to update https://wiki.opnfv.org/functest_release_1&#arno_sr0_vs_sr1 15:25:10 pbandzi: great. Thanks 15:25:18 what else is left for functest to do jose_lausuch? 15:25:26 nothing else 15:25:48 so from a functest perspective all you need is update the docs? 15:25:48 so frankbrockners, [1]JonasB: should both installers update the tests results in the docs for SR1 before release? 15:26:01 doc has been updates 15:26:14 <[1]JonasB> trozet: Why not be lazy and point to the link? 15:26:19 +1 15:26:24 +1 to lazy 15:26:26 i mean to link.. 15:26:28 :) 15:26:33 -1 to point to the link 15:26:35 <[1]JonasB> +1 15:26:47 <[1]JonasB> No to voting :-) 15:26:51 :-) 15:27:13 if you decide to use the the link, please point to a versioned wiki - not latest 15:27:52 would that work? 15:28:03 jose_lausuch: can you update the wiki with the pass/fail for ODL then send out the version link? 15:28:06 mostly a question for jose_lausuch and morgan_orange? 15:28:18 yes 15:28:20 <[1]JonasB> frankbrockners: How do I do that, and when is the final content there? 15:28:30 I can do that 15:28:39 jose_lausuch would need to rell you the when things are final 15:28:54 on the wiki you can always go to "older revisions" 15:29:10 so you create a new revision without a real edit 15:29:24 and then pick the last revision 15:29:28 frankbrockners: ok I'll try 15:29:39 <[1]JonasB> Jose_lausuch: So you send out a mail with the link to all of us, when done? 15:29:50 jose_lausuch - thanks. let me know if you have issues. 15:29:50 ok 15:30:03 frankbrockners: sure, thanks 15:30:16 #action jose_lausuch to send the link with test results to the team once results are final 15:30:52 any other updates 15:30:53 ? 15:32:26 not from my side 15:32:34 <[1]JonasB> no 15:32:48 so from the above conversation I'd deduce that we'll release SR1 tomorrow. Does everyone agree - or does anyone see blocking issues? 15:32:59 <[1]JonasB> +1 15:33:09 if you see a blocking issue, please speak up now... 15:33:22 sounds good to me 15:33:32 ok 15:34:03 next question: When do we label the release tomorrow? On Friday we said "Tuesday afternoon" 15:34:28 I need to upload a patch still for releng 15:34:38 to call the odl test 15:34:44 I'll do it asap 15:34:48 <[1]JonasB> frankbrockners: Fuel will try to be ready by midnight tonight 15:35:05 <[1]JonasB> CET 15:35:06 thanks jose_lausuch and [1]JonasB 15:35:22 would be nice to label before the TSC call tomorrow 15:35:28 a question 15:35:42 pharos, functest, octopus, opnfvdocs, releng will have tags tomorrow 15:35:54 and genesis 15:35:55 ay reason why? other than to report at the TSC? 15:35:56 early morning - assuming no updates will come to these 15:36:07 genesis might be left to afternoon 15:36:12 i mean - just to understand what type of timeline you are setting Frank since it means late night work for some of us 15:36:20 given that we can build/deploy and produce docs from same commit 15:36:35 and if we are going against some arbitrary timeline so we can make a report, then i would avoid working against that.. let us finish as best we can and let you know 15:36:37 so 15:36:41 the question is 15:36:54 who expects to have commits left for tomorrow morning? 15:36:59 or anyone expects? 15:37:16 well if i understand so far, you dont have a CI test of ODL against the SR1 builds yet 15:37:17 fdegir - good question 15:37:17 including doc updates 15:37:25 so i think its a bit early to start taggin gthings until that works no? 15:37:32 im just making 1 more commit to the linked test results for foreman 15:37:51 <[1]JonasB> Fuel will try to avoid commits tomorrow. 15:37:51 lmcdasm: that's the reason I say that we need to do a final run based on same commit 15:38:02 lmcdasm: ODL tests passed according to pbandzi 15:38:07 im with you Fatih.. 15:38:15 trozet: yes, but manually 15:38:19 thats it Tim 15:38:20 not from CI 15:38:23 we dont have a CI pipeline 15:38:26 why does that matter? 15:38:27 jus t.. 15:38:36 well.. thats it.. does it? 15:38:39 it is not repeatable 15:38:49 its not needed to release 15:38:59 no one is running LFpod 2 ci pipeline other than us ;) 15:39:04 well. i thought that a req for release was a integrated CI pipeline 15:39:07 if it was me, I wouldn't release this 15:39:11 I'd like to push that patch anyway, its 10 min work as soon as we are finish :) 15:39:26 yeah jose_lausuch push the patch for functest fix 15:39:32 for CI clean fix it doesnt matter for release 15:39:37 ok - can we do the following: 15:39:39 it is not about CI 15:39:49 it is about repeatability/reproducibility 15:39:53 traceability 15:39:56 its not just CI, but the whole testing of ODL consistenly 15:39:58 many bilities 15:40:02 ya.. what Fatih said 15:40:07 ya, ok 15:40:22 you are right 15:40:39 agreed on the repeatability - if it works through CI it is proven to be repeatable 15:40:46 otherwise folks might challenge it 15:40:49 so how many times should we repeat ODL test before we can release? 15:40:53 CI is just our LF lab , people dont have to use our LF lab... so 15:41:18 if we can't even repeat our stuff in our lab 15:41:21 you mean deploy+test ? 15:41:24 how we can expect people to do same in their labs 15:41:27 we said some days ago that we need 2 days of stable run (deploy/test) 15:41:35 but anyway, I 15:41:39 m here to report 15:41:48 +1 morgan - seems like we have a patch coming in from Jose for testing 15:41:51 morgan_orange.. we're abit away from the possibility 15:42:01 and then that functest needs to be frozen and then tests ran 15:42:04 but could we get 3 successful runs? 15:42:09 if we have said its 2 days after code freeze then so be it 15:42:21 so 2 dailys? so if we fix clean today and functest, then 2 day delay assuming it all passes? 15:42:26 but functest is about CI, the network configuration on the POD is about CI 15:42:42 (again - wondering at this deadline we are settings and why a day makes a difference - other than optics to the TSC)? is one pass enough for a release? I dont think so - not after a last code change. 15:43:07 hmm. .seems to me like either we stick to our own guidelines.. or not 15:43:09 lmcdasm - perception matters, so let's try to not move the date 15:43:15 if we just say "3 times" what does that prove? 15:43:31 it proves that it was repeatable for 3 times 15:43:32 Frank - perception shoud NOT trump doing the job thoroughly. 15:43:32 i dont think it proves anything 15:43:35 we didnt change any code in ODL 15:43:40 SR3->SR4 15:43:44 its tested upstream by ODL 15:44:00 ok.. well.. my point is this.. if we have guidelines for release and we have said 2 days testing once frozen and labelled 15:44:15 then that is it.. if Frank needs to communicate that we ahve a release and its in "test stability phase" then so be it. 15:44:20 I agree - so the only challenge you could run into is that you misconfigured ODL 15:44:22 otherwise why bother with anything.. call a release now then 15:44:29 and you'd likely notice that in the first run... 15:44:30 +1 15:44:36 if we don't need to see if it works 15:44:44 then today and tomorrow don't different for me 15:45:08 my point Fatih - if we arent doing it to be thorough, then dont pay lip-service to it 15:45:15 say we are done cut it and we patch with problems as they are seen 15:45:20 folks... let's be pragmatic and see what would be feasible to run between now and tomorrow to increase the comfort level 15:45:25 and not say "we did 3 tests and are ok" which is a bit 'light i would say" 15:45:31 trozet promised to have the patch done quickly 15:45:44 then we can kick another deploy/test and see whether it works 15:45:52 and if it works we could try again 15:46:08 well Fuel already passed ODL multiple times right? 15:46:12 the concern is just foreman? 15:46:14 we dont know about ODL 15:46:15 trozet: yes 15:46:22 except for pbandzi manual test report 15:46:31 and we also just had another patch submitted an merge a couple hours ago 15:46:37 <[1]JonasB> trozet: We didnt perform tests 15:47:01 lmcdasm ODL tests are running automaticaly for fuel from last week 15:47:05 <[1]JonasB> Since the testpipeline is broken it the deve ci-pipeline 15:47:13 [1]JonasB: functest was never executed on the fuel deploys? 15:47:22 guys, if ODL matters, I see a bigger problem here, the deployment as such, which didnt work today switching from Fuel to Foreman 15:47:47 ok - my last couple comments.. we have just frozen the code (for FUEL) 1 hour ago - and agree no more changes.. so all test results, while good info, dont count until you hit the freeze button 15:47:52 so how about this - jose_lausuch push the functest patch, ill push a fix for clean. I'll run it 3 times today and then if its all good we release tmrw? 15:47:55 from there, you do your tests and start to get results to make in your release report 15:47:58 <[1]JonasB> frankbrockners: In LF-lab it was, not in the 20 runs we did with the devel. pipeline 15:48:21 i dont know how we can say - with code coming that "3" is going to be enough -what happenned to our 2 day guideline? 15:48:37 are we to throw this out cause we are now against a invisible time-wall? 15:48:49 what is the problem with freezing the code and waiting to see it sit and run for a while? 15:49:07 so jose_lausuch - what you say is that we'd need to do the deploy/test cycle for both fuel and foreman - correct? 15:49:09 whats the point in it running for a while when it wont make a difference? 15:49:21 trozet - then why do 3 times?4 15:49:28 i dont think it should be done any times 15:49:30 at least for foreman 15:49:33 im confident its good 15:49:39 frankbrockners: thats what I would do 15:49:40 at all? the point is whether we want to have a standard test method for our releases and do what we say we are going to do 15:49:47 or just cowboy it each time before release. 15:50:24 i just think that with the code still in motion - while we "think" things wont happen, why bother to rush? but again - as you said trozet, if it doesnt matter to the group and to the PTL, then fine 15:50:46 deploy/test takes <3 hrs? Correct? 15:50:52 lmcdasm: i dont think its as tight for us. I haven't changed any code in a week 15:50:57 and been testing it 15:51:01 I would say 4 15:51:21 if everything is alright 15:51:27 ok - even with 4hrs - we should be able to run fuel and foreman 3 times in 24hrs 15:51:38 which means ODL would also go through testing 6 times... 15:52:03 we can label the release late on Sep/29... 15:52:07 frankbrockners: in theory yes 15:52:30 but maths sometimes dont apply with that exactitude 15:52:31 So that i undersatnd. we are saying that 12 hours before you want to release, you are gonna do more code change and run it three times - per platform and that is what we accept as a test result for a release (so we know for next time) - and with the approach if something goes wrong in the next 4 horus, you have put us all awake all night Frank 15:53:17 last time ill say it - but i dont see why this pressure for a day is after lots of people have hammered this all weekend. 15:53:46 IMHO we have two options: (1) fix things and run deploy/test in the next 24hrs. if things go well - release (2) fix things and have things run for 2 days - and release by then. which moves the release out by 1+ days. 15:54:17 Could you state your preference? 15:55:04 i think mine is obvious - finish your development, freeze everything, let it run for the 2 days that Functest group has agreed 15:55:08 then review / fix and release 15:55:46 other thoughts? Please opt for 1 or 2.... 15:55:57 1 is fine with me 15:56:12 <[1]JonasB> I think 2 is the correct way 15:56:22 I vote for 2, Daniel's thinking 15:56:52 morgan_orange, fdegir? 15:57:41 its not really a democracy Frank - make a call.. i dont "care" either way - i just think tha tyou havent had a stable CI pipline for a while, and you might wanna try some interations ... thats all. 15:57:53 nope lmcdasm 15:58:05 2 makes sense for me. I am confident that if 1) is OK the extra day may be useless but we did not see any clean alternance of deploy/run so far and if we decided to have 2 days of clean run, we shall respect this decision. Is there any special need to release tomorrow? 15:58:15 we need to deal with the situation either way 15:58:37 and if we have a clear majority for (2) then the argument is far easier with the TSC 15:59:13 <[1]JonasB> ping fdegir 15:59:35 I vote 2 16:00:09 sounds as if a majority would prefer to have 2 days of stable deploy/test before we release 16:00:28 and we should really really freeze this time 16:00:37 I'll position things with the TSC accordingly tomorrow 16:00:42 <[1]JonasB> fdegir +1 16:00:53 so then when do we freeze? 16:01:00 fdegir: if we freeze now, the CI will fail again 16:01:02 fdegir - agreed, but let's get the CI patches.. 16:01:07 yes 16:01:18 so let's get the CI pipeline fixed 16:01:20 that was the question: anyone has anything more? 16:01:27 <[1]JonasB> The installer code should freeze NOW 16:01:30 if so, they go and we freeze 16:01:35 including docs etc 16:01:36 yeah fdegir: docs update and clean patch 16:02:01 <[1]JonasB> fuel will freeze docs tonight 16:02:06 fdegir - we still need to update the docs to point to the test results... 16:02:33 test results go to functest 16:02:35 so could functest / fuel / foreman send a "frozen" to the team once ready? 16:02:45 k 16:02:46 as well - the label of the ISO will need to be updated once the iso is cut with the final name ( i have used the format i have seen - hope that is correct).. as well, once Pharos has done their commits i can update the links to reference arch, etc 16:02:55 fine by me 16:03:26 <[1]JonasB> Should we wait with the test update in the release notes until just before the tagging? 16:03:58 #info team decided to stay with the rule to have 2 days of successful, non-interrupted deploy/release cycles prior to releaseing SR1 16:04:23 #info this decision means that SR1 will not come out on Sep/29 but a few days later 16:04:38 we should wait to tag until the 2 days of testing are done so we know its going to stay frozen 16:04:47 #info teams working on SR1 will notify each other with a "frozen" email 16:05:07 #info once all components for SR1 are frozen - we'll start counting the two days 16:05:41 <[1]JonasB> frankbrockners: releasenotes must have an exemption 16:06:01 [1]JonasB - tests will likely be complete once we're through a deploy/test cycle 16:06:33 [1]JonasB - fine to have an exception on release notes 16:06:45 (kind of obvious= 16:06:53 not to stir the pot anymore, but when we say 2 days, how many tests does that mean? 2 dailys? or.. 16:07:30 what does CI currently do fdegir? 16:07:38 trozet: even more, what do we test? stable arno only right? 16:07:48 yes stable 16:07:52 frankbrockners: it does 2 deployments a day 16:07:52 frankbrockners: build/deploy/functest 16:08:05 1 for fuel master, 1 for foreman stable 16:08:08 run for 1 branch per installer a day 16:08:20 could we run things sequentially without delay - rather than only once a day? 16:08:21 jose_lausuch: stable/arno only I think 16:08:29 now we have to change it to fuel stable and foreman stable 16:08:37 otherwise we don't gain anything with 2 days... 16:09:10 fdegir: we could decrease the separation between deployments 16:09:59 we could also just submit a bunch of daily jobs manually that will consume 2 days 16:11:22 trozet: ok 16:11:28 just during the night ... 16:11:36 we can connect foreman fuel jobs to each other 16:11:38 well, we have differnt time zones, but stil.... 16:11:46 fdegir +1 16:11:47 once foreman ends, it triggers fuel and vice versa 16:11:47 can we chain them? 16:11:53 ok 16:11:54 yes 16:11:56 thats even better 16:13:05 #info once everything is frozen, we'll "chain" deployments (i.e. build/deploy/test foreman - fuel - without any delay inbetween) 16:13:26 #info that should get us sufficient deploys to have a "comfort level" for the release 16:13:36 ok... looks like we have a plan now - finally 16:14:02 should we do a quick IRC synch tomorrow 16:14:04 ? 16:14:18 just to confirm that everything works fine? 16:14:21 yep 16:14:27 <[1]JonasB> I will be in my car, so I cant make it 16:14:36 how about 9am PDT? (after debra's call) 16:14:44 fine by me 16:14:58 ok 16:15:08 ok - let's do this... should be less than 5min (keeping fingers crossed) 16:15:27 #info quick status checkin tomorrow - 9am PDT (=4pm UTC). 16:15:27 like today :p 16:15:53 jose_lausuch - time depends on where you are... 16:16:08 anyway... 16:16:11 yes, and today was needed 16:16:17 looks like we're done for today. 16:16:29 Genesis will need to wait till next week - post SR1 16:16:36 agreed 16:16:37 ok with all Genesis folks here? 16:17:12 I take this as an ok from the larger team 16:17:34 ... thanks everyone 16:17:39 #endmeeting