15:00:47 <yamahata> #startmeeting neutron_northbound 15:00:47 <odl_meetbot> Meeting started Mon Jul 24 15:00:47 2017 UTC. The chair is yamahata. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:00:47 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:47 <odl_meetbot> The meeting name has been set to 'neutron_northbound' 15:01:00 <yamahata> #chair mkolesni rajivk_ 15:01:00 <odl_meetbot> Current chairs: mkolesni rajivk_ yamahata 15:01:06 <yamahata> #topic agenda bashing and roll call 15:01:10 <mkolesni> #info mkolesni 15:01:15 <yamahata> #info yamahata 15:01:22 <yamahata> #link https://wiki.opendaylight.org/view/NeutronNorthbound:Meetings 15:01:24 <rajivk_> #info rajivk 15:01:45 <yamahata> any topics in addition to breakage and usual patches/bugs? 15:02:23 <mkolesni> id like to discuss the ci 15:02:36 <mkolesni> not the u/t, the tempest 15:02:45 <yamahata> yeah, now tempest ci is not in good shape. 15:03:07 <mkolesni> well, were making it better but its a slow process 15:03:31 <yamahata> anything else? 15:04:15 <yamahata> ok move on 15:04:15 <mkolesni> FF is thursday 15:04:23 <mkolesni> we need to merge all non-bugs by then 15:04:34 <mkolesni> yamahata, are you cutting the branch? 15:04:49 <mkolesni> or is it nor automatic since wer'e not in independent release model 15:04:56 <yamahata> neutron team will do with one patch. 15:05:07 <yamahata> Pike-2 was done so. 15:05:17 <yamahata> So we'll review such patch 15:05:33 <mkolesni> ok 15:05:37 <yamahata> #topic Announcements 15:05:43 <yamahata> pike-3 is this week. 15:05:44 <mkolesni> afaik its 27th 15:05:52 <mkolesni> so that leaves ~3 days 15:05:53 <yamahata> #info Feature freeze is thursday 15:06:12 <yamahata> any other announcement? 15:06:31 <mkolesni> do you know if you're going to ptg yet? 15:06:34 <mkolesni> or the summit? 15:06:39 <yamahata> Unfortunately not yet. 15:06:44 <mkolesni> ok 15:06:52 <mkolesni> i will ask again next week :) 15:07:07 <yamahata> #topic action items from last meeting 15:07:17 <yamahata> I suppose we don't have any. (except patch review) 15:07:21 <yamahata> #topic Pike/Nitrogen planning 15:07:27 <mkolesni> rajivk_'s patch is good to go but blocked by the ci breakage :/ 15:07:46 <yamahata> So for Pike-3 feature patches needs to be merged 15:07:59 <yamahata> #action everyone address ci breakage 15:08:20 <mkolesni> rajivk_ mentioned it earlier 15:08:31 <mkolesni> rajivk_, do you know the necessary fix for the u/t ci? 15:08:45 <rajivk_> i will put a patch. 15:09:09 <rajivk_> But i dont know, why ci were passing after ceilometer patch got merged. 15:09:11 <mkolesni> ok great i havent had time to look at it today so if you have the fix we'll review it 15:09:18 <rajivk_> May be my findings are not correct. 15:09:38 <mkolesni> well, post the patch and we'll see :) 15:09:49 <rajivk_> ok 15:09:54 <yamahata> yeah, we'll see the result. 15:10:16 <yamahata> So what are the remaining patches? 15:10:26 <yamahata> https://review.openstack.org/#/c/474851/ 15:10:46 <yamahata> https://review.openstack.org/#/q/topic:bug/1683797 15:11:04 <yamahata> Oh mkolesni you uploaded a patch to make it neutron worker. 15:11:07 <yamahata> great 15:11:17 <mkolesni> yamahata, yes i think its a more elegant approach 15:11:31 <mkolesni> also it will allow to configure multiple workers if we have a need for it 15:11:36 <yamahata> and dhcp patch 15:11:46 <yamahata> https://review.openstack.org/#/c/465735/ 15:12:06 <yamahata> For dhcp port patch, it would need review. 15:12:20 <mkolesni> i will review it again tomorrow 15:12:27 <mkolesni> yamahata, if you agree with https://review.openstack.org/486606 15:12:43 <mkolesni> perhaps we can abandon all the other ones on the same bug 15:12:55 <yamahata> I haven't reviewed the patch yet. But that's what I'd like to cook it. 15:13:10 <yamahata> I think thread pooling still make sense. 15:13:19 <yamahata> It's orthogonal to 486606. 15:13:20 <mkolesni> sure just saying there's lot of patches there now 15:13:37 <mkolesni> no problem with that though nobody addressed my comment there from PS5 15:13:49 <yamahata> prepopulate agentdb patches are floating around. 15:14:03 <yamahata> It's bug fix patch, though. 15:14:21 <yamahata> https://review.openstack.org/#/c/465735/ and https://review.openstack.org/#/c/484446/ 15:14:26 <mkolesni> all bug fixes aren't first priority so lets focus on the features first 15:14:34 <yamahata> Yeah. 15:14:42 <mkolesni> and if we have time left on the bug fixes 15:14:46 <yamahata> we have plenty of patches for Pike-3... 15:15:03 <mkolesni> rajivk_, yamahata please see my comment here https://review.openstack.org/#/c/452647/5/networking_odl/journal/journal.py 15:15:06 <yamahata> After Pike-3, we can address bug fixes 15:15:35 <mkolesni> of course then neutron stable team has to approve the backports to stable/pike right? 15:15:49 <yamahata> right. 15:16:01 <manjeets> hello 15:16:08 <mkolesni> btw thread pooling is also a feature so if you want it in we can focus on it too 15:16:23 <mkolesni> though i dont think its critical for Pike and could slip to Queens 15:17:23 <yamahata> https://review.openstack.org/#/q/project:openstack/networking-odl+status:open 15:17:34 <yamahata> we have many bug fix patches which were floating around. 15:17:47 <yamahata> After Pike-3, let's wipe them out. 15:17:58 <mkolesni> theres some cleaning required there some of them are obsolete obviously 15:18:39 <mkolesni> so to sum it up, for this week need to focus on: 15:18:48 <mkolesni> 1. https://review.openstack.org/474851 - done, needs to be merged 15:19:28 <mkolesni> 2. https://review.openstack.org/#/c/465735/ 15:19:38 <mkolesni> 3. https://review.openstack.org/#/c/452647 15:19:43 <mkolesni> anything else? 15:20:07 <yamahata> that's the priority. 15:20:19 <yamahata> I think three is already many. 15:20:22 <mkolesni> yes thats the rfes 15:20:34 <mkolesni> well first one is +2 by both of us 15:20:44 <mkolesni> its a technicality to merge it after the gate is fixed 15:21:23 <yamahata> good summary. let's move on 15:21:25 <yamahata> #topic patches/bugs 15:21:31 <yamahata> we've already discussed patches. 15:21:43 <yamahata> and we'll look into ci breakage. 15:21:48 <yamahata> #topic tempest CI 15:21:52 <yamahata> mkolesni: you're on stage 15:22:29 <mkolesni> right 15:22:48 <mkolesni> so as you know ive been investigating tempest ci breakage 15:23:02 <mkolesni> found some bugs here and there all fixed now 15:23:12 <mkolesni> but the status is still dire 15:23:36 <mkolesni> so lets discuss per job basis.. 15:23:52 <mkolesni> first, gate-tempest-dsvm-networking-odl-boron-snapshot-v2driver which is our only voting job (and also gating) 15:24:07 <mkolesni> this job is very unstable 15:24:39 <yamahata> really unstable! It's with legacy netvirt. So I don't see much value to fix it. 15:24:41 <mkolesni> i believe the cause is some mess up in the set up of the DHCP so that somehow traffic slips across subnets on the DHCP nodes 15:24:48 <mkolesni> indeed 15:24:53 <mkolesni> but just to understand the cause 15:25:01 <yamahata> I suppose once we have carbon with new netvirt voting, we can retire boron job or disable unstable tests of boron. 15:25:13 <yamahata> Oh, great! what's that? 15:25:16 <mkolesni> so basically what you'll see when it fails its because VMs dont get IP 15:25:32 <mkolesni> and on the VM boot log you see it got DHCP NAK 15:26:07 <mkolesni> and also you see it in the dhcp log where you see each request gets answered by the dnsmasq on that subnet (DHCP ACK) 15:26:20 <mkolesni> and also 2 other dnsmasq on other subnets (DHCP NAK) 15:26:33 <mkolesni> so basically this sucks but i didnt investigate further 15:26:50 <yamahata> are those dhcp agent on same network? 15:26:52 <mkolesni> because, as you said, its old netvirt so i doubt if anyone's going to fix it 15:27:01 <mkolesni> no theyre on different subnets 15:27:11 <mkolesni> but somehow they get the dhcp request as well 15:27:12 <yamahata> I mean, network, not subnet 15:27:32 <rajivk_> mkolesni, i also noticed disk write failure 15:27:34 <mkolesni> no i think theyre even on different tenants but im not sure 15:27:50 <yamahata> I see. 15:27:56 <mkolesni> rajivk_, yes there might be other failures im just describing what i saw most of the time 15:28:07 <mkolesni> anyway old netvirt, not interesting 15:28:22 <mkolesni> ok can i move on to next job? 15:28:29 <rajivk_> Is it failure to acquire lease again or just to get IP first time? 15:28:48 <mkolesni> rajivk_, it fails to get ip on vm boot 15:28:49 <rajivk_> I mean, are they failed after machine reboots or in all the test cases? 15:29:00 <mkolesni> about 10 times or something and then gives up 15:29:16 <mkolesni> from what i saw every time ip is requested 15:29:39 <mkolesni> its consistent, all the same test fail each time because of this issue 15:29:49 <rajivk_> I checked in one of the patch log, it was requesting a specific ips but server responded with NAK. 15:30:01 <rajivk_> anyway, we can leave as you said. 15:30:08 <mkolesni> yes lets continue 15:30:12 <mkolesni> next is gate-tempest-dsvm-networking-odl-carbon-snapshot-vpnservice-v1driver-nv 15:30:39 <mkolesni> so this one had a problem that the port status updater wasnt loaded at all causing random failures 15:30:44 <mkolesni> that got fixed 15:31:01 <mkolesni> i didnt continue too much on it since it's v1 driver 15:31:19 <mkolesni> but its rather unstable, though it's non voting so meh 15:31:42 <mkolesni> the only problem is that it stalls results until it times out but i guess we can live with that for now 15:32:07 <mkolesni> not sure how much value it provides so we can decide to drop it entirely once P-3 is out 15:32:24 <mkolesni> yamahata, whats the plan about V1, is it cut from the tree on Queens? 15:32:50 <yamahata> Maybe, if we can have v2driver voting, it makes sense to retire v1driver. 15:32:58 <manjeets> ++ 15:33:04 <mkolesni> we can throw it out when we cut it out if the tree 15:33:24 <yamahata> Yeah. So far v2driver job isn't stable enough. 15:33:27 <mkolesni> RH has no interest in V1 driver so as far as we're concerned the sooner the better 15:33:36 <mkolesni> problem is no job is stable enough :) 15:33:52 <yamahata> so we had v1driver job for comparison to understand where the issue exists. 15:34:03 <yamahata> but right now they are both too unstable. 15:34:31 <mkolesni> we can send an email and see if theres any objection to throwing out the job 15:34:37 <yamahata> anyway we should focus v2driver job. 15:34:38 <mkolesni> if not we can remove the job at lease 15:34:44 <mkolesni> *at least 15:35:19 <yamahata> Once v2driver job is stable, it's okay to remove v1job. 15:36:10 <mkolesni> ok as you wish 15:36:34 <mkolesni> ok now the big guy gate-tempest-dsvm-networking-odl-carbon-snapshot-vpnservice-v2driver-nv 15:36:51 <mkolesni> so this one also had a bug that the provisioning block wasnt created 15:37:18 <mkolesni> so port status update failed to actually do anything and then nova would randomly timeout VMs 15:37:48 <mkolesni> depending on a race there so sometimes a VM vould boot normally because the provisioning by dhcp was fast enough 15:37:52 <mkolesni> anyway that got fixed 15:38:12 <mkolesni> now the major issue im noticing with it is something i believe is a problem in ODL 15:38:20 <yamahata> now we're seeing sometime the carbon v2 job is passing 15:38:36 <mkolesni> i sent an email about it to netvirt-dev, let me find it 15:38:46 <yamahata> It's plausible that the issues are in ODL side. 15:39:23 <mkolesni> #info https://lists.opendaylight.org/pipermail/netvirt-dev/2017-July/005062.html 15:39:27 <yamahata> there are many ERROR logs in karaf log. 15:39:38 <mkolesni> so to sum it all up from the email, the FIP is sometimes broken 15:40:03 <mkolesni> now again we're seeing a situation where either the tests are all green, or all tests requiring FIP fail 15:40:16 <mkolesni> or at least the same tests fail every time 15:40:36 <mkolesni> so this leads me to believe the problem happens when the public network gets created on ODL 15:40:54 <yamahata> All pass or all fail is interesting observation. 15:41:02 <mkolesni> problem is im not that strong on odl side so thats why i asked for assistance 15:41:16 <mkolesni> but nobody stepped up yet and the mail saw little interest 15:41:45 <mkolesni> if you guys have better netvirt knowledge you can take a look 15:41:47 <yamahata> maybe we would like to replicate it with nitrogen. 15:42:02 <mkolesni> if not im trying to get some help from our ODL team at RH 15:42:18 <yamahata> Cool. 15:42:25 <mkolesni> hmm nitrogen jobs are all broken because its not built by the integrated job yet 15:42:42 <mkolesni> so yamahata rajivk_ or manjeets do you guys have knowledge to debug this? 15:43:07 <manjeets> mkolesni, I guess they need to have nitrogen-snapshot available 15:43:09 <yamahata> off course, we have. The issue is their bandwidth. 15:43:35 <manjeets> it fails at getting nitrogen-snapshot 15:43:42 <yamahata> Anyway after Pike-3, I'll also look into it. 15:43:51 <mkolesni> anyways i believe that the new-netvirt job should be the voting one and the old-netvirt should be non voting, the opposite of what happens today 15:44:00 <yamahata> With nitrogen, karaf-distribution is not created yet. 15:44:18 <mkolesni> yes we can perhaps use only the netvirt karaf 15:44:18 <yamahata> karaf or netvirt image needs to be used. 15:44:47 <mkolesni> basically netvirt karaf probably has everything we need 15:44:50 <mkolesni> so we can try it 15:44:55 <yamahata> So far ODL community doesn't have ETA to create karaf-distribution image. 15:45:00 <mkolesni> i had some experimental patch to use netvirt karaf 15:45:03 <mkolesni> this seems plausible 15:45:23 <mkolesni> https://review.openstack.org/#/c/482453/ 15:45:34 <mkolesni> but i didnt dig into the test failures too much 15:46:05 <mkolesni> regarding sfc im not sure 15:47:01 <mkolesni> i think it does have sfc but dont take my word for it 15:47:19 <mkolesni> is sfc even tested by tempest though? 15:47:41 <yamahata> I guess no. I guess no one seriously has tested sfc. 15:48:04 <mkolesni> so for gate maybe its enough to use the netvirt karaf 15:48:20 <yamahata> Probably unit test for sfc will be kept. tempest tests for sfc won't be enabled. 15:48:25 <mkolesni> i can rebase that patch if you want to see whats up 15:48:42 <yamahata> I'd love to see the result. 15:48:50 <mkolesni> luckily unit test doesnt care about what distribution we use :) 15:49:47 <yamahata> ODL nitrogen cycle is short. so we should know issue early. 15:50:36 <mkolesni> nitrogen would be targeted by queens though right? 15:50:57 <mkolesni> obviously we need to know asap but im asking regarding the "optimal versions" 15:51:22 <yamahata> In that sense, yeah queens + nitrogen, pike + carbon. 15:51:39 <mkolesni> btw with that experimental patch obviously old netvirt job fails cause it's not in the distribution even in boron :) 15:52:01 <yamahata> Also netvirt folks have started similar discussion. 15:52:24 <mkolesni> again its hard to debug cause if the gate times out then no logs are collected 15:52:47 <yamahata> Probably we'd like to disable some tests with floating ip so that we can have logs. 15:52:49 <mkolesni> also something thats been bothering me but i dont know how to solve is that these damn logs are in html 15:53:14 <yamahata> https://review.openstack.org/#/c/486177/ 15:53:19 <mkolesni> and thats doubling their size making reading them tougher 15:53:22 <yamahata> there is something wrong with the patch. 15:53:54 <mkolesni> with what patch? 15:54:13 <yamahata> to disable some fip tests. 15:54:48 <mkolesni> ok i didnt see that i have some other ones to reduce the load so that logs do get collected and thats what i been using to debug the gate 15:54:48 <yamahata> we have 6mins left. 15:55:10 <manjeets> yamahata, mkolesni I switched grenade job to new netvirt, v2 driver, I still see it fails on floating Ip access tests 15:55:18 <yamahata> do we have anything else? 15:55:42 <mkolesni> so just to make sure, if FIP fails you should see some errors about GARP in karaf.log 15:55:50 <mkolesni> manjeets, please check if thats the case ^ 15:55:56 <mkolesni> if so its the same as in tempest 15:56:28 <mkolesni> i got nothing else basically we should switch to new netvirt job asap and keep old job as non voting for reference 15:56:48 <mkolesni> problem right now they both seem to be failing about half of the time 15:56:54 <mkolesni> so its hard to say whats worse 15:56:59 <mkolesni> but its a true nightmare 15:57:17 <mkolesni> also on the gate queue it takes 3 hours till it fails :/ 15:57:40 <manjeets> timeouts are very common these days 15:57:43 <mkolesni> maybe we should consider slimmer tempest on the gate itself 15:57:53 <mkolesni> and keep the heavy tests only on the check queue 15:58:20 <yamahata> parallel execution is one way, and neutron did. but it's too early for us.... 15:58:33 <yamahata> anyway anything else to discuss/compalin? 15:58:41 <mkolesni> hmm yeahs thats a whole other discussion :) 15:58:50 <yamahata> #topic open mike 15:58:52 <mkolesni> no im done, stick a fork in me :) 15:59:19 <mkolesni> ok thanks guys 15:59:24 <yamahata> thank you everyone 15:59:25 <mkolesni> have a good day/night 15:59:28 <manjeets> thank you 15:59:31 <yamahata> #topic cookies 15:59:31 <mkolesni> bye :) 15:59:37 <yamahata> #endmeeting