15:00:35 #startmeeting neutron_northbound 15:00:35 Meeting started Fri Aug 14 15:00:35 2015 UTC. The chair is regXboi. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:35 The meeting name has been set to 'neutron_northbound' 15:00:37 #info flaviof 15:00:43 #chair flaviof, edwarnicke 15:00:43 Current chairs: edwarnicke flaviof regXboi 15:00:52 #topic roll call and agenda bashing 15:00:54 #info regXboi 15:01:00 #info edwarnicke 15:01:10 #info mestery in lurk mode 15:01:21 #link https://wiki.opendaylight.org/view/NeutronNorthbound:Meetings agenda in its usual place 15:01:38 regXboi: mestery: maybe we can also talk about stable/kilo for networking-odl issie 15:01:38 #info regXboi left the agenda light because we need more discussion time on itema 15:01:39 issue 15:02:03 flaviof: there is lots of time for that :) - it's all open mike today 15:02:09 ack 15:02:12 #info yamahata 15:02:16 #topic action items from last meeting 15:02:31 #info regXboi to consult with GBP folks about bug 3968 and why they are calling isXXX() methods directly 15:03:01 I actually did talk to alagalah yesterday and this defect isn't coming from GBP - so I'm now planning on rejecting it as theoretical 15:03:11 and not from a real code example 15:03:26 #info regXboi has consulted and is not convinced bug 3968 is real 15:03:36 #info regXboi, edwarnicke, flaviof to find a document contact vict.... volunteer 15:03:40 still looking :) 15:03:49 step right up folks - your chance to help out :) 15:04:03 yeah - we'll carry that for another week... 15:04:10 #action regXboi, edwarnicke, flaviof to find a document contact vict.... volunteer 15:04:24 flaviof: Ack 15:04:24 #info regXboi to kick off threads on networking-ODL 15:04:33 yeah I was late on this, but they are out there: 15:04:43 #link https://lists.opendaylight.org/pipermail/neutron-dev/2015-August/000275.html plugin or ML2 15:04:53 #link https://lists.opendaylight.org/pipermail/neutron-dev/2015-August/000276.html where should it live 15:05:02 #link https://lists.opendaylight.org/pipermail/neutron-dev/2015-August/000277.html two projects or one 15:05:10 and that's the action item list 15:05:12 so.... 15:05:21 #topic Beryllium/Open Mike 15:05:22 nice work regXboi 15:05:35 flaviof: you have the floor :) 15:05:56 #info flaviof to talk about stable/kilo networking-odl issue 15:06:06 * regXboi nudges flaviof to wake up 15:06:08 regXboi: ack, thsnk. 15:06:37 #link https://lists.opendaylight.org/pipermail/neutron-dev/2015-August/000284.html stable/kilo issue 15:07:22 basically, when we created stable/kilo for networking/odl we introduced a mis-alignment in the versions of pbr 15:08:37 flaviof: We need to get the proposal bot syncing accurate stable/kilo changes into networking-odl 15:08:38 Is that right? 15:09:06 mestery: yeah. I do not know what it means, but yes, that sounds right :) 15:09:11 :) 15:09:25 mestery: Side note: I think you just ticked up one more on the plus side for leaving networking-odl at OS ;) 15:09:37 edwarnicke: +1 15:10:40 #info problem is we need the proposal bot syncing accurate stable/kilo changes into networking-odl 15:10:49 that is all on this topic, really; just need to get done... maybe mestery or some competent o/s folk can sign up for that? 15:10:58 #info this is a +1 for leaving networking-odl at openstack 15:11:36 * flaviof is done... next topic? 15:12:11 well... I'd rather keep the other discussions to the ML threads 15:12:21 so I guess it is time to open this pandora's box 15:12:27 #info changing the wire protocol 15:12:40 mlemay: ping ^^ 15:13:13 #link https://wiki.opendaylight.org/images/8/8d/Experiences_with_Neutron.pdf Dec's summit talk 15:13:56 so folks, I'm not really convinced this isn't going to cause more trouble than it's work 15:13:59 er worth 15:14:16 regXboi: I think there are two things here 15:14:23 regXboi: 1) Figuring out HA/Scalability 15:14:33 2) The possibility that that might entail wire protocol changes 15:14:41 I think we *have* to do #1 15:14:46 how about we talk about each observation 15:14:48 ? 15:14:55 edwarnicke: do we want to step back and walk through all of the observations and how to deal with it? 15:14:57 flaviof: +1 15:14:58 otherwise it is too easy to get off topic. 15:15:00 I think #1 *might* imply #2... but could be convinced otherwise 15:15:23 flaviof: regXboi I'm fine with that if thats how you'd like to handle it :) 15:15:29 I think I do 15:15:38 try and make some progress in the 45 minutes 15:15:42 yeah, thanks edwarnicke ! 15:16:21 #info current openstack ml2 odl driver is single threaded and blocks neutron server (post-commit) until ODL responds 15:16:34 um... mestery - do you know of an ml2 driver that is multi-threaded? 15:16:47 I can't think of one, and I'm not sure why I really care 15:16:49 Now ML2 folks are aware of the issue and thinking of asynchronous one 15:16:49 so... on that one; armax, mestery may know the intricate details that explain that? 15:16:51 lol 15:16:54 there is PoC 15:17:12 regXboi: flaviof mestery Are plugins traditionally multi-threaded? 15:17:17 and they are also aware of synchronizing issue under discussion. 15:17:23 regXboi: well that’s not entirely accurate 15:17:24 The driver itself needs a re-write at this point. 15:17:35 The problem is due to how it handles re-sync, we had to make it single-threaded 15:17:36 mestery, yamahata, armax: is this something that would go away if odl used plugin instead of ml2? 15:17:39 neutron uses greenthreads and they are non-blobkcing 15:17:42 And even worse: IT uses a local file lock 15:17:43 armax: which of my many statements is not accurate :) 15:17:50 armax: ++ 15:18:06 the fact that the driver is single threaded blocking 15:18:08 regXboi: ^ 15:18:24 Even with monolithic plugin, the same problem needs to be solved. 15:18:25 armax: I'm quoting the observation 15:18:35 regXboi: and I am not blaming you 15:19:09 help me out here - is this just about scaling the management plane? 15:19:10 sorry if it is a stupid question, why need a multi-thread driver? most of ML2 is configuring network/subnets/... 15:19:15 armax: mestery How does neutron/OS multi-threadedness work around the GIL? 15:19:28 yamahata: ack, thanks. my rush to move away from ml2 is that this is a 'greener grass on the other side illusion'. That info helps. 15:19:45 * edwarnicke admits its been a while, but still remembers multi-threadedness in python being radically hamstrung by the GIL... 15:20:13 greenthreads are different from regular OS threads 15:21:06 I'm not sure at this point that we need to do anything other than include this in what we were already contemplating in the rewrite 15:21:15 is that a fair conclusion??? ^^^^^^ 15:21:23 regXboi: + 15:21:55 armax: Understood about greenthreads... and looking deeper it seems that the GIL isn't so relavent to them, because there is never more than a single greenthread running at once... did I get that right? 15:22:04 yes 15:22:08 regXboi: ++ 15:22:16 I'm going to write in an agree here folks :) 15:22:25 regXboi: Please do before we go off the rails 15:22:32 regXboi ++ 15:22:37 edwarnicke: thread execturion can still interleave though and that’s where teh interference comes into place 15:22:46 #agree this observation should be bundled as a consideration/requirement for the rewrite ML discussion 15:22:50 regXboi + 15:23:05 regXboi: ideally we’d know what we’re rewriting into 15:23:12 regXboi: we don’t want to swap mess with mess 15:23:14 #info second observation: db synchronization issue 15:23:15 armax: Yep... if you are waiting on IO, you should yield so the next guy gets chance :) 15:23:22 regXboi: but I guess that I am stating the obvious 15:23:52 armax: My experience has been that the best way to not swap mess with mess is small incremental steps of rewrite with continuous measuring of the resulting desired improvement 15:23:56 #info regXboi thinks this is a bit of a bogus observation (at least as presented in the slide) 15:24:06 armax: no you are not, buddy. I think it is a very valid point. 15:24:34 armax: yes - I'm saying we need that to be part of that discussion or we may a boot 15:24:41 er boat (or whatever) 15:24:42 regXboi: Say more... because I see it as being a big deal, and would like to understand why you don't 15:24:43 ? 15:25:04 so... when I look at this slide - I see multiple independent neutrons 15:25:14 therefore, they *don't* know about each other 15:25:27 and so I can't do the steps that come to the ODL controller 15:25:42 unless (and here I give my evil smile) 15:26:01 you are implying that UUIDs from the upstream neutrons can collide (gotcha :) ) 15:26:28 regXboi: not sure if they are truly " independent neutrons" 15:26:41 regXboi: I think he is talking about different control nodes for the same logical neutron 15:26:48 regXboi: But sharing the same underlying DB 15:26:53 edwarncike, flaviof: unfortunately, they are drawn that way 15:26:53 flaviof: they are independent threads 15:26:57 regXboi: can't they be O/S HA ? 15:27:17 edwarnicke, flaviof: I consider this a poor diagram because it leaves that to the reader 15:27:20 they are not aware of each other, but they work on the state from a persistent store (like DB) etc 15:27:35 Is Wojciech here? 15:27:41 don't see him 15:27:46 k 15:27:49 I'm willing to table this one for a week and get an update 15:27:59 I suspect you are correct, but I'm standing on my interpretation for now :) 15:28:05 DB is sahred. e.g. UUID uniqueness is guaranteed by db unique contstraint. 15:28:38 #agreed tabling this observation for a week to get clarity of assumptions 15:28:42 er 15:28:43 #unto 15:28:45 #undo 15:28:45 Removing item from minutes: 15:28:54 #agreed tabling this observation for a week to get clarity of assumptions 15:28:58 ok, so that did work 15:29:07 yamahata: understood - I just want Dec to confirm that 15:29:36 another question I have is on the fact that these slides were done using icehouse. that predates networking-odl, so keep that in mind.... 15:29:44 #info observation: current ML2 ODL driver sync mechanism only runs when triggered by a new Neutron event 15:30:07 so, this one looks like more rewrite requirements - doesn't it? 15:30:51 although.... I also admit that the more I look at it, the more I don't get it 15:30:58 regXboi, this youtube link of Wojciech's presentation: https://www.youtube.com/watch?v=CL70MNgFeQs 15:31:24 #link https://www.youtube.com/watch?v=CL70MNgFeQs Dec's presentation on youtube 15:31:26 yapeng: thx 15:31:28 regXboi: agree; this is related to Neutron <-- ODL events, which are pretty much non-existent in current implementaion 15:31:40 regXboi: when ML2 driver goes out of sync it sets out_of_sync flag, but it is not able to trigger resync till next event comes in. 15:31:40 yapeng: thanks, 15:31:48 regXboi: I think #3 is basically that each control node tries for each event 15:31:55 So if something goes wrong, you are getting repeat failures 15:32:10 But I think it also opens us up to bad out of order issues 15:32:15 well... I'll still admit - I don't get this at all 15:32:30 because for this to even happen, things have to go horribly wrong in the first place 15:33:05 Right. basically db transaction then sending request to ODL. 15:33:07 and so I'd rather fix what causes things to go horribly wrong 15:33:10 I'm stuck too... there is no T3 ;) 15:33:28 and then come back to this 15:33:52 the order of db transaction and the order of request to ODL can be reordered. 15:33:54 btw edwarnicke: the "things go horribly wrong" is very close/overlays the HA 15:33:56 even with single neutron server. 15:34:03 regXboi: I don't quite see it that way (which may not mean that I'm wrong)... 15:34:20 I've seen out_of_sync happen with single neutron server also 15:34:36 regXboi: I see it as sort of this issue. A db event happens, you have n control nodes. You get n calls to ODL. Nothing guarentees they are in the same order, and it means n times the load 15:34:49 vthapar: Please say more 15:34:52 vthapar: Cool. I haven't been able to see it with single neutron. 15:34:55 yamahata: if neutron is reordering events so that the events to the db are different that the events southbound, that's a horrible mistake 15:34:58 yamahata: ic, so it is 'okay' to expect subnet create before net crate, as an example? 15:35:20 no it shouldn't be 15:35:23 I'm sorry 15:35:26 seen the issue when spawning VMs with multiple NICs and then clearing them out. I'll hve to get more details from our test team. 15:35:47 vthapar: that's a bit different then subnet coming before network 15:35:52 flaviof: In theory, yes. But I think it should be cared by neutron side somehow. 15:35:56 so let's be careful about reording 15:36:03 er reordering 15:36:04 e.g. backgroundly syncing. 15:36:05 vthapar: when you say single neutron server, what was your api-worker configuration 15:36:31 folks: I'm going to budge until 45 past the hour for this - we have more observations to cover 15:36:37 vthapar : even a single neutron-server can be run with n-workers which would enable concurrent handling of REST requests 15:36:44 budget that is - unless we come to a conclusion sooner :) 15:36:48 viveknarasimhan: I believe it was default in my setup, where I saw it not so frequently. can get details from test folks as they used to run into it frequently enough. 15:36:50 regXboi: In ideal world, yes. But the difficult cames from the fact that we have two state to be synced. 15:36:56 vthapar: better to try with single neutron -server single api-worker and see if issue reproduces 15:37:05 yamahata: ack. I think that explains all the changes in master regXboi has been doing, so nn code makes no sanity checks on things like that (i.e. net being created before subnet). 15:37:19 yamahata: I'm sorry, but that answer doesn't quite hold water 15:37:25 And ML2 team is aware of this issue. 15:37:41 telling the DB order A and telling southbound order B is *evil* 15:38:06 has been discussed on the topic. there are (unfinished?) PoC to some extent 15:38:23 actually, I'd like to table this until we see what comes out of the ML2 team 15:38:30 because that's going to be what we have to work with 15:38:42 unless somebody wants to volunteer to track it :) 15:38:54 sorry if I sound like a broken record folks: is this a ml2 only issue? is this something a plugin would have to 'worry about'? 15:39:13 flaviof: IIRC, the plugin takes this responsibility on itself 15:39:26 flaviof: so reordering of southbound requests is up to the plugin 15:39:27 regXboi: ack. thanks. 15:39:49 Yes, syncing status is common problem among controllers. 15:39:57 flaviof: in my mind this is a +1 argument for plugin, pending where the ML2 team comes out 15:40:08 * flaviof recalls quote from spiderman: more power, more responsabilities 15:40:34 can we agree on tabling this/making it part of the rewrite? 15:40:44 even with monolithic plugin, there remains same problem. 15:40:47 regXboi: ++ 15:40:55 yamahata: regXboi I guess my more simple questions would be these: 15:41:03 regXboi: but I have a feeling we are moving problems and solving none.... 15:41:04 Can events arrive out of order at an ML2 driver? 15:41:12 Can events arrive out of order at a plugin? 15:41:31 edwarnicke: define events - db events? some other events? 15:41:32 One extreme solution by opencontrail is not to use neutron db. always using controller side. 15:41:41 It *sounds* like they can arrive out of order at an ML2 driver... but I couldn't tell if that was true of a plugin 15:41:58 yamahata: Interesting, please say more 15:42:01 afaik, db events do not arrive out of order to an ML2 driver 15:42:27 uh... I'm really not interested in following the opencontrail route 15:42:38 I mean really really really 15:42:47 as in -2 really 15:43:07 regXboi: If a change comes in to the db (for many things we are listening for db changes) are we gauranteed to hear it in the order it was applied? 15:43:14 opencontrail plugin just passes through all the request to controller. It means they re-implement full neutron functionality in the controller. 15:43:18 regXboi: while I'm not interested in contrial; I think understanding a model that is proven to work well may be helpful 15:43:26 at least if we are re-writting 15:43:33 which we are, right? 15:43:42 there is no db operation (or no meaningfull operation) in neutron side. 15:43:43 regXboi: I'm not necessarily disagreeing with your conclusion... but I am keeping a more open mind in the interum... sometimes an ugly solution is better than none at all 15:43:44 flaviof: note what yamahata said 15:43:53 "re-implement full neutron functionality" 15:43:58 just checked some older mails from test team, exceptions in rest calls to ODL can cause sync issues. 15:43:59 that's a BIG statement 15:44:21 edwarnicke: I have the scars from this - I'm not interested 15:44:23 any updates/deletes are "lost" on resync and cause further problems. 15:44:50 folks - we are coming up on the budgeted time 15:45:04 do we want to keep going and leave the other observations for next week? 15:45:14 #info lots of discussion about observation 3 - see the logs 15:45:23 regXboi: Other than taking responsibility for storing the data (which we have to do anyway), what additional things does yamahata 's suggestion imply we need to do? 15:45:38 vthapar: ack. that is inline with Observation#3 in Dec slides, right? 15:46:00 edwarnicke: all of the checking/tracking/state management that I've been taking out 15:46:10 regXboi: OK 15:46:11 falviof: yes, though I'd say #2 and #3 both. 15:46:15 Another possibility is to sync intelligently. e.g. adding sequencenumber to each request. and sync backgroundly 15:46:31 vthapar: ack. 15:46:33 yamahata: Please say more :) I've heard this suggestion before as well :) 15:46:37 honestly, vthapar's comments say the current sync mechanism needs rewriting 15:46:59 sequence number can be added to each status. 15:47:05 yamahata: one more gap in resync logic is it doesn't track failed events and way it does only adds are taken care of. 15:47:31 so that we can detect reorder or event loss. 15:47:37 any deletes that fail stay in ODL but gone from Neutron DB. 15:48:06 whoa... vthapar 15:48:07 tracking how behind ODL is from neutron. 15:48:56 actually - I'd argue that the ML2 connection to ODL being post-commit is a bit ... well ... broken 15:49:07 regXboi: agree. 15:49:17 but that's also for the rewrite to consider 15:49:25 regXboi: ++ 15:49:31 regXoi: ++ 15:49:32 I'd argue that we ought to take a page out of our own book 15:49:37 regXboi: ++ 15:49:39 and go the pre-commit/post-commit route 15:49:49 regXboi: when we say 'rewrite', can we be a little more specific? 15:49:49 I*Aware anyone ???? 15:50:09 flaviof: I'm saying that currently we register for post-commit events 15:50:18 rewrite could be: fix sync .... to use plugin... 15:50:19 I think we have to register for pre-commit events as well 15:50:29 and use the pre-commit event to veto a change 15:50:36 and the post-commit event to make the change 15:50:37 regXboi: ack. make sense. 15:50:46 like I said - I*aware again 15:51:08 have we run the gamut on this one for now? 15:51:19 I'm not saying we've closed it 15:51:32 with I*aware going away, we are taking a step in the wrong direction? is that what you mean regXboi ? 15:51:44 flaviof: no - in ODL it should go away 15:51:59 between OS and ODL, the pattern makes more sense 15:52:11 that's what I'm saying 15:52:27 regXboi: ok. thanks. 15:52:47 #info observation 4: no switch type independent configuration for mapping of neutron physical networks to physical interfaces 15:52:57 edwarnicke: remind me again why we care about this one? 15:53:10 is this the responsibility of the providers? 15:53:15 er isn't this .... 15:53:59 regXboi: You can make a case for that being the providers responsibility 15:54:13 regXboi: But the net-net is its a general problem that everybody has to have some kind of hack around 15:54:35 regXboi: I think this is Dec wishing that networking-odl / neutron could be made to help more by giving hints on the mapping of neutron-port to OVS ports 15:55:03 edwarnicke, flaviof: both of those imply NN knowing more about the physical topology that I'm personally comfortable with 15:55:28 * regXboi doesn't want to get into the inventory business - other projects are doing that 15:55:32 it still needs to live in a world where there is no OVS, or no OVS ports for every neutron port object. 15:56:30 regXboi: understtod. at some point, someting needs to map neutron port to an object that makes sense to the netvirt. 15:56:57 flaviof: yes, but I'm not yet convinced that is *us* 15:57:15 flaviof: I think part of this is a deeper neutron thing that seems to be (right or wrong) vexing a bunch of folks... 15:57:22 whether that mapping lives in nn or another project is the real question. Sounds like nn is not it then. ;) 15:57:39 flaviof: I said, I"m not yet convinced 15:58:12 flaviof: I think the basic argument comes down to the fact that the mapping is going to be communicated different ways by different underlying infra 15:58:18 edwarncike - can you start a ML thread on this one? 15:58:32 since (IIRC) you have a deep understanding of this 15:58:54 flaviof: That said... having a common place to stash the info has some appeal... I'm really not decided in my own mind just yet 15:59:06 regXboi: Not deep, no 15:59:10 #action edwarnicke to launch neutron-dev ML thread on observation #4 from Dec's talk 15:59:21 edwarnicke: deeper than mine 15:59:25 regXboi: But it might be good to solicit a deeper conversation on neutron-dev from Woj 15:59:38 edwarnicke: ++ that's the reason for the ML thread 15:59:42 so that Woj can chime in 16:00:02 ok folks we'll table the others for next week 16:00:09 #topic cookies 16:00:12 #undo 16:00:12 Removing item from minutes: 16:00:25 #action regXboi to update ML thread on rewrite with notes from this meeting 16:00:28 #topic cookies 16:00:31 #endmeeting