15:01:35 <mkolesni_> #startmeeting neutron_northbound 15:01:35 <odl_meetbot> Meeting started Wed May 25 15:01:35 2016 UTC. The chair is mkolesni_. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:01:35 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:35 <odl_meetbot> The meeting name has been set to 'neutron_northbound' 15:01:48 <mkolesni_> #link https://wiki.opendaylight.org/view/NeutronNorthbound:Meetings agenda page 15:02:00 <mkolesni_> #topic agenda bashing and roll call 15:02:13 <mkolesni> #info mkolesni 15:02:51 <mkolesni> do we have any suggestion for meeting agenda? 15:04:03 <mkolesni> i don't have anything in particular, except discussing partial sync if anyone is interested 15:04:20 <mkolesni> #chair asomya 15:04:43 <mkolesni> #topic Announcements 15:04:54 <mkolesni_> #topic Announcements 15:04:58 <mkolesni_> #chair asomya 15:04:58 <odl_meetbot> Current chairs: asomya mkolesni_ 15:05:09 <mkolesni_> no announcements 15:05:16 <asomya> nope, none from me 15:05:38 <asomya> mkolesni_: Discuss the partial sync here or in bluejeans? 15:06:03 <mkolesni_> asomya: where ever is more comfortable :) 15:06:09 <mkolesni_> asomya: do you have a preference? 15:06:30 <asomya> wider audience here, we can go over the high level overview and discuss the finer points in bluejeans 15:06:38 <mkolesni_> sure 15:06:53 <mkolesni_> ok lets move on and get to it later 15:07:01 <mkolesni_> #topic Open Patch sets needing review/merging 15:07:28 <mkolesni_> does anyone has patches in need of reviews other than those Isaku sent in the email? 15:08:40 <mkolesni_> asomya: do you want to discuss the spec process or should we wait for Isaku? 15:09:06 <asomya> wait for isaku, we haven't finalized the plan yet.. The rough idea is to have a lightweight spec for major features 15:09:29 <mkolesni_> ok 15:09:48 <mkolesni_> since no new bugs/patches to review today, moving on to the partial sync 15:10:16 <mkolesni_> #topic partial sync discussion 15:10:30 <mkolesni_> ok so we had some brainstorming today 15:10:49 <mkolesni_> basically we already discussed this a while ago and had some ideas 15:11:14 <mkolesni_> 1. add a hash on both sides (ODL & Neutron) to lighten the payload 15:11:27 <mkolesni_> 2. add a version column on both sides 15:11:56 <mkolesni_> the two approaches are valid, though the has is a bit more complicated overall but can pay off in larger scale 15:12:16 <mkolesni_> asomya: any other option? 15:13:04 <asomya> So we will have a hash per resource in neutron? 15:13:20 <asomya> and a meta-hash of a collection of resource types, right? 15:13:26 <mkolesni_> yeah 15:13:39 <mkolesni_> it can be computed on demand or stored 15:13:56 <mkolesni_> this is more implementation details, what really matters is that an API change is necessary 15:14:16 <asomya> Agreed 15:14:42 <mkolesni_> this is also true if we add a version for each resource 15:14:45 <mkolesni_> correct? 15:15:02 <asomya> a version like a sequence number? 15:15:15 <mkolesni_> yes 15:15:48 <asomya> correct, that'll work.. plus would make the hash comuptation a bit less expensive 15:16:19 <asomya> who has the action to work on this? 15:16:47 <mkolesni_> currently nobody 15:17:22 <asomya> ok, i don't mind signing up for it unless you want to do it :) 15:17:48 <mkolesni_> actually we thought api is a bit expensive to add 15:18:02 <mkolesni_> also lets think of reasons why we want partial sync 15:18:19 <mkolesni_> 1. resource went through journal but failed 15:18:29 <mkolesni_> 2. someone messed up something on ODL side 15:18:45 <mkolesni_> thats what i can think of, any more? 15:19:27 <asomya> yeah that's mostly it. 15:19:50 <mkolesni_> ok so the partial sync we discussed so far can solve them both 15:20:10 <asomya> agreed 15:20:18 <mkolesni_> but it also comes with a pricetag that we need to have this extra info 15:20:39 <mkolesni_> which is noty too awful except it means extra API that needs to be managed for this porpose alone 15:21:02 <mkolesni_> so what we thought is, maybe we can start smaller and simpler 15:21:33 <mkolesni_> our idea was instead of doing a partial sync that runs periodically and tries to compare the both DBs all the time 15:21:43 <mkolesni_> lets use the tools that we already have 15:21:53 <mkolesni_> what do i mean? i'll explain 15:22:03 <mkolesni_> we have the journal table with entries 15:22:30 <mkolesni_> if an entry is completed that means "whoopee, everything works" (up to some extent) 15:22:52 <mkolesni_> if an entry is failed that means "damn, something doesnt work" 15:22:58 <mkolesni_> so something can be: 15:23:15 <mkolesni_> 1. some error on ODL side that might be fixed if we retry later 15:23:25 <mkolesni_> 2. some misalignment 15:23:35 <mkolesni_> for example, i tried to update a port 15:23:52 <mkolesni_> on Neutron everything is peachy and the update gets saved to the DB 15:24:15 <mkolesni_> on ODL someone deleted the network for the port, so i'll always have an error 15:24:46 <mkolesni_> so this is sind of misalignment either caused by some bug, or by human error 15:24:55 <mkolesni_> what we can do is "journal recovery" 15:25:11 <mkolesni_> this means that in the maintenance thread we add a new operation 15:25:28 <mkolesni_> this operation goes over failed rows and tried to recover them 15:25:43 <mkolesni_> so basically it will start probing ODL for data 15:26:06 <mkolesni_> so for the example i gave the operation checks in ODL if the port exists 15:26:32 <mkolesni_> if it does, retry the update (with the new port data) 15:26:52 <mkolesni_> if it doesnt, check if the subnet & network exist 15:26:59 <mkolesni_> if they do, create the port 15:27:06 <mkolesni_> if they dont, create them 15:27:49 <mkolesni_> this is of course just a small flow for port but it can be generified to any resource type we need to support 15:28:12 <mkolesni_> this way we dont always go looking for trouble, but we react to trouble when we find it 15:28:19 <mkolesni_> makes sense? 15:28:22 <asomya> mkolesni_: Retry is already in the driver for journal rows, this is for the maintenance run to go and rerun failed rows? 15:28:30 <mkolesni_> yeah 15:28:42 <mkolesni_> journal rows can only be retried a certain amount 15:28:46 <asomya> how many times do we do this until we mark a resource as 'unsyncable' ? 15:28:53 <asomya> or forever? 15:29:11 <mkolesni_> im not sure we can reach this state since we will sync the reource tree 15:29:29 <mkolesni_> but we can set some expiration time 15:29:52 <asomya> ok, as an interim measure i'm ok with this but in the long run we need to mark the resource failed in Neutron itself 15:30:06 <mkolesni_> obviously the details need to be hashed out on a spec since this is not very simple 15:30:33 <mkolesni_> well, we can mark it failed on Neutron only if it's possible 15:30:50 <mkolesni_> currently not quite so as we discussed earlier, only Port has a status 15:31:29 <mkolesni_> but we can set some sane value like "retry for 1 day" or something 15:31:32 <asomya> yeah, that will take a while 15:32:11 <mkolesni_> and if it doesnt succeed then either delete it or move it to some other state or keep it in failed forever 15:32:26 <mkolesni_> but this can be decided later i suppose 15:32:36 <mkolesni_> what do you think of the general idea? 15:33:02 <asomya> seems ok for now 15:33:29 <asomya> as you said, details need to be hashed out 15:33:34 <mkolesni_> i think this will be beneficial in the short-middle term 15:33:55 <mkolesni_> since a full blown partial sync as we discussed earlier is: 15:34:05 <mkolesni_> * requiring an API on ODL 15:34:19 <mkolesni_> * has some overhead in complexity 15:34:33 <asomya> from a use case perspective, i don't see the user waiting for a day for his/her VM's to come up.. even if neutron says the port is up and it's down.. most likely the user is going to delete and recreate it 15:34:40 <mkolesni_> * probably wasting CPU/bandwidth 15:35:40 <mkolesni_> probably though the maintenance thread runs by default every 5 minutes, so in theory if everything is fine except his port it will be handled rather quickly 15:36:12 <asomya> ok worth a shot i guess 15:36:44 <mkolesni_> the cloud admin can tweak this value obviously if it makes sense to him 15:37:05 <mkolesni_> do you want to discuss this further on bluejeans? 15:37:19 <asomya> sure 15:37:47 <mkolesni_> ok then lets wrap this up here 15:37:50 <asomya> paste link for bluejeans 15:38:14 <mkolesni_> #action mkolesni draft a spec for journal recovery 15:38:41 <mkolesni_> #link https://bluejeans.com/223556372 15:39:16 <mkolesni_> #endmeeting 15:39:19 <mkolesni_> #endmeeting