15:01:35 #startmeeting neutron_northbound 15:01:35 Meeting started Wed May 25 15:01:35 2016 UTC. The chair is mkolesni_. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:01:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:35 The meeting name has been set to 'neutron_northbound' 15:01:48 #link https://wiki.opendaylight.org/view/NeutronNorthbound:Meetings agenda page 15:02:00 #topic agenda bashing and roll call 15:02:13 #info mkolesni 15:02:51 do we have any suggestion for meeting agenda? 15:04:03 i don't have anything in particular, except discussing partial sync if anyone is interested 15:04:20 #chair asomya 15:04:43 #topic Announcements 15:04:54 #topic Announcements 15:04:58 #chair asomya 15:04:58 Current chairs: asomya mkolesni_ 15:05:09 no announcements 15:05:16 nope, none from me 15:05:38 mkolesni_: Discuss the partial sync here or in bluejeans? 15:06:03 asomya: where ever is more comfortable :) 15:06:09 asomya: do you have a preference? 15:06:30 wider audience here, we can go over the high level overview and discuss the finer points in bluejeans 15:06:38 sure 15:06:53 ok lets move on and get to it later 15:07:01 #topic Open Patch sets needing review/merging 15:07:28 does anyone has patches in need of reviews other than those Isaku sent in the email? 15:08:40 asomya: do you want to discuss the spec process or should we wait for Isaku? 15:09:06 wait for isaku, we haven't finalized the plan yet.. The rough idea is to have a lightweight spec for major features 15:09:29 ok 15:09:48 since no new bugs/patches to review today, moving on to the partial sync 15:10:16 #topic partial sync discussion 15:10:30 ok so we had some brainstorming today 15:10:49 basically we already discussed this a while ago and had some ideas 15:11:14 1. add a hash on both sides (ODL & Neutron) to lighten the payload 15:11:27 2. add a version column on both sides 15:11:56 the two approaches are valid, though the has is a bit more complicated overall but can pay off in larger scale 15:12:16 asomya: any other option? 15:13:04 So we will have a hash per resource in neutron? 15:13:20 and a meta-hash of a collection of resource types, right? 15:13:26 yeah 15:13:39 it can be computed on demand or stored 15:13:56 this is more implementation details, what really matters is that an API change is necessary 15:14:16 Agreed 15:14:42 this is also true if we add a version for each resource 15:14:45 correct? 15:15:02 a version like a sequence number? 15:15:15 yes 15:15:48 correct, that'll work.. plus would make the hash comuptation a bit less expensive 15:16:19 who has the action to work on this? 15:16:47 currently nobody 15:17:22 ok, i don't mind signing up for it unless you want to do it :) 15:17:48 actually we thought api is a bit expensive to add 15:18:02 also lets think of reasons why we want partial sync 15:18:19 1. resource went through journal but failed 15:18:29 2. someone messed up something on ODL side 15:18:45 thats what i can think of, any more? 15:19:27 yeah that's mostly it. 15:19:50 ok so the partial sync we discussed so far can solve them both 15:20:10 agreed 15:20:18 but it also comes with a pricetag that we need to have this extra info 15:20:39 which is noty too awful except it means extra API that needs to be managed for this porpose alone 15:21:02 so what we thought is, maybe we can start smaller and simpler 15:21:33 our idea was instead of doing a partial sync that runs periodically and tries to compare the both DBs all the time 15:21:43 lets use the tools that we already have 15:21:53 what do i mean? i'll explain 15:22:03 we have the journal table with entries 15:22:30 if an entry is completed that means "whoopee, everything works" (up to some extent) 15:22:52 if an entry is failed that means "damn, something doesnt work" 15:22:58 so something can be: 15:23:15 1. some error on ODL side that might be fixed if we retry later 15:23:25 2. some misalignment 15:23:35 for example, i tried to update a port 15:23:52 on Neutron everything is peachy and the update gets saved to the DB 15:24:15 on ODL someone deleted the network for the port, so i'll always have an error 15:24:46 so this is sind of misalignment either caused by some bug, or by human error 15:24:55 what we can do is "journal recovery" 15:25:11 this means that in the maintenance thread we add a new operation 15:25:28 this operation goes over failed rows and tried to recover them 15:25:43 so basically it will start probing ODL for data 15:26:06 so for the example i gave the operation checks in ODL if the port exists 15:26:32 if it does, retry the update (with the new port data) 15:26:52 if it doesnt, check if the subnet & network exist 15:26:59 if they do, create the port 15:27:06 if they dont, create them 15:27:49 this is of course just a small flow for port but it can be generified to any resource type we need to support 15:28:12 this way we dont always go looking for trouble, but we react to trouble when we find it 15:28:19 makes sense? 15:28:22 mkolesni_: Retry is already in the driver for journal rows, this is for the maintenance run to go and rerun failed rows? 15:28:30 yeah 15:28:42 journal rows can only be retried a certain amount 15:28:46 how many times do we do this until we mark a resource as 'unsyncable' ? 15:28:53 or forever? 15:29:11 im not sure we can reach this state since we will sync the reource tree 15:29:29 but we can set some expiration time 15:29:52 ok, as an interim measure i'm ok with this but in the long run we need to mark the resource failed in Neutron itself 15:30:06 obviously the details need to be hashed out on a spec since this is not very simple 15:30:33 well, we can mark it failed on Neutron only if it's possible 15:30:50 currently not quite so as we discussed earlier, only Port has a status 15:31:29 but we can set some sane value like "retry for 1 day" or something 15:31:32 yeah, that will take a while 15:32:11 and if it doesnt succeed then either delete it or move it to some other state or keep it in failed forever 15:32:26 but this can be decided later i suppose 15:32:36 what do you think of the general idea? 15:33:02 seems ok for now 15:33:29 as you said, details need to be hashed out 15:33:34 i think this will be beneficial in the short-middle term 15:33:55 since a full blown partial sync as we discussed earlier is: 15:34:05 * requiring an API on ODL 15:34:19 * has some overhead in complexity 15:34:33 from a use case perspective, i don't see the user waiting for a day for his/her VM's to come up.. even if neutron says the port is up and it's down.. most likely the user is going to delete and recreate it 15:34:40 * probably wasting CPU/bandwidth 15:35:40 probably though the maintenance thread runs by default every 5 minutes, so in theory if everything is fine except his port it will be handled rather quickly 15:36:12 ok worth a shot i guess 15:36:44 the cloud admin can tweak this value obviously if it makes sense to him 15:37:05 do you want to discuss this further on bluejeans? 15:37:19 sure 15:37:47 ok then lets wrap this up here 15:37:50 paste link for bluejeans 15:38:14 #action mkolesni draft a spec for journal recovery 15:38:41 #link https://bluejeans.com/223556372 15:39:16 #endmeeting 15:39:19 #endmeeting