15:01:35 <mkolesni_> #startmeeting neutron_northbound
15:01:35 <odl_meetbot> Meeting started Wed May 25 15:01:35 2016 UTC.  The chair is mkolesni_. Information about MeetBot at http://ci.openstack.org/meetbot.html.
15:01:35 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:35 <odl_meetbot> The meeting name has been set to 'neutron_northbound'
15:01:48 <mkolesni_> #link https://wiki.opendaylight.org/view/NeutronNorthbound:Meetings agenda page
15:02:00 <mkolesni_> #topic agenda bashing and roll call
15:02:13 <mkolesni> #info mkolesni
15:02:51 <mkolesni> do we have any suggestion for meeting agenda?
15:04:03 <mkolesni> i don't have anything in particular, except discussing partial sync if anyone is interested
15:04:20 <mkolesni> #chair asomya
15:04:43 <mkolesni> #topic Announcements
15:04:54 <mkolesni_> #topic Announcements
15:04:58 <mkolesni_> #chair asomya
15:04:58 <odl_meetbot> Current chairs: asomya mkolesni_
15:05:09 <mkolesni_> no announcements
15:05:16 <asomya> nope, none from me
15:05:38 <asomya> mkolesni_: Discuss the partial sync here or in bluejeans?
15:06:03 <mkolesni_> asomya: where ever is more comfortable :)
15:06:09 <mkolesni_> asomya: do you have a preference?
15:06:30 <asomya> wider audience here, we can go over the high level overview and discuss the finer points in bluejeans
15:06:38 <mkolesni_> sure
15:06:53 <mkolesni_> ok lets move on and get to it later
15:07:01 <mkolesni_> #topic  Open Patch sets needing review/merging
15:07:28 <mkolesni_> does anyone has patches in need of reviews other than those Isaku sent in the email?
15:08:40 <mkolesni_> asomya: do you want to discuss the spec process or should we wait for Isaku?
15:09:06 <asomya> wait for isaku, we haven't finalized the plan yet.. The rough idea is to have a lightweight spec for major features
15:09:29 <mkolesni_> ok
15:09:48 <mkolesni_> since no new bugs/patches to review today, moving on to the partial sync
15:10:16 <mkolesni_> #topic partial sync discussion
15:10:30 <mkolesni_> ok so we had some brainstorming today
15:10:49 <mkolesni_> basically we already discussed this a while ago and had some ideas
15:11:14 <mkolesni_> 1. add a hash on both sides (ODL & Neutron) to lighten the payload
15:11:27 <mkolesni_> 2. add a version column on both sides
15:11:56 <mkolesni_> the two approaches are valid, though the has is a bit more complicated overall but can pay off in larger scale
15:12:16 <mkolesni_> asomya: any other option?
15:13:04 <asomya> So we will have a hash per resource in neutron?
15:13:20 <asomya> and a meta-hash of a collection of resource types, right?
15:13:26 <mkolesni_> yeah
15:13:39 <mkolesni_> it can be computed on demand or stored
15:13:56 <mkolesni_> this is more implementation details, what really matters is that an API change is necessary
15:14:16 <asomya> Agreed
15:14:42 <mkolesni_> this is also true if we add a version for each resource
15:14:45 <mkolesni_> correct?
15:15:02 <asomya> a version like a sequence number?
15:15:15 <mkolesni_> yes
15:15:48 <asomya> correct, that'll work.. plus would make the hash comuptation a bit less expensive
15:16:19 <asomya> who has the action to work on this?
15:16:47 <mkolesni_> currently nobody
15:17:22 <asomya> ok, i don't mind signing up for it unless you want to do it :)
15:17:48 <mkolesni_> actually we thought api is a bit expensive to add
15:18:02 <mkolesni_> also lets think of reasons why we want partial sync
15:18:19 <mkolesni_> 1. resource went through journal but failed
15:18:29 <mkolesni_> 2. someone messed up something on ODL side
15:18:45 <mkolesni_> thats what i can think of, any more?
15:19:27 <asomya> yeah that's mostly it.
15:19:50 <mkolesni_> ok so the partial sync we discussed so far can solve them both
15:20:10 <asomya> agreed
15:20:18 <mkolesni_> but it also comes with a pricetag that we need to have this extra info
15:20:39 <mkolesni_> which is noty too awful except it means extra API that needs to be managed for this porpose alone
15:21:02 <mkolesni_> so what we thought is, maybe we can start smaller and simpler
15:21:33 <mkolesni_> our idea was instead of doing a partial sync that runs periodically and tries to compare the both DBs all the time
15:21:43 <mkolesni_> lets use the tools that we already have
15:21:53 <mkolesni_> what do i mean? i'll explain
15:22:03 <mkolesni_> we have the journal table with entries
15:22:30 <mkolesni_> if an entry is completed that means "whoopee, everything works" (up to some extent)
15:22:52 <mkolesni_> if an entry is failed that means "damn, something doesnt work"
15:22:58 <mkolesni_> so something can be:
15:23:15 <mkolesni_> 1. some error on ODL side that might be fixed if we retry later
15:23:25 <mkolesni_> 2. some misalignment
15:23:35 <mkolesni_> for example, i tried to update a port
15:23:52 <mkolesni_> on Neutron everything is peachy and the update gets saved to the DB
15:24:15 <mkolesni_> on ODL someone deleted the network for the port, so i'll always have an error
15:24:46 <mkolesni_> so this is sind of misalignment either caused by some bug, or by human error
15:24:55 <mkolesni_> what we can do is "journal recovery"
15:25:11 <mkolesni_> this means that in the maintenance thread we add a new operation
15:25:28 <mkolesni_> this operation goes over failed rows and tried to recover them
15:25:43 <mkolesni_> so basically it will start probing ODL for data
15:26:06 <mkolesni_> so for the example i gave the operation checks in ODL if the port exists
15:26:32 <mkolesni_> if it does, retry the update (with the new port data)
15:26:52 <mkolesni_> if it doesnt, check if the subnet & network exist
15:26:59 <mkolesni_> if they do, create the port
15:27:06 <mkolesni_> if they dont, create them
15:27:49 <mkolesni_> this is of course just a small flow for port but it can be generified to any resource type we need to support
15:28:12 <mkolesni_> this way we dont always go looking for trouble, but we react to trouble when we find it
15:28:19 <mkolesni_> makes sense?
15:28:22 <asomya> mkolesni_: Retry is already in the driver for journal rows, this is for the maintenance run to go and rerun failed rows?
15:28:30 <mkolesni_> yeah
15:28:42 <mkolesni_> journal rows can only be retried a certain amount
15:28:46 <asomya> how many times do we do this until we mark a resource as 'unsyncable' ?
15:28:53 <asomya> or forever?
15:29:11 <mkolesni_> im not sure we can reach this state since we will sync the reource tree
15:29:29 <mkolesni_> but we can set some expiration time
15:29:52 <asomya> ok, as an interim measure i'm ok with this but in the long run we need to mark the resource failed in Neutron itself
15:30:06 <mkolesni_> obviously the details need to be hashed out on a spec since this is not very simple
15:30:33 <mkolesni_> well, we can mark it failed on Neutron only if it's possible
15:30:50 <mkolesni_> currently not quite so as we discussed earlier, only Port has a status
15:31:29 <mkolesni_> but we can set some sane value like "retry for 1 day" or something
15:31:32 <asomya> yeah, that will take a while
15:32:11 <mkolesni_> and if it doesnt succeed then either delete it or move it to some other state or keep it in failed forever
15:32:26 <mkolesni_> but this can be decided later i suppose
15:32:36 <mkolesni_> what do you think of the general idea?
15:33:02 <asomya> seems ok for now
15:33:29 <asomya> as you said, details need to be hashed out
15:33:34 <mkolesni_> i think this will be beneficial in the short-middle term
15:33:55 <mkolesni_> since a full blown partial sync as we discussed earlier is:
15:34:05 <mkolesni_> * requiring an API on ODL
15:34:19 <mkolesni_> * has some overhead in complexity
15:34:33 <asomya> from a use case perspective, i don't see the user waiting for a day for his/her VM's to come up.. even if neutron says the port is up and it's down.. most likely the user is going to delete and recreate it
15:34:40 <mkolesni_> * probably wasting CPU/bandwidth
15:35:40 <mkolesni_> probably though the maintenance thread runs by default every 5 minutes, so in theory if everything is fine except his port it will be handled rather quickly
15:36:12 <asomya> ok worth a shot i guess
15:36:44 <mkolesni_> the cloud admin can tweak this value obviously if it makes sense to him
15:37:05 <mkolesni_> do you want to discuss this further on bluejeans?
15:37:19 <asomya> sure
15:37:47 <mkolesni_> ok then lets wrap this up here
15:37:50 <asomya> paste link for bluejeans
15:38:14 <mkolesni_> #action mkolesni draft a spec for journal recovery
15:38:41 <mkolesni_> #link https://bluejeans.com/223556372
15:39:16 <mkolesni_> #endmeeting
15:39:19 <mkolesni_> #endmeeting