15:59:59 <rgoulding> #startmeeting kernel projects
15:59:59 <odl_meetbot> Meeting started Tue Jul 17 15:59:59 2018 UTC.  The chair is rgoulding. Information about MeetBot at http://ci.openstack.org/meetbot.html.
15:59:59 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:59:59 <odl_meetbot> The meeting name has been set to 'kernel_projects'
16:00:03 <rgoulding> #topic agenda bashing
16:06:26 <rgoulding> #topic clustering status quo
16:07:05 <rgoulding> #link https://jira.opendaylight.org/browse/MDSAL-362
16:07:17 <rgoulding> #info vpicard saw another occurence of this that was just slightly different
16:07:40 <rgoulding> #info appears to be a deadlock, but slightly different than the original one
16:08:10 <rgoulding> #action rovarga likely to look at this one by the end of the week (Friday target)
16:08:14 <rgoulding> #info he has an idea why this is happening
16:08:25 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1845
16:09:01 <rgoulding> #info so far vpicard has not been able to reproduce this after he added patch to do threaddump when netstat condition met
16:09:13 <rgoulding> #info liklely to deprioritize this
16:09:19 <rgoulding> #info since it is not reproducible
16:09:54 <rgoulding> #info Jamo working on two bugs in genius/netvirt/openflowplugin
16:10:17 <rgoulding> #info 1) unhealthy RESTCONF 401 unauthorized 2) cluster unhealthy
16:13:02 <rgoulding> #info likely that the first issue is solved in master and stable/oxygen
16:13:03 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1768
16:13:25 <rgoulding> #info “Shard leaders failed to settle in 90 seconds, giving up”
16:13:29 <rgoulding> #info happens intermittently
16:13:55 <rgoulding> #info jamoluhrsen can reproduce pretty reliably
16:14:41 <rgoulding> #info shague asks when do we actually think we will end up getting to tell-based?  ever?
16:14:54 <rgoulding> #info it appears we are focusing on ask-based right now
16:15:17 <rgoulding> #info the long term intention is to go to that because it promises more resiliency, but there aren’t enough cycles to get there in the short term
16:16:58 <rgoulding> #info jamoluhrsen asks what happens when we have java transaction timeouts?  its up to the application to do the retries
16:17:09 <rgoulding> #info so there are timing and race conditions that will need to be fixed
16:17:45 <rgoulding> #info faseela asks whether you should retry a transaction when DS is unavailable (some uncertainty whether it is AskTimeoutException or DatabaseUnavailableException)
16:18:19 <rgoulding> #info tpantelis says we may want to push towards enabling tell based over fixing applications
16:18:44 <rgoulding> #info rovarga brings up that no one knows what happens during ATE due to fact that it happens in 3PC and there is inconsistency in what happens
16:18:54 <rgoulding> #info the state of the transaction is unknown (may be committed, may not)
16:19:09 <rgoulding> #info the only way to figure this out on App side is to do a Read and then start resyncing
16:19:18 <rgoulding> #info that is quite a bit of work that probably shouldn’t be done from the application side
16:19:53 <rgoulding> #info rovarga states we should put forth the effort to switch to tell-based protocol where these problems aren’t an issue (or as much of an issue)
16:21:47 <rgoulding> #info should we lower the timeout from 30s?
16:21:54 <rgoulding> #info rovarga says this can easily happen during GC
16:22:07 <rgoulding> #info so be careful around making assumptions about this since a major collection with a huge heap can take minutes
16:22:58 <rgoulding> #info during AskTimeoutException the comm between the backend and frontend is broken
16:27:26 <rgoulding> #info depends on application
16:27:40 <rgoulding> #info rovarga brings up the fact that the data is replicated in the peer and can be recovered from that third party
16:27:54 <rgoulding> #info since it can converge in a couple of seconds, then the recovery is in ~1 minute without a ton of retry logic
16:28:54 <rgoulding> #info mapping uint64 to BigInteger
16:29:03 <rgoulding> #info asked on mailing list
16:29:18 <rgoulding> #info anytime there is logging, counter, stats, the conversion toString() is expensive
16:29:30 <rgoulding> #info is there a more fixed data type (yang) that they can use for this?
16:30:15 <rgoulding> #info the recommendation is to minimize conversion when possible
16:31:02 <rgoulding> #info and using a separate appender for logging possibly
16:31:29 <rgoulding> #info there are a slew of types long term that will come post-Neon that will require breaking binding-spec return types
16:31:48 <rgoulding> #info either that or incur the cost in the binding adapter
16:31:55 <rgoulding> #info but then everyone will pay the conversion cost
16:32:01 <rgoulding> #info begs a two -step approach
16:32:08 <rgoulding> #info BigInteger is hard to convert to/from
16:33:02 <rgoulding> #link https://lists.opendaylight.org/pipermail/yangtools-dev/2018-July/002264.html
16:33:53 <rgoulding> #info to adopt this and not pay performance price, then we will have to break everyone (will require planning)
16:33:59 <rgoulding> #info it is a hard trade-off
16:34:24 <rgoulding> #info this may be easier to do when md-sal is MRI
16:38:37 <rgoulding> #topic modular models
16:38:54 <rovarga> #link https://git.opendaylight.org/gerrit/#/q/topic:modular-models+(status:open+OR+status:merged)
16:39:31 <rgoulding> #info instead of odl-mdsal-models (which includes 20-25 models) now has more granular features so you can request more specific models
16:39:51 <rgoulding> #info the idea is to kill the meta-feature afterwards to help improve CSIT times
16:40:17 <rgoulding> #topic odlparent 3.1.3
16:40:23 <rgoulding> #info Oxygen is on 3.1.1
16:40:31 <rgoulding> #info yangtools 2.0.5
16:41:15 <rgoulding> #info need to roll out 2.0.7 or 2.0.8 in oxygen
16:41:23 <rgoulding> #info some models in downstreams will need to be fixed up using cherry-picks
16:41:42 <rgoulding> #info in order to do this we also need to adapt odlparent 3.1.2 or 3.1.3 for upgraded guava dependencies
16:41:52 <rgoulding> #info skitt points out also to utilize consistent versions in our releases
16:42:43 <rgoulding> #info skitt says release notes for 3.1.3 are ready and he is running a multi-patch build
16:42:52 <rgoulding> #info he started 4 hours ago and still hasn’t been queued yet
16:43:00 <rgoulding> #info includes munging of xtend plugin
16:43:06 <rgoulding> #undo
16:43:06 <odl_meetbot> Removing item from minutes: <MeetBot.ircmeeting.items.Info object at 0x2c40290>
16:44:09 <rgoulding> #info skitt there will be a bunch of project specific patches to adapt 3.1.3
16:45:22 <rgoulding> #topic odlparent 4.0.0 timeline
16:45:28 <rgoulding> #info mid-august
16:45:55 <rgoulding> #info if there is stuff you want, then get it in!
16:46:08 <rgoulding> #info it is going to include a karaf upgrade, so we will need all the runway we can get
16:46:25 <rgoulding> #topic SyncStatus stays false for more than 5minutes after bringing 2 of 3 nodes down and back up.
16:46:33 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1768
16:46:45 <rgoulding> #info this is happening with just one node too, according to jamoluhrsen
16:48:56 <rgoulding> #info 401 was happening before we were doing the datastore read for AuthZ
16:49:02 <rgoulding> #info and jolokia one fixed now
16:49:11 <rgoulding> #info there should no longer be 401s
16:50:53 <rgoulding> #info luis is cointinuing to see this as of 29 minutes ago
16:51:08 <rgoulding> #info was this tried with the seed-node-timeout of 30s?
16:52:33 <rgoulding> #info tpantelis si saying that CONTROLLER-1849 401 exception may have been due to CONTROLLER-1768 and we may see more now
16:53:01 <rgoulding> #info so lets forget 401 and focus on sync status staying false
16:53:15 <rgoulding> #info actually, we still see 401
16:57:08 <rgoulding> #info two questions 1) is node rejoining and has rejoined 2) did the CDTCL come alive
16:57:58 <rgoulding> #link https://github.com/opendaylight/aaa/blob/7e7cd43a637a5b01510b0af9cac770b06d380d82/aaa-shiro/impl/src/main/resources/initial/aaa-app-config.xml#L313
16:59:36 <rgoulding> #action tpantelis push patch to get rid of dynamicAuthorization
17:00:23 <rgoulding> #info will unmask the issue
17:02:18 <rgoulding> #endmeeting