15:59:59 <rgoulding> #startmeeting kernel projects 15:59:59 <odl_meetbot> Meeting started Tue Jul 17 15:59:59 2018 UTC. The chair is rgoulding. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:59:59 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:59:59 <odl_meetbot> The meeting name has been set to 'kernel_projects' 16:00:03 <rgoulding> #topic agenda bashing 16:06:26 <rgoulding> #topic clustering status quo 16:07:05 <rgoulding> #link https://jira.opendaylight.org/browse/MDSAL-362 16:07:17 <rgoulding> #info vpicard saw another occurence of this that was just slightly different 16:07:40 <rgoulding> #info appears to be a deadlock, but slightly different than the original one 16:08:10 <rgoulding> #action rovarga likely to look at this one by the end of the week (Friday target) 16:08:14 <rgoulding> #info he has an idea why this is happening 16:08:25 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1845 16:09:01 <rgoulding> #info so far vpicard has not been able to reproduce this after he added patch to do threaddump when netstat condition met 16:09:13 <rgoulding> #info liklely to deprioritize this 16:09:19 <rgoulding> #info since it is not reproducible 16:09:54 <rgoulding> #info Jamo working on two bugs in genius/netvirt/openflowplugin 16:10:17 <rgoulding> #info 1) unhealthy RESTCONF 401 unauthorized 2) cluster unhealthy 16:13:02 <rgoulding> #info likely that the first issue is solved in master and stable/oxygen 16:13:03 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1768 16:13:25 <rgoulding> #info “Shard leaders failed to settle in 90 seconds, giving up” 16:13:29 <rgoulding> #info happens intermittently 16:13:55 <rgoulding> #info jamoluhrsen can reproduce pretty reliably 16:14:41 <rgoulding> #info shague asks when do we actually think we will end up getting to tell-based? ever? 16:14:54 <rgoulding> #info it appears we are focusing on ask-based right now 16:15:17 <rgoulding> #info the long term intention is to go to that because it promises more resiliency, but there aren’t enough cycles to get there in the short term 16:16:58 <rgoulding> #info jamoluhrsen asks what happens when we have java transaction timeouts? its up to the application to do the retries 16:17:09 <rgoulding> #info so there are timing and race conditions that will need to be fixed 16:17:45 <rgoulding> #info faseela asks whether you should retry a transaction when DS is unavailable (some uncertainty whether it is AskTimeoutException or DatabaseUnavailableException) 16:18:19 <rgoulding> #info tpantelis says we may want to push towards enabling tell based over fixing applications 16:18:44 <rgoulding> #info rovarga brings up that no one knows what happens during ATE due to fact that it happens in 3PC and there is inconsistency in what happens 16:18:54 <rgoulding> #info the state of the transaction is unknown (may be committed, may not) 16:19:09 <rgoulding> #info the only way to figure this out on App side is to do a Read and then start resyncing 16:19:18 <rgoulding> #info that is quite a bit of work that probably shouldn’t be done from the application side 16:19:53 <rgoulding> #info rovarga states we should put forth the effort to switch to tell-based protocol where these problems aren’t an issue (or as much of an issue) 16:21:47 <rgoulding> #info should we lower the timeout from 30s? 16:21:54 <rgoulding> #info rovarga says this can easily happen during GC 16:22:07 <rgoulding> #info so be careful around making assumptions about this since a major collection with a huge heap can take minutes 16:22:58 <rgoulding> #info during AskTimeoutException the comm between the backend and frontend is broken 16:27:26 <rgoulding> #info depends on application 16:27:40 <rgoulding> #info rovarga brings up the fact that the data is replicated in the peer and can be recovered from that third party 16:27:54 <rgoulding> #info since it can converge in a couple of seconds, then the recovery is in ~1 minute without a ton of retry logic 16:28:54 <rgoulding> #info mapping uint64 to BigInteger 16:29:03 <rgoulding> #info asked on mailing list 16:29:18 <rgoulding> #info anytime there is logging, counter, stats, the conversion toString() is expensive 16:29:30 <rgoulding> #info is there a more fixed data type (yang) that they can use for this? 16:30:15 <rgoulding> #info the recommendation is to minimize conversion when possible 16:31:02 <rgoulding> #info and using a separate appender for logging possibly 16:31:29 <rgoulding> #info there are a slew of types long term that will come post-Neon that will require breaking binding-spec return types 16:31:48 <rgoulding> #info either that or incur the cost in the binding adapter 16:31:55 <rgoulding> #info but then everyone will pay the conversion cost 16:32:01 <rgoulding> #info begs a two -step approach 16:32:08 <rgoulding> #info BigInteger is hard to convert to/from 16:33:02 <rgoulding> #link https://lists.opendaylight.org/pipermail/yangtools-dev/2018-July/002264.html 16:33:53 <rgoulding> #info to adopt this and not pay performance price, then we will have to break everyone (will require planning) 16:33:59 <rgoulding> #info it is a hard trade-off 16:34:24 <rgoulding> #info this may be easier to do when md-sal is MRI 16:38:37 <rgoulding> #topic modular models 16:38:54 <rovarga> #link https://git.opendaylight.org/gerrit/#/q/topic:modular-models+(status:open+OR+status:merged) 16:39:31 <rgoulding> #info instead of odl-mdsal-models (which includes 20-25 models) now has more granular features so you can request more specific models 16:39:51 <rgoulding> #info the idea is to kill the meta-feature afterwards to help improve CSIT times 16:40:17 <rgoulding> #topic odlparent 3.1.3 16:40:23 <rgoulding> #info Oxygen is on 3.1.1 16:40:31 <rgoulding> #info yangtools 2.0.5 16:41:15 <rgoulding> #info need to roll out 2.0.7 or 2.0.8 in oxygen 16:41:23 <rgoulding> #info some models in downstreams will need to be fixed up using cherry-picks 16:41:42 <rgoulding> #info in order to do this we also need to adapt odlparent 3.1.2 or 3.1.3 for upgraded guava dependencies 16:41:52 <rgoulding> #info skitt points out also to utilize consistent versions in our releases 16:42:43 <rgoulding> #info skitt says release notes for 3.1.3 are ready and he is running a multi-patch build 16:42:52 <rgoulding> #info he started 4 hours ago and still hasn’t been queued yet 16:43:00 <rgoulding> #info includes munging of xtend plugin 16:43:06 <rgoulding> #undo 16:43:06 <odl_meetbot> Removing item from minutes: <MeetBot.ircmeeting.items.Info object at 0x2c40290> 16:44:09 <rgoulding> #info skitt there will be a bunch of project specific patches to adapt 3.1.3 16:45:22 <rgoulding> #topic odlparent 4.0.0 timeline 16:45:28 <rgoulding> #info mid-august 16:45:55 <rgoulding> #info if there is stuff you want, then get it in! 16:46:08 <rgoulding> #info it is going to include a karaf upgrade, so we will need all the runway we can get 16:46:25 <rgoulding> #topic SyncStatus stays false for more than 5minutes after bringing 2 of 3 nodes down and back up. 16:46:33 <rgoulding> #link https://jira.opendaylight.org/browse/CONTROLLER-1768 16:46:45 <rgoulding> #info this is happening with just one node too, according to jamoluhrsen 16:48:56 <rgoulding> #info 401 was happening before we were doing the datastore read for AuthZ 16:49:02 <rgoulding> #info and jolokia one fixed now 16:49:11 <rgoulding> #info there should no longer be 401s 16:50:53 <rgoulding> #info luis is cointinuing to see this as of 29 minutes ago 16:51:08 <rgoulding> #info was this tried with the seed-node-timeout of 30s? 16:52:33 <rgoulding> #info tpantelis si saying that CONTROLLER-1849 401 exception may have been due to CONTROLLER-1768 and we may see more now 16:53:01 <rgoulding> #info so lets forget 401 and focus on sync status staying false 16:53:15 <rgoulding> #info actually, we still see 401 16:57:08 <rgoulding> #info two questions 1) is node rejoining and has rejoined 2) did the CDTCL come alive 16:57:58 <rgoulding> #link https://github.com/opendaylight/aaa/blob/7e7cd43a637a5b01510b0af9cac770b06d380d82/aaa-shiro/impl/src/main/resources/initial/aaa-app-config.xml#L313 16:59:36 <rgoulding> #action tpantelis push patch to get rid of dynamicAuthorization 17:00:23 <rgoulding> #info will unmask the issue 17:02:18 <rgoulding> #endmeeting