====================================== #opendaylight-meeting: kernel projects ====================================== Meeting started by rgoulding at 15:59:59 UTC. The full logs are available at http://meetings.opendaylight.org/opendaylight-meeting/2018/kernel_projects/opendaylight-meeting-kernel_projects.2018-07-17-15.59.log.html . Meeting summary --------------- * agenda bashing (rgoulding, 16:00:03) * clustering status quo (rgoulding, 16:06:26) * LINK: https://jira.opendaylight.org/browse/MDSAL-362 (rgoulding, 16:07:05) * vpicard saw another occurence of this that was just slightly different (rgoulding, 16:07:17) * appears to be a deadlock, but slightly different than the original one (rgoulding, 16:07:40) * ACTION: rovarga likely to look at this one by the end of the week (Friday target) (rgoulding, 16:08:10) * he has an idea why this is happening (rgoulding, 16:08:14) * LINK: https://jira.opendaylight.org/browse/CONTROLLER-1845 (rgoulding, 16:08:25) * so far vpicard has not been able to reproduce this after he added patch to do threaddump when netstat condition met (rgoulding, 16:09:01) * liklely to deprioritize this (rgoulding, 16:09:13) * since it is not reproducible (rgoulding, 16:09:19) * Jamo working on two bugs in genius/netvirt/openflowplugin (rgoulding, 16:09:54) * 1) unhealthy RESTCONF 401 unauthorized 2) cluster unhealthy (rgoulding, 16:10:17) * likely that the first issue is solved in master and stable/oxygen (rgoulding, 16:13:02) * LINK: https://jira.opendaylight.org/browse/CONTROLLER-1768 (rgoulding, 16:13:03) * “Shard leaders failed to settle in 90 seconds, giving up” (rgoulding, 16:13:25) * happens intermittently (rgoulding, 16:13:29) * jamoluhrsen can reproduce pretty reliably (rgoulding, 16:13:55) * shague asks when do we actually think we will end up getting to tell-based? ever? (rgoulding, 16:14:41) * it appears we are focusing on ask-based right now (rgoulding, 16:14:54) * the long term intention is to go to that because it promises more resiliency, but there aren’t enough cycles to get there in the short term (rgoulding, 16:15:17) * jamoluhrsen asks what happens when we have java transaction timeouts? its up to the application to do the retries (rgoulding, 16:16:58) * so there are timing and race conditions that will need to be fixed (rgoulding, 16:17:09) * faseela asks whether you should retry a transaction when DS is unavailable (some uncertainty whether it is AskTimeoutException or DatabaseUnavailableException) (rgoulding, 16:17:45) * tpantelis says we may want to push towards enabling tell based over fixing applications (rgoulding, 16:18:19) * rovarga brings up that no one knows what happens during ATE due to fact that it happens in 3PC and there is inconsistency in what happens (rgoulding, 16:18:44) * the state of the transaction is unknown (may be committed, may not) (rgoulding, 16:18:54) * the only way to figure this out on App side is to do a Read and then start resyncing (rgoulding, 16:19:09) * that is quite a bit of work that probably shouldn’t be done from the application side (rgoulding, 16:19:18) * rovarga states we should put forth the effort to switch to tell-based protocol where these problems aren’t an issue (or as much of an issue) (rgoulding, 16:19:53) * should we lower the timeout from 30s? (rgoulding, 16:21:47) * rovarga says this can easily happen during GC (rgoulding, 16:21:54) * so be careful around making assumptions about this since a major collection with a huge heap can take minutes (rgoulding, 16:22:07) * during AskTimeoutException the comm between the backend and frontend is broken (rgoulding, 16:22:58) * depends on application (rgoulding, 16:27:26) * rovarga brings up the fact that the data is replicated in the peer and can be recovered from that third party (rgoulding, 16:27:40) * since it can converge in a couple of seconds, then the recovery is in ~1 minute without a ton of retry logic (rgoulding, 16:27:54) * mapping uint64 to BigInteger (rgoulding, 16:28:54) * asked on mailing list (rgoulding, 16:29:03) * anytime there is logging, counter, stats, the conversion toString() is expensive (rgoulding, 16:29:18) * is there a more fixed data type (yang) that they can use for this? (rgoulding, 16:29:30) * the recommendation is to minimize conversion when possible (rgoulding, 16:30:15) * and using a separate appender for logging possibly (rgoulding, 16:31:02) * there are a slew of types long term that will come post-Neon that will require breaking binding-spec return types (rgoulding, 16:31:29) * either that or incur the cost in the binding adapter (rgoulding, 16:31:48) * but then everyone will pay the conversion cost (rgoulding, 16:31:55) * begs a two -step approach (rgoulding, 16:32:01) * BigInteger is hard to convert to/from (rgoulding, 16:32:08) * LINK: https://lists.opendaylight.org/pipermail/yangtools-dev/2018-July/002264.html (rgoulding, 16:33:02) * to adopt this and not pay performance price, then we will have to break everyone (will require planning) (rgoulding, 16:33:53) * it is a hard trade-off (rgoulding, 16:33:59) * this may be easier to do when md-sal is MRI (rgoulding, 16:34:24) * modular models (rgoulding, 16:38:37) * LINK: https://git.opendaylight.org/gerrit/#/q/topic:modular-models+(status:open+OR+status:merged) (rovarga, 16:38:54) * instead of odl-mdsal-models (which includes 20-25 models) now has more granular features so you can request more specific models (rgoulding, 16:39:31) * the idea is to kill the meta-feature afterwards to help improve CSIT times (rgoulding, 16:39:51) * odlparent 3.1.3 (rgoulding, 16:40:17) * Oxygen is on 3.1.1 (rgoulding, 16:40:23) * yangtools 2.0.5 (rgoulding, 16:40:31) * need to roll out 2.0.7 or 2.0.8 in oxygen (rgoulding, 16:41:15) * some models in downstreams will need to be fixed up using cherry-picks (rgoulding, 16:41:23) * in order to do this we also need to adapt odlparent 3.1.2 or 3.1.3 for upgraded guava dependencies (rgoulding, 16:41:42) * skitt points out also to utilize consistent versions in our releases (rgoulding, 16:41:52) * skitt says release notes for 3.1.3 are ready and he is running a multi-patch build (rgoulding, 16:42:43) * he started 4 hours ago and still hasn’t been queued yet (rgoulding, 16:42:52) * skitt there will be a bunch of project specific patches to adapt 3.1.3 (rgoulding, 16:44:09) * odlparent 4.0.0 timeline (rgoulding, 16:45:22) * mid-august (rgoulding, 16:45:28) * if there is stuff you want, then get it in! (rgoulding, 16:45:55) * it is going to include a karaf upgrade, so we will need all the runway we can get (rgoulding, 16:46:08) * SyncStatus stays false for more than 5minutes after bringing 2 of 3 nodes down and back up. (rgoulding, 16:46:25) * LINK: https://jira.opendaylight.org/browse/CONTROLLER-1768 (rgoulding, 16:46:33) * this is happening with just one node too, according to jamoluhrsen (rgoulding, 16:46:45) * 401 was happening before we were doing the datastore read for AuthZ (rgoulding, 16:48:56) * and jolokia one fixed now (rgoulding, 16:49:02) * there should no longer be 401s (rgoulding, 16:49:11) * luis is cointinuing to see this as of 29 minutes ago (rgoulding, 16:50:53) * was this tried with the seed-node-timeout of 30s? (rgoulding, 16:51:08) * tpantelis si saying that CONTROLLER-1849 401 exception may have been due to CONTROLLER-1768 and we may see more now (rgoulding, 16:52:33) * so lets forget 401 and focus on sync status staying false (rgoulding, 16:53:01) * actually, we still see 401 (rgoulding, 16:53:15) * two questions 1) is node rejoining and has rejoined 2) did the CDTCL come alive (rgoulding, 16:57:08) * LINK: https://github.com/opendaylight/aaa/blob/7e7cd43a637a5b01510b0af9cac770b06d380d82/aaa-shiro/impl/src/main/resources/initial/aaa-app-config.xml#L313 (rgoulding, 16:57:58) * ACTION: tpantelis push patch to get rid of dynamicAuthorization (rgoulding, 16:59:36) * will unmask the issue (rgoulding, 17:00:23) Meeting ended at 17:02:18 UTC. Action items, by person ----------------------- * rovarga * rovarga likely to look at this one by the end of the week (Friday target) People present (lines said) --------------------------- * rgoulding (88) * odl_meetbot (4) * rovarga (1) Generated by `MeetBot`_ 0.1.4