#opendaylight-meeting: kernel projects

Meeting started by rgoulding at 15:59:59 UTC (full logs).

Meeting summary

  1. agenda bashing (rgoulding, 16:00:03)
  2. clustering status quo (rgoulding, 16:06:26)
    1. https://jira.opendaylight.org/browse/MDSAL-362 (rgoulding, 16:07:05)
    2. vpicard saw another occurence of this that was just slightly different (rgoulding, 16:07:17)
    3. appears to be a deadlock, but slightly different than the original one (rgoulding, 16:07:40)
    4. ACTION: rovarga likely to look at this one by the end of the week (Friday target) (rgoulding, 16:08:10)
    5. he has an idea why this is happening (rgoulding, 16:08:14)
    6. https://jira.opendaylight.org/browse/CONTROLLER-1845 (rgoulding, 16:08:25)
    7. so far vpicard has not been able to reproduce this after he added patch to do threaddump when netstat condition met (rgoulding, 16:09:01)
    8. liklely to deprioritize this (rgoulding, 16:09:13)
    9. since it is not reproducible (rgoulding, 16:09:19)
    10. Jamo working on two bugs in genius/netvirt/openflowplugin (rgoulding, 16:09:54)
    11. 1) unhealthy RESTCONF 401 unauthorized 2) cluster unhealthy (rgoulding, 16:10:17)
    12. likely that the first issue is solved in master and stable/oxygen (rgoulding, 16:13:02)
    13. https://jira.opendaylight.org/browse/CONTROLLER-1768 (rgoulding, 16:13:03)
    14. “Shard leaders failed to settle in 90 seconds, giving up” (rgoulding, 16:13:25)
    15. happens intermittently (rgoulding, 16:13:29)
    16. jamoluhrsen can reproduce pretty reliably (rgoulding, 16:13:55)
    17. shague asks when do we actually think we will end up getting to tell-based? ever? (rgoulding, 16:14:41)
    18. it appears we are focusing on ask-based right now (rgoulding, 16:14:54)
    19. the long term intention is to go to that because it promises more resiliency, but there aren’t enough cycles to get there in the short term (rgoulding, 16:15:17)
    20. jamoluhrsen asks what happens when we have java transaction timeouts? its up to the application to do the retries (rgoulding, 16:16:58)
    21. so there are timing and race conditions that will need to be fixed (rgoulding, 16:17:09)
    22. faseela asks whether you should retry a transaction when DS is unavailable (some uncertainty whether it is AskTimeoutException or DatabaseUnavailableException) (rgoulding, 16:17:45)
    23. tpantelis says we may want to push towards enabling tell based over fixing applications (rgoulding, 16:18:19)
    24. rovarga brings up that no one knows what happens during ATE due to fact that it happens in 3PC and there is inconsistency in what happens (rgoulding, 16:18:44)
    25. the state of the transaction is unknown (may be committed, may not) (rgoulding, 16:18:54)
    26. the only way to figure this out on App side is to do a Read and then start resyncing (rgoulding, 16:19:09)
    27. that is quite a bit of work that probably shouldn’t be done from the application side (rgoulding, 16:19:18)
    28. rovarga states we should put forth the effort to switch to tell-based protocol where these problems aren’t an issue (or as much of an issue) (rgoulding, 16:19:53)
    29. should we lower the timeout from 30s? (rgoulding, 16:21:47)
    30. rovarga says this can easily happen during GC (rgoulding, 16:21:54)
    31. so be careful around making assumptions about this since a major collection with a huge heap can take minutes (rgoulding, 16:22:07)
    32. during AskTimeoutException the comm between the backend and frontend is broken (rgoulding, 16:22:58)
    33. depends on application (rgoulding, 16:27:26)
    34. rovarga brings up the fact that the data is replicated in the peer and can be recovered from that third party (rgoulding, 16:27:40)
    35. since it can converge in a couple of seconds, then the recovery is in ~1 minute without a ton of retry logic (rgoulding, 16:27:54)
    36. mapping uint64 to BigInteger (rgoulding, 16:28:54)
    37. asked on mailing list (rgoulding, 16:29:03)
    38. anytime there is logging, counter, stats, the conversion toString() is expensive (rgoulding, 16:29:18)
    39. is there a more fixed data type (yang) that they can use for this? (rgoulding, 16:29:30)
    40. the recommendation is to minimize conversion when possible (rgoulding, 16:30:15)
    41. and using a separate appender for logging possibly (rgoulding, 16:31:02)
    42. there are a slew of types long term that will come post-Neon that will require breaking binding-spec return types (rgoulding, 16:31:29)
    43. either that or incur the cost in the binding adapter (rgoulding, 16:31:48)
    44. but then everyone will pay the conversion cost (rgoulding, 16:31:55)
    45. begs a two -step approach (rgoulding, 16:32:01)
    46. BigInteger is hard to convert to/from (rgoulding, 16:32:08)
    47. https://lists.opendaylight.org/pipermail/yangtools-dev/2018-July/002264.html (rgoulding, 16:33:02)
    48. to adopt this and not pay performance price, then we will have to break everyone (will require planning) (rgoulding, 16:33:53)
    49. it is a hard trade-off (rgoulding, 16:33:59)
    50. this may be easier to do when md-sal is MRI (rgoulding, 16:34:24)

  3. modular models (rgoulding, 16:38:37)
    1. https://git.opendaylight.org/gerrit/#/q/topic:modular-models+(status:open+OR+status:merged) (rovarga, 16:38:54)
    2. instead of odl-mdsal-models (which includes 20-25 models) now has more granular features so you can request more specific models (rgoulding, 16:39:31)
    3. the idea is to kill the meta-feature afterwards to help improve CSIT times (rgoulding, 16:39:51)

  4. odlparent 3.1.3 (rgoulding, 16:40:17)
    1. Oxygen is on 3.1.1 (rgoulding, 16:40:23)
    2. yangtools 2.0.5 (rgoulding, 16:40:31)
    3. need to roll out 2.0.7 or 2.0.8 in oxygen (rgoulding, 16:41:15)
    4. some models in downstreams will need to be fixed up using cherry-picks (rgoulding, 16:41:23)
    5. in order to do this we also need to adapt odlparent 3.1.2 or 3.1.3 for upgraded guava dependencies (rgoulding, 16:41:42)
    6. skitt points out also to utilize consistent versions in our releases (rgoulding, 16:41:52)
    7. skitt says release notes for 3.1.3 are ready and he is running a multi-patch build (rgoulding, 16:42:43)
    8. he started 4 hours ago and still hasn’t been queued yet (rgoulding, 16:42:52)
    9. skitt there will be a bunch of project specific patches to adapt 3.1.3 (rgoulding, 16:44:09)

  5. odlparent 4.0.0 timeline (rgoulding, 16:45:22)
    1. mid-august (rgoulding, 16:45:28)
    2. if there is stuff you want, then get it in! (rgoulding, 16:45:55)
    3. it is going to include a karaf upgrade, so we will need all the runway we can get (rgoulding, 16:46:08)

  6. SyncStatus stays false for more than 5minutes after bringing 2 of 3 nodes down and back up. (rgoulding, 16:46:25)
    1. https://jira.opendaylight.org/browse/CONTROLLER-1768 (rgoulding, 16:46:33)
    2. this is happening with just one node too, according to jamoluhrsen (rgoulding, 16:46:45)
    3. 401 was happening before we were doing the datastore read for AuthZ (rgoulding, 16:48:56)
    4. and jolokia one fixed now (rgoulding, 16:49:02)
    5. there should no longer be 401s (rgoulding, 16:49:11)
    6. luis is cointinuing to see this as of 29 minutes ago (rgoulding, 16:50:53)
    7. was this tried with the seed-node-timeout of 30s? (rgoulding, 16:51:08)
    8. tpantelis si saying that CONTROLLER-1849 401 exception may have been due to CONTROLLER-1768 and we may see more now (rgoulding, 16:52:33)
    9. so lets forget 401 and focus on sync status staying false (rgoulding, 16:53:01)
    10. actually, we still see 401 (rgoulding, 16:53:15)
    11. two questions 1) is node rejoining and has rejoined 2) did the CDTCL come alive (rgoulding, 16:57:08)
    12. https://github.com/opendaylight/aaa/blob/7e7cd43a637a5b01510b0af9cac770b06d380d82/aaa-shiro/impl/src/main/resources/initial/aaa-app-config.xml#L313 (rgoulding, 16:57:58)
    13. ACTION: tpantelis push patch to get rid of dynamicAuthorization (rgoulding, 16:59:36)
    14. will unmask the issue (rgoulding, 17:00:23)


Meeting ended at 17:02:18 UTC (full logs).

Action items

  1. rovarga likely to look at this one by the end of the week (Friday target)
  2. tpantelis push patch to get rid of dynamicAuthorization


Action items, by person

  1. rovarga
    1. rovarga likely to look at this one by the end of the week (Friday target)


People present (lines said)

  1. rgoulding (88)
  2. odl_meetbot (4)
  3. rovarga (1)


Generated by MeetBot 0.1.4.