#opendaylight-meeting: kernel projects
Meeting started by rgoulding at 15:59:59 UTC
(full logs).
Meeting summary
- agenda bashing (rgoulding, 16:00:03)
 
- clustering status quo (rgoulding, 16:06:26)
  - https://jira.opendaylight.org/browse/MDSAL-362
    (rgoulding,
    16:07:05)
- vpicard saw another occurence of this that was
    just slightly different (rgoulding,
    16:07:17)
- appears to be a deadlock, but slightly
    different than the original one (rgoulding,
    16:07:40)
- ACTION: rovarga
    likely to look at this one by the end of the week (Friday
    target) (rgoulding,
    16:08:10)
- he has an idea why this is happening
    (rgoulding,
    16:08:14)
- https://jira.opendaylight.org/browse/CONTROLLER-1845
    (rgoulding,
    16:08:25)
- so far vpicard has not been able to reproduce
    this after he added patch to do threaddump when netstat condition
    met (rgoulding,
    16:09:01)
- liklely to deprioritize this (rgoulding,
    16:09:13)
- since it is not reproducible (rgoulding,
    16:09:19)
- Jamo working on two bugs in
    genius/netvirt/openflowplugin (rgoulding,
    16:09:54)
- 1) unhealthy RESTCONF 401 unauthorized 2)
    cluster unhealthy (rgoulding,
    16:10:17)
- likely that the first issue is solved in master
    and stable/oxygen (rgoulding,
    16:13:02)
- https://jira.opendaylight.org/browse/CONTROLLER-1768
    (rgoulding,
    16:13:03)
- “Shard leaders failed to settle in 90 seconds,
    giving up” (rgoulding,
    16:13:25)
- happens intermittently (rgoulding,
    16:13:29)
- jamoluhrsen can reproduce pretty
    reliably (rgoulding,
    16:13:55)
- shague asks when do we actually think we will
    end up getting to tell-based?  ever? (rgoulding,
    16:14:41)
- it appears we are focusing on ask-based right
    now (rgoulding,
    16:14:54)
- the long term intention is to go to that
    because it promises more resiliency, but there aren’t enough cycles
    to get there in the short term (rgoulding,
    16:15:17)
- jamoluhrsen asks what happens when we have java
    transaction timeouts?  its up to the application to do the
    retries (rgoulding,
    16:16:58)
- so there are timing and race conditions that
    will need to be fixed (rgoulding,
    16:17:09)
- faseela asks whether you should retry a
    transaction when DS is unavailable (some uncertainty whether it is
    AskTimeoutException or DatabaseUnavailableException) (rgoulding,
    16:17:45)
- tpantelis says we may want to push towards
    enabling tell based over fixing applications (rgoulding,
    16:18:19)
- rovarga brings up that no one knows what
    happens during ATE due to fact that it happens in 3PC and there is
    inconsistency in what happens (rgoulding,
    16:18:44)
- the state of the transaction is unknown (may be
    committed, may not) (rgoulding,
    16:18:54)
- the only way to figure this out on App side is
    to do a Read and then start resyncing (rgoulding,
    16:19:09)
- that is quite a bit of work that probably
    shouldn’t be done from the application side (rgoulding,
    16:19:18)
- rovarga states we should put forth the effort
    to switch to tell-based protocol where these problems aren’t an
    issue (or as much of an issue) (rgoulding,
    16:19:53)
- should we lower the timeout from 30s?
    (rgoulding,
    16:21:47)
- rovarga says this can easily happen during
    GC (rgoulding,
    16:21:54)
- so be careful around making assumptions about
    this since a major collection with a huge heap can take
    minutes (rgoulding,
    16:22:07)
- during AskTimeoutException the comm between the
    backend and frontend is broken (rgoulding,
    16:22:58)
- depends on application (rgoulding,
    16:27:26)
- rovarga brings up the fact that the data is
    replicated in the peer and can be recovered from that third
    party (rgoulding,
    16:27:40)
- since it can converge in a couple of seconds,
    then the recovery is in ~1 minute without a ton of retry
    logic (rgoulding,
    16:27:54)
- mapping uint64 to BigInteger (rgoulding,
    16:28:54)
- asked on mailing list (rgoulding,
    16:29:03)
- anytime there is logging, counter, stats, the
    conversion toString() is expensive (rgoulding,
    16:29:18)
- is there a more fixed data type (yang) that
    they can use for this? (rgoulding,
    16:29:30)
- the recommendation is to minimize conversion
    when possible (rgoulding,
    16:30:15)
- and using a separate appender for logging
    possibly (rgoulding,
    16:31:02)
- there are a slew of types long term that will
    come post-Neon that will require breaking binding-spec return
    types (rgoulding,
    16:31:29)
- either that or incur the cost in the binding
    adapter (rgoulding,
    16:31:48)
- but then everyone will pay the conversion
    cost (rgoulding,
    16:31:55)
- begs a two -step approach (rgoulding,
    16:32:01)
- BigInteger is hard to convert to/from
    (rgoulding,
    16:32:08)
- https://lists.opendaylight.org/pipermail/yangtools-dev/2018-July/002264.html
    (rgoulding,
    16:33:02)
- to adopt this and not pay performance price,
    then we will have to break everyone (will require planning)
    (rgoulding,
    16:33:53)
- it is a hard trade-off (rgoulding,
    16:33:59)
- this may be easier to do when md-sal is
    MRI (rgoulding,
    16:34:24)
 
 
- modular models (rgoulding, 16:38:37)
  - https://git.opendaylight.org/gerrit/#/q/topic:modular-models+(status:open+OR+status:merged)
    (rovarga,
    16:38:54)
- instead of odl-mdsal-models (which includes
    20-25 models) now has more granular features so you can request more
    specific models (rgoulding,
    16:39:31)
- the idea is to kill the meta-feature afterwards
    to help improve CSIT times (rgoulding,
    16:39:51)
 
 
- odlparent 3.1.3 (rgoulding, 16:40:17)
  - Oxygen is on 3.1.1 (rgoulding,
    16:40:23)
- yangtools 2.0.5 (rgoulding,
    16:40:31)
- need to roll out 2.0.7 or 2.0.8 in
    oxygen (rgoulding,
    16:41:15)
- some models in downstreams will need to be
    fixed up using cherry-picks (rgoulding,
    16:41:23)
- in order to do this we also need to adapt
    odlparent 3.1.2 or 3.1.3 for upgraded guava dependencies
    (rgoulding,
    16:41:42)
- skitt points out also to utilize consistent
    versions in our releases (rgoulding,
    16:41:52)
- skitt says release notes for 3.1.3 are ready
    and he is running a multi-patch build (rgoulding,
    16:42:43)
- he started 4 hours ago and still hasn’t been
    queued yet (rgoulding,
    16:42:52)
- skitt there will be a bunch of project specific
    patches to adapt 3.1.3 (rgoulding,
    16:44:09)
 
 
- odlparent 4.0.0 timeline (rgoulding, 16:45:22)
  - mid-august (rgoulding,
    16:45:28)
- if there is stuff you want, then get it
    in! (rgoulding,
    16:45:55)
- it is going to include a karaf upgrade, so we
    will need all the runway we can get (rgoulding,
    16:46:08)
 
 
- SyncStatus stays false for more than 5minutes after bringing 2 of 3 nodes down and back up. (rgoulding, 16:46:25)
  - https://jira.opendaylight.org/browse/CONTROLLER-1768
    (rgoulding,
    16:46:33)
- this is happening with just one node too,
    according to jamoluhrsen (rgoulding,
    16:46:45)
- 401 was happening before we were doing the
    datastore read for AuthZ (rgoulding,
    16:48:56)
- and jolokia one fixed now (rgoulding,
    16:49:02)
- there should no longer be 401s (rgoulding,
    16:49:11)
- luis is cointinuing to see this as of 29
    minutes ago (rgoulding,
    16:50:53)
- was this tried with the seed-node-timeout of
    30s? (rgoulding,
    16:51:08)
- tpantelis si saying that CONTROLLER-1849 401
    exception may have been due to CONTROLLER-1768 and we may see more
    now (rgoulding,
    16:52:33)
- so lets forget 401 and focus on sync status
    staying false (rgoulding,
    16:53:01)
- actually, we still see 401 (rgoulding,
    16:53:15)
- two questions 1) is node rejoining and has
    rejoined 2) did the CDTCL come alive (rgoulding,
    16:57:08)
- https://github.com/opendaylight/aaa/blob/7e7cd43a637a5b01510b0af9cac770b06d380d82/aaa-shiro/impl/src/main/resources/initial/aaa-app-config.xml#L313
    (rgoulding,
    16:57:58)
- ACTION: tpantelis
    push patch to get rid of dynamicAuthorization (rgoulding,
    16:59:36)
- will unmask the issue (rgoulding,
    17:00:23)
 
Meeting ended at 17:02:18 UTC
(full logs).
Action items
  - rovarga likely to look at this one by the end of the week (Friday target)
- tpantelis push patch to get rid of dynamicAuthorization
Action items, by person
  -  rovarga 
    - rovarga likely to look at this one by the end of the week (Friday target)
 
People present (lines said)
  - rgoulding (88)
- odl_meetbot (4)
- rovarga (1)
Generated by MeetBot 0.1.4.