14:09:23 <abhijitkumbhare> #startmeeting OF plugin
14:09:23 <odl_meetbot> Meeting started Wed Sep 17 14:09:23 2014 UTC.  The chair is abhijitkumbhare. Information about MeetBot at http://ci.openstack.org/meetbot.html.
14:09:23 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:09:23 <odl_meetbot> The meeting name has been set to 'of_plugin'
14:09:32 <vishnoianil> #info vishnoianil
14:10:30 <abhijitkumbhare> #info abhijitkumbhare
14:10:44 <abhijitkumbhare> hi michal_rehak
14:11:08 <abhijitkumbhare> The bugs list is at: https://wiki.opendaylight.org/view/OpenDaylight_Bugs#openflowplugin
14:11:20 <abhijitkumbhare> Under that Helium bugs
14:12:38 <abhijitkumbhare> michal_rehak - we were on https://bugs.opendaylight.org/show_bug.cgi?id=1626
14:12:47 <michal_rehak> #info michal_rehak
14:13:08 <michal_rehak> 1626 needs retest
14:15:00 <abhijitkumbhare> OK - should we reassign it to Jan?
14:15:11 <michal_rehak> no - I need to retest that
14:15:16 <michal_rehak> just doing taht
14:15:23 <abhijitkumbhare> OK
14:16:38 <abhijitkumbhare> Next one is: https://bugs.opendaylight.org/show_bug.cgi?id=1907
14:18:24 <michal_rehak> oflibMichal already did some research and it probably originates in OFPLugin
14:18:54 <michal_rehak> I will have a look
14:19:18 <abhijitkumbhare> OK
14:21:26 <abhijitkumbhare> Next one is: https://bugs.opendaylight.org/show_bug.cgi?id=1866
14:22:50 <abhijitkumbhare> as Hideyuki's message indicates - I am not sure why Tony has moved this to openflowplugin from controller
14:23:09 <michal_rehak> I need to discuss that with Hideyki and we may move it to lithium
14:23:24 <abhijitkumbhare> OK
14:23:41 <michal_rehak> effect of that issue is that VTN does not see statistics
14:23:46 <abhijitkumbhare> It may a priority for him however
14:23:51 <abhijitkumbhare> OK
14:23:52 <michal_rehak> which is not a blocking issue
14:24:30 <abhijitkumbhare> If not Helium - may be in the Helium stable in November
14:24:42 <michal_rehak> right
14:25:12 <abhijitkumbhare> Next one is: https://bugs.opendaylight.org/show_bug.cgi?id=1879
14:25:44 <michal_rehak> that also need some investigation - I will take that too
14:26:24 <abhijitkumbhare> OK
14:27:09 <abhijitkumbhare> Next one Chris O'Shea will be updating
14:27:14 <abhijitkumbhare> (https://bugs.opendaylight.org/show_bug.cgi?id=1395 )
14:27:26 <abhijitkumbhare> But this may go to Lithium
14:28:23 <abhijitkumbhare> Next one - https://bugs.opendaylight.org/show_bug.cgi?id=1756 - there seems to be a patch for it: https://git.opendaylight.org/gerrit/#/c/11288/
14:29:37 <abhijitkumbhare> Needs a review
14:29:53 <abhijitkumbhare> OK - the other topic
14:29:57 <abhijitkumbhare> testing
14:30:44 <abhijitkumbhare> We need to answer the following questions:
14:30:46 <abhijitkumbhare> 1) Features has been tested for your project:
14:31:03 <abhijitkumbhare> 2) Bugs found in RC1 test:
14:31:16 <abhijitkumbhare> 3) Patches:
14:31:37 <abhijitkumbhare> 4)  Is your project a compatible projecf:
14:33:55 <abhijitkumbhare> Also Sujatha has asked the following questions regarding karaf:
14:33:57 <abhijitkumbhare> As I need to send the OF Plugin documentation for review, I thought of adding the karaf steps for the plugin installation. Would it be possible for you to answer a couple of queries?
14:33:58 <abhijitkumbhare> 1.       Are these the karaf features for OF Plugin?
14:33:58 <abhijitkumbhare> ·         OpenDaylight Openflow Plugin SouthBound
14:33:58 <abhijitkumbhare> ·         OpenDaylight Openflow Plugin Flow Services
14:33:58 <abhijitkumbhare> 2.       What is the OF Plugin repository in the Karaf shell?
14:33:58 <abhijitkumbhare> (repo-add mvn:org.opendaylight.openflowplugin/features-openflowplugin/?-SNAPSHOT/xml/features)
14:33:58 <abhijitkumbhare> 
14:36:44 <abhijitkumbhare> Let's wrap up & continue tomorrow - about testing
14:36:51 <abhijitkumbhare> and these questions
11:37:40 <tbachman> michal_rehak: just added a comment to that bug
11:37:44 <tbachman> I haven’t seen it
11:37:50 <tbachman> and in fact thought it must have been fixed :)
11:38:06 <tbachman> I guess we can watch to see if it happens again
11:38:15 <tbachman> but am fine with closing the bug for now,  if you want
11:44:24 <michal_rehak> tbachman: ok
11:45:03 <michal_rehak> tbachman: talking about https://bugs.opendaylight.org/show_bug.cgi?id=1907  ?
11:45:21 <tbachman> yes :)
11:46:00 <michal_rehak> ok, resolved
11:46:04 <michal_rehak> thank you
11:46:29 <tbachman> np!
14:12:23 <abhijitkumbhare> Hi folks
14:13:18 <abhijitkumbhare> Sorry - a bit late
14:18:08 <abhijitkumbhare> we have a release review coming up next week:
14:18:11 <abhijitkumbhare> https://lists.opendaylight.org/pipermail/release/2014-September/000438.html
14:19:30 <abhijitkumbhare> Also we have several bugs right now
14:19:43 <abhijitkumbhare> https://bugs.opendaylight.org/buglist.cgi?bug_severity=blocker&bug_severity=critical&bug_severity=major&bug_severity=normal&bug_severity=minor&bug_severity=trivial&bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=IN_PROGRESS&bug_status=WAITING_FOR_REVIEW&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Ccf_target_milestone&f1=cf_target_milestone&list_id=17007&n1=1&o1=su
14:19:43 <abhijitkumbhare> bstring&product=openflowplugin&query_format=advanced&resolution=---&v1=Lithium
14:20:13 <abhijitkumbhare> OK - easier to copy paste the following: https://wiki.opendaylight.org/view/OpenDaylight_Bugs#openflowplugin
14:26:09 <abhijitkumbhare> hi vishnoianil - did you have the chance to look at the RC1?
14:29:15 <abhijitkumbhare> https://bugs.opendaylight.org/show_bug.cgi?id=1840 should be closed - right? michal_rehak & vishnoianil
14:30:15 <abhijitkumbhare> OK - will mark it
14:30:43 <michal_rehak> abhijitkumbhare: probably - Peter can do a workaround by changing xml content to setfield in order to have closer match
14:32:00 <abhijitkumbhare> Yes - but OF 1.3 to OF 1.0 forward compatibility is not there - OF 1.0 to 1.3 backward compatibility is present.
14:32:41 <michal_rehak> tests run against of-1.3
14:33:14 <michal_rehak> and we wrote some backward compatibility hacks for vlan-id and set*port
14:35:56 <abhijitkumbhare> I think we should try to ensure the critical bugs are fixed soon
14:36:15 <abhijitkumbhare> I will not be able to make the meeting tomorrow
14:36:32 <abhijitkumbhare> So we can continue Monday
06:51:27 <michal_rehak> morning
16:46:34 <hideyuki> https://bugs.opendaylight.org/show_bug.cgi?id=1988
16:46:58 <hideyuki> Hi, VTN project has detected a blocker bug in OpenFlow plugin.
16:48:11 <hideyuki> I'm going to add more detailed information for bug 1988 in Bugzilla.
16:48:47 <abhijitkumbhare> OK
16:49:29 <hideyuki> Due to this bug, VTN Manager system test fails for OF 1.0 switches....
16:51:35 <abhijitkumbhare> OK - I think this may be another bug Michal may need to look into. vishnoianil is this something you can take a shot?
16:58:45 <hideyuki> abhijitkumbhare: Thanks. This is a blocker bug to VTN project...
21:31:17 <rvdp> I like to start testing with helium RC1 and OF1.3 REST API; from which URL should I download?
21:36:49 <tbachman> rvdp: https://lists.opendaylight.org/pipermail/release/2014-September/000463.html
21:41:48 <rvdp> Using that, but I need to load features? I loaded odl-dlux-all, odl-openflowplugin-all, odl-restconf (what will be in the base release?) Where are the REST docs?
21:44:28 <tbachman> rvdp: fwiw, there’s this: https://wiki.opendaylight.org/view/Karaf:Step_by_Step_Guide#How_to_Test_RC0
21:45:21 <rvdp> found that already; searching further
21:45:24 <tbachman> rvdp: you can see the list of all features by doing feature:list and pipe it in to grep
21:45:42 <tbachman> oh
21:45:43 <tbachman> lol
21:45:46 <tbachman> that wiki page has that :)
09:14:37 <tbachman> michal_rehak: morning :)
09:15:01 <tbachman> (althought I guess it’s closer to “afternoon” for you ;) )
09:16:01 <michal_rehak> tbachman: hi
09:16:09 <michal_rehak> close to noon
09:16:11 <michal_rehak> :-)
09:16:11 <tbachman> ah
09:16:52 <tbachman> michal_rehak: I’m going to be debugging an issue we’re having with flow programming this am
09:17:31 <michal_rehak> ok
09:17:34 <tbachman> I haven’t done much debugging with the openflowplugin yet — was hoping you might have some pointers?
09:18:13 <michal_rehak> well - you probably want to catch main entry points where the flow must pass
09:18:31 <michal_rehak> the path looks following:
09:19:02 <michal_rehak> after you write flow to datastore/config FRM is notified (onDataChangeEvent..)
09:19:21 <michal_rehak> so one brakpoint should be in FlowForwarder class in add method
09:19:36 <tbachman> k
09:19:42 <michal_rehak> then the next stop is ModelDrivenSwitchImpl.addFlow
09:20:06 <tbachman> k
09:20:09 <michal_rehak> this will lead you into asynchronous task in OFRpcTaskFactory
09:20:27 <michal_rehak> something inside of createAddFlowTask
09:21:02 <michal_rehak> and next stop is in openflowjava - there occurs encoding
09:21:02 <tbachman> okay
09:21:59 <tbachman> michal_rehak: if I wanted to make sure that the flow made it to the config store
09:22:03 <tbachman> using REST
09:22:15 <michal_rehak> starts in class SerializationFactory.messageToBuffer
09:22:16 <tbachman> do you have the URL example for that?
09:22:36 * tbachman can also go look it up in the yang model…. but is just waking up ;)
09:22:37 <michal_rehak> on how to push a flow into datastore/config via rest?
09:22:45 <tbachman> to read bcak
09:22:46 <tbachman> back
09:22:59 <tbachman> I want to see if I can see the flows in the config store
09:23:15 <tbachman> (e.g. compare config and operational, see if they’re different or the same)
09:23:22 <michal_rehak> http://localhost:8080/restconf/config/opendaylight-inventory:nodes/node/openflow:1/table/0
09:23:27 <tbachman> thx!
09:23:40 <tbachman> and operational is just change the “config” above to operational?
09:23:46 <michal_rehak> here you have DS/config for node openflow:1, table 0
09:23:50 <michal_rehak> right
09:23:55 <tbachman> k
09:23:55 <tbachman> thx
09:24:03 <tbachman> michal_rehak: got it
09:24:05 <michal_rehak> http://localhost:8080/restconf/operational/opendaylight-inventory:nodes/node/openflow:1/table/0
09:24:19 <tbachman> any stats avaiable via karaf console, perhaps?
09:24:32 <michal_rehak> in case you use curl then the xml output is awful - no indentation
09:24:34 <tbachman> or any other means?
09:24:35 <tbachman> lol
09:24:36 <tbachman> yeah
09:24:38 <tbachman> I use postman
09:24:45 <michal_rehak> ok - then you are fine
09:25:07 <michal_rehak> stats in karaf
09:25:36 <michal_rehak> I am not sure if there is a way in karaf to read datastore content into shell
09:25:42 <tbachman> okay
09:25:44 <tbachman> no worries :)
09:25:54 <tbachman> this gives me a start
09:26:01 <michal_rehak> there are internal stats of OFPlugin
09:26:23 <michal_rehak> which tell you how many messages passed important points in plugin
09:26:51 <michal_rehak> but is is started by configSubsytem config file
09:27:11 <michal_rehak> and I tried to create feature for karaf contaiing only this file, but no success
09:27:19 <tbachman> ah
09:27:29 <tbachman> so, not available via REST today :(
09:27:32 <tbachman> that’s okay
09:27:48 <michal_rehak> but if you increase log level for stats, it might go at least into log file
09:27:59 <tbachman> okay
09:28:32 <michal_rehak> <logger name="org.opendaylight.openflowplugin.statistics.MessageSpyCounterImpl" level="DEBUG"/>
09:28:50 <michal_rehak> and do not forget to start wireshark in order to see what was sent to device
09:28:58 <tbachman> michal_rehak: good idea! ;)
09:29:05 <tbachman> michal_rehak: thx for the help!
09:29:09 <tbachman> am going to try some of this
09:29:20 <tbachman> may try to ping you later in the day, if you’re around
09:29:25 <michal_rehak> sure
09:29:33 <tbachman> michal_rehak: thx for your help! :)
09:29:39 <michal_rehak> in 4 hours I have to leave for 2 hours
09:29:45 <michal_rehak> you are welcome
09:29:48 <tbachman> michal_rehak: no worries! :)
09:29:49 <tbachman> thx
11:11:36 <edwarnicke> michal_rehak: Could you provide tbachman with pointers on how to check in the config data store via RESTCONF to see if his flows are in the config data store?
11:11:52 <edwarnicke> michal_rehak: Oh... I see where you did above :)
11:11:56 <michal_rehak> :-)
11:11:56 <tbachman> :)
11:12:14 <michal_rehak> I can resend to make it more seem more important
11:12:15 * tbachman has already requested that info, and found it to be different from what’s in operational
11:12:18 <tbachman> lol
11:12:21 <tbachman> michal_rehak: ;)
11:12:37 <tbachman> michal_rehak: edwarnicke  am collecting data
11:12:43 <tbachman> and will provide some info soon
11:13:06 <tbachman> will continue the chase down the rabbit-hole why you guys enjoy the data ;)
11:13:16 <edwarnicke> tbachman: As to FRM points to put breakpoints with the debugger, this patch may provide some hints: https://git.opendaylight.org/gerrit/#/c/11382/
11:13:28 <edwarnicke> tbachman: I think the key questions are:
11:13:43 <edwarnicke> 1)  Do all the flows get to the switch (apparently not)
11:13:50 <edwarnicke> 2)  Do the missing flows get to the OFplugin
11:13:59 <edwarnicke> 3)  Do the missing flows get the FRM?
11:14:10 <tbachman> FRM is first to hit (from GBP)?
11:14:10 <edwarnicke> 4)  Do the missing flows get to the config data store?
11:14:19 <tbachman> (er, after data store of course ;) )
11:14:29 <tbachman> edwarnicke: they get to the config data store
11:14:33 <edwarnicke> GBP -> config data store -> FRM -> OFplugin -> OFjava -> wire -> switch
11:14:39 <edwarnicke> tbachman: Excellent
11:14:49 <edwarnicke> But the key point is... where is the chain breaking
11:14:54 <tbachman> exactly :)
11:14:55 <edwarnicke> Because that's certainly where the bug is :)
11:14:58 <tbachman> lol
11:15:06 <tbachman> will get there
11:15:13 * tbachman is the tortoise
11:15:29 <edwarnicke> Do look at 11382, as it will show you where to debug to find out if you are getting to the FRM and if the FRM is calling the OFplugin
11:15:36 <tbachman> edwarnicke: thx!
11:15:43 <edwarnicke> Since that's the next place in the chain :)
11:15:51 <tbachman> is it okay that it failed build?
11:16:11 <tbachman> I guess it’s just a pointer to the code
11:16:17 <tbachman> (for breakpoint purposes)
15:59:30 <edwarnicke> oflibMichal: Lets wait a moment for michal_rehak
16:00:32 <oflibMichal> sure, meanwhile going through the 1997 comments
16:00:56 <edwarnicke> michal_rehak: Welcome :)
16:01:07 <edwarnicke> michal_rehak: Do you need a moment to go through the 1997 comments?
16:01:23 <edwarnicke> michal_rehak: oflibMichal is reading the 1997 comments now
16:01:54 <michal_rehak> edwarnicke: reading
16:07:20 <edwarnicke> oflibMichal: michal_rehak Take your time, but let me know when you are finished and if you have any questions... then I will tell you the latest finding from tbachman
16:07:49 <michal_rehak> sure
16:08:05 <oflibMichal> i am ready, I am implementing some logs in openflowjava to debug easier
16:08:50 <edwarnicke> oflibMichal: Many thanks :)
16:09:00 <edwarnicke> michal_rehak: Let me know when you are ready :)
16:17:04 <michal_rehak> ok, I guess we can
16:18:35 <oflibMichal> edwarnicke: when i want to build integration karaf distribution ... i need to cd distributions/extra/karaf mvn clean install ?
16:19:02 <edwarnicke> oflibMichal: Correct
16:19:07 <oflibMichal> ok
16:20:11 <oflibMichal> edwarnicke: michal_rehak: building the distribution according to the instructions here https://lists.opendaylight.org/pipermail/ovsdb-dev/2014-September/000720.html
16:23:28 <edwarnicke> oflibMichal: That is a good set of reproduction instructions for ovsdb, yes :)
16:24:44 <edwarnicke> tbachman: Wecome
16:24:58 <edwarnicke> tbachman: Can you explain your latest finding about Groups for the GBP case?
16:25:10 <tbachman> edwarnicke: back :)
16:25:29 <tbachman> we have found that all the flowmods are present on the wire
16:25:51 <edwarnicke> tbachman: And...
16:26:00 * tbachman is good with dramatic pauses
16:26:02 <tbachman> lol
16:26:11 <tbachman> the are two vSwitches
16:26:18 <tbachman> one gets all the flows, no problems
16:26:29 <tbachman> the other gets all the flows, but there are also two error messages
16:26:37 <tbachman> there are also error messages in the vSwitchd logs
16:26:54 <tbachman> the key will be understanding why those errors are happening
16:27:36 <tbachman> unfortunately, I’ve got a learning curve for interpreting wireshark :)
16:27:37 <tbachman> lol
16:27:56 <tbachman> edwarnicke: you had a theory that the groups might be referring to things that haven’t been created yet?
16:28:02 <tbachman> (i.e. sequencing problem)
16:28:04 <tbachman> also
16:28:08 <tbachman> one thing to keep in mind
16:28:14 <tbachman> this all *used to work*
16:28:20 <tbachman> w/o any change in GBP code
16:28:44 <edwarnicke> tbachman: Could you pastebin the error messages?
16:28:53 <tbachman> the vswitchd error message?
16:29:11 <edwarnicke> tbachman: We have made significant performance gains... so race conditions are racier
16:29:22 <tbachman> edwarnicke: I see
16:29:31 <edwarnicke> tbachman: The error message on the wire (and the vswitchd error too :) )
16:29:37 <tbachman> k
16:30:39 <tbachman> http://pastebin.com/jzpRGKYn
16:30:43 <tbachman> fwiw
16:30:49 <tbachman> for those on the email that I sent
16:31:00 <tbachman> there is a text file version of the dissected OF messages
16:31:11 <tbachman> so, it should be available in there (with context) as well
16:31:27 <edwarnicke> oflibMichal1: Note that OVSDB is *not* using Groups (I checked) and so it wouldn't explain their issue
16:31:59 <edwarnicke> tbachman: What should we search for?
16:32:06 <tbachman> ERROR :)
16:32:09 <tbachman> all caps
16:32:26 * tbachman isn’t sure what edwarnicke is asking?
16:32:36 <edwarnicke> LOL
16:32:43 <edwarnicke> When you said 'with context'I thought you'd sent more stuff :)
16:33:20 <edwarnicke> tbachman: Instead you did exactly what I would have wanted :)
16:33:33 <edwarnicke> tbachman: And I would have realized that if I had looked closer :)
16:33:48 <edwarnicke> tbachman: Can you look at your capture with a filter roughly like
16:35:20 <michal_rehak> oh - do we have groups involved?
16:35:34 <michal_rehak> this looks like corresponding group is not ready
16:35:54 <oflibMichal1> michal_rehak: (6:31:30 PM) edwarnicke: oflibMichal1: Note that OVSDB is *not* using Groups (I checked) and so it wouldn't explain their issue
16:36:07 <edwarnicke> (openflow_v4.type == 14  || openflow_v4.type == 1 ||openflow_v4.type == 15 )
16:36:26 <tbachman> edwarnicke: will do
16:36:41 <edwarnicke> tbachman: And then see if you can add a filter to filter openflow_v4.type == 14 && <whatever you need to match transaction id 6)
16:37:01 <edwarnicke> tbachman: The idea is to see the sequence of groupmod and flow mod that produced the error
16:37:19 <edwarnicke> michal_rehak: Yes... for GBP... but not for ovsdb, they don't use them
16:39:29 <tbachman> oflibMichal1: michal_rehak: let me know if you’re interested in the pcapng, and will email
16:40:22 <michal_rehak> sure, that could be helpful
16:44:19 <tbachman> michal_rehak: sent :)
16:45:59 <tbachman> edwarnicke: michal_rehak oflibMichal1 : updated: http://pastebin.com/U4j328Ww
16:46:58 <edwarnicke> tbachman: Do you see any GroupMods ?
16:47:55 <tbachman> edwarnicke: with transaction ID 6?
16:49:03 <oflibMichal1> edwarnicke: how do I start karaf in -debug mode ?
16:49:54 <tbachman> Is GroupMod an operation?
16:50:17 <michal_rehak> oflibMichal1: export KARAF_DEBUG=1
16:50:20 * tbachman realizes he should know, but hasn’t reahed that level of understanding of the OF side fo things yet
16:50:31 * tbachman goes to pull up OF1.3 spec
16:50:45 <edwarnicke> openflow_v4.type == 15 matches groupmods
16:50:54 <edwarnicke> openflow_v4.type == 14  matches flowmods
16:51:03 <edwarnicke> openflow_v4.type == 1 matches errors
16:51:04 <edwarnicke> So
16:51:12 <edwarnicke> (openflow_v4.type == 14  || openflow_v4.type == 1 ||openflow_v4.type == 15 ) gives you those things
16:51:28 <edwarnicke> I am curious if we see GroupMods before or after the FlowMod that produces the errors
16:51:38 <tbachman> There is one
16:51:48 <tbachman> but
16:51:51 <tbachman> to the other switch
16:53:23 <tbachman> so, that’s confusing
16:53:36 <tbachman> the # of flows is what we expect
16:53:42 <tbachman> but we only see one GROUP_MOD
16:53:48 <tbachman> to the one switch that works
16:54:00 <tbachman> could a GROUP_MOD be accidentally sent as a FLOW_MOD?
16:55:18 <michal_rehak> tbachman: there many featuresReply indicating that there are 1.0 and 1.3 switches
16:55:34 <michal_rehak> and datapathId is often the same
16:55:47 <tbachman> michal_rehak: I should explain
16:55:51 <tbachman> the vswitches start out 10
16:56:00 <tbachman> and we do an ovs-vsctl to change them to 1.3
16:56:04 <michal_rehak> could you point us to number where the session started
16:56:05 <michal_rehak> ?
16:56:28 <michal_rehak> oh
16:56:30 <tbachman> michal_rehak: unfortunately, I don’t have something that can sequence it that well
16:56:32 <tbachman> I should say
16:56:36 <tbachman> that I started the wireshark
16:56:43 <tbachman> and then ran our python script
16:56:46 <michal_rehak> I meant the left column in wireshark
16:56:47 <tbachman> which does the ovs-vsctl
16:57:26 <michal_rehak> oh
16:59:46 <michal_rehak> ok and one switch is talking from IP 192.168.194.137 and the other one from  192.168.194.138?
17:00:04 <tbachman> michal_rehak: right
17:00:10 <tbachman> and the controller is 192.168.194.137
17:00:19 <tbachman> so, one vSwitch is on the same host as the controller
17:00:52 <michal_rehak> ok
17:02:08 <tbachman> michal_rehak: I have some old traces
17:02:14 <tbachman> one of which has the “good” case
17:02:18 <tbachman> I want to compare it to the bad one
17:02:35 <edwarnicke> oflibMichal1: Let us know when you have your debugs in place :)
17:02:40 <michal_rehak> so both switches were first changed from 1.0 to 1.3
17:02:49 <michal_rehak> and then started all the business?
17:02:54 <tbachman> michal_rehak: that’s  correct
17:03:39 <tbachman> guys
17:03:42 <tbachman> the weird thing is
17:03:54 <tbachman> I only see a single group flow even in the “good” case
17:03:55 <tbachman> guys
17:04:00 <tbachman> I have to dissappear for about 45
17:04:01 <tbachman> :(
17:04:08 <oflibMichal1> edwarnicke: already debugging
17:05:18 <michal_rehak> ok, there is group sent only to one switch
17:11:11 <michal_rehak> looks like we lost groupMod message 1 sec after featuresReply from the 137 switch
17:11:41 <edwarnicke> oflibMichal1: Be aware that the issue is intermittent, and additional logs have made it harder to reproduce
17:12:23 <oflibMichal1> sure, I do the eclipse debug
17:13:00 <oflibMichal1> now when thomas mentioned that all flow are installed and there are errors it looks like conversion / translation issue
17:13:11 <edwarnicke> oflibMichal1: Oh?
17:13:40 <michal_rehak> we have a bug in XID assigning - all flowMod messages use the same xid as it was in featureReply
17:13:47 <michal_rehak> tat should be fixed now
17:13:55 <edwarnicke> michal_rehak: OK
17:14:02 <michal_rehak> but it makes hard to pair error to flowMod
17:14:20 <oflibMichal1> edwarnicke: thomas mentioned that they do not use groups ... correct ?
17:15:13 <edwarnicke> oflibMichal1: I asked ovsdb if they used groups, they said no, and I also grepped their code and did not find any reference to Group
17:15:22 <oflibMichal1> oh, ok
17:15:37 <oflibMichal1> so than the flowMod looks incorrect ...
17:15:38 <oflibMichal1> Action            Type: OFPAT_GROUP (22)            Length: 8            Group ID: 1
17:16:46 <oflibMichal1> there should be no GROUP action ...
17:16:53 <oflibMichal1> or am i missing something ?
17:17:20 <michal_rehak> how do I check the table pipline - mentioned in the instructions?
17:19:18 <edwarnicke> oflibMichal1: https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-specifications/openflow/openflow-spec-v1.3.0.pdf page 18 at the bottom
17:20:53 <edwarnicke> michal_rehak: table pipeline ?
17:21:10 <michal_rehak> trying to replicate
17:21:21 <michal_rehak> after this: ovs-vsctl set-manager  tcp:<controllerip>:6640
17:21:35 <michal_rehak> I should see those flows somehow
17:28:22 <edwarnicke> michal_rehak: Oh... you are asking how to dump their flows?
17:28:51 <michal_rehak> I obviously failed to start something
17:28:53 <edwarnicke> michal_rehak: sudo ovs-ofctl dump-flows br-int -O OpenFlow13
17:28:56 <michal_rehak> let me retry
17:28:59 <edwarnicke> ack
17:41:46 <oflibMichal1> i need to leave ... i will try to reconnect later, probably in 45 minutes
17:44:02 <edwarnicke> oflibMichal1: Thank you, see you in about 45 minutes :)
17:47:25 <michal_rehak> alagalah: is there a fast way to clean and start replicating again?
17:47:35 <michal_rehak> https://lists.opendaylight.org/pipermail/ovsdb-dev/2014-September/000720.html
17:47:53 <michal_rehak> I used sudo ovs-vsctrl emer-reset
17:50:32 <alagalah> michal_rehak: You mean to reset ovs?
17:50:37 <alagalah> michal_rehak: sudo mn -c
17:50:44 <michal_rehak> yes
17:50:46 <michal_rehak> ok
17:50:53 <alagalah> sudo ovs-vsctl show (see stuff)
17:50:56 <alagalah> sudo mn -c
17:51:00 <alagalah> sudo ovs-vsctl show (clean)
17:53:11 <edwarnicke> alagalah: I understand you can pick up the wireshark sequencing stuff
17:53:16 <edwarnicke> alagalah: http://pastebin.com/U4j328Ww was sent by tbachman
17:53:20 <edwarnicke> It shows the error
17:53:44 <edwarnicke> (openflow_v4.type == 14  && <something that will match transaction id 6> || openflow_v4.type == 1 ||openflow_v4.type == 15 )
17:54:03 <edwarnicke> will give you GroupMods (15) Errors(1) and FlowMods(14) with the offending transaction id
17:56:00 <alagalah> edwarnicke: Ack
17:56:23 <alagalah> edwarnicke: I take it i should ditch my Priority filters then, which is fine.
17:56:39 <edwarnicke> alagalah: They will always be there for you if you need them :)
17:56:54 <edwarnicke> (which is my way of saying... I could be wrong, but probably :) )
17:57:23 <alagalah> edwarnicke: LOLZ, yes, I know... filtering is non-destructive :D
18:08:58 <alagalah> edwarnicke: Applied filter.... interesting results
18:09:06 <edwarnicke> alagalah: Do tell :)
18:09:44 <alagalah> Firstly we have to start with the assumption from our tests that OVS-s1 has "all the flows" (ie 57) and OVS-s2 does not (55).
18:09:45 <alagalah> BUT
18:10:10 <alagalah> When I apply the filter, (and I tried with xid==6 and without to get both switches) ... I only see errors on s1....
18:10:41 <alagalah> Still digging, but I want to ensure that thomas has his system setup up for .137 as s1 and .138 as s2... (which I am almost positive it is) but ... hmmm.
18:11:15 <alagalah> I am just going to filter on the Type1 (Errors) and have a dig
18:17:03 <alagalah> 4142
18:17:03 <alagalah> 4146
18:17:03 <alagalah> 4176
18:17:05 <alagalah> edwarnicke: michal_rehak Some interesting things again from the trace... packets
18:17:08 <alagalah> doh
18:17:29 <michal_rehak> jsut missing groupMod to one of the switches
18:17:29 <alagalah> edwarnicke: michal_rehak Some interesting things again from the trace... packets 4142, 4146, 4176 all have a OFPAT_GROUP (the only ones) as an action
18:17:43 <alagalah> 4142, 4146 are proceeded by the error
18:17:46 <alagalah> 4176 is not
18:18:02 <alagalah> But its likely that those are the mysterious two flows we are missing michal_rehak to your point
18:18:08 <edwarnicke> alagalah: Do we get a GroupMod between 4142 and 4176 ?
18:18:29 <alagalah> edwarnicke: That was next on my list to explain why 4176 doesnt get an error
18:18:42 * alagalah not much of an OF debugging maven so I don't know how to filter on GroupMod
18:19:36 * alagalah furiously RTFM'ing
18:21:57 <alagalah> openflow_v4.action.group.group_id only returns 4142, 4146, 4176 and a bunch from s2
18:22:44 <edwarnicke> alagalah: One moment
18:22:56 <edwarnicke> openflow_v4.type == 15 matches groupmods
18:23:05 <edwarnicke> openflow_v4.type == 14  matches flowmods
18:23:12 <edwarnicke> openflow_v4.type == 1 matches errors
18:23:52 <alagalah> edwarnicke: Thanks, I was running down the: openflow_v4.group_features.actions.all rabbit-hole
18:24:37 <alagalah> GroupMod from .137 (s1?) to .138 (s2?) @PAK: 3657
18:24:51 <alagalah> michal_rehak: Should we have a GroupMod on s1-s1 ?
18:24:57 <alagalah> (note: S1 is also the controller)
18:25:12 <alagalah> So .137 - .138 is really controller - s2 (in my tiny mind)
18:25:58 <michal_rehak> I believe so as there is reference to that group in failing flows
18:26:09 <michal_rehak> and that is what the error message told
18:26:33 <alagalah> michal_rehak: Ok, let me see where the GroupMod is built in our code... might take some time as I am not that familiar with it, but I'll dig
18:27:28 <michal_rehak> ok, great
18:28:15 <michal_rehak> and I am sorry (missing something obvious) but do we have instructions on how to replicate the scenario which was captured by the pcap session of wireshark?
18:28:32 <michal_rehak> I mean change from 1.0 to 1.3 and all the many flows with group?
18:28:52 <michal_rehak> or is this something magic what need some third party app?
18:30:15 <alagalah> michal_rehak: neither unfortunately mate
18:30:59 <alagalah> michal_rehak: We don't know the trigger other than "it happened a few weeks ago we think" but we have made strides but now see this problem (2 flows, tied to a missing GroupMod).... note it is intermittent (Heisenbug)
18:31:39 <alagalah> michal_rehak: I can run our Proof of Concept 5 times in a row with full success, then the 6 it fails... sometimes if fails continuously... no rhyme or reason for the trigger (so suspect a race condition or some such)
18:32:04 <edwarnicke> alagalah: I think the 'didn't happen a week ago' may just be related to performance bugs being fixed and thus race conditions getting racier :)
18:32:04 <alagalah> michal_rehak: but if it was a race condition, I would still expect us to be sending the GroupMod (out of sequence) but I don't
18:32:07 <alagalah> michal_rehak: Its weird
18:32:29 <alagalah> edwarnicke: I concur... other issues we have faced have been resolved but they have brought this issue to light
18:33:00 <alagalah> tbachman: ping
18:33:07 <tbachman> pong
18:33:12 <tbachman> sorry guys
18:33:17 <alagalah> tbachman: So you should be!
18:33:19 <abhijitkumbhare> are you guys talking about https://bugs.opendaylight.org/show_bug.cgi?id=1997 ?
18:33:20 * alagalah kidding
18:33:24 <alagalah> abhijitkumbhare: ack
18:33:30 <abhijitkumbhare> OK
18:34:10 <michal_rehak> need to dispose of 2 disturbances, be back in 30 mins
18:34:19 <tbachman> lol
18:34:30 * alagalah Is expecting michal_rehak to knock on his door with a shotgun to eliminate one disturbance
18:34:35 <tbachman> lol
18:37:00 * tbachman is recreating his env :(
18:37:13 <tbachman> mac crash
18:37:48 <tbachman> folks
18:37:53 <tbachman> I want to recreate the “good” case
18:38:04 <tbachman> and verify that we see the group mod to both switches
18:39:31 <edwarnicke> tbachman: That sounds like an excellent plan
18:39:44 <edwarnicke> abhijitkumbhare: Yep... welcome to #opendaylight-Bug1997 :)
18:39:48 <edwarnicke> (this channel)
18:39:49 <tbachman> If I don’t, will be worried
18:39:52 <tbachman> what’s weird
18:39:56 <tbachman> is the # of flows is the same
18:39:57 <edwarnicke> abhijitkumbhare: here's what we think we know
18:40:09 <tbachman> (on the wire)
18:40:09 <edwarnicke> abhijitkumbhare: Both OVSDB and GBP have found the following to be true:
18:40:17 <edwarnicke> a) Their flows are getting to the config datastore
18:40:32 <edwarnicke> b)  They are *sometimes* (not always, it can be hard to reproduce) missing a couple of flows on the switch
18:40:47 <edwarnicke> abhijitkumbhare: GBP has debugged further and determined that
18:40:53 <edwarnicke> c)  All of their flows are getting to OFplugin
18:41:15 <edwarnicke> d)  OFplugin stats show the same number of AddFlows coming in from MD-SAL as going out to OFjava (in the failure case)
18:41:31 <edwarnicke> Well... #a-d are also true when things work :)
18:41:57 <edwarnicke> abhijitkumbhare: OVSDB has not debugged that far, so we can only surmise its true for them too
18:42:11 <edwarnicke> abhijitkumbhare: So tbachman and alagalah have been looking at the wire with wireshark, to see what's going on
18:42:44 <edwarnicke> abhijitkumbhare: Current theory is that we may be seeing an ordering issue where a FlowMod that references a group arrives before the GroupMod that creates it
18:42:52 <edwarnicke> abhijitkumbhare: This is because we see the Error for that situation
18:42:59 <edwarnicke> abhijitkumbhare: alagalah and tbachman are digging more deeply into that
18:43:06 <edwarnicke> abhijitkumbhare: We are pretty sure that is not the issue for OVSDB
18:43:24 <edwarnicke> abhijitkumbhare: Because they claim to not be using groups, and a grep through their code fro Group reveals nothing
18:43:35 <edwarnicke> abhijitkumbhare: So michal_rehak and oflibMichal are working on debugging the OVSDB case
18:43:40 <edwarnicke> abhijitkumbhare: Does all of that make sense?
18:46:57 <edwarnicke> tbachman: Say more
18:47:03 * edwarnicke was just running off at the mouth ;)
18:47:08 <tbachman> lol
18:47:09 <tbachman> sorry
18:47:11 <tbachman> just ran a test
18:47:14 <tbachman> seems to be working
18:47:15 <tbachman> so
18:47:17 <tbachman> am looking at flows
18:47:19 <edwarnicke> Yes
18:47:23 <tbachman> and
18:47:24 <tbachman> I see it
18:47:27 <tbachman> the GROUP_MOD flow
18:47:31 <edwarnicke> Flows on OVS or FlowMods on the wire?
18:47:33 <alagalah> edwarnicke: your summary is correct
18:47:35 <tbachman> wireshark
18:47:39 <edwarnicke> Cool
18:47:40 <tbachman> let me verify that I see *both*
18:47:54 <alagalah> edwarnicke: abhijitkumbhare I would also add that between CONFIG and OFPlugin we have not only verified the count of flows but also the flowIDs
18:48:16 * alagalah point of order=temp pumpkinate in 45min
18:48:31 * alagalah Have to meet a friend
18:48:33 <edwarnicke> alagalah: Point of pumpkination ?
18:48:54 <edwarnicke> alagalah: Which was a lovely bit of work I might add :)
18:49:05 <alagalah> edwarnicke: What was?
18:49:35 <alagalah> edwarnicke: That 15min python hack ??? I found it to be profoundly ugly LOL ... could do a lot better with difflib.SequenceMatcher but more tooling for another time.
18:51:38 * tbachman is thoroughly confused
18:52:01 <tbachman> edwarnicke: abhijitkumbhare: oflibMichal: michal_rehak: is it possible to piggy-back a group flow onto another message?
18:52:08 <tbachman> b/c
18:52:09 <tbachman> again
18:52:16 <tbachman> I only see *one* group flow
18:52:20 <tbachman> this time to the other switch
18:52:27 <tbachman> (i.e. the one that wasn’t getting one before)
18:53:56 <tbachman> wiat
18:54:01 <tbachman> I had the wrong filter setting
18:54:03 <tbachman> it’s ther
18:54:05 <tbachman> there
18:54:06 <tbachman> to both
18:54:10 <edwarnicke> tbachman: You may find that there are multiple OFmessages in a wireshark summary line
18:54:12 <edwarnicke> Use
18:54:18 <tbachman> openflow_v4.type == 15
18:54:21 <tbachman> that’s what I used
18:54:26 <tbachman> interestingly
18:54:28 <tbachman> in this trace
18:54:31 <tbachman> there are 3 :)
18:54:34 <edwarnicke> That was what I would recommend
18:54:39 <edwarnicke> tbachman: Intresting
18:54:43 <tbachman> two to one switch
18:54:58 <tbachman> back-to-back
18:57:30 <tbachman> edwarnicke: it may be splitting it into two messages?
18:57:49 <tbachman> the message to the first switch has a length of 436
18:57:58 <tbachman> the two messages to the other switch are each of size 204
18:58:33 <alagalah> tbachman: edwarnicke abhijitkumbhare michal_rehak Its important to note that trace tbachman is referring to is a WORKING example of our POC. The one I referred to above is a FAILING example
18:58:57 <edwarnicke> alagalah: Thank you for being clear there :)
19:00:30 <alagalah> tbachman: So BAD: No GroupMod for s1, Single GroupMod for s2.     GOOD: One GroupMod for s1, Two GroupMods for S2 (potentially we could be seeing OFPack/TCP fragmentation)
19:01:22 <tbachman> fwiw
19:01:30 <tbachman> in one of the bad cases
19:01:35 <tbachman> I see a message of only 364
19:02:31 <tbachman> and in another failure case, it’s only 340
19:03:23 <alagalah> tbachman: Its going to be tough to use length without quite a bit of work..... I kind of started down this road to compare traces (potentially)
19:03:43 <alagalah> tbachman: The issue is for each pak you have a fixed+variable length so a) you can't add the lengths for each trace
19:04:05 <alagalah> b) you could take the lengths for each case, minus (#paks * fixed header length) and normalise both counts
19:04:30 <alagalah> All that is telling you is you are sending the same information in both, but perhaps then the order is wrong
19:04:35 <tbachman> alagalah: ack — but the point is that for a single message, it looks like we are sending 436
19:04:37 <tbachman> but
19:04:44 <tbachman> we’re seeing 340, 364
19:04:45 <alagalah> Easier is just to count manually in the good case Openflow13 parts of the TCP message
19:05:45 * tbachman wonders if this could be an issue of how group flows may be split into multiple messages
19:06:21 <tbachman> abhijitkumbhare: oflibMichal michal_rehak: do any of you know how multiple group flows may be sent (i.e. one or more messages)?
19:06:26 <tbachman> Could we have an issue
19:06:44 <tbachman> where we split across a boundary we shouldn't?
19:06:55 <tbachman> and if we get the ERROR
19:07:02 <tbachman> maybe we don’t complete those?
19:14:59 <alagalah> michal_rehak: Thoughts on tbachman's comments?
19:15:26 * alagalah I'm an idiot... forgot about kiddy nighty night time
19:15:59 * alagalah Sorry folks... I have to pumpkinate... fwiw been on a hangout w tbachman in parallel discussing ideas as its quicker...
19:16:41 <oflibMichal> so the question is ... do we split group mod message into more group mod messages ?
19:16:43 <abhijitkumbhare> Thanks for the summary edwarnicke (still going thru the other's comments on/beyond it) - will also relay/playback the conversation to vishnoianil if he is interested (over google chat)
19:17:11 <tbachman> oflibMichal: yeah — just trying to think of why these would look different
19:17:26 <tbachman> since we send the same flows every time (or at least we think we do)
19:17:39 <tbachman> I guess in general
19:17:48 <tbachman> how do you decide how many flows to send in a message?
19:18:14 <tbachman> Is it asynchronous to what’s happening in the data store (e.g. timer or threshold based, and send once either is reached)?
19:18:30 <tbachman> Or does each data store notification result in an OF message?
19:18:54 <tbachman> (where a data store notifcation may have multiple flow)
19:19:01 <oflibMichal> i believe that there is currently no mechanismi that would allow for message split
19:19:08 <tbachman> oflibMichal: okay
19:19:18 <tbachman> so — it’s tied to the asynch notify
19:19:20 <oflibMichal> so basically each rpc or notification will be transfered into of message
19:19:29 <tbachman> oflibMichal: got it
19:19:30 <tbachman> so
19:19:39 <tbachman> interesting that one switch gets two messages, and the other gets one
19:19:41 <tbachman> and in the failing case
19:19:54 <tbachman> we only see a message to one of the switches
19:20:05 <oflibMichal> there is one exception I know, but that is programatically done, - based on the need to install two flows
19:20:16 <oflibMichal> not because of the length
19:20:18 <tbachman> and it’s bigger than the normal messages (364 or 340, vs. 204)
19:20:30 <oflibMichal> and that was the vlan id discussion with hideyuki
19:20:33 <oflibMichal> from vtn
19:20:54 <tbachman> oh — slight correction on my statement
19:21:01 <tbachman> er, nm
19:21:02 <tbachman> :)
19:21:07 * tbachman managed to confuse himseelf
19:21:09 <tbachman> ;)
19:21:14 <oflibMichal> :-)
19:21:43 <edwarnicke> tbachman: the number of flows you see in a message depends on the TCP stack
19:21:54 <edwarnicke> tbachman: Its frame size and tcp window dependent
19:22:03 <edwarnicke> tbachman: Wireshark is just dissecting the frames
19:22:12 <tbachman> edwarnicke: TCP will send things in chunks, yes, but there’s a notion of OF message boundaries, no?
19:22:20 <tbachman> edwarnicke: ah
19:22:22 <tbachman> good point
19:22:25 <tbachman> so
19:22:33 <tbachman> even though there are two entries in the dissector
19:22:39 <tbachman> those are truly Ethernet packets
19:22:41 <edwarnicke> tbachman: Exactly
19:22:48 <edwarnicke> tbachman: yep
19:22:50 <tbachman> got it
19:22:51 <tbachman> k
19:22:55 <tbachman> so
19:23:00 <tbachman> so
19:23:04 <tbachman> I need to put these together
19:23:09 <tbachman> stitch them
19:23:09 <edwarnicke> tbachman: And a good TCP stack will pack in as much as it can
19:23:14 <tbachman> right :)
19:23:26 <tbachman> In the bad case
19:23:27 <edwarnicke> oflibMichal: How goes the debugging ?
19:24:14 * tbachman sees a useful feature to add to wireshark — OF message stitching
19:24:48 <oflibMichal> edwarnicke: I was talking with Michal Rehak on how to get the ovsdb usecase running
19:25:02 <oflibMichal> because I traced some wrong flow mods
19:25:10 <oflibMichal> but than he needed to go off
19:25:14 <oflibMichal> for a while
19:25:35 * vishnoianil is away: Break time, will be back soon
19:26:08 <edwarnicke> tbachman: You can use follow tcp stream :)
19:26:15 <edwarnicke> tbachman: Not *quite* what you want though
19:26:16 <tbachman> edwarnicke: :)
19:26:18 <tbachman> lol
19:26:18 <edwarnicke> tbachman: But yes :)(
19:26:19 <tbachman> so
19:26:23 <tbachman> there’s quite a bit of time
19:26:30 <tbachman> from when that first segment is sent to that switch
19:26:38 <tbachman> and when we get back the error from the vSwitch
19:27:20 <tbachman> like 3 seconds
19:27:33 <edwarnicke> tbachman: OF is allowed to be tardy :(
19:27:40 <tbachman> that sounds like a fairly long time for a window on a local IP stack
19:28:00 <edwarnicke> tbachman: OF is allowed to be more tardy than the IP stack :(
19:28:07 <tbachman> lol
19:28:22 <tbachman> hmmm
19:28:23 <tbachman> actually
19:28:25 <tbachman> the point is
19:28:36 <tbachman> that the switch throwing the error is the one that didn’t get the message
19:28:43 * tbachman forgets these things
19:29:38 <abhijitkumbhare> edwarnicke it probably takes more work to be tardy (and respond tardy) than respond immediately :)
19:29:39 <tbachman> fwiw, in the working case, these messages are sent to each switch
19:29:42 <tbachman> in succession
19:30:39 * tbachman fears that we’re going to have to go through the logs to see the actual flows as they progress from datastore to Ofplugin to Oflowjava
19:30:59 <edwarnicke> tbachman: I don't think you will have to
19:31:06 <tbachman> <whew>
19:31:06 <tbachman> :)
19:31:08 <edwarnicke> tbachman: We just need to know if the issue is succession on the wire
19:31:19 <tbachman> ?
19:31:42 <edwarnicke> tbachman: If its an issue of succession on the wire
19:31:45 <tbachman> the # of flows is also confusing
19:31:57 <tbachman> b/c they are the same in the pass/fail case
19:32:01 <edwarnicke> tbachman: Then we just have to look at how you guys are enforcing order on your flow programming
19:32:06 <tbachman> but we don’t see the group flow in the fail case
19:32:17 <tbachman> it’s like the group flow was replaced with something else
19:32:18 <edwarnicke> tbachman: But in the fail case you have the error, right?
19:32:29 <edwarnicke> tbachman: OK... *that* could be interesting
19:32:43 * tbachman is just theorizing here
19:32:46 <edwarnicke> tbachman: We may need to trace that and see if perhaps the GroupFlow is getting lost
19:32:54 <edwarnicke> tbachman: Let me see what tools we have for that
19:33:09 <tbachman> edwarnicke: it wouldn’t have to be just lost — it would have to be replaced
19:33:14 <tbachman> that’s the only way the counts could add up
19:34:37 <edwarnicke> tbachman: But a GroupMod is not a FlowMod, right?
19:34:54 <edwarnicke> tbachman: You have OFplugin stats for both working and failing , right?
19:34:55 <tbachman> edwarnicke: so, would that show up in ovs-ofctl?
19:35:05 <tbachman> edwarnicke: I don’t have working yet
19:35:09 <tbachman> but I can collect those
19:35:12 <tbachman> just a sec...
19:35:28 <edwarnicke> tbachman: I thought you collected OFplugin stats to verify OFplugin wasn't loosing flows?
19:35:35 <edwarnicke> Oh
19:35:37 <edwarnicke> read to fast
19:35:41 <edwarnicke> You only have stats for failing
19:35:44 <tbachman> correct
19:35:50 <tbachman> you’re talking about jconsole stuff, right?
19:38:38 <tbachman> here’s the jconsole from working/non-working: http://pastebin.com/6JtgCfJz
19:39:21 <tbachman> edwarnicke: would that be “AddGroupInput”?
19:39:55 <tbachman> If so, we have +4/+4 on ENQUEUED and SUBMITTED
19:40:12 <edwarnicke> Yep
19:40:26 <edwarnicke> tbachman: And how many GroupMods do you see on the wire?
19:40:38 * tbachman goes back to check
19:41:00 <tbachman> edwarnicke: you’re talking about the failure case, right?
19:43:11 <tbachman> edwarnicke: there are two in the failing case - but am worried about the length there
19:45:52 <tbachman> edwarnicke: abhijitkumbhare oflibMichal  michal_rehak: this is looking more suspicious
19:46:23 <oflibMichal> tbachman: do you have wireshark captures with groupmods for both successful and failure case ?
19:46:29 <michal_rehak> reading channel - I am currently 30 mins ago
19:46:32 <tbachman> oflibMichal: yes
19:46:41 <tbachman> michal_rehak: distractions asleep? ;)
19:46:46 <oflibMichal> can you please send me those ?
19:46:51 <tbachman> oflibMichal: sure
19:46:56 <oflibMichal> on michal.polkorab@pantheon.sk ?
19:46:59 <oflibMichal> thanks
19:47:06 <tbachman> np
19:48:28 <tbachman> http://pastebin.com/uzaNtMX6
19:48:37 <tbachman> so, in the failure case
19:48:58 <tbachman> we see a single message with 2 group flow mods, with total length of 340
19:49:24 <tbachman> hmmmm
19:49:29 <tbachman> that may be a legitimate one
19:49:29 <tbachman> but
19:49:38 <tbachman> we never see the group flow mod for the other switch
19:50:16 <tbachman> (I say the one of length 340 may be legitimate, b/c that switch appears to work — I guess it’s conceiveable that the two switches are getting slightly different group flow entries, and therefore the 340 may be okay?)
19:50:51 <tbachman> again — fwiw, in the working  case, the group flow messages are sent two both switches back-to-back-to-back
19:51:01 <tbachman> (i.e. no intervening flow mod messages)
19:51:04 <tbachman> not sure if that’s important
19:51:15 <tbachman> but, from the perspective of what we’d expect to see
19:52:16 <tbachman> oflibMichal: did you receive it?
19:52:22 <tbachman> (wireshark traces)
19:52:35 <oflibMichal> yes i did
19:52:41 <tbachman> cool :)
19:54:35 <oflibMichal> ok, so the lengths look ok
19:54:46 <oflibMichal> every group mod message is 136 bytes long
19:55:00 <tbachman> oflibMichal: in both the good and bad cases?
19:55:06 <oflibMichal> yes
19:55:29 <oflibMichal> please see the length inside the group mod message
19:55:30 * tbachman wonders why in the good case, a single switch gets a message that’s over 400 bytes in length, but in the bad case, the other switch gets a single message of 340 bytes in length
19:55:36 <oflibMichal> not in the packet header
19:55:41 <tbachman> oflibMichal: okay
19:55:46 <tbachman> good point
19:55:54 <tbachman> but — also good to make sure these line up :)
19:55:54 <oflibMichal> i can see in good case >>>
19:56:15 <michal_rehak> tbachman: distractions partially asleep
19:56:21 <tbachman> michal_rehak: lol
19:56:39 <tbachman> hmmm
19:56:44 <tbachman> I wonder how clever the OF dissector is
19:56:52 <tbachman> b/c we could have another packet
19:56:59 <tbachman> that contains the rest of the message
19:56:59 <michal_rehak> wireshark is known to miss some packages
19:57:48 <oflibMichal> one packet - size 436 with 2*136 (group mods) +96 (one flow mod) + 68 (packet header size)
19:57:54 <edwarnicke> tbachman: michal_rehak has a lot more recent experience with wireshark and OF than I do :)
19:57:59 <tbachman> lol
19:58:01 <tbachman> yes
19:58:19 * tbachman is a total dunce in this (and many other) domains
19:58:35 <michal_rehak> edwarnicke: I have 2 week positive experience with wireshark-1.12.0
19:58:43 <edwarnicke> michal_rehak: LOL
19:58:47 * tbachman is using 1.12.1
19:59:00 <edwarnicke> tbachman: You are now the most modern of us all :)
19:59:05 <tbachman> lol
19:59:13 <edwarnicke> oflibMichal: How are you doing on reproducing the ovsdb case?
19:59:46 <michal_rehak> tbachman: I shall bow in front of you
19:59:52 <tbachman> lol
19:59:58 <tbachman> have you guys been drinking?
20:00:00 <tbachman> :p
20:00:11 <edwarnicke> tbachman: I started on Pellegrino in the wee hours of the morning :0
20:00:15 <tbachman> lol
20:00:15 * edwarnicke has always been drinking :)
20:00:22 <tbachman> hydration is a good thing
20:00:33 <tbachman> that’s a centuries old bio-hack ;)
20:03:25 <tbachman> oflibMichal: so, I just realized what you were saying with one of the above messages
20:03:34 <tbachman> there are 2 group mods with one flow mod
20:03:39 <tbachman> interesting
20:04:13 <tbachman> if a group mod fails
20:04:16 <tbachman> and there’s a flow mod with it
20:04:27 <michal_rehak> then the flowMod fails tooo
20:04:28 <tbachman> does that mean all will fail?
20:04:31 <tbachman> ah
20:04:48 <tbachman> that explains why I can see all the flow messages
20:04:55 <tbachman> but we’re missing a flow in OVS
20:05:01 <tbachman> b/c it got packaged with a group mod
20:05:12 <tbachman> that failed
20:05:34 * tbachman is probably catching up to “start”, whereas michal_rehak and oflibMichal are almost at the finish
20:05:36 <michal_rehak> ovs can ignore your flow in case there is already another one installed with the same match rules and priority
20:05:52 <michal_rehak> in the same table
20:07:19 <michal_rehak> tbachman: back to failing flowMod - it fails when relying on group and corresponding groupMod failed
20:08:07 <tbachman> michal_rehak: since I excell at asking dumb questions, can I ask another?
20:08:18 <michal_rehak> I would be pleased
20:08:25 <tbachman> does ovs-ofctl dump-flows show just flow mods, or are group flows covered in that as well?
20:09:23 <michal_rehak> tbachman: that is funny question for ovs does not support groups now
20:09:36 * tbachman is alarmed
20:09:43 <tbachman> really?
20:09:55 * tbachman wonders why readams put those in there
20:10:47 <michal_rehak> there were some issues with kernel modules since some older version
20:10:56 <tbachman> I guess the thing I’m trying to undertand
20:11:03 <michal_rehak> for dumping groups there is dump-groups
20:11:09 <tbachman> is that when we do ovs-ofctl dump-flows <switch> -O OpenFlow13
20:11:16 <tbachman> and pipe that into wc -l
20:11:25 <tbachman> we get 57 (if you exclude the banner/header)
20:11:32 <tbachman> we expect each switch to have 57 flows
20:11:35 <tbachman> but
20:11:41 <tbachman> I wasn’t sure if that covers group flows as well
20:12:28 <tbachman> in the failing case, we’re short 2 flows
20:12:42 <tbachman> which could be the 2 group flows
20:15:52 <michal_rehak> groups are listed separate via ovs-ofctl dump-groups
20:15:57 <michal_rehak> regarding help
20:16:52 <tbachman> I think I am finally able to make the numbers almost add up in my head now :)
20:16:53 <tbachman> thx
20:18:05 <edwarnicke> tbachman: I still remember the point where I realized all of quantum field theory and statmech was mostly counting... and I was *really* bad at counting ;)
20:18:25 <tbachman> lol
20:18:39 <tbachman> edwarnicke: while you are applying quantum theory
20:18:49 <tbachman> I’m still trying to apply how to code in java ;)
20:18:50 <tbachman> lol
20:19:57 <tbachman> so, this adds up: alagalah@odlgbp1:~/logdata/wireshark$ egrep 'OFPT_GROUP_MOD \(15\)' wiresharkdata-working.txt | wc -l
20:19:58 <tbachman> 4
20:19:59 <tbachman> alagalah@odlgbp1:~/logdata/wireshark$ egrep 'OFPT_GROUP_MOD \(15\)' wiresharkdata.txt | wc -l
20:20:00 <tbachman> 2
20:20:02 <tbachman> yuck
20:20:03 <tbachman> sorry folks
20:20:05 * tbachman goes back to pastebin
20:21:57 <hideyuki> michal_rehak: Thank you for the patch for Bug 1988.
20:22:15 <michal_rehak> hideyuki: hi! have you tested it?
20:22:35 <hideyuki> michal_rehak: I've built the Karaf edition using the latest code of Integration Git repository, and I'm testing bug 1988.
20:22:42 <tbachman> this is what I should have done: http://pastebin.com/Hh8aRHM0
20:22:43 <michal_rehak> hideyuki: perfect
20:22:46 <hideyuki> michal_rehak: But, it seems to me that the bug is not fixed.
20:23:03 <michal_rehak> hideyuki: :-(
20:23:22 <hideyuki> michal_rehak: The result has been not changed.
20:23:28 <edwarnicke> tbachman: OK... and the OFplugin stats in both cases?
20:23:34 <edwarnicke> tbachman: For GroupMods?
20:23:46 <tbachman> they were 4 and 4 in both cases
20:23:54 * tbachman scrolls up to retrieve pastebin
20:24:23 <tbachman> http://pastebin.com/6JtgCfJz
20:25:11 * tbachman presumes these are the entries of interest:
20:25:12 <tbachman> TO_SWITCH_ENQUEUED_SUCCESS: MSG[AddGroupInput] -> +4 | 4
20:25:12 <edwarnicke> oflibMichal: How are you doing on getting debugs going?
20:25:20 <tbachman> TO_SWITCH_SUBMITTED_SUCCESS: MSG[AddGroupInput] -> +4 | 4
20:25:36 <edwarnicke> oflibMichal: Because it looks like we might be loosing GroupMods south of OFplugin as well
20:26:27 <oflibMichal> I must miss something
20:26:42 <oflibMichal> not able to reproduce by these instructions
20:26:43 <oflibMichal> https://lists.opendaylight.org/pipermail/ovsdb-dev/2014-September/000720.html
20:26:55 <oflibMichal> no flow mods are send
20:28:04 <michal_rehak> I was wrong - ovs-2.1.3 supports groups
20:28:24 * tbachman is using 2.1.2
20:29:16 <tbachman> oflibMichal: did you see his 3a/b/c?
20:29:30 <michal_rehak> tbachman: I misread changelog
20:31:17 <oflibMichal> yes i did
20:31:22 <edwarnicke> oflibMichal: Can you bounce over to #opendaylight-ovsdb with me?
20:31:24 <oflibMichal> but let me recheck
20:32:06 <tbachman> edwarnicke: been trying to summon Madhu_offline, but he may be out
20:32:25 <michal_rehak> tbachman: the amounts in pastebin are weird - 4groups+114flows = OK and 2groups+116flows is WRONG
20:32:41 <tbachman> michal_rehak: exactly ;)
20:33:05 <michal_rehak> like there would be some fallback in case without groups add 2 more flows
20:33:24 <michal_rehak> and device shows how many flows?
20:33:40 * tbachman wonders if it’s possible that the group flows were accidentally sent as flow mods
20:33:51 <tbachman> it shows 55 (out of 57) flows
20:34:16 <tbachman> b/c ovs-ofctl doesn’t appear to show group flows
20:34:25 <tbachman> two switches
20:34:29 <tbachman> one switch gets all 57
20:34:32 <tbachman> the other 55
20:34:42 <tbachman> (or I should say *programs* 55)
20:35:41 <michal_rehak> so good case is 57 per each switch -
20:35:43 <michal_rehak> .
20:35:47 <michal_rehak> 114
20:35:51 <tbachman> right
20:36:05 <tbachman> is it possible that the missing 2 groups are turned into flow mods?
20:36:17 <michal_rehak> unlikely
20:36:47 <tbachman> michal_rehak: I’ll see if I can figure out what the 2 extra flow mods are
20:37:11 <michal_rehak> that would be great
20:37:27 <tbachman> unless oflibMichal beats me to it ;)
20:37:28 <tbachman> lol
20:37:39 <tbachman> tortoise v. hare
20:37:45 <tbachman> except am going to trip the hare ;)
20:37:46 <tbachman> lol
20:41:18 <michal_rehak> tbachman: I am slightly missing context but it sounds adventurous
20:41:26 <tbachman> lol
20:42:01 <oflibMichal> tbachman: ok, i was able to reproduce the ovsdb usecase, but only once now
20:42:08 <tbachman> oflibMichal: nice!
20:42:09 <oflibMichal> how can i restart the mininet
20:42:36 <oflibMichal> do i need to shut it down and restart completely ?
20:42:49 <michal_rehak> cat reset.sh
20:42:50 <michal_rehak> #!/bin/bash
20:42:50 <michal_rehak> echo -e '\e[33;1mClean MANAGER:\e[0m'
20:42:50 <michal_rehak> echo -e '\e[32;1mSee current status - manager.\e[0m'
20:42:50 <michal_rehak> sudo ovs-vsctl get-manager
20:42:50 <michal_rehak> echo -e '\e[32;1mClean manager.\e[0m'
20:42:50 <michal_rehak> sudo ovs-vsctl del-manager
20:42:51 <michal_rehak> echo -e '\e[32;1mSee current status after clean - manager.\e[0m'
20:42:51 <michal_rehak> sudo ovs-vsctl get-manager
20:42:53 <michal_rehak> echo
20:42:53 <michal_rehak> echo -e '\e[33;1mClean CONTROLLER:\e[0m'
20:42:53 <michal_rehak> echo -e '\e[32;1mSee current status - ctrl.\e[0m'
20:42:53 <michal_rehak> sudo ovs-vsctl show
20:42:55 <michal_rehak> echo -e '\e[32;1mClean controller.\e[0m'
20:44:01 <edwarnicke> oflibMichal: You can rest mininet with
20:44:03 <edwarnicke> sudo mn -c
20:44:44 <michal_rehak> http://pastebin.com/JFUU2NaJ
20:45:14 <oflibMichal> edwarnicke: will try
20:46:47 <tbachman> michal_rehak: I’m not confident in the 116 number
20:47:33 <tbachman> b/c I used wireshark to filter on just openflow_v4.type == 14
20:47:38 <tbachman> and dumped that to a text file
20:47:42 <tbachman> and greped for OFPT_FLOW_MOD (14)
20:47:44 <tbachman> and that is 114
20:47:50 <tbachman> both good and bad cases
20:47:56 <michal_rehak> sounds promising
20:48:20 <michal_rehak> now we are searching for 2 missing groupModes only
20:48:45 <tbachman> ah
20:48:59 <tbachman> I think I figured out where the extra count of 2 came from
20:49:03 <tbachman> the error messages back
20:49:21 <tbachman> michal_rehak: correct
20:52:18 <edwarnicke> oflibMichal: Did you see steps 3a-c: https://lists.opendaylight.org/pipermail/ovsdb-dev/2014-September/000720.html
20:52:27 <oflibMichal> yes i did
20:52:29 <tbachman> edwarnicke: he was able to reproduce it
20:53:22 <djx> tbachman did you say ovs-ofctl doesn't show groups?
20:53:34 <tbachman> djx: maybe that’s not the right way to say that
20:53:45 <tbachman> we’re sending group mods to OVS 2.1.2
20:53:53 <tbachman> running the above doesn’t seem to show them
20:53:53 <tbachman> but
20:53:55 <tbachman> that may just be
20:54:01 <tbachman> b/c OVS 2.1.2 doesn’t support groups
20:54:09 <djx> yes it doe
20:54:10 * tbachman isn’t versed enough yet to know
20:54:12 <tbachman> k
20:54:13 <djx> since 2.1.0
20:54:21 <tbachman> djx: is there a separate command to dump those?
20:54:28 <djx> sudo ovs-ofctl dump-groups --protocols=OpenFlow13
20:54:32 <tbachman> ah
20:54:33 <tbachman> thx :)
20:54:46 <djx> and don't forget the switch
20:54:47 * tbachman apparrently is too lazy to do “man ovs-ofct"
20:54:52 <djx> no problemo
20:54:53 <tbachman> the protocols?
20:55:14 <djx> sudo ovs-ofctl dump-groups --protocols=OpenFlow13 s1
20:55:18 <djx> the s1 part
20:55:19 <edwarnicke> oflibMichal: Datapoint... when I follow the directions at https://lists.opendaylight.org/pipermail/ovsdb-dev/2014-September/000720.html
20:55:20 <tbachman> ah
20:55:24 <edwarnicke> oflibMichal: I am missing flow:
20:55:28 <edwarnicke> cookie=0x0, duration=16.196s, table=0, n_packets=0, n_bytes=0, send_flow_rem dl_type=0x88cc actions=CONTROLLER:65535
20:56:31 <oflibMichal> I missed the one with table:20
20:58:24 <djx> tbachman if you want to update your stuff https://wiki.opendaylight.org/view/CrossProject:Integration_Group:Create_System_Test_Environment
20:58:47 <tbachman> djx: awesome — thx! :
20:58:48 <tbachman> :)
20:59:21 * tbachman will wait to download, as he’s currently tethering, and doesn’t want to kill his monthly GB limit ;)
20:59:32 <tbachman> (test VM)
21:00:00 * djx feels sorry for tbachman GB limit
21:00:05 <tbachman> lol
21:05:09 <michal_rehak> tbachman: is this ok?
21:05:10 <michal_rehak> org.opendaylight.ovsdb.openstack.net-virt-providers - 1.0.0.SNAPSHOT | Unable to spot Bridge Identifier for br-ex in OVS|10.0.42.16:39032
21:05:32 <michal_rehak> 2014-09-21 23:03:35,115 | ERROR | otification-1342 | OF13Provider                     | 340 - org.opendaylight.ovsdb.openstack.net-virt-providers
21:05:42 <tbachman> one sec....
21:15:38 <tbachman> sorry
21:15:38 <tbachman> back
21:15:51 <tbachman> br-ex shouldn’t be critical
21:16:12 <tbachman> this is just to test out the OVSDB failure, right?
21:16:49 <michal_rehak> tbachman: I just spotted this in karaf logs
21:17:44 <tbachman> it’s probably fine
21:18:16 <tbachman> I can double check, but I think that’s only needed in order to set up external traffic
21:18:28 <tbachman> they try to install some flows on it if it’s there
21:18:46 * tbachman notes that it has been quite some time since he’s poked around OVSDB code
21:21:05 <michal_rehak> tbachman: ok, I am exhausted for today - sorry that we did not move much further
21:21:14 <tbachman> michal_rehak: understood
21:21:22 <tbachman> almost 12:30 am there, right?
21:21:42 <michal_rehak> 23:21
21:21:48 <tbachman> ah
21:21:54 <tbachman> one hr off
21:22:06 <tbachman> michal_rehak: thx for your help!
21:22:15 <djx> michal_rehak you guys are shifted from the timezone?
21:22:20 <michal_rehak> we can go on tomorrow
21:22:33 <tbachman> michal_rehak: I will be up at 5am EST
21:22:39 <tbachman> if that’s early enough
21:22:43 <tbachman> 11am your time?
21:22:49 <michal_rehak> djx: no, standard CEST: GMT+2 during summer
21:23:03 <tbachman> djx: late for you, too! ;)
21:23:37 <djx> here is only 22:23 but I thought michal had 2 hours more than me
21:24:32 <edwarnicke> michal_rehak: What is the status of hideyuki 's bug 1988 ?
21:24:37 <tbachman> michal_rehak: one other thing
21:24:53 <tbachman> If you want to try to recreate the GBP test setup
21:24:54 <tbachman> https://wiki.opendaylight.org/view/Group_Policy:Building_and_Running
21:24:56 <michal_rehak> edwarnicke: need to replicate
21:25:14 * tbachman realizes that he needs to update this to reflect running using karaf
21:25:15 <edwarnicke> michal_rehak: Where you able to get enough info from hideyuki to replicate it tomorrow morning?
21:25:20 <tbachman> oflibMichal: same: https://wiki.opendaylight.org/view/Group_Policy:Building_and_Running
21:25:43 <michal_rehak> edwarnicke: nothing new - I will try with OF-1.0
21:27:19 <edwarnicke> michal_rehak: ACK
21:27:25 <edwarnicke> oflibMichal: How are you doing?
21:30:09 <oflibMichal> edwarnicke: nothing new
21:30:17 <oflibMichal> i am like deleted
21:30:39 <tbachman> oflibMichal: like kill -9 deleted?
21:31:24 <tbachman> oflibMichal: any other info I can provide?
21:31:36 <oflibMichal> i guess we have all the info
21:31:51 <oflibMichal> just need to create som statistics inside ofjava
21:32:06 <tbachman> oflibMichal: okay
21:32:07 <tbachman> I may head out for a bit
21:32:08 <oflibMichal> but i am affraid that's for tomorrow
21:32:21 <tbachman> oflibMichal: sounds good
21:33:56 <oflibMichal> need to go off
21:34:02 <tbachman> okay
21:34:08 <edwarnicke> oflibMichal: Pick up again tomorrow morning?
21:34:10 <tbachman> will see you in the a.m.
21:34:13 <oflibMichal> sorry that I couldn't help more :-(
21:34:19 <tbachman> oflibMichal: we’ll get there!
21:34:28 <oflibMichal> edwarnicke: definitely
21:37:19 <hideyuki> Hi all, I tested sveral times on Bug 1988, and confirmed that bug 1988 is not fixed.
21:37:23 <hideyuki> https://bugs.opendaylight.org/show_bug.cgi?id=1988#c8
21:37:50 <hideyuki> It seems to me that the result of the issue is not changed.
21:40:36 <edwarnicke> hideyuki: Michal is going to try to recreate again tomorrow morning with OF10
21:40:47 <edwarnicke> hideyuki: Could you get up early and get on IRC so we can collaborate on it?
21:44:40 <hideyuki> OK.
21:44:58 <hideyuki> edwarnicke: Thank you!
08:24:15 <mbobak> hello guys
09:40:38 <tbachman> mbobak: hi :)
09:41:21 <tbachman> oflibMichal: michal_rehak hey guys :_
09:41:23 <tbachman> :)
09:48:45 * vishnoianil is away: Break time, will be back soon
09:50:19 <alagalah> Morning edwarnicke michal_rehak
09:55:20 <oflibMichal> tbachman: hi there
09:55:32 <tbachman> oflibMichal: hi! :)
09:55:50 <oflibMichal> anything new ?
09:55:54 <tbachman> lol
09:55:59 <tbachman> was going to ask the same :)
09:56:04 <oflibMichal> :-)
09:56:07 <tbachman> did you need more information?
09:56:17 <oflibMichal> not yet
09:56:18 <tbachman> I can add logs, etc.
09:56:18 <tbachman> k
09:56:29 <oflibMichal> i have implemented some logs
09:56:40 <tbachman> ah, k
09:57:06 <tbachman> I guess a key question is test env
09:57:08 <oflibMichal> and it shows that the number of flows that entered ofjava is the same as is written to the switch
09:57:15 <oflibMichal> in wrong case scenario
09:57:15 <tbachman> oflibMichal: right
09:57:18 <tbachman> right
09:57:25 <alagalah> morning oflibMichal
09:57:32 <oflibMichal> i wasn't able to reproduce successful scenario this morning
09:57:35 <tbachman> oflibMichal: keep in mind: we actually see all the flow mod messages on the wires
09:57:41 <oflibMichal> alagalah: hello
09:57:51 <tbachman> oflibMichal: ah — you set up the GBP demo/POC?
09:58:26 * alagalah has already taken note we need to update POC instructions for Karaf
09:58:28 <oflibMichal> tbachman: i tried with ovsdb instructions so far
09:58:29 <tbachman> just to clarify — it’s just the group mod message to the one switch that’s missing
09:58:34 <tbachman> oflibMichal: ah, I see
09:58:41 <tbachman> those *may* be different issues
09:58:45 <oflibMichal> will try with the gbp demo now
09:58:55 <tbachman> the GBP one seems to be a missing group message
09:59:10 <oflibMichal> i will follow these instructions
09:59:10 <oflibMichal> https://wiki.opendaylight.org/view/Group_Policy:Building_and_Running
09:59:14 <alagalah> oflibMichal: If its helpful I have the POC as OVAs exported
09:59:21 <alagalah> (dropbox)
10:00:25 <tbachman> oflibMichal: the other possibility is that I could run something for you
10:00:41 <tbachman> (i.e. either tell or send me the changes, I can build and run)
10:01:12 <oflibMichal> hmm, I will try to run the demo first
10:01:18 <tbachman> okay
10:01:24 <tbachman> you need two VMs, fwiw
10:01:50 <tbachman> b/c it uses two vSwitches
10:02:10 <oflibMichal> ou, ok
10:02:37 <oflibMichal> so we might first want to try your code modification
10:02:43 <tbachman> oflibMichal: okay
10:02:52 <tbachman> just let me know what to change/add
10:03:14 <tbachman> you could also push a “DO NOT MERGE” gerrit, that I could fetch
10:03:28 <tbachman> whatever works
10:03:34 <oflibMichal> ok, that would be the best option i guess
10:03:50 <alagalah> tbachman: Let me know if you want me to do the same...
10:03:59 <tbachman> alagalah: thx!
10:04:03 <tbachman> let me try it out first
10:04:21 <tbachman> I have all the build stuff now on a separate VM
10:04:52 * vishnoianil is back (gone 00:16:07)
10:04:52 <tbachman> oflibMichal: we’ve added in some debug messages to the openflowplugin, etc.
10:04:59 <alagalah> tbachman: Cool! We just need to iterate a lot over this since its a bit of a heisenbug.... when you say "build stuff" how is that different from my setup which is Ed's integrated logging ?
10:05:02 <tbachman> do you want/need any of those?
10:05:17 <tbachman> alagalah: did you build that integration?
10:05:24 <tbachman> Or just download it from ed?
10:05:28 <alagalah> tbachman: Just downloaded
10:05:52 <tbachman> I guess we could just build from that
10:06:04 <oflibMichal> I believe that they might be helpful
10:06:05 * tbachman goes off to check the size of that
10:06:12 <oflibMichal> *those log from openfowplugin
10:06:32 <alagalah> tbachman: oflibMichal Those are just Karaf debug levels though, right?
10:06:54 <tbachman> alagalah: we added some patches to the openflowplugin, I believe
10:07:13 <oflibMichal> alagalah: tbachman can you provide me your gerrit emails ?
10:07:15 <tbachman> alagalah: I think it’s in that gdoc for bug tracking
10:07:18 <alagalah> tbachman: Ack, I recall now, and to the controller, but that was prior to Ed sending a pre-built zip
10:07:19 <oflibMichal> i pushed a dfraft
10:07:27 <tbachman> oflibMichal: thx!
10:07:41 <alagalah> oflibMichal: alagalah@gmail.com for me
10:09:07 <oflibMichal> tbachman: and what is your gerrit email please ?
10:09:09 <tbachman> oflibMichal: tbachman@yahoo.com to add me
10:09:11 <tbachman> thx
10:09:30 <oflibMichal> ok, added both
10:09:36 <alagalah> oflibMichal: ty!
10:09:48 <oflibMichal> it just logs when any flowmod or group mod message is received in openflowjava
10:10:04 <oflibMichal> and also when these messages are about to be serialized
10:10:31 <oflibMichal> when i run the ovsdb demo ... i also used only the Error log level and only on openflowjava
10:10:56 <tbachman> oflibMichal: so…. the messages that Ed added were for debug level
10:11:00 <tbachman> do you want those?
10:11:14 <oflibMichal> we can try running with them
10:11:25 <tbachman> okay
10:11:25 <oflibMichal> and if we are unable to reproduce the issue
10:11:27 <tbachman> actually
10:11:34 <tbachman> can I do this first with your traces
10:11:34 <oflibMichal> we might use only openflowjava logs
10:11:43 <michal_rehak> morning
10:11:46 <oflibMichal> sure, it is up to you
10:11:50 <tbachman> b/c I still have to pull in and build the changes for openflowplugin
10:11:57 <tbachman> if we need it, I’ll add those as a second iteration
10:11:59 <tbachman> oflibMichal: thx
10:12:00 <oflibMichal> ok
10:12:05 <tbachman> michal_rehak: afternoon ;)
10:12:46 <alagalah> Hey michal_rehak
10:15:16 * vishnoianil is away: Break time, will be back soon
10:20:44 <tbachman> oflibMichal: building my setup
10:20:57 <tbachman> give me a bit
10:30:40 <michal_rehak> tbachman: I got an idea - how do you guys setup 2 mininets?
10:30:51 * tbachman listens in
10:30:59 <tbachman> michal_rehak: we set them up using a python script
10:31:01 <michal_rehak> tbachman: is there a parameter for OVS to force dpid?
10:31:12 <tbachman> michal_rehak: not sure
10:31:17 * tbachman goes off to consult google
10:31:30 <alagalah> michal_rehak: We use python scripts to setup everything, the mininet, the policies, and then REST calls into controller
10:37:05 <michal_rehak> tbachman: could you point me to those scripts
10:38:02 <tbachman> https://wiki.opendaylight.org/view/Group_Policy:Building_and_Running
10:38:08 <tbachman> michal_rehak: actually, wait
10:38:19 <tbachman> let me just give you a pointer in gitweb
10:38:53 <michal_rehak> tbachman: because having 2 mininets with same dpid will render into session fights
10:39:09 * tbachman is confused
10:39:13 <michal_rehak> and disconnect previous device +reconnect
10:39:13 <tbachman> you want the same dpid?
10:39:24 <tbachman> michal_rehak: I think they have different ones already?
10:39:43 <michal_rehak> no - one controller can not handle 2 nodes with same dpid
10:39:46 <tbachman> or, at least  that’s how they’re showing up int the controller
10:39:58 <tbachman> i.e. we see openflow:1 and openflow:2
10:40:34 <tbachman> michal_rehak: https://git.opendaylight.org/gerrit/gitweb?p=groupbasedpolicy.git;a=tree;f=util/testOfOverlay;h=ee972ca7dbc6a62247ac9340bde62c60ab5c865c;hb=HEAD
10:48:35 <tbachman> oflibMichal: okay, so I ran it and got the failure case
10:48:43 <tbachman> there are some ERROR messages from openflowjava in the logs
10:49:04 <oflibMichal> sure, there should be some
10:49:14 <oflibMichal> i did implement it intentionally as error
10:49:18 <tbachman> oflibMichal: okay
10:49:19 <oflibMichal> messages
10:49:25 <tbachman> do you want me to email you the log?
10:49:29 <oflibMichal> you should se two types
10:49:38 <oflibMichal> tbachman: please do
10:49:42 <tbachman> okay
10:51:50 <tbachman> sent
10:51:51 <oflibMichal> tbachman: log received ... thx
10:51:54 <tbachman> np
10:51:59 <tbachman> thank YOU :)
10:54:49 <tbachman> oflibMichal: let me know you’d like the logs from openflowplugin
11:15:22 <alagalah> oflibMichal: michal_rehak FYI, we have distinct DPIDs for each switch
11:18:07 <alagalah> alagalah@odlgbp1:~$ sudo ovs-ofctl show s1 -O OpenFlow13
11:18:08 <alagalah> OFPT_FEATURES_REPLY (OF1.3) (xid=0x2): dpid:0000000000000001
11:18:08 <alagalah> alagalah@odlgbp2:~$ sudo ovs-ofctl show s2 -O OpenFlow13
11:18:09 <alagalah> [sudo] password for alagalah:
11:18:11 <alagalah> OFPT_FEATURES_REPLY (OF1.3) (xid=0x2): dpid:0000000000000002
11:28:38 <oflibMichal> tbachman: ok, so the log you sent me, this is a failure case ?
11:28:46 <tbachman> oflibMichal: yes
11:28:54 <oflibMichal> because i can see 114 flowmods
11:28:59 <tbachman> oflibMichal: okay
11:29:03 <oflibMichal> and just 2 groupmods
11:29:08 <tbachman> interesting
11:29:13 <tbachman> not 4 groupmods?
11:29:18 <oflibMichal> can you combine with the openflowplugin statistics ?
11:29:29 <tbachman> oflibMichal: the jconsole ones?
11:29:32 <oflibMichal> no, only 2 groupmods in logs
11:30:12 <oflibMichal> hmmm ... not sure
11:30:17 <oflibMichal> those where
11:30:30 <oflibMichal> we can see the addgroupinput logs
11:30:36 <oflibMichal> from message spy
11:30:55 <oflibMichal> those that count messages that were sent into ofjava from ofplugin
11:31:10 <tbachman> oflibMichal: pastebin coming up
11:31:18 <oflibMichal> great
11:32:11 <tbachman> oflibMichal: http://pastebin.com/wQPhBEkz
11:32:32 <oflibMichal> ok ... TO_SWITCH_SUBMITTED_SUCCESS: MSG[AddGroupInput] -> +2 | 2
11:32:39 <tbachman> oflibMichal: this is interessting
11:32:40 <tbachman> in this case
11:32:41 <tbachman> we only see 2
11:32:42 <tbachman> which is different from the cases before
11:32:52 * tbachman is even more confused
11:33:04 <tbachman> TO_SWITCH_ENQUEUED_SUCCESS: MSG[AddGroupInput] -> +4 | 4
11:33:11 <tbachman> so 4 came in
11:33:13 <tbachman> and 2 went out
11:33:24 <tbachman> we have a roach motel ;)
11:33:25 <tbachman> lol
11:33:39 * tbachman would understand if oflibMichal doesn’t follow that one — joke in the u.s.
11:33:44 <tbachman> (commercial)
11:33:47 <tbachman> (in u.s.)
11:33:58 <oflibMichal> no i don't :-(
11:34:00 <tbachman> lol
11:34:01 <tbachman> so
11:34:09 <tbachman> do those counters say what I think?
11:34:20 <tbachman> i.e. that 2 came in to openflowjava
11:34:23 <tbachman> sorry
11:34:26 <tbachman> 4 came in to openflowjava
11:34:36 <tbachman> and 2 left openflowjava
11:34:41 <oflibMichal> no
11:34:54 <oflibMichal> (1:33:04 PM) tbachman: TO_SWITCH_ENQUEUED_SUCCESS: MSG[AddGroupInput] -> +4 | 4
11:35:01 <oflibMichal> where do you have this log from ?
11:35:08 <tbachman> that pastebin
11:35:13 <oflibMichal> because the pastebin you shared contains only 2 group mods
11:35:23 <tbachman> oflibMichal: edwarnicke said there were two parts to this
11:35:25 <tbachman> the enqueued
11:35:26 <oflibMichal> oh i see
11:35:28 <tbachman> and the submitted
11:35:30 <oflibMichal> my fault
11:35:31 <tbachman> np
11:35:38 <oflibMichal> so this means
11:35:44 <tbachman> oflibMichal: fwiw
11:35:45 <tbachman> before
11:35:50 <oflibMichal> that 4 groupmods were enqueued in openflowplugin
11:35:55 <oflibMichal> but only 2 reached ofjava
11:35:59 <tbachman> okay
11:36:00 <tbachman> fwiw
11:36:03 <tbachman> before
11:36:06 <tbachman> we did see 4 and 4
11:36:15 <tbachman> (should be in that tarball I sent you before)
11:36:26 <oflibMichal> yes, agree
11:36:31 <tbachman> oflibMichal: I can re-run
11:36:36 <tbachman> and make sure that’s still the case
11:36:50 <oflibMichal> tbachman: agree, it might tell us more
11:36:54 <tbachman> I probably should include a local version of the openflowplugin with those logs
11:37:02 <tbachman> to make sure we have apples-apples
11:37:09 <oflibMichal> true
11:37:22 <tbachman> okay
11:37:24 <oflibMichal> please send me the new log when you have it
11:37:24 <tbachman> I’ll do that
11:49:36 <tbachman> oflibMichal: sorry this is taking a bit
11:49:40 <tbachman> openflowplugin build for first time
11:49:42 <tbachman> :)
11:49:53 <tbachman> in karaf tests now
11:50:55 <oflibMichal> tbachman: that's ok ... no worries
11:51:15 <tbachman> oflibMichal: there were some prints in the controller
11:51:18 <tbachman> for FRM
11:51:22 <tbachman> is it okay if we don’t have those?
11:51:26 <tbachman> I can add them
11:51:29 <tbachman> but will be a bit longer
11:51:47 <tbachman> (seems like we’re okay with FRM - it’s ofplugin and ofjava that we’re focusing on?)
11:52:44 <oflibMichal> well
11:52:47 <oflibMichal> up to you
11:52:49 <tbachman> k
11:52:52 <tbachman> let’s start w/o it
11:52:55 <oflibMichal> we can try without frm
11:53:06 <tbachman> and I’ll build those while we’re testing w/o it
11:53:14 <oflibMichal> ok
11:59:16 <edwarnicke> oflibMichal: What do these logs mean (I see them in the case where I get missing flows for OVSDB):
11:59:18 <edwarnicke> https://www.irccloud.com/pastebin/tviasVNL
11:59:28 <edwarnicke> I have one such log message for every missing flow
12:00:41 <oflibMichal> it reports missing BarrierReply message for BarrierRequest send
12:00:52 <edwarnicke> oflibMichal: Could that cause flow programming failures?
12:01:00 <edwarnicke> oflibMichal: And how long are we waiting  for them?
12:03:45 <tbachman> edwarnicke: good morning :)
12:05:00 <tbachman> edwarnicke: I’m setting up my own build environment, so that you don’t have to be my build proxy :)
12:05:10 * tbachman should have thought of that before
12:05:12 <edwarnicke> tbachman: Excellent :)
12:05:39 <tbachman> the only piece I haven’t included is controller, but will do that next
12:05:44 <tbachman> then I should have the same stuff
12:05:58 <tbachman> (controller, openflowplugin, openflowjava, groupbasedpolicy, integration)
12:07:04 <oflibMichal> edwarnicke: don't know, need to check if there can be a barrier request inside a barrier request
12:07:22 <edwarnicke> ACK
12:08:10 <oflibMichal> edwarnicke: and can you see these logs in successful scenario ?
12:10:27 <alagalah> morning edwarnicke
12:13:47 <oflibMichal> edwarnicke:  Upon receipt, the switch must finish processing all previously-received messages, including sending corresponding reply or error messages, before executing any messages beyond the Barrier Request.
12:13:59 <oflibMichal> sending a lot of barriers might cause us problems
12:14:07 <oflibMichal> timeouts and so on
12:15:20 <oflibMichal> tbachman: how goes the testing ?
12:15:35 <tbachman> oflibMichal: I got it to happen
12:15:36 <tbachman> but
12:15:41 <tbachman> just verifying that I have the logs
12:15:48 <oflibMichal> ok, great
12:16:26 <tbachman> okay
12:16:49 <tbachman> I think I have them
12:17:38 <tbachman> oflibMichal: will send the logs on
12:17:44 <tbachman> I think you only need karaf.log.1
12:17:46 <tbachman> fwiw
12:18:24 <oflibMichal> yes, and also the ofplugin (message spy) statistics
12:18:29 <oflibMichal> to verify
12:19:11 <tbachman> oflibMichal: will get those now
12:20:52 <tbachman> oflibMichal: jconsole stats: http://pastebin.com/M1KkT0GR
12:21:09 <tbachman> oflibMichal: so, this case is different
12:21:10 <tbachman> 4 came in
12:21:12 <tbachman> 4 went out
12:21:17 <edwarnicke> oflibMichal: tbachman Just tried the ovsdb case, and got flows going missing in OFplugin:
12:21:27 <edwarnicke> https://www.irccloud.com/pastebin/dR38nlxX
12:21:28 <tbachman> edwarnicke: that’s good!
12:21:32 <tbachman> reproduceable
12:21:33 <tbachman> and isolated
12:21:42 <edwarnicke> TO_SWITCH_ENQUEUED_SUCCESS: MSG[AddFlowInput] -> +11 | 11
12:21:43 <tbachman> oflibMichal: not sure what to make of the 2/4 on the last run
12:21:48 <edwarnicke> TO_SWITCH_SUBMITTED_SUCCESS: MSG[AddFlowInput] -> +9 | 9
12:22:02 <tbachman> other than the fact that it wasn’t the same as what we ran before (i.e. openflowplugin wasn’t built locally)
12:22:03 <edwarnicke> tbachman: l also have wireshark
12:22:07 <tbachman> edwarnicke: nice!
12:22:08 <edwarnicke> tbachman: oflibMichal investigating
12:22:11 <tbachman> is the message on the wire?
12:22:12 <tbachman> ah, h
12:22:13 <tbachman> k
12:22:25 <edwarnicke> My gut is telling me this has something to do with barrier
12:22:48 <edwarnicke> oflibMichal: michal_rehak how are we handling barriers?
12:24:21 <oflibMichal> tbachman: there are two attachments in your email
12:24:30 <tbachman> oflibMichal: do karaf.log.1
12:24:32 <tbachman> I think it’s in there
12:24:35 <tbachman> there’s log rotation
12:24:41 <oflibMichal> what is the difference between karaf.log and .1 ?
12:24:55 <tbachman> log messages are probably in karaf.log.1
12:25:04 <tbachman> when it exceeds 1MB in size
12:25:12 <tbachman> it saves the current log as karaf.log.<n>
12:25:16 <tbachman> where <n> increments
12:25:22 <tbachman> and restarts with a clean karaf.log
12:25:28 <tbachman> so, karaf.log.1 is the oldest log messages
12:25:29 <oflibMichal> because i can see 639 flows entered ofjava in log *.1
12:25:31 <tbachman> karaf.log is new
12:25:38 <tbachman> oflibMichal: wow :)
12:25:54 <oflibMichal> ok, so i shall check only the karaf.log ... correct ?
12:26:08 <tbachman> ah — I think your messages only showe up in karaf.log.1?
12:26:25 <tbachman> oflibMichal: fwiw, these log messages are all from the same run
12:26:33 <tbachman> you could cat them into a single big file if you prefer :)
12:34:59 <tbachman> oflibMichal: http://pastebin.com/NkR6Rxrc
12:35:08 <tbachman> that log shows your message
12:38:41 <oflibMichal> tbachman: any chance that you have wireshark capture ?
12:38:51 <tbachman> oflibMichal: I didn’t get it that time
12:38:52 <tbachman> but
12:38:55 <tbachman> I can get one if you want
12:39:08 <tbachman> oflibMichal: I can probably tell you whether it’s the same error or not though
12:39:11 <tbachman> (i.e no group flow)
12:40:11 <tbachman> 2014-09-22T10:17:28.675Z|00031|connmgr|INFO|s1<->tcp:192.168.194.137:6653: sending OFPBAC_BAD_OUT_GROUP error reply to OFPT_FLOW_MOD message
12:40:12 <tbachman> 2014-09-22T10:17:28.697Z|00032|connmgr|INFO|s1<->tcp:192.168.194.137:6653: sending OFPBAC_BAD_OUT_GROUP error reply to OFPT_FLOW_MOD message
12:40:22 <tbachman> those are the ovs-vswitchd logs
12:40:33 <tbachman> this is what we see when the group flows aren’t sent
12:40:34 <tbachman> so
12:40:43 <tbachman> even though we don’t have the wireshark traces
12:40:51 <tbachman> this looks exactly like what we’ve seen before
12:40:57 <oflibMichal> hmm, ok - so the statistics probably don't work correctly
12:41:21 <tbachman> oflibMichal: I will do this again and capture the wire with wireshark
12:41:24 <tbachman> that way we’ll know for sure
12:41:34 <oflibMichal> because your log shows that there was 4 groupmods enqueued and 4 submitted
12:41:42 <tbachman> right
12:41:44 <oflibMichal> but in the karaf.log.1
12:41:55 <oflibMichal> there are only 2 ofjava logs
12:41:57 <tbachman> yeah
12:42:01 <oflibMichal> on group mod message received
12:42:02 <tbachman> so, that is certainly true
12:42:15 <tbachman> and is consistent with what we’ve seen on the failure case
12:43:45 <oflibMichal> so those groups mods ale probably lost somewhere inside ofplugin - somewhere between counting and sending ...
12:44:01 <tbachman> okay
12:44:12 <oflibMichal> at least i undestand it this way
12:44:12 <tbachman> oflibMichal: do you still want a run with the wireshark captures?
12:44:38 <oflibMichal> hmm, give me one second
12:44:43 <oflibMichal> need to check one more thing
12:45:01 <tbachman> k
12:46:44 <oflibMichal> hmm, I believe that the wireshark capture is not needed at the moment
12:47:05 <oflibMichal> ofjava clearly states that it is encoding only two group mods in last run
12:47:07 <tbachman> oflibMichal: okay
12:58:14 <edwarnicke> tbachman: Do you see all of your groupmods in the config data store?
12:58:19 <edwarnicke> tbachman: In the failed case?
12:58:31 <edwarnicke> tbachman: Or do you see from the OFplugin stats that you are loosing them in the plugin?
12:58:39 <tbachman> edwarnicke: since they show up in openflowplugin, I’m guessing they’re there
12:58:48 <tbachman> 4 in, 4 out
12:58:55 <tbachman> but I can check
12:59:00 <edwarnicke> tbachman: I thought you saw a 4 in 2 out at one point
12:59:07 <tbachman> edwarnicke: not sure about that case
12:59:08 <tbachman> b/c
12:59:16 <tbachman> I didn’t have a locally built openflowplugin
12:59:24 <tbachman> I wasn’t sure if it was truly apples-apples
12:59:29 <edwarnicke> tbachman: No worries
12:59:40 <edwarnicke> tbachman: Just keep checking the stats each run and keep an eye out
12:59:45 <tbachman> I also (unfortunately) don’t have that setup running any more
12:59:49 <tbachman> edwarnicke: fair enough
12:59:54 <tbachman> maybe that’s what I need to do here
12:59:56 <tbachman> run a few times
13:00:01 <tbachman> collect more data
13:00:32 <edwarnicke> tbachman: I'd stick with oflibMichal
13:00:40 <edwarnicke> tbachman: Just suggesting when you do do a run, check stats
13:00:47 <tbachman> k
13:09:58 <alagalah> Back
13:10:13 <alagalah> (sorry, need to head into office in 20min, but will be on line at top of the hour)
13:13:11 <tbachman> edwarnicke: I just ran 4 times, and they all failed the same way (4 in, 4 out)
13:13:21 <edwarnicke> tbachman: Good to know :)
13:13:28 <edwarnicke> oflibMichal: Any clues from your side?
13:13:35 <tbachman> okay — going to move on to other things
13:13:42 <tbachman> oflibMichal: anything I can provide or help with?
13:13:48 <tbachman> I’ll still be around
13:13:52 <tbachman> just focusing on other things
13:14:34 <oflibMichal> tbachman: nothing that comes into my mind, we have seen that the count are different - so i am trying to investigate this issue
13:14:42 <tbachman> oflibMichal: okay
13:14:50 <oflibMichal> and michal_rehak will try to debug the barrier issue
13:14:59 <tbachman> if there’s something that you’d like me to try, just push an amendment to that gerrit, and I’ll build and retest
13:15:20 <edwarnicke> oflibMichal: Is the situation that the stats say you are getting 4 groupMods but the logs only show 2 coming in ?
13:15:44 <tbachman> edwarnicke: I believe that’s the case
13:16:02 <tbachman> we only see his log message in there twice
13:16:06 <tbachman> but stats show 4 in and 4 out
13:16:14 <oflibMichal> true
13:18:01 <edwarnicke> oflibMichal: Any idea how to debug further?
13:18:37 <oflibMichal> well, i am investigating stats collection
13:18:46 <oflibMichal> when the future is set and so on
13:19:03 <oflibMichal> if there is a place where the group mod is overwritten or so
13:19:31 <edwarnicke> HA!  I found the step we are loosing flows for ovsdb :)
13:19:42 <tbachman> :)
13:19:47 <tbachman> edwarnicke: congrats!
13:19:53 * tbachman claps approvingly for edwarnicke
13:20:11 <oflibMichal> edwarnicke: wow, where's the catch ?
13:21:21 <edwarnicke> We get 11 flows that come into OFRpcTaskFactory line 195, and only 7 that go out at line 206
13:21:26 <edwarnicke> Its the barrier handling
13:22:05 <edwarnicke> oflibMichal: michal_rehak Do we do barrrier handling for Groups
13:22:06 <edwarnicke> ?
13:22:14 <edwarnicke> tbachman: Do you guys set barrier for your groups?
13:22:58 <michal_rehak> edwarnicke: by default almost all rpc do handle barrier
13:23:03 <michal_rehak> .. if modeled
13:23:32 <tbachman> edwarnicke: will check
13:23:35 <tbachman> more importantly
13:23:56 <tbachman> would this change in behavior be the result of just performance improvements in the plugin?
13:24:05 <tbachman> or was there a change in barrier handling recently
13:24:08 <edwarnicke> tbachman: Possibly
13:24:30 <edwarnicke> tbachman: So if we are just being impatient about barriers, that could explain it
13:24:37 <edwarnicke> michal_rehak: are you around?
13:25:01 <michal_rehak> edwarnicke: yes
13:25:50 <michal_rehak> edwarnicke: there is max timeout while waiting for barrier response
13:25:57 <edwarnicke> michal_rehak: I have found were we are loosing OVSDB's flows for 1997
13:26:05 <edwarnicke> michal_rehak: Where is it set?
13:26:10 <michal_rehak> 1sec
13:26:33 <michal_rehak> edwarnicke: hardcoded in ModelDrivenSwitchImpl, line 120
13:26:58 <edwarnicke> michal_rehak: So what do we do if we don't see the barrier reponse in time?
13:27:01 <michal_rehak> this is common singleton object for all rpc (similar to lightweight pattern)
13:27:16 <tbachman> wait
13:27:19 <tbachman> by hard-coded
13:27:27 <tbachman> does that mean it ignores what the user passed in?
13:27:34 <michal_rehak> the we get rpcResponse containing exception and success=false
13:27:56 <michal_rehak> rpc wont continue - no flowMod/groupMod sent
13:28:11 <tbachman> edwarnicke: it looks like we do .setBarrier(false) on all of ours
13:28:19 <michal_rehak> no, user can specify timeout
13:28:31 <michal_rehak> and rpc contains of barrier + flowMod
13:28:33 <edwarnicke> tbachman: Both flows and groups ?
13:28:41 <tbachman> edwarnicke: I believe all of them
13:28:44 <michal_rehak> those are rpc towards ofjava and both async
13:28:56 <edwarnicke> michal_rehak: But what happens if we don't see a barrier response in time?
13:29:06 <michal_rehak> so we chain 2 futures and user can specify timeout for the chain product
13:29:42 <michal_rehak> then rpc returns failure wrapped inside of RpcResult and does not continue with the message
13:30:10 <edwarnicke> michal_rehak: OK... so if the user requested a barrier
13:30:14 <edwarnicke> michal_rehak: We send a barrier
13:30:20 <edwarnicke> michal_rehak: If we don't see a barrier response in time
13:30:29 <edwarnicke> michal_rehak: We don't send the subsequent message, correct?
13:30:58 <michal_rehak> yes
13:31:08 <michal_rehak> there is 1 sec for the barrier response
13:31:13 <edwarnicke> OK... so does that mean a flowMod with a barrier could wait as much as a second and then fail?
13:31:23 <michal_rehak> so even if user specified longer timeout
13:31:30 <michal_rehak> right
13:31:44 <michal_rehak> but then we should not see it in stats as leaving ofplugin
13:31:47 <edwarnicke> michal_rehak: Question... why are we sending barriers before instead of after?
13:32:04 <edwarnicke> michal_rehak: For the ovsdb case I see in the stats 11 in and 7 out
13:32:21 <michal_rehak> good question
13:32:41 <edwarnicke> michal_rehak: How hard would it be to send it after instead of before?
13:32:46 <michal_rehak> probably better approach is to send it afterwards and add plain barrier to API
13:32:47 * edwarnicke is just exploring possibilities
13:33:12 <michal_rehak> quite simple
13:33:44 <michal_rehak> Collection<RpcError> barrierErrors = OFRpcTaskUtil.manageBarrier(getTaskContext(), getInput().isBarrier(), getCookie());
13:33:45 <michal_rehak> if (!barrierErrors.isEmpty()) {
13:33:45 <michal_rehak> OFRpcTaskUtil.wrapBarrierErrors(((SettableFuture<RpcResult<UpdateGroupOutput>>) result), barrierErrors);
13:33:45 <michal_rehak> } else { ... message } return result;
13:34:06 <michal_rehak> this pattern is all over the OFRpcTaskFactory
13:34:57 <edwarnicke> when you said 'quite simple' did you mean it would be 'quite simple' to have the barrier message sent after the flowMod instead of before?
13:37:15 <michal_rehak> yes
13:37:36 <michal_rehak> just need to attach right notification content
13:37:43 <edwarnicke> michal_rehak: Could you prepare a patch that does so?
13:41:56 <michal_rehak> edwarnicke: you mean today?
13:42:12 <edwarnicke> Yes... I mean ASAP... how long would it take to do?
13:48:45 <edwarnicke> michal_rehak: There?
13:49:33 <michal_rehak> hmm
13:49:35 <michal_rehak> 1 hour
13:50:30 <edwarnicke> michal_rehak: Excellent :)
13:50:33 <edwarnicke> michal_rehak: One other thing
13:50:49 <edwarnicke> michal_rehak: I am curious what you are planning to do with missing barrier requests?
13:51:09 <edwarnicke> michal_rehak: In terms of error handling
13:51:16 <edwarnicke> michal_rehak: For the 'after' case
13:54:00 <michal_rehak> good question
13:54:29 <michal_rehak> I guess there we wont put hardcoded timeout on the future
13:55:32 <edwarnicke> michal_rehak: So just send it after and then forget about it?
13:55:42 <edwarnicke> michal_rehak: Or something else?
13:58:48 <michal_rehak> the future would never return success
13:59:07 <michal_rehak> which does not expose the real state on device
13:59:39 <michal_rehak> but the future here is not mentioned to report truly state on device
14:00:01 <michal_rehak> so if app puts timeout then there would be exception on app side
14:00:16 <dave_tucker> hi folks - has anybody seen this before? https://gist.github.com/054171d0b26a6e580497
14:03:36 <oflibMichal> dave_tucker: hi dave, I haven't seen this log before, but it means that there is no deserializer for error message registered
14:03:49 <oflibMichal> *for experimenter error message
14:03:57 <tbachman> msgType: 65535 experimenterID: 8992
14:04:08 <dave_tucker> could it be ONFBIC_DUP_INSTRUCTION? seeing that sent from ovs -> controller
14:04:14 <dave_tucker> just making sure it's nothing serious
14:05:51 <tbachman> msgType is 0xFFFF, experimenterID is 0x2320?
14:06:06 <tbachman> 0x23 is #
14:06:08 <tbachman> 0x20 is space
14:06:10 <tbachman> ASCII
14:06:19 <tbachman> maybe not relevant
14:06:36 <edwarnicke> dave_tucker: That looks a lot to me like the switch sent an experimenter message we know nothing about
14:07:00 <dave_tucker> ok i'll treat it as an error from ovs right now. will dig down on root cause
14:07:34 <oflibMichal> edwarnicke: that's the case
14:12:24 * vishnoianil is back (gone 01:18:27)
14:12:55 <abhijitkumbhare> Hi folks
14:14:25 <abhijitkumbhare> any progress on https://bugs.opendaylight.org/show_bug.cgi?id=1997 ?
14:16:40 <edwarnicke> michal_rehak: I have verified that the barrier thing is what is causing 1997... ETA on a clean patch?
14:16:58 <abhijitkumbhare> Great edwarnicke
14:17:06 <edwarnicke> abhijitkumbhare: I have isolated the source of the failure for 1997... I have a 'machete' fix that I am using to test with... michal_rehak Is doing the 'clean' patch
14:17:20 <edwarnicke> abhijitkumbhare: Last I heard from him he was saying ETA of about an hour
14:17:26 <abhijitkumbhare> Great!
14:18:16 <edwarnicke> abhijitkumbhare: But... the downside is that GBPs issue is *not* 1997
14:18:24 <edwarnicke> abhijitkumbhare: So we still have that showstopper to fix
14:18:25 <abhijitkumbhare> mbobak also have been working  1985
14:18:29 <edwarnicke> abhijitkumbhare: But oflibMichal is on it :)
14:18:44 <abhijitkumbhare> No mbobak is on it
14:18:54 <abhijitkumbhare> that is also GBP raised
14:19:11 <edwarnicke> abhijitkumbhare: Which one?
14:19:37 <abhijitkumbhare> mbobak , vishnoianil were on https://bugs.opendaylight.org/show_bug.cgi?id=1985
14:19:52 <edwarnicke> abhijitkumbhare: This is not 1985
14:19:55 <abhijitkumbhare> they had found the solution to it during the night
14:20:00 <edwarnicke> abhijitkumbhare: Is a groupMod getting lost in OFplugin
14:20:13 <abhijitkumbhare> OK - let me check
14:22:06 <abhijitkumbhare> Is there a bug number for it? I don't see it
14:22:17 <edwarnicke> tbachman: alagalah Could you file a bug for your group issue?
14:22:33 <edwarnicke> abhijitkumbhare: We only just figured out this morning it was different from 1997
14:22:37 <tbachman> edwarnicke: sure
14:22:42 <abhijitkumbhare> OK
14:22:45 * tbachman didn’t realize that he hadn’t already ;)
14:23:08 <abhijitkumbhare> OK edwarnicke & tbachman
14:27:12 <abhijitkumbhare> mbobak - how goes it on  https://bugs.opendaylight.org/show_bug.cgi?id=1985 ?
14:28:30 <tbachman> abhijitkumbhare: edwarnicke: oflibMichal: https://bugs.opendaylight.org/show_bug.cgi?id=2023
14:28:49 <abhijitkumbhare> good
14:28:55 <abhijitkumbhare> thx tbachman
14:29:02 <tbachman> abhijitkumbhare: np!
14:29:17 <abhijitkumbhare> is oflibMichal already working on it?
14:29:40 <tbachman> abhijitkumbhare: yes :)
14:29:45 <abhijitkumbhare> great
14:30:02 <oflibMichal> abhijitkumbhare: yes, trying
14:30:10 <abhijitkumbhare> thx oflibMichal
14:31:04 <gzhao> meeting now
14:36:28 <edwarnicke> tbachman: Do you have your bug?
14:36:41 <tbachman> edwarnicke: I didn’t enter it into the spreadsheet
14:36:49 <tbachman> edwarnicke: https://bugs.opendaylight.org/show_bug.cgi?id=2023
14:36:52 <tbachman> I can add it if you want
14:37:06 <tbachman> hmmm
14:37:07 <tbachman> or not
14:37:10 <tbachman> can’t seem to edit
14:37:20 <tbachman> privileges, maybe
14:38:36 <edwarnicke> entered
14:38:46 * edwarnicke is full of magical super powers
14:43:51 <alagalah> tbachman: back
14:44:07 <tbachman> alagalah: :)
15:23:30 <tbachman> michal_rehak: question for you
15:23:35 <tbachman> we filed this bug: https://bugs.opendaylight.org/show_bug.cgi?id=1984
15:23:44 <tbachman> ttkacik addressed one point
15:24:02 <tbachman> but then he mentions “First exception sees that RegLoadNotifGroupDescStatsUpdatedCase was used
15:24:02 <tbachman> in place where it is not allowed by model.”
15:24:21 <tbachman> but I didn’t see RegLoadNotifGroupDescStatsUpdatedCase used anywhere in our code
15:24:30 <tbachman> could this be something that’s within openflowplugin?
15:26:16 <michal_rehak> tbachman: this is from nicira extensions
15:26:59 <tbachman> michal_rehak: so, this should be in our project?  Or in openflowplugin?
15:27:29 <tbachman> b/c I greped for the above and didn’t see it
15:27:34 <tbachman> but found it in openflowplugin
15:27:44 <michal_rehak> tbachman: in principle this should be somewhere outside ODL, but technically it is in OFPlugin repo
15:28:08 <tbachman> okay
15:28:13 <michal_rehak> those classes should be used when special nicira match/action is used in flow
15:28:16 <tbachman> just trying to find out what to file the bug against
15:28:25 <michal_rehak> otherwise not
15:31:07 <michal_rehak> tbachman: bug-1984 looks like you pushed nicira contaminated flow to device somehow and then statistics are trying to read that but corresponding classes are not loaded (missing nicira bundles in controller)
15:31:52 <tbachman> ah
15:32:05 <tbachman> is there a separate bundle for nicira statistics?
15:33:20 <edwarnicke> michal_rehak: How is 1997 going?
15:33:38 <michal_rehak> the barriers?
15:33:44 <edwarnicke> Yes
15:33:58 <michal_rehak> got distracted, but
15:34:08 <edwarnicke> michal_rehak: Ah... please make that your highest priority
15:34:09 <michal_rehak> I have catched some wireshark
15:34:20 <michal_rehak> ok, now all is gone, I am free to code
15:34:21 <michal_rehak> :-)
15:34:24 <edwarnicke> :)
15:34:45 <michal_rehak> should be matter of one example and then repeat until end of class
15:35:20 <edwarnicke> michal_rehak: Yep :)
15:35:36 <edwarnicke> michal_rehak: Did we loose oflibMichal ?
15:56:49 <michal_rehak> went home
15:56:57 <michal_rehak> there he will connect again
16:47:54 <edwarnicke> michal_rehak: Good to know :)
17:17:51 <edwarnicke> tbachman: Have you gotten to MessageDispatchServiceImpl ?
17:18:05 <edwarnicke> MessageDispatchServiceImpl.getGroupMod() ?
17:18:16 <tbachman> edwarnicke: just groupMod()
17:18:20 <tbachman> I used that print
17:18:21 <tbachman> saw it 4x
17:18:24 <tbachman> so, that looks good
17:18:26 <tbachman> but am narrowing
17:18:28 <edwarnicke> So lets drill in deeper
17:18:29 <tbachman> to the actual call
17:18:30 <tbachman> yeah
17:19:58 <edwarnicke> Looks like ConnectionAdaptorImpl.groupMod() could use logging
17:22:43 <edwarnicke> Looks like we already have some good logs in ConnectionAdapter to turn on
17:22:54 <tbachman> yeah, I didn’t do those
17:23:05 <edwarnicke> Yes.. but you can use them :)
17:23:09 <tbachman> lol
17:23:17 * tbachman was admitting his guilt
17:23:18 <tbachman> lol
17:38:25 <edwarnicke> oflibMichal: Welcome back ! :)
17:38:36 <edwarnicke> tbachman: Could you summarize what you know for oflibMichal
17:38:37 <edwarnicke> ?
17:38:53 <tbachman> just a sec....
17:38:56 <oflibMichal> hi guys
17:48:21 <tbachman> okay
17:48:21 <tbachman> back
17:48:43 <tbachman> so, I printed the log messages in the RPC entry point in openflowjava
17:50:56 <tbachman> hmmm
17:51:04 * tbachman needs to go back and double-check his work
17:51:24 <tbachman> I’ll be back
17:55:33 <tbachman> sorry guys
17:55:40 <tbachman> just want to make sure I have my story straight
17:56:59 <edwarnicke> tbachman: What line number in what file is the debug and could you paste in the debug
17:57:10 <tbachman> edwarnicke: sure
17:57:24 <tbachman> line 133
17:57:34 <tbachman> in MessageDispatchServiceImpl.java
17:57:39 <tbachman> there’s already a debug log there
18:02:05 <edwarnicke> tbachman: And how many GroupMods do you see get there?
18:02:17 <tbachman> edwarnicke: that’s what am double-checking
18:02:32 * tbachman has jumped to too many conclusions before :(
18:02:40 <edwarnicke> michal_rehak: How are we doing on the fix for 1997 ?
18:02:48 <michal_rehak> 50%
18:03:17 <michal_rehak> got stuck with Barrier as it is not modeled in md-sal and there is no transactionaware version
18:04:35 * tbachman prepares pastebin
18:05:19 <tbachman> http://pastebin.com/fdzGChpG
18:05:20 <tbachman> okay
18:05:23 <tbachman> so, that shows 2 things
18:05:32 <tbachman> the print statment as the RPC is received in OFjava
18:05:35 <tbachman> (2)
18:05:57 <tbachman> and the print satement at line 133 of MessageDispatchServiceImpl.java
18:06:23 <tbachman> (4)
18:06:50 <tbachman> It seems that the print statement in OFjava is pretty much at the beginning of the RPC
18:06:51 <tbachman> so
18:06:56 <tbachman> I don’t see how it can be lost in there
18:07:07 <tbachman> So, I’m going to continue chasing this in openflowplugni
18:07:11 <tbachman> openflowplugin
18:07:19 <tbachman> (sorry — had some distractions of late)
18:07:44 <oflibMichal> so it look like we are losing group mod in the groupMod() method ?
18:08:14 <tbachman> oflibMichal: it appears so
18:08:25 <oflibMichal> hmmm ... ok
18:11:07 <tbachman> oflibMichal: michal_rehak: something interesting
18:12:27 <tbachman> http://pastebin.com/s5LcFQVH
18:12:35 <tbachman> will send you guys the complete log
18:15:14 <tbachman> michal_rehak: oflibMichal: sent you guys the log
18:17:02 <oflibMichal> tbachman: log received
18:17:05 <oflibMichal> thx
18:17:19 <tbachman> np
18:17:48 <tbachman> note the error in ConnectionAdapterImpl just below Calling OFLibrary groupMod
18:18:07 <tbachman> Received groupmodinput: 53 4
18:21:30 <tbachman> oflibMichal: michal_rehak: edwarnicke: it seems there’s useful information there?
18:22:25 <oflibMichal> I believe it is
18:22:40 <oflibMichal> somehow it looks that the condition
18:22:58 <oflibMichal> (!session.isValid())
18:23:11 <oflibMichal> is true in one case and false in the other case
18:23:30 <oflibMichal> must discuss when the valid flag is set with michal
18:23:35 <tbachman> okay
18:23:57 <tbachman> oflibMichal: actually
18:24:02 <tbachman> I think we see that print for all 4 caess
18:24:03 <tbachman> cases
18:24:10 <edwarnicke> tbachman: It sounds like you guys figured something out?
18:24:13 <tbachman> but we only see 2 error messages
18:24:21 <tbachman> edwarnicke: just getting further down the rabbit hole
18:24:29 <tbachman> : http://pastebin.com/s5LcFQVH
18:25:01 <edwarnicke> tbachman: So... let me make sure I understand
18:25:09 <edwarnicke> You see 4 messages go into OFjava, right?
18:25:11 <oflibMichal> ok, so lets analyze the http://pastebin.com/s5LcFQVH
18:25:16 <oflibMichal> with the code
18:25:39 <tbachman> wrong
18:25:45 <oflibMichal> I believe that there are two behaviors for method getConnectionAdapter(SwitchConnectionDistinguisher cookie)
18:25:46 <tbachman> well
18:25:48 <tbachman> I see 2
18:26:10 <oflibMichal> first it logs "finding connecton for cookie value null."
18:26:19 <tbachman> edwarnicke: by virtue of that print statement in OFjava
18:26:26 <tbachman> oflibMichal: but, we see that 4x
18:26:29 <oflibMichal> which is at line 78
18:26:32 <tbachman> right
18:26:40 <tbachman> I think it hits that for all 4 group mods
18:26:45 <tbachman> (2x switch)
18:26:59 <tbachman> I think that’s a “red herring"
18:27:11 * tbachman wonders if they have an analog for that in slovakia
18:27:27 <oflibMichal> so the condition if (!session.isValid()) is false and therefore is skipped
18:27:39 <tbachman> not skipped
18:27:43 <tbachman> just throws an exception
18:27:55 <oflibMichal> in last two cases
18:28:03 <oflibMichal> not in the first two
18:28:05 <tbachman> oflibMichal: we see this 2x: 2014-09-22 09:07:18,307 | ERROR | OFRpc-3          | ConnectionAdapterImpl            | 210 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.5.0.SNAPSHOT | Received groupmodinput: 53 4
18:28:18 <oflibMichal> exactly
18:28:44 <tbachman> so, we see this 4x: “Calling OFLibrary groupMod”
18:29:03 <tbachman> and this 4x (right after each Calling OFLibrary groupMod”): “finding connecton for cookie value null.”
18:29:09 <oflibMichal> that's correct problem is elsewhere
18:29:23 <oflibMichal> oh, so my fault
18:29:27 <tbachman> but we only see this 2x: “Received groupmodinput: 53 4”
18:29:35 <oflibMichal> i will check the log you sent, not the one from pastebin
18:29:51 <tbachman> and that last one is thrown in ConnectionAdapterImpl
18:29:52 <oflibMichal> there are only two finding connecton for cookie value null. logs
18:29:59 <tbachman> my guess is those are the 2 that are missing
18:30:47 <tbachman> oh
18:30:50 <tbachman> sorry oflibMichal
18:30:51 <tbachman> you’re right
18:30:53 <tbachman> slight difference
18:31:18 <oflibMichal> let me check with full log
18:31:20 <tbachman> “finding connecton for cookie value null” vs. “Session for the cookie null is invalid”
18:31:21 <oflibMichal> want to be sure
18:31:21 <tbachman> my bad
18:31:36 <oflibMichal> tbachman: ok, exactly that's my point
18:31:49 <oflibMichal> one session seems to be set to valid
18:31:54 <tbachman> right, sorry
18:31:55 <oflibMichal> while the other is not
18:32:21 <tbachman> saw “cookie” and “null” and thought they were the same print :)
18:32:28 <oflibMichal> therefor it throws an connectionException
18:32:39 <oflibMichal> tbachman: no problem :-)
18:32:49 <tbachman> right
18:32:50 <tbachman> sorry
18:33:15 <oflibMichal> and that's the reason why there are 2 groupMods missing ... at least that's how my understanding goes
18:33:25 <tbachman> exactly
18:33:30 <tbachman> so, if we can answer that
18:33:36 <oflibMichal> now we should investigate why one session is valid while the other one is not
18:33:37 <tbachman> we may have an answer to the problem :)
18:33:41 <tbachman> yup
18:35:00 <edwarnicke> tbachman: oflibMichal Could this be a case of connection flapping where the groupmods are getting lost when the connection is down?
18:35:30 <tbachman> edwarnicke: would we also see flow mods lost as well?
18:35:42 <tbachman> (i.e. why just the group mods)
18:36:42 <oflibMichal> edwarnicke: connection flapping ? I don't understand
18:37:48 <edwarnicke> oflibMichal: If the switch connects... and then disconnects, and then connects that's called flapping
18:38:24 <edwarnicke> oflibMichal: So if the switch connects, we send GroupMods... before they exit OFjava the switch disconnects... we loose them... then the switch reconnects... and the other messages go out
18:38:32 <edwarnicke> oflibMichal: Does that make sense?
18:39:57 <oflibMichal> edwarnicke: hmmm ... not sure but it might be the case. The other usecase that comes to my mind is that the "valid" boolean / flag is set incorrectly
18:41:04 <oflibMichal> edwarnicke: yes it makes sense, I can imagine, that while sending messages downstream the switch disconnects and the session is set to invalid ... (i don't know if there are any timers or so)
18:41:30 <edwarnicke> tbachman: how are you guys setting up your switches to connect?
18:41:39 <tbachman> edwarnicke: oflibMichal: again, doesn’t this seem strange that it would only affect the group mod flows?
18:42:05 <edwarnicke> tbachman: Don't you push your groups first?
18:42:08 <edwarnicke> As a bunch
18:42:11 <tbachman> not first
18:42:12 <tbachman> plus
18:42:26 <oflibMichal> well, it is really strange that we would lose only groupmods
18:42:33 <tbachman> I’ve also tried this with a 5 second wait after changing the switches from 10 to 13 mode
18:43:00 <tbachman> not saying that it’s not the problem, but just doesn’t seem to make sense completely to me
18:43:15 <tbachman> plus
18:43:20 <tbachman> it’s always the same switch
18:43:29 <edwarnicke> tbachman: I agree
18:43:30 <tbachman> I’d expect it to be a bit more random if that were the case
18:43:33 <edwarnicke> oflibMichal: tbachman keep digging :)
18:43:39 <tbachman> :)
18:43:55 <tbachman> oflibMichal: if we can understand how that flag gets set better, that would be great ;)
18:45:15 <oflibMichal> will try to get more info on that
18:45:39 <edwarnicke> tbachman: What file what line are you guys seeing stuff?
18:45:44 <edwarnicke> tbachman: And what?
18:45:57 <tbachman> log file line, or java file line?
18:46:10 <tbachman> edwarnicke: I can send you the karaf log file
18:46:12 <tbachman> that may help
18:46:44 <tbachman> edwarnicke: sent
18:47:11 <tbachman> we see this print 2x: Received groupmodinput
18:47:22 <tbachman> that’s the print that happens on the groupMod RPC call in OFjava
18:47:52 <tbachman> edwarnicke: we see this one 4x: ‘Calling OFLibrary groupMod’
18:48:38 <tbachman> That’s on entry to MessageDispatchServiceImpl.java, line 201
18:48:42 <tbachman> in groupMod()
18:50:25 <edwarnicke> tbachman: Specifically... what file what line are you encountering the sesion being invalid?
18:50:29 <edwarnicke> sorry, cookie
18:50:45 <tbachman> line 74 of the same
18:50:49 <tbachman> that’s the session invalid
18:51:00 <tbachman> the cookie invalid is just a print
18:51:02 <tbachman> not an exception
18:51:12 <tbachman> getConnectionAdapter
18:52:48 <edwarnicke> And only for the groupMod call ?
18:53:09 <tbachman> yes, 2x of them
18:53:51 * tbachman follows the protocol of IRC’ing edwarnicke, to let him know he has new email :)
18:53:56 <edwarnicke> So walking up the call tree
18:55:03 <tbachman> This is set in the setValid method of SessionContextOFImpl
18:56:19 <tbachman> that gets created in OFSesionUtil
18:56:26 <tbachman> (these are all in the same package)
18:57:36 <tbachman> and it looks like its
18:57:41 <tbachman> set to true when it’s crated
18:57:42 <tbachman> created
18:58:10 <tbachman> grepping for setValid
18:58:16 <tbachman> reveals a very small set of places
18:58:43 <edwarnicke> Cool...something to grab a hold of :)
18:59:15 <tbachman> invalidateSessionContext
18:59:27 <tbachman> In SessionManagerOFImpl
18:59:35 <tbachman> (again, same package)
18:59:48 <tbachman> There’s a print for this
18:59:51 <tbachman> I should enable it :)
18:59:57 <tbachman> er
18:59:57 <tbachman> wait
19:00:01 <tbachman> for the error case only
19:00:02 <tbachman> but
19:00:05 <tbachman> maybe I should add one here
19:00:19 <tbachman> there is a TODO in there
19:00:22 <tbachman> notifyListeners ;)
19:00:23 <tbachman> lol
19:00:29 <tbachman> but probably not important
19:01:25 <tbachman> interesting
19:01:39 <tbachman> it hits htis
19:01:40 <tbachman> this
19:01:45 <tbachman> if there’s already a session context
19:01:55 <tbachman> this is in registerSession()
19:02:02 <tbachman> in OFSessionUtil.java
19:02:53 <tbachman> michal_rehak: if you’re around, can you look at registerSession in OFSessionUtil.java?
19:03:02 <tbachman> openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/session/OFSessionUtil.java
19:03:10 <tbachman> or oflibMichal
19:03:12 <tbachman> or edwarnicke
19:03:14 <tbachman> or anyone :)
19:03:16 <tbachman> lol
19:03:26 <edwarnicke> tbachman: Let me look
19:03:46 <tbachman> line 56
19:03:50 <tbachman> invalidateSessionContext
19:03:56 <tbachman> sets the session.valid flag to false
19:04:06 <tbachman> so it hits this if it thinks there’s already a session
19:05:45 <michal_rehak> tbachman: yes, bug-1985
19:06:54 <edwarnicke> michal_rehak: Wasn't mboback fixing that?
19:07:35 <tbachman> this is a warn print
19:07:41 <edwarnicke> tbachman: So is OFSessionUtil definitely where your session is getting invalidated?
19:07:43 <tbachman> do warns show up in karaf by default?
19:07:49 <tbachman> edwarnicke: not sure
19:07:55 <tbachman> that’s why I was curious about the print
19:08:20 <edwarnicke> Did you set:log to something for that file?
19:08:30 <tbachman> nope
19:08:31 <edwarnicke> tbachman: I think you will get WARNS by defautl
19:08:37 <edwarnicke> Are you seeing a WARN from there?
19:08:40 <tbachman> edwarnicke: I would have thought so too
19:08:45 <tbachman> I don’t see that log message in there
19:08:49 <tbachman> grep’d for “duplicate datapathId occured while registering new switch session"
19:08:51 <tbachman> but not there
19:09:02 <tbachman> I can run again and set it explicitly
19:09:17 <tbachman> default is info
19:10:19 <edwarnicke> Yep, I don't either
19:10:35 <edwarnicke> invalidateDeadSessionContext is the other candidate
19:10:57 <edwarnicke> tbachman: And I do see: openflowplugin - 0.0.3.SNAPSHOT | context for invalidation not found
19:12:08 <edwarnicke> vishnoianil: Are you around?
19:12:15 <edwarnicke> vishnoianil: You were looking at 985 right?
19:14:12 <tbachman> lol
19:14:18 <tbachman> edwarnicke: you scared him offf!
19:14:59 * edwarnicke has that effect on people
19:15:32 <tbachman> lol
19:16:17 <tbachman> edwarnicke: so, I tried again
19:16:24 <tbachman> and enabled debug log level for that file
19:16:29 <tbachman> so, it’s definitely not hitting that one
19:16:32 <edwarnicke> tbachman: Results
19:16:41 <tbachman> I’ll look at the invalidateDeadSessionContext
19:16:41 <edwarnicke> not hitting invalidateDeadSessionContext
19:17:04 <edwarnicke> And its not hitting invalidateSessionContext ?
19:17:05 <tbachman> looks like it’s hitting it:
19:17:06 <tbachman> karaf.log.1:	at org.opendaylight.openflowplugin.openflow.md.core.session.SessionManagerOFImpl.invalidateDeadSessionContext(SessionManagerOFImpl.java:117)[212:org.opendaylight.openflowplugin:0.0.3.SNAPSHOT]
19:17:36 <edwarnicke> tbachman: OK
19:17:46 <edwarnicke> So that looks like we are getting the switch disconnected
19:17:54 <michal_rehak> yes
19:17:57 <edwarnicke> tbachman: Is there a failure case pcap somewhere I could look at?
19:18:21 <tbachman> yes
19:18:28 <tbachman> email it?
19:18:33 <tbachman> pull it up locally?
19:18:58 <edwarnicke> Drop it somewhere i can download it?
19:19:04 <tbachman> sure
19:19:27 <tbachman> actually
19:19:30 <tbachman> just a sec....
19:20:19 <oflibMichal> tbachman: now is probably the time for wireshark capture
19:20:27 <oflibMichal> can you get one ?
19:21:00 <tbachman> oflibMichal: that tarball has one
19:21:03 <tbachman> a failed case
19:21:22 <tbachman> logdata.tar.gz
19:21:29 <tbachman> oflibMichal: did I email that to you?
19:21:41 <oflibMichal> i don't remember such file
19:21:44 <oflibMichal> give me a sec
19:21:48 <oflibMichal> i will check it
19:21:50 <tbachman> k
19:21:53 <tbachman> if not, will send it now
19:22:20 <oflibMichal> ok, i have it
19:24:24 * tbachman appreciates the ofplugin/ofjava folks staying up so late to help
19:24:26 <tbachman> :)
19:26:35 <tbachman> oflibMichal: fwiw, the one group mod message that is sent is packet 3657
19:26:50 <oflibMichal> ok, thanks
19:26:59 <tbachman> when it works, I typically see them to each switch, back-to-back
19:29:24 <edwarnicke> tbachman: I only see *one* groupmod on the wire here though
19:29:35 <tbachman> edwarnicke: right
19:29:47 <tbachman> one message, to the one working switch, with to group mods in it
19:30:02 <tbachman> there is no message to the other switch, which is what we’re expecting
19:30:12 <tbachman> b/c of the deadswitch invalidation
19:30:39 <tbachman> (this is the non-working case)
19:30:43 <tbachman> did we want the working one?
19:30:49 * tbachman has a trace for that as well
19:31:30 <edwarnicke> tbachman: are you running the controller on one of your host nodes?
19:32:24 <tbachman> yes
19:33:08 <edwarnicke> Ah OK
19:33:12 <edwarnicke> Makes more sense
19:33:59 <edwarnicke> tbachman: OK... I've learned all I can from the pcap
19:34:11 <edwarnicke> You have no fins anywhere near your group/flowmods
19:34:17 <edwarnicke> So I don't think thats it
19:35:48 * tbachman agrees
19:36:16 <tbachman> edwarnicke: this doesn’t feel like a comms issue
19:36:26 <edwarnicke> ?
19:48:38 <edwarnicke> tbachman: OK... I just talked to Tony
19:48:51 <edwarnicke> tbachman: And as soon as Michal finishes 1997
19:48:58 <edwarnicke> tbachman: He will switch to this
19:49:06 <tbachman> edwarnicke: thx
19:49:12 <tbachman> is there anything else I can provide here?
19:49:15 <edwarnicke> tbachman: Basically the issue appears to be
19:49:32 <edwarnicke> tbachman: If the switch connects and disconnects to fast
19:49:42 <tbachman> debouncer ;)
19:49:43 <tbachman> lol
19:49:45 <edwarnicke> It can produce some nasty behavior on race conditions
19:49:57 <tbachman> edwarnicke: question there tho
19:50:02 <tbachman> why is this creeping in now?
19:50:02 <edwarnicke> tbachman: I am not completely certain thats it... but it is possible
19:50:07 <tbachman> b/c we didn’t see this before
19:50:20 <edwarnicke> tbachman: Not entirely sure... but a lot of stuff has gotten faster
19:50:24 <tbachman> again — it’s all racy, so maybe something got faster, which caused the problem
19:50:52 <edwarnicke> tbachman: I should be finished downloading ova's here in a bit
19:51:05 <tbachman> edwarnicke: cool
19:51:05 <tbachman> but
19:51:20 <tbachman> can alagalah or I help here/
19:51:20 <tbachman> ?
20:01:47 <edwarnicke> keep digging
20:01:53 <edwarnicke> I could be wrong
20:12:01 <alagalah> ping
20:12:03 <alagalah> Just back
20:12:17 <alagalah> edwarnicke: tbachman I'm catching up
20:12:32 <edwarnicke> alagalah: ACK
20:12:49 <tbachman> alagalah: howdy :)
20:12:56 <edwarnicke> alagalah: tbachman I will say this
20:13:03 <edwarnicke> Looking at other captures of OF traffic
20:13:11 <edwarnicke> Yours is the only one I see flapping like that
20:13:14 <edwarnicke> On the wire
20:13:15 <tbachman> lol
20:13:19 <edwarnicke> alagalah: How are you guys setting up?
20:13:52 <tbachman> edwarnicke: we use mininet
20:14:05 <edwarnicke> tbachman: OK... how do you invoke mininet ?
20:14:15 <tbachman> python APIs
20:14:23 <edwarnicke> Could you point me to the specifics?
20:14:30 <tbachman> in our repo
20:14:39 <tbachman> under util/testOfOverlay
20:14:45 <tbachman> there’s a file, mininet_gbp.py
20:15:03 <tbachman> we do use a system call
20:15:13 <tbachman> to set things like the OF protocll
20:15:15 <tbachman> protocol
20:15:28 <tbachman> and to create the tunnels
20:15:54 <tbachman> towards the end
20:15:55 <tbachman> there’s this:
20:15:56 <tbachman> for sw in switches:
20:15:57 <tbachman> setOFVersion(sw['name'])
20:16:00 <tbachman> after that
20:16:07 <tbachman> I put a time.sleep(5)
20:16:11 <tbachman> and that still didn’t fix it
20:16:34 * tbachman will retry, given the plethora of things that have transpired since the last time he did it
20:16:44 <tbachman> edwarnicke: the other thing
20:16:50 <tbachman> is that this conditions is transient
20:16:50 <tbachman> but
20:16:54 <tbachman> we always get the same result
20:17:01 <tbachman> that’s strange to me
20:17:12 <alagalah> edwarnicke: michal_rehak tbachman All the information about the log zip is in 1997 (look for my comments)
20:17:23 <alagalah> *Yes I know its now the wrong bug, I will update 2023*
20:18:59 <edwarnicke> alagalah: Looking
20:19:57 <edwarnicke> alagalah: Yep, tbachman directed us to that
20:20:40 <edwarnicke> tbachman: Are you setting the OF version before or after you connect to the controller?
20:20:58 * tbachman isn’t sure what’s initiating the connection
20:21:15 <edwarnicke> tbachman: Let me put it this way
20:21:18 <tbachman> edwarnicke: the protocols is one of the last things that’s done, I think
20:21:27 <edwarnicke> tbachman: Are you setting the OFVersion before or after you tell it were teh controller *is*
20:22:15 <tbachman> setting the controller IP is done way first
20:22:18 <edwarnicke> OK
20:22:25 <edwarnicke> SetOFVersion before you set the IP
20:22:27 <edwarnicke> And try that
20:22:30 <tbachman> okay
20:22:31 <edwarnicke> I bet you will be happy :)
20:22:34 <tbachman> lol
20:22:35 <tbachman> but
20:22:43 <edwarnicke> Note: This is a workaround
20:22:46 <tbachman> would this have *changed*
20:22:47 <tbachman> sorry to keep beating this horse
20:22:49 <tbachman> k
20:22:57 <edwarnicke> tbachman: race conditions
20:23:31 <edwarnicke> tbachman: Give the workaround a try :)
20:23:41 <tbachman> edwarnicke: am doing so now :)
20:28:52 <tbachman> so, I’ve tried two different ways of doing it
20:28:58 <tbachman> both have actually made it worse
20:29:24 * tbachman goes back to double-check his work
20:32:29 <edwarnicke> tbachman: Are you setting the OFversion before setting the controller IP?
20:32:39 <tbachman> edwarnicke: that’s what am trying
20:32:50 <tbachman> end result is undesirable, so far
20:33:55 * tbachman goes back to square 1, trying to sanity check his setup
20:36:29 <alagalah> edwarnicke: ping
20:37:13 <edwarnicke> tbachman: What are you seeing?
20:37:34 <tbachman> the nodes can’t ping each other at all
20:37:38 <tbachman> so, worse
20:37:43 <tbachman> let me get back to a good baseline
20:40:16 <tbachman> okay, verfied back to working
20:47:04 <abhijitkumbhare> BTW folks - vishnoianil will work on  https://bugs.opendaylight.org/show_bug.cgi?id=2028 tomorrow. I don't think we should stop the artifact cutting for that bug (from the description)
20:48:11 <michal_rehak> tbachman: may I ask you to test a patch?
20:48:20 <tbachman> michal_rehak: would love to!
20:48:22 <michal_rehak> tbachman: it fixes sessions killing
20:48:26 <tbachman> oka
20:48:27 <tbachman> okay
20:48:36 <michal_rehak> https://git.opendaylight.org/gerrit/#/c/11456/1
20:48:48 <michal_rehak> the there is another patch moving barrier at the end
20:48:56 <michal_rehak> https://git.opendaylight.org/gerrit/#/c/11457/
20:49:16 <michal_rehak> my guess is that the sessions fix is crucial
20:49:45 <tbachman> okay
20:49:57 <michal_rehak> tbachman: thank you
20:50:04 <tbachman> michal_rehak: thank YOU! :)
20:50:22 <alagalah> michal_rehak: Thanks man
20:50:22 <michal_rehak> I wrote only one of them :-)
20:50:34 <tbachman> lol
20:50:39 <tbachman> you get to be the messenger
20:50:41 <michal_rehak> tbachman: ETA?
20:50:57 <tbachman> michal_rehak: can I do a fetch on each?  Or do I have to cherry pick?
20:51:07 * tbachman hasn’t done multiple fetches in gerrit
20:51:53 <michal_rehak> tbachman: I think the best way is to try them separate
20:52:04 <tbachman> wait
20:52:10 <tbachman> just to make sure I understand
20:52:11 <michal_rehak> moving barrier is not really fixing disconnetions
20:52:15 <tbachman> just fetch the first gerrit
20:52:16 <tbachman> and test
20:52:19 <michal_rehak> right
20:52:21 <tbachman> then if that doesn’t work
20:52:23 <tbachman> try the second?
20:52:27 <michal_rehak> right
20:52:36 <tbachman> should I revert the first patch as well?
20:52:40 <michal_rehak> the first is favorite of mine
20:52:44 <michal_rehak> yes
20:52:45 <tbachman> Or just apply the second on top of the first?
20:52:46 <tbachman> okay
20:53:02 <tbachman> michal_rehak: I have to rebuild this and integration
20:53:04 <tbachman> give me about 15 mins
20:53:15 <michal_rehak> ok
20:53:18 <michal_rehak> perfect
20:53:28 <michal_rehak> exclamation mark
20:53:52 <tbachman> lol
21:00:52 * tbachman watches openflowplugin karaf tests scroll by
21:01:57 <tbachman> michal_rehak: may be a bit more — openflowplugin builds take a bit
21:02:09 <michal_rehak> :-)
21:03:58 <tbachman> michal_rehak: first patch applied, and openflowplugin is built — moving on to integration build
21:04:08 <tbachman> :(
21:04:12 <michal_rehak> fingers crossed
21:04:14 <tbachman> I built on the wrong VM
21:04:16 <tbachman> ugh
21:04:21 <tbachman> give me 15 more to get there
21:04:23 <tbachman> sorry!
21:04:29 <michal_rehak> no problem
21:04:48 <michal_rehak> http://en.wikipedia.org/wiki/Ugh!
21:05:18 <tbachman> lol
21:06:10 <edwarnicke> michal_rehak: Would https://git.opendaylight.org/gerrit/#/c/11456/1 help 2023 ?
21:07:52 <michal_rehak> I hope so - all the mysteriously missing packets could be caused by killing netty thread by this NPE
21:08:03 <edwarnicke> tbachman: are you trying https://git.opendaylight.org/gerrit/#/c/11456/1
21:08:12 <tbachman> edwarnicke: building it as we speak
21:08:17 <tbachman> (for the 2nd time :) )
21:08:18 <tbachman> lol
21:08:25 <tbachman> don’t ask
21:08:45 <edwarnicke> tbachman: apologies... fielding calls at the same time
21:08:51 <tbachman> edwarnicke: no worries :)
21:09:13 * tbachman understands edwarnicke is in constant communication with many, many parties
21:17:50 <tbachman> michal_rehak: hang in there — still building the openflowplugin, but hoping that wraps up in the next 5 mins
21:17:58 <tbachman> I think the integration build with -nsu is pretty quick
21:18:19 <tbachman> michal_rehak: headed to the summit?
21:21:42 <tbachman> okay, on to integration build
21:24:41 <tbachman> michal_rehak: integration build done :)
21:24:45 * tbachman scp’s zip file
21:26:59 <michal_rehak> tbachman: now I feel like senior progress bar specialist :-)
21:27:04 <tbachman> lol
21:27:15 <tbachman> you’re acquiring new titles ;)
21:27:26 <michal_rehak> skills :-)
21:28:35 <tbachman> we’re 1/1
21:28:51 <tbachman> michal_rehak: if we go 5/5, I’d say it’s looking pretty good
21:31:16 <tbachman> 2/2
21:31:26 <alagalah> tbachman: michal_rehak 3, 3, 3 .....
21:32:00 <oflibMichal> alagalah: 5,5,5 ... :-)
21:32:03 <alagalah> tbachman: michal_rehak  woot!!! 4,4,4 ....
21:32:12 <alagalah> oflibMichal: heheheheh nice one mate
21:32:44 * alagalah Waitings for tbachman to deliver the good news :)
21:32:46 <tbachman> 3/3
21:33:07 <michal_rehak> now my progress bar skills are failing
21:33:11 <alagalah> michal_rehak: What is the essence of the patch ?
21:34:03 <michal_rehak> alagalah: creating of session and registering of services could interfere with session removal
21:34:08 <tbachman> 4/4
21:34:09 <michal_rehak> now those steps are separated
21:35:34 <alagalah> michal_rehak: "registering of services" includes OF1.x version negotiation ?
21:35:38 <tbachman> 5/5 :)
21:35:38 <alagalah> (by any chance?)
21:35:44 <tbachman> okay
21:35:48 <tbachman> so, I’ll do about another 10 runs
21:35:49 <tbachman> but
21:35:52 <tbachman> this is looking good
21:35:59 <alagalah> tbachman: Sounds good
21:36:16 <tbachman> michal_rehak (et. al.): thx for the help!
21:36:36 <alagalah> tbachman: michal_rehak oflibMichal edwarnicke : Many thanks for all the effort and patience!!!
21:37:17 <tbachman> michal_rehak: edwarnicke oflibMichal: we need reviewers and committers :)
21:37:21 <michal_rehak> alagalah: yes - the conflicts might emerge when switching OF-1.0 to OF-1.3
21:37:23 <tbachman> will this make it in for 5pm?
21:39:28 <tbachman> for those of you still counting: 6/6
21:40:23 <michal_rehak> this one https://git.opendaylight.org/gerrit/#/c/11456/1 can be merged now
21:40:27 <tbachman> 7/7
21:40:35 <michal_rehak> .. we have the possibility
21:40:51 <tbachman> :)
21:40:53 <tbachman> bionic patch
21:43:05 <alagalah> tbachman: Did you apply both 11456 and 11457 ?
21:43:11 <tbachman> just 11456
21:43:16 <tbachman> they don’t need to go together
21:45:09 <tbachman> 8/8
21:46:56 <tbachman> 9/9
21:48:03 <tbachman> 10/10
21:48:05 <tbachman> I’m satisfied
21:48:27 <edwarnicke> tbachman: So you are fixed?
21:48:34 <tbachman> edwarnicke: it appears so :)
21:48:53 <edwarnicke> Victory!
21:49:04 <tbachman> edwarnicke: congrats were made above ^^^^^
21:49:07 <tbachman> thanks to all!
21:49:20 <edwarnicke> tbachman: Lets merged
21:49:29 * tbachman agrees
21:49:51 <tbachman> hmmm
21:49:54 <tbachman> let me verify my http
21:50:40 <edwarnicke> michal_rehak: have you and vishnoianil sorted your stuff out on 11456 ?
21:50:56 <michal_rehak> I just answered
21:51:13 <tbachman> guys
21:51:17 <tbachman> have to step away for 30
21:51:19 <tbachman> back soon
21:51:55 <edwarnicke> michal_rehak: so am I good to merge?
21:52:06 <michal_rehak> edwarnicke: I think this is minor detail what Anil suggested
21:52:12 <michal_rehak> edwarnicke: yes
21:52:32 <edwarnicke> https://git.opendaylight.org/gerrit/#/c/11456/ merged
21:53:03 <edwarnicke> michal_rehak: I am trying out https://git.opendaylight.org/gerrit/#/c/11457
21:53:34 <edwarnicke> michal_rehak: I was able to reliably reproduce the error
21:53:40 <edwarnicke> michal_rehak: So I should be able to reliably verify its gone
21:54:33 <michal_rehak> edwarnicke: ok
21:55:11 <edwarnicke> tbachman: Before you go, could you update bug 2023 indicating that its fixed?
21:55:21 <edwarnicke> tbachman: So I can close it out on the showstopper spreadsheet?
21:59:08 <oflibMichal> ok, so I am leaving for today
21:59:15 <oflibMichal> see you tomorrow
21:59:20 <oflibMichal> and good job all
21:59:35 <oflibMichal> thx for all help while debugging
22:18:28 <alagalah> edwarnicke: ping
22:18:35 <edwarnicke> pong
22:18:38 <alagalah> edwarnicke: Let me raise tbachman to close out the bug for you
22:18:43 <alagalah> edwarnicke: If its urgent
22:18:49 <edwarnicke> It would help me a lot
22:18:59 <edwarnicke> Because I could then retire it from the showstopper list
22:19:03 <edwarnicke> Or if you wish to allege
22:30:31 <tbachman> I lied
22:30:33 <tbachman> 40 minutes
22:30:35 <tbachman> am back :)
22:32:39 <gzhao> https://www.irccloud.com/pastebin/56HRtzrS
22:32:45 <tbachman> guhs
22:32:46 <tbachman> gusy
22:32:47 <tbachman> ugh
22:32:48 <tbachman> guys
22:32:48 <tbachman> :)
22:32:57 <tbachman> I have to revisit this
22:32:59 <alagalah> :)
22:33:02 <alagalah> tbachman: huh ?
22:33:04 <tbachman> my tests were only testing out pings
22:33:11 <tbachman> those are working
22:33:11 <gzhao> tbachman:  you have 90 minutes.-:)
22:33:14 <tbachman> but
22:33:16 <tbachman> 90?
22:33:18 <alagalah> tbachman: you didn't test curls ?
22:33:20 <tbachman> nope
22:33:26 <tbachman> just did now
22:33:49 <alagalah> and ?
22:34:07 <tbachman> only 46 flows now on one of th nodes
22:34:56 <tbachman> the group flows are there now
22:35:25 <tbachman> I’ll re-run
22:35:30 <tbachman> b/c I want to see if this is consistent
22:37:30 <tbachman> same
22:37:31 <tbachman> 46 flows
22:37:33 <tbachman> :(
22:37:42 <tbachman> I’ll have to go through the steps again
22:37:49 <tbachman> to figure out where things are going wrong this time
22:37:50 <tbachman> sorry folks
22:38:12 <michal_rehak> ZZZzzzz
22:38:19 <michal_rehak> see you tomorrow
22:38:21 <michal_rehak> bye
22:38:32 <tbachman> michal_rehak: will give you the update once I know it
22:39:00 <michal_rehak> thank you
22:41:46 <tbachman> only 102 flows are recorded in jconsole
22:41:48 <tbachman> we expect 114
23:24:54 <edwarnicke> coming into the plugin?
23:26:31 <alagalah> tbachman: opinion on closing 2023 and opening another one ??? This is different
23:38:08 <abhijitkumbhare> looks like I may have forgotten to end meeting :)
23:38:16 <abhijitkumbhare> #endmeeting