15:00:28 <DaveBarach> #startmeeting fdio-vpp 15:00:28 <collab-meetbot> Meeting started Tue May 11 15:00:28 2021 UTC and is due to finish in 60 minutes. The chair is DaveBarach. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 <collab-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:28 <collab-meetbot> The meeting name has been set to 'fdio_vpp' 15:02:59 <mackonstan> #info mackonstan 15:04:12 <DaveBarach> #chair dmarion 15:04:12 <collab-meetbot> Warning: Nick not in channel: dmarion 15:04:12 <collab-meetbot> Current chairs: DaveBarach dmarion 15:04:34 <DaveBarach> #topic CSIT (maciek reporting) 15:04:47 <mackonstan> #info Physical and virtual infrastructure updates 15:05:20 <mackonstan> #info Vexxhost DC move almost done, last four servers will be moved from MTL1 to YUL1 tomorrow, and we are done with the phy machines move. 15:05:58 <mackonstan> #info Mgmt/IPMI IPv4 addr renumbering to happen shortly, to put all hosts in the same subnet(s). 15:06:48 <mackonstan> #info Open item1: OpenStack vRouter still used for accessing LF IT VM applications left behind in MTL1 (jenkins master, gerrit, etc) 15:07:05 <mackonstan> #info Resolution1: LF IT VM apps will move to YUL1 in the next few weeks, and then all problems should go away. 15:07:18 <mackonstan> #info Open item2: intermittent (much less frequent now after we went into full daily esca calls with involved parties) git fetch failures and jenkins connection resets. 15:07:26 <mackonstan> #link https://secure.vexxhost.com/billing/viewticket.php?tid=NOB-607778&c=4Dp0GdHT 15:07:38 <mackonstan> #info Resolution2: Continue daily 15min calls for situation review with involved parties, until all parties satisfies and min 2-day uninterrupted operation evident. 15:09:40 <mackonstan> #info Test breakages: 15:09:51 <mackonstan> #info NAT44ed multi-worker keep testing intermittently, less frequently after recent patch, but still vpp crashing. 15:10:15 <mackonstan> #info Sporadic VPP crashes in get statistics. 15:10:58 <mackonstan> #info Few other under investigation. 15:11:18 <mackonstan> #info Work highlights: 15:11:25 <mackonstan> #info CSIT in AWS - 2-node and 3-node tests running smoothly, ENA DPDK driver making VPP packets drop on tx. Moving ahead with Jenkins onboarding, will be publishing results for a subset of CSIT tests in CSIT-2106 report. 15:12:06 <mackonstan> #info Merging VPP & Linux telemetry - VPP perfmon bundles, Linux bcc/bpf tracing tools, using OpenMetrics format for storage and post-processing. 15:13:19 <mackonstan> #info Moving to json models for test oper data and results storage, querying and post processing. Would be good to hear from vpp-dev community what queries people would like execute against CST test result data e.g. over specific time period or for specific git patch period to say verify specific patch(set) impact on things. 15:13:31 <mackonstan> #info Ongoing work to make TRex behaving as a deterministic and reliable traffic generator at high 100GbE rates. 15:13:43 <mackonstan> #info Revamp of ipsec tests, as CSIT suffering from test suite overload (269 tests at last count). See Maciek recent patches for tests being axed, under review. 15:13:56 <mackonstan> #info Generic effort to reduce number of tests, remove redundant packet path testing. See Maciek recent patches, under review. 15:14:10 <mackonstan> #info Other CSIT-2106 work, see link 15:14:17 <mackonstan> #link https://wiki.fd.io/view/CSIT/csit2106_plan 15:14:41 <DaveBarach> #topic Host Stack(Florin) 15:14:59 <DaveBarach> #info lots of patches in the last month 15:15:25 <DaveBarach> #info improvements in session layer for connect/listen APIs - Lots more config knobs 15:15:44 <DaveBarach> #info working to improve active-open performance 15:16:03 <DaveBarach> #info moving active-opens to the first worker since the main thread tends to sleep a lot 15:16:15 <DaveBarach> #info improve half-open connection tracking 15:16:57 <DaveBarach> #info bunch of TCP cleanup, bulk buffer translation 15:17:12 <DaveBarach> #info improvements in vcl test code, server 15:17:49 <DaveBarach> #info now have a DTLS vcl test 15:18:19 <DaveBarach> #topic Documentation (Ole reporting) 15:19:12 <DaveBarach> #info need to find a home for documentation, e.g. to auto-update main website docs 15:19:50 <DaveBarach> #info dwallace: LFN has a license for readthedocs 15:20:35 <DaveBarach> #info any community volunteers for maintaining / writing docs more than welcome 15:21:12 <DaveBarach> #info dwallace: need to help e.g. Google find up-to-date docs 15:21:28 <DaveBarach> #topic Release Mgmt (Andrew) 15:21:58 <DaveBarach> #info 21.06 RC1 in a few weeks 15:22:13 <DaveBarach> #info 5/25 (Weds) will pull the release throttle 15:23:02 <DaveBarach> #topic Coverity 15:23:27 <DaveBarach> #info look at list on github, broken out by owner/maintainer 15:26:20 <DaveBarach> #link https://github.com/vpp-dev/vpp-coverity-report 15:27:47 <DaveBarach> #info vppapigen "training wheels" to be removed in this release 15:28:27 <DaveBarach> #info vppapigen added message status (experimental, production, etc) to JSON 15:29:05 <DaveBarach> #topic Infra Status(DaveW) 15:29:32 <DaveBarach> #info three intermittent false failures: punt tests fixed 15:29:59 <DaveBarach> #info vpp device job fails when 2 jobs run / both try to reconfigure the i40e at the same time 15:30:18 <DaveBarach> #info intermittent vcl / ldp make test failure on the arm platform 15:30:38 <DaveBarach> #info "that one is driving me crazy..." 15:31:14 <DaveBarach> #info reenabled Naginator to (temporarily) address Jenkins comms reset problems 15:32:06 <DaveBarach> #info trying to avoid Vexxhost virtual-router bailing-wire / bubble-gum to improve network reliability 15:32:40 <DaveBarach> #info DW spending hours/day updating vexxhost ticket w/ data 15:33:24 <DaveBarach> #info proposal to use vpp instead of current virtual router technology, early stage discussions 15:38:41 <DaveBarach> #topic make test (cont'd from last meeting) 15:39:34 <DaveBarach> #Info short-term, move tests back to centralized location 15:39:52 <DaveBarach> #topic node enqueue improvements 15:40:19 <DaveBarach> #info currently: enqueues very fast when all pkts go to same destination 15:40:51 <DaveBarach> #info rewrote vlib_node_enqueue_to_next(...) to use SIMD instrs 15:41:46 <DaveBarach> #info significant change, but reduces 20 clocks to 2 or 3 clocks in the general case 15:43:16 <DaveBarach> #info handoff code in progress 15:43:33 <DaveBarach> #info multiple tx queue support in progress 15:44:27 <DaveBarach> #info not clear whether the two in-progress items will end up in 21.06 15:45:14 <DaveBarach> #info will try to combine handoff frames 15:48:27 <DaveBarach> #info should improve high worker count scenarios where the number of tx queues is lower than the number of workers 15:51:09 <DaveBarach> #info multiple places hash packets to queues. Want to create infra to handle the problem in a general way 15:54:22 <DaveBarach> #endmeeting