#fdio-csit: FD.io CSIT project meetings
Meeting started by mackonstan at 14:04:34 UTC
(full logs).
Meeting summary
-
- Jan Gelety (jgelety,
14:04:53)
- Agenda bashing (mackonstan, 14:05:23)
- FD.io CSIT physical labs (mackonstan, 14:06:24)
- Juraj: re 2 new ThunderX servers for vpp_device
- in contact with LFN IT + Vexxhost re physical install and
onboarding (mackonstan,
14:07:22)
- Juraj: will update the
testbed_specifications.md in the rep (mackonstan,
14:07:54)
- Dave Wallace (dwallacelf,
14:08:49)
- Ed: 1ru CLX servers (with 8280) install, never
got IP addresses from LF IT/Vexxhost, re-asked, waiting for
response. Once received will update testbed_specifications.md in the
CSIT repo. (mackonstan,
14:10:38)
- Maciek: we had a ticket open for 2ru CLX
servers and now it got closed. (mackonstan,
14:12:34)
- Maciek: Ed pls open a separate ticket for the
three 1ru CLX servers for CI/CD infra and backend work. (mackonstan,
14:13:17)
- Ed: having ongoing issues with vpp_device
machines going "flaky" after Jenkins "adventures" (crashes,
unplanned downtime). Can we use 3 new CLX servers (originally
destined for data processing backend plotlydash, s5ci proto) to help
here? (mackonstan,
14:17:47)
- Inputs from LFN and FD.io projects (mackonstan, 14:18:12)
- VPP - Dave: no updates on vpp v19.08.2.
(mackonstan,
14:21:22)
- VPP - Dave: vpp v20.01 rls milestones
published (mackonstan,
14:22:08)
- https://wiki.fd.io/view/Projects/vpp/Release_Plans/Release_Plan_20.01
(mackonstan,
14:22:15)
- TSC - Vratko: last meeting finished quickly,
nothing CSIT related (mackonstan,
14:22:53)
- Releases - CSIT-1908.1 report (mackonstan, 14:23:39)
- Maciek: CSIT-1908.1 report published but not
announced, need to review data and compare across 19.08, then send
announce email (mackonstan,
14:24:18)
- Maciek/Vratko/Peter: CSIT-1908.1 - all tests
have been finished. No more open points. (mackonstan,
14:25:31)
- Jan: confirmed all 1908.1 jobs are finished.
Need to summarise all resources taken by 1908.1 maintenance
rls. (mackonstan,
14:26:27)
- CSIT-2001 (mackonstan, 14:27:50)
- Vratko: improving VPP API change process to
make it more reliable and reduce the false positive. (mackonstan,
14:29:00)
- Vratko: complete VAT to PAPI migration -
address the API execution efficiency for scale tests. (mackonstan,
14:30:42)
- Jan: Python 2.7 to 3x migration, .md analysis
and migration plan coming to gerrit shortly. (mackonstan,
14:32:05)
- Vratko: job for bisecting performance
regressions (leveraging per patch perf test work). (mackonstan,
14:33:51)
- Maciek/Tibor/Peter: a standalone test data
processing backend - datastore, analytics/query engine. Stop relying
on Nexus as results file store. (mackonstan,
14:35:21)
- Vratko/Tibor/Peter: Making use of HDRhistogram
in TRex, and higher resolution of latency data for performance
tests. (mackonstan,
14:36:19)
- Vratko/Maciek: reconf tests methodology - see
if we can apply b2b-frame methodology described in ietf bmwg
draft. (mackonstan,
14:37:51)
- Peter/Maciek: per vpp node efficiency - today
storing elog capturing thread barriers - for perfmon we are missing
an API to catch two values for the run, we would need to check if
this got resolved. (mackonstan,
14:40:44)
- Peter: start with a new telemetry approach -
per packet path analysis, similarly how it's done in NFVbench, see
how this could be applied to NFV density tests and actually all
other tests. (mackonstan,
14:42:10)
- Maciek/Tibor: trending regressions - add
announce emails to csit-report. (mackonstan,
14:43:38)
- Vratko: anomaly detection - still seeing some
noise, more data doesn't seem to be helping, no pattern. Need more
inside knowledge, white-box, need more telemetry data from tests to
see if any correlation can be found. Affects trending anomaly
detection, per patch perf, perf bisecting. (mackonstan,
14:45:25)
- Peter/Maciek: vhost/memif - adding
vpp-in-container with ipsec. (mackonstan,
14:47:24)
- Peter: seeing the new tests being pushed for
Load-Balancer, baseline tests (mackonstan,
14:48:43)
- last LB is for Maglev (mackonstan,
14:49:25)
- Peter: seeing new tests for "NAT44 L3
DSR" (mackonstan,
14:49:30)
- Vratko: improve suite generator for heat-map
graphed tests e.g. NFV density tests (mackonstan,
14:50:11)
- Maciek: any other work in services and L47
space? (mackonstan,
14:50:43)
- Juraj: testbeds - Arm - adding more ThunderX
machines for vpp_device to run csit-vpp and vpp-csit device
tests (mackonstan,
14:54:13)
- Juraj: productize per VPP patch (with voting?)
vpp-csit device tests for Arm. (mackonstan,
14:54:48)
- Goal: add more vpp_device tests for better VPP
API coverage, as those are executed per vpp patch and per csit
patch (mackonstan,
14:56:54)
- Operational status (mackonstan, 14:57:32)
- Ed: situation right now - stabilized back to
normal - root cause not known. Some issues with vpp_device machines
(Peter handling). A simple Registry app "stopped intelligently
responding", redundancy didn't kick in. On Registry recovery, all
queued jobs kicked off, and overloaded Jenkins with ~160 jobs in the
queue (LFN ONAP can handle many more). Jenkins tipped over handling
number of requests to Nomad cluster (Nomad can handle (mackonstan,
15:01:40)
- Ed: adding more healthchecks to prevent
Registry app HA failure. (mackonstan,
15:03:01)
- Peter: 10 servers lost mgmt IP addresses,
configured as static without DHCP. Unclear how it happened. Root
cause analysis in progress. (Some external system
interference??)(ONAP servers experienced similar situation this
week). (mackonstan,
15:06:54)
Meeting ended at 15:11:10 UTC
(full logs).
Action items
- (none)
People present (lines said)
- mackonstan (46)
- collabot_ (4)
- jgelety (1)
- dwallacelf (1)
Generated by MeetBot 0.1.4.