12:02:15 <jki> #startmeeting CIP IRC weekly meeting
12:02:15 <collab-meetbot`> Meeting started Thu Oct 27 12:02:15 2022 UTC and is due to finish in 60 minutes.  The chair is jki. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:02:15 <collab-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:02:15 <collab-meetbot`> The meeting name has been set to 'cip_irc_weekly_meeting'
12:02:19 <jki> #topic AI review
12:02:30 <jki> 1. Add qemu-riscv to cip-kernel-config - patersonc
12:02:59 <patersonc[m]> No updates
12:03:28 <jki> too bad - we are close to zero AI ;)
12:03:40 <jki> any other AIs?
12:04:01 <patersonc[m]> Sorry!
12:04:21 <jki> np
12:04:48 <jki> then moving on...
12:04:52 <jki> 3
12:04:53 <jki> 2
12:04:56 <jki> 1
12:04:57 <jki> #topic Kernel maintenance updates
12:05:17 <uli> reviewing 5.10.150
12:05:22 <pave1> Reviewing 5.10.150, 151.
12:05:25 <masami> There is 20 CVEs reported. These are not so critical vulnerabilities.
12:05:44 <iwamatsu> I reviewed 5.10.150.
12:08:46 <jki> any other topic on maintenance?
12:10:03 <jki> then: moving on...
12:10:05 <jki> 3
12:10:07 <jki> 2
12:10:09 <jki> 1
12:10:11 <pave1> Busy week with huge reviews from -rc1, sorry.
12:10:24 <jki> ok :)
12:10:24 <pave1> We can move on.
12:10:35 <jki> #topic Kernel testing
12:11:12 <alicef_> patersonc[m]: I wrote on #lava-docker about the lava security problem
12:11:34 <patersonc[m]> Thanks, I saw it recently. We really need to upgrade our LAVA version anyway, this is a good kick
12:11:46 <alicef_> oh I just see your replay
12:11:46 <iwamatsu> nice
12:12:19 <patersonc[m]> I'll try and start investigating in the next few weeks, I just need to get a few internal things sorted first
12:12:56 <alicef_> thanks pave1 for sending the email with the warnings checks
12:13:06 <patersonc[m]> +1
12:13:29 <pave1> You are welcome :-).
12:14:30 <pave1> For the results, I believe good start would be reporting failure whenever test changes.
12:14:58 <pave1> I guess that will simply result on smc fails on qemu? We can disable those as a next step.
12:15:30 <pave1> Seeing one fail and investigating is better than having to go through all results every time to see if something maybe failed somewhere.
12:15:40 <alicef_> disable it only for qemu ?
12:15:54 <pave1> alicef: Yes. Or.. does it cause problems elesewhere?
12:16:23 <alicef_> I don't think so
12:17:07 <pave1> (Or it can be still run and result ignored, or something, if that's easier).
12:18:25 <patersonc[m]> >For the results, I believe good start would be reporting failure whenever test changes.
12:18:25 <patersonc[m]> This would mean setting up a database to keep track of previous test results etc.
12:18:25 <patersonc[m]> Then comparing previous result to the new result, and only fail if it's changed from pass->fail
12:18:25 <patersonc[m]> Is this what you mean?
12:18:36 <patersonc[m]> * > For the results, I believe good start would be reporting failure whenever test changes.
12:18:36 <patersonc[m]> This would mean setting up a database to keep track of previous test results etc.
12:18:36 <patersonc[m]> Then comparing previous result to the new result, and only fail if it's changed from pass->fail
12:18:36 <patersonc[m]> Is this what you mean?
12:18:41 <pave1> No, not really.
12:19:16 <pave1> "If the target is qemu && test is smc then ignore the test result".
12:19:37 <patersonc[m]> Yes, but we get smc failures on other boards as well
12:19:49 <pave1> Ok, can I get an exampel?
12:20:24 <patersonc[m]> https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3141003858#L262
12:21:51 <pave1> Can we run smc on corresponding upstream kernel, to verify we did not break anything, and then probably disable smc test?
12:21:56 <pave1> Thanks for example, I'll take a look.
12:22:22 <patersonc[m]> Check the email I sent on Tuesday for more links
12:24:14 * patersonc[m] checks for more links now - 2 secs
12:24:53 <patersonc[m]> Upstream: https://linux.kernelci.org/test/plan/id/635a180f43c30ac29ce7db6f/
12:25:07 <patersonc[m]> CIP: https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/jobs/3224759608#L1132
12:25:16 <patersonc[m]> Both on the hihope-rzg2m board
12:25:17 <patersonc[m]> Same CVE failure
12:25:55 <patersonc[m]> (obviously different kernels)
12:26:32 <pave1> If the test fails even on 6.1-rc2, it will probably fail in 5.10-stable and 5.10-cip, too.
12:26:35 <pave1> Let's ignore the test.
12:27:14 <pave1> Going through the emails, you said: "I'd still need this job to return a green tick though if the actual job runs successfully, even if there are failed test cases.
12:27:26 <iwamatsu> But xilinx MPSoC is no issue. https://lava.ciplatform.org/scheduler/job/771035#L859
12:27:42 <pave1> Actually, I believe what we should do.
12:27:58 <pave1> This is not kernel issue, this is subtly broken hardware.
12:28:31 <pave1> Plus it is "security" and people love to say "this is critical"
12:28:35 <pave1> when it is not.
12:28:51 <pave1> We have security team. Could we simply ask our security team to investigate that?
12:29:01 <pave1> ...best in cooperation with hardware vendors. :-)
12:29:42 <pave1> Why do you want to put green ticks on failing tests?
12:29:50 <pave1> Lets simply use "green" -> nothing to see here.
12:29:58 <pave1> "red" -> someone needs to look at the results.
12:30:53 <pave1> I guess it is normally me looking at the results, and I escalate if I can't understand that, but don't make me examine logs for 12 targets when there's nothing to see there.
12:31:13 <patersonc[m]> This was referring to if there are tests that always fail, such as those smc tests
12:32:06 <pave1> Ok, lets disable those on platforms where they always fail.
12:32:06 <patersonc[m]> Is a view like this better?
12:32:07 <patersonc[m]> https://linux.kernelci.org/test/job/cip/branch/linux-4.4.y-cip/kernel/v4.4.302-cip70-98-g7f7838c92740f/
12:34:04 <pave1> Not really. When I tried to investigate a failure, I got into maze of results..
12:34:07 <pave1> ...https://linux.kernelci.org/test/plan/id/62d01083f0ebb14854a39bde/
12:34:18 <pave1> ...which seems to be caused by flakey target failing randomly.
12:35:05 <patersonc[m]> But flakey targets is no different for kernelci compared to cip right?
12:35:06 <pave1> ...but because results are sorted by test (not target), it is not obvious to see.
12:35:31 <patersonc[m]> okay
12:36:20 <patersonc[m]> This view should show the targets that failed: https://linux.kernelci.org/test/job/cip-gitlab/branch/ci%2Fpavel%2Flinux-test/kernel/v5.10.147-cip18-11-ge6d27ea102c3b/plan/baseline/
12:36:28 <pave1> It works as long as they are no flakey things. We are happy to have none in the gitlab.
12:36:51 <patersonc[m]> s/failed/regressed/
12:36:52 <alicef_> about the asus chromebook failures I asked on kernelci about that. and they was looking into such problems. not sure if they did something.
12:37:53 <alicef_> actually the best would be to open a issue on kernelci about boards that fails often, as it can be relevant also for other kernel tree
12:38:57 <pave1> So the way I use test branch (pushing 5.10, 4.19 and 4.4 into it) leads into really confusing stuff on kernelci.
12:39:54 <patersonc[m]> Could you use a different branch for each kernel version?
12:40:10 <patersonc[m]> Then the automated regression tracking would work
12:41:11 <pave1> I'll need to jump on a branch, too, during bisect for example.
12:41:23 <pave1> I'm afraid I'll get the regression tracking confused..
12:43:40 <jki> then you need a separate branch for debugging, vs. one per kernel that is moving forward only and tracks regressions
12:44:00 <pave1> Yes, that's current setup.
12:44:38 <pave1> Actually we have 4.4-st-rc and 4.4-st because more debugging is happening there.
12:45:45 <patersonc[m]> Taking a step back a bit, why don't we change our GitLab CI approach to be more for development - "instant" CI, and mainly focus on kernel builds and boot tests.
12:45:45 <patersonc[m]> Then use KernelCI for the full (pre)release testing with full build/test coverage.
12:45:45 <patersonc[m]> GitLab CI can then provide simple green/red statuses, and KernelCI can track regressions properly etc.
12:45:45 <patersonc[m]> Best of both worlds
12:46:01 <patersonc[m]> * builds and (only) boot tests.
12:46:42 <patersonc[m]> Then rather spend resources trying to make our gitlab CI setup do the same thing as kernelci, we can spend them on improving kernelci for all, and on improving things like platform stability in the labs
12:47:21 <alicef_> updating lava also
12:48:41 <pave1> Kernelci is quite confusing to me. Gitlab CI is a bit better, but hiding failures behind green ticks is not nice.
12:49:38 <pave1> We can split testing between kernelci and gitlab anyway you want, if you take over interpretting the results.
12:50:30 <pave1> I can simply ask over email, and testing team says "okay, that kernel looks good to us" or "hey there's bug you should debug".
12:51:18 <patersonc[m]> Is anyone else in the kernel team looking at test results on KernelCI/gitlab?
12:52:26 <uli> i don't, usually
12:52:38 <iwamatsu> I am checking sometime.
12:54:50 <jki> is the number of reports we get on cip-dev useful then, or can it be made useful?
12:57:00 <pave1> I don't find those reports useful.
12:58:04 <patersonc[m]> From each build triggered on kernelci we get 1 email reporting on the build status, and 1 email per test suite.
12:58:04 <patersonc[m]> The test emails have a section at the top which tells you which boards had regressions. There is a link to click in order to see the logs etc.
12:58:13 <iwamatsu> Is there a lot of unnecessary information?
13:00:31 <patersonc[m]> There's a lot more information if you scroll down, but it's direct links to each board that failed, rather then the summary link at the top
13:01:14 <patersonc[m]> So you can either click the summary link and navigate the website, or click on the specific links. All useful depending on your choice
13:01:29 <alicefm> Also already KernelCI works by regression and notify if a previous working test failed
13:02:52 <pave1> Well, there's buch of reports from kernelci on cip-dev from Oct 26.
13:03:07 <alicefm> SMC problem got notified because on the previous 4.4-CIP builds it was not failings
13:03:35 <pave1> It reports lot of failures including regressions, but I doubt there's a real regression in 5.10-rc tree.
13:03:57 <alicefm> Pavel that because KernelCI is sending a build report and a board/regression reports
13:04:53 <pave1> Could we get a red cross whenever gitlab detects something that can be debugged, please?
13:05:21 <alicefm> Build report are always sent, board/regression reports are sent with problems
13:06:06 <pave1> I'm trying to get gitlab useful, first. kernelci on 5.10 will require more work.
13:08:02 <patersonc[m]> Okay
13:08:28 <patersonc[m]> Please take a look at my email from tuesday and comment on the MR etc. Then GitLab will be red if individual test cases fail
13:08:47 <pave1> Ok.
13:09:46 <pave1> Can we get smc disabled on problematic boards?
13:10:01 <patersonc[m]> I'll have to check if any of the boards pass
13:10:23 <patersonc[m]> May be easier to disable it alltogether if we're not interested in the results
13:10:39 <patersonc[m]> But it's not just smc. We'll get failures in all the ltp tests as well
13:11:06 <pave1> Can we disable those ltp cases that always fail?
13:11:32 <iwamatsu> These can be controlled with LAVA metadata.
13:12:03 <pave1> But I don't see ltp failures.
13:12:08 <pave1> I picked up random example:
13:12:09 <pave1> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/3235839704
13:12:25 <pave1> and only see 0_spectre-meltdown-checker-test.CVE-2018-3640 [fail] there.
13:12:41 <patersonc[m]> That test job only ran boot and smc tests
13:13:54 <pave1> When are we running ltp tests?
13:14:05 <patersonc[m]> On rc testing
13:14:11 <patersonc[m]> Here's an example: https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3186281973
13:15:22 <pave1> Fun. That will need some more work.
13:15:41 <pave1> But I'd still proceed with the red cross, so we know we have work to do.
13:15:48 <patersonc[m]> Okay
13:16:09 <pave1> Thank you!
13:16:11 <patersonc[m]> Anyway, the MR is there to comment on.
13:16:11 <patersonc[m]> Let's move on because we're already well over time
13:16:24 <jki> indeed :)
13:16:37 <jki> 3
13:16:39 <jki> 2
13:16:41 <jki> 1
13:16:43 <jki> #topic AOB
13:17:02 <jki> anything else to discuss today?
13:18:03 <jki> 3
13:18:05 <jki> 2
13:18:07 <jki> 1
13:18:09 <jki> #endmeeting