#cip log

12:03:40 <jki> #startmeeting CIP IRC weekly meeting
12:03:40 <collab-meetbot> Meeting started Thu Oct 20 12:03:40 2022 UTC and is due to finish in 60 minutes.  The chair is jki. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:03:40 <collab-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:03:40 <collab-meetbot> The meeting name has been set to 'cip_irc_weekly_meeting'
12:03:46 <jki> #topic AI review
12:03:56 <jki> 1. Add qemu-riscv to cip-kernel-config - patersonc
12:04:30 <patersonc[m]> No updates
12:04:54 <jki> 2. Ask Florian to support with 4.4 kernel-ci reports - jki
12:05:06 <jki> done, Florian will look into that soon
12:05:24 <jki> he said only one issue was remaining if he was recalling correctly
12:05:35 <jki> any other AIs?
12:06:01 <pave1> If 4.4 reports are useful or close to being useful... should I take another look?
12:06:21 <jki> I asked Florian to approach you on that
12:06:52 <jki> but you may also give feedback earlier, if you like
12:07:17 <pave1> Ok :-).
12:08:22 <jki> ok, moving on in...
12:08:33 <jki> 3
12:08:36 <jki> 2
12:08:38 <jki> 1
12:08:46 <jki> #topic Kernel maintenance updates
12:09:21 <pave1> I did reviews on 5.10.149 and .150.
12:09:27 <masami> This week reported 23 new CVEs that three of them are remote code execution vulnerabilities.
12:09:37 <masami> These vulnerabilities have been fixed in stable kernels.
12:09:40 <pave1> (.150 is huge, still a lot to do).
12:10:33 <uli> i did 5.10.150
12:13:59 <jki> anything else for this topic?
12:15:50 <jki> 3
12:15:51 <jki> 2
12:15:53 <jki> 1
12:15:56 <jki> #topic Kernel testing
12:16:21 <alicef> replaying to florian issue
12:16:38 <patersonc[m]> It appears that a lot of our RT tests aren't completing properly. The tests run, but the Python script that collates the results at the end doesn't run.
12:16:38 <patersonc[m]> Need to investigate further.
12:17:31 <patersonc[m]> Have you noticed this before Pavel? Do you just check the latency results and not worry about the script at the end?
12:17:58 <pave1> I just watch for green ticks :-). Have not noticed that before, sorry.
12:18:45 <patersonc[m]> Does anyone check test results? Or just that the lava job ran until the end?
12:19:16 <pave1> patersonc: I suspect noone does.
12:20:30 <patersonc[m]> So how are we confirming that we don't see regressions? If we don't look at the results it's just a load of boot tests
12:20:47 <alicef> most of the issues that Florian pointed out about 4.4 looks solved upstream by the way
12:20:59 <alicef> https://github.com/kernelci/kernelci-core/issues/1053#issuecomment-1285433617
12:21:55 <pave1> Yeah, and a bunch of compile tests :-). A lot of kernel issues manifest by kernel not booting.
12:22:00 <alicef> we have no more errors on 4.4 only some warnings
12:22:11 <pave1> alicef: Ok, thanks.
12:22:37 <jki> pavel: do we need/want to resolve those as well?
12:22:39 <alicef> pave1: sorry I couldn't find not booting kernel on the last 4.4
12:22:55 <pave1> alicef: So I can just ignore the warnings, and only care if errors appear?
12:22:59 <alicef> I'm currently looking a some strange smc regression on 4.4 qemu
12:23:56 <pave1> jki: Ignoring warnings should be simple enough. Getting rid of warnings would be nice, but is not high priority.
12:24:43 <alicef> pave1: I also don't know about that. Depend from the warnings and errors. I will give a shallow look if there is any warning that is interesting
12:25:50 <pave1> patersonc: ?
12:26:01 <pave1> alicef: Thanks.
12:26:57 <alicef> pave1: actually you can just give a look at the end of this page and looks if there is something that is interesting. but I don't think there is https://linux.kernelci.org/build/cip/branch/linux-4.4.y-cip/kernel/v4.4.302-cip70-98-g7f7838c92740f/
12:27:29 <alicef> I will look if something as been not inserted that
12:27:47 <alicef> like I cannot find the smc regression on that list
12:27:56 <pave1> alicef: Ok, let me do that over the week.
12:28:38 <alicef> they are really few warning. they are mostly like suggest parentesis and similar. thanks
12:29:00 <patersonc[m]> Great
12:29:27 <jki> cool!
12:29:33 <pave1> patersonc: Could we get three results in the gitlab-ci?
12:29:49 <pave1> patersonc: Green -- nothing to see, noone needs to look here.
12:30:16 <pave1> Red -- something is wrong in the kernel, either it failed to boot or some test failed.
12:30:38 <pave1> Yellow or something -- something is wrong in the labs. Power failed, docker stuff is acting funny, ...
12:31:06 <patersonc[m]> GitLab CI doesn't support this, sorry
12:31:32 <pave1> Its important that we dont get green when theres some problem hidden in the logs.
12:31:55 <alicef> pave1: following is the strange smc regression
12:31:59 <pave1> Ok, next best thing: can we get last line of the log saying what is it?
12:32:02 <jki> why should gitlab not support this?
12:32:25 <jki> Lava runs can be translated into pipeline states - if they return clear results
12:32:59 <patersonc[m]> pave1: We could trawl the test case results and make the whole thing a red cross if there is a single error?
12:33:02 <alicef> on v4.4.302-cip70 all cve pass https://storage.kernelci.org/cip/linux-4.4.y-cip/v4.4.302-cip70/x86_64/x86_64_defconfig/gcc-10/lab-collabora/smc-qemu_x86_64.html
12:33:09 <jki> or did you mean the yellow state?
12:33:49 <alicef> on v4.4.302-cip70-98-g7f7838c92740f we have a CVE-2020-0543: fail https://storage.kernelci.org/cip/linux-4.4.y-cip/v4.4.302-cip70-98-g7f7838c92740f/x86_64/x86_64_defconfig/gcc-10/lab-collabora/smc-qemu_x86_64.html
12:34:42 <alicef> CVE-2020-0543: VULN (Your CPU microcode may need to be updated to mitigate the vulnerability) but is same board
12:34:46 <patersonc[m]> Let me look into it. But last time I looked it wasn't something we can control, other then pass (green tick) or fail (red cross)
12:35:43 <pave1> patersonc: Can we have two phases? We currently have build, test.
12:35:57 <pave1> Have build, tests finish, tests report passing result?
12:36:01 <alicef> looks like same board some toolchain but different smc test results O_o
12:37:05 <patersonc[m]> pave1: Yea we could do something like this. And then just have to look into the collated test results in the last job
12:38:43 <pave1> alicef: If it is spectre/meltdown checker on Qemu.. then I suggest we don't have to care much.
12:39:25 <alicef> pave1: ok thanks. than we have only the warning I pointed you
12:40:05 <pave1> alicef: Ok, let me look into that and report via email.
12:40:20 <alicef> ok
12:41:27 <patersonc[m]> It looks like we can set some exit codes now in GitLab CI: https://docs.gitlab.com/ee/ci/yaml/#allow_failureexit_codes
12:42:04 <jki> great to see this progress!
12:43:06 <patersonc[m]> Although actually, that looks like it's used to determine if a job fails or passes based on return codes from scripts. It's not controlling whether we can set a yellow warning in the pipelines view.
12:43:08 <patersonc[m]> So not so useful
12:44:43 <jki> yeah, a yellow state is not known to me either
12:44:51 <jki> fail or pass, that's it
12:46:55 <patersonc[m]> Bit annoying.
12:47:12 <jki> well, open an issue with gitlab.com ;)
12:47:34 <patersonc[m]> We can create some scripts to parse the results better, but at some point we may as well just use KernelCI as it is better for this kind of thing
12:47:58 <pave1> Is it possible to not finish, or finish without returning the result?
12:47:59 <patersonc[m]> This was the main drive for using KernelCI - much more advanced than our gitlab CI setup
12:48:41 <patersonc[m]> pave1: Not finishing would eventually lead to a timeout, which would be a red cross
12:48:51 <jki> if you don't finish a job, you will empty out pockets with AWS bills ;)
12:49:12 <patersonc[m]> The only option really is to fail the whole job if a single test case didn't pass
12:49:17 <patersonc[m]> jki: That too :D
12:50:13 <pave1> patersonc: That is good option. And add a line at the end of log explaining "test returned failure" so that we know it is different from "lab does not have power".
12:52:06 <patersonc[m]> We pretty much have this already
12:52:06 <patersonc[m]> e.g. https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3184699360
12:52:25 <patersonc[m]> Test case results are all there
12:52:43 <patersonc[m]> We just don't have a overall fail if a single test case fails - only if the entire lava job failed
12:53:27 <patersonc[m]> Example with some fails:
12:53:28 <patersonc[m]> https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3141003858
12:54:10 <patersonc[m]> But then, in the case of SMC, some of those test cases are expected to fail.
12:54:24 <patersonc[m]> So we'd need to have a way of knowing when failures are expected...
12:54:31 <patersonc[m]> Or whether they are regressions
12:55:11 <pave1> Drop SMC for now :-).
12:56:02 <pave1> Or blacklist SMC from qemu targets.
12:56:13 <patersonc[m]> I'm sure that all the LTP test cases don't pass..
12:57:26 <patersonc[m]> e.g. https://lava.ciplatform.org/results/763461
13:01:27 <jki> we've reached the top of the hour - anything else on testing?
13:02:12 <jki> 3
13:02:14 <jki> 2
13:02:16 <jki> 1
13:02:19 <jki> #topic AOB
13:03:17 <jki> anyone anything?
13:04:05 <jki> 3
13:04:07 <jki> 2
13:04:09 <jki> 1
13:04:11 <jki> #endmeeting