#opnfv-pharos log

14:00:31 <fdegir> #startmeeting OpenStack 3rd Party CI
14:00:31 <collabot> Meeting started Wed Sep 28 14:00:31 2016 UTC.  The chair is fdegir. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:31 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:00:31 <collabot> The meeting name has been set to 'openstack_3rd_party_ci'
14:00:46 <fdegir> #topic Roll Call
14:00:56 <fdegir> anyone around?
14:01:00 <yolanda> hi
14:01:00 <Julien-zte> #info Julien
14:01:03 <Julien-zte> hi
14:01:06 <fdegir> hi
14:01:10 <fdegir> #info Fatih Degirmenci
14:01:15 <qiliang> #info qiliang
14:01:19 <jmorgan1> #info Jack Morgan
14:01:25 <yolanda> #info Yolanda
14:01:54 <fdegir> #topic Status Update
14:02:00 <fdegir> I think yolanda has some news for us
14:02:12 <fdegir> yolanda: stage is yours
14:02:22 <yolanda> so I got baremetal provisioning working with bifrost, on POD5
14:02:43 <yolanda> i have not been able to deploy a full cloud, but at least the bifrost part works fine
14:02:45 <fdegir> #info yolanda got baremetal provisioning working with bifrost, on LF POD5
14:02:50 <hwoarang> #info Markos
14:03:01 <Julien-zte> godd
14:03:08 <yolanda> also i went to the Infra Mid-cycle last week, and exposed some of our needs
14:03:32 <fdegir> #info yolanda attended to OpenStack Infra Mid-cycle meetup last week and exposed some of OPNFV needs
14:03:32 <yolanda> Infra is fine in us implementing HA and different network configs in puppet-infracloud modules, but as long as we put efforts from the OPNFV side
14:03:48 <yolanda> and don't affect the current functionality of infracloud modules, and don't make them too complex
14:03:56 <fdegir> #info OpenStack Infra is fine in us implementing HA and different network configs in puppet-infracloud modules, but as long as we put efforts from the OPNFV side
14:04:08 <fdegir> #info without affecting the current functionality of infracloud modules, and don't make them too complex
14:04:23 <fdegir> yolanda: any luck for us getting help from them to make stuff more flexible?
14:04:42 <fdegir> yolanda: as you personally found out, the things in upstream are pretty tied to their needs
14:05:00 <Julien-zte> yolanda, OpenStack Infra Mid-cycle meetup is F2F meeting?
14:05:03 <yolanda> fdegir, they will review things, but there is no people now in Infra that can put efforts on it
14:05:11 <fdegir> yolanda: I mean, do you think they will do stuff a bit more configurable going forward?
14:05:16 <yolanda> apart from myself
14:05:41 <yolanda> fdegir, i am in the process of making these things more flexible. Mostly Ricardo Carrillo and myself are the ones working on that effort
14:06:20 <qiliang> so we can push HA related code to openstack community when we implement them?
14:06:20 <fdegir> yep
14:06:36 <fdegir> qiliang: yes wasn't for your question
14:07:02 <fdegir> qiliang: but I think they would be happy if ha things go there
14:07:08 <fdegir> in fact yolanda has a patch there
14:07:24 <qiliang> i think so :)
14:07:30 <yolanda> yep, well, i just added the pacemaker module
14:07:35 <yolanda> but no patch to have ha
14:07:51 <fdegir> yolanda is core there so whatever she lets in goes in :)
14:08:05 <qiliang> great
14:08:15 <yolanda> yep, as long as is sane, and doesn't add extra complexity, it can go in
14:08:30 <qiliang> i see
14:08:34 <yolanda> same will apply for OVS instead of linuxbridge, i think it's an important feature to have
14:08:38 <fdegir> anything else you want to mention yolanda?
14:09:04 <yolanda> i think it's all from my side, i'm trying to find time to test full deploy but this week is difficult
14:09:15 <fdegir> one last question
14:09:29 <fdegir> sorry for asking this but how do you see our chances of having something for summit?
14:09:51 <fdegir> on baremetal I mean
14:10:08 <yolanda> so i have hopes that we can have something working, but i'm not totally sure if we can automate 100%
14:10:26 <yolanda> because there are still some problems with bifrost playbook, and also the enroll/deploy steps are not automated
14:10:40 <fdegir> that should be fine as long as we can say we are able to provision/deploy
14:10:44 <fdegir> automation can be fixed later on
14:10:59 <fdegir> hope fully by that time we all are up to the speed, helping out more
14:10:59 <yolanda> then i hope it can be done because the hardest part, that is the provision, is working
14:11:05 <yolanda> now is a matter of ssh to nodes and run puppet there
14:11:58 <fdegir> thanks yolanda
14:12:06 <fdegir> hwoarang: your turn
14:12:22 <hwoarang> right. so suse host support is there for bifrost
14:12:42 <fdegir> hwoarang: just to make sure I record it right
14:12:56 <hwoarang> sure. so you can run bifrost on suse hosts
14:12:56 <fdegir> we can run provisioning of trusty on suse host?
14:13:02 <fdegir> or centos
14:13:23 <hwoarang> yeah, you need some tweaks to make diskimage-builder build such images on foreign hosts
14:13:27 <hwoarang> but it should be possible
14:13:30 <fdegir> #info SuSe host support is now available for bifrost
14:13:51 <hwoarang> i haven't tried that because i have troubles with suse vms at the moment
14:13:58 <hwoarang> tl;dr; is that diskimage-builder does not support minimal opensuse images
14:14:22 <Julien-zte> fdegir, does we have any definition for the VM's Operation system?
14:14:25 <hwoarang> so this causes 2 problems. 1) the generated images are huge so it takes forever to flush them to the vdisk when you pxe boot
14:14:44 <hwoarang> 2) cloud-init and glean run in parallel on such images so you get funny results
14:15:03 <fdegir> #info diskimage-builder does not support minimal opensuse images
14:15:09 <fdegir> #info This causes 2 problems
14:15:15 <hwoarang> and glean has broken suse support, so network is broken and provisioning never completes. i have submitted a patch for that
14:15:17 <fdegir> #info 1. the generated images are huge so it takes forever to flush them to the vdisk when you pxe boot
14:15:31 <fdegir> #info 2. cloud-init and glean run in parallel on such images so you get funny results
14:15:44 <hwoarang> so vm support is nearly there but not quite
14:15:53 <fdegir> #info glean has broken suse support, so network is broken and provisioning never completes. hwoarang has submitted a patch for that
14:16:11 <hwoarang> i could perhaps try and use suse host + centos vm. if that's good enough then we can use this job for the time being if needed
14:16:12 <fdegir> hwoarang: so we wait until it is totally done to enable our unstable jenkins jobs
14:16:19 <hwoarang> but i'd rather keep the host to finish the work.
14:16:21 <fdegir> no point spamming
14:16:31 <fdegir> and taking your machine away :)
14:16:34 <hwoarang> indeed
14:16:48 <fdegir> Julien-zte: we plan to have centos on centos, trusty on trusy and suse on suse
14:17:05 <Julien-zte> good
14:17:19 <fdegir> hwoarang: you were planning to ask some stuff as well
14:17:26 <qiliang> fdegir: what we have now?
14:17:44 <qiliang> seems suse/centos/trusty host are all support
14:17:50 <fdegir> qiliang: we have trusty on trusty, centos on centos and you can run bifrost on suse
14:17:56 <hwoarang> fdegir: my questions have been answered when i read the new job deployment. i wanted to ask you what kind of images you deploy but then i saw you use the -minimal elements
14:18:13 <qiliang> fdegir: got it, thx
14:18:19 <fdegir> finally I did something useful :)
14:18:31 <fdegir> before moving on to others
14:18:46 <fdegir> I want to bring this bifrost instability issue
14:19:04 <fdegir> as you noticed in your junk mail folders, our jenkins jobs fail time to time
14:19:13 <hwoarang> ^_^
14:19:27 <fdegir> I couldn't figure out what could be the problem after checking things manually
14:19:37 <fdegir> yolanda's permission fix patch is still in the works
14:19:51 <fdegir> but even without that patch and with our chmod 755 /httpboot
14:19:51 <yolanda> yep, i was hitting some errors
14:19:56 <fdegir> that issue shouldn't be there
14:20:04 <hwoarang> latest logs suggest that 1 out of 3 vms fail. so i don't think it's the same perm issue
14:20:16 <hwoarang> if one of them manages to grab the files, so will the rest
14:20:24 <Julien-zte> yah, currently the bifrost CI is not in stable status either
14:20:45 <fdegir> and the other thing I noticed
14:20:53 <fdegir> when I logged in to one of the machines
14:21:02 <fdegir> the /httpboot and /tftpboot folders were empty
14:21:14 <fdegir> even though bifrost said build-dib-images is completed
14:21:59 <Julien-zte> anyone is still here? or am I lost ?
14:22:00 <fdegir> I'm not sure how to tackle all this different types of issues
14:22:20 <fdegir> Julien-zte: we are here
14:22:37 <fdegir> anyway, let's move on
14:22:46 <fdegir> Julien-zte: your turn
14:23:32 <yolanda> and no failures in the log for that creation?
14:23:43 <fdegir> yolanda: nope
14:23:57 <hwoarang> the last log suggest a git problem too
14:23:59 <fdegir> VMs fail pxe booting
14:24:06 <fdegir> cause the stuff is not there
14:24:25 <hwoarang> so are we sure the hardware is ok? :)
14:24:43 <fdegir> jmorgan1 has been quiet for 2 weeks
14:24:49 <jmorgan1> what would be wrong with hardware?
14:25:04 <hwoarang> lets say the git failures could be related to upstream
14:25:05 <jmorgan1> like bad disk?
14:25:36 <hwoarang> don't know. do you track their health somehow?
14:25:50 <Julien-zte> there are several times because of git failure in recent days in ZTE pods
14:26:03 <jmorgan1> LF-POD5 is in Linux Foundation lab so don't know aht they do
14:26:26 <fdegir> jmorgan1: the failures we talk about occurred on lf pod4 jumphost and pod1
14:26:28 <jmorgan1> git failures mans a timeout?
14:26:38 <fdegir> intel pod4 sorry
14:26:41 <yolanda> that tests from fdegir are not on pod5
14:26:50 <hwoarang> missing ref in the upstream git repo
14:27:02 <yolanda> that looks as an upstream failure right
14:27:08 <hwoarang> perhaps some caching, load balancing etc on upstream i'd say
14:27:13 <hwoarang> so i wouldn't worry much
14:27:24 <hwoarang> but the other failures sound more suspicious
14:27:32 <fdegir> agree
14:27:38 <jmorgan1> what other failures?
14:27:56 <fdegir> provisioning failures
14:28:10 <fdegir> we have more failed runs than successful runs
14:28:13 <jmorgan1> those might be hardware related?
14:28:25 <fdegir> that's what hwoarang suspects :)
14:28:43 <jmorgan1> but specifically what?
14:28:57 <jmorgan1> filesystem issue? hard disk? network related?
14:29:35 <hwoarang> i would suspect an FS issue but i can't say much from the log. there is no obvious reason there
14:29:55 <jmorgan1> what FS are we using?
14:30:33 <hwoarang> btw i think it's only the trusty hosts that fails so badly
14:30:42 <hwoarang> the centos one just failed because of git errors
14:31:31 <hwoarang> i don't know anything about the trusty slave so no idea about the FS
14:31:39 <hwoarang> did you try to turn it off and on again ? :)
14:32:11 <fdegir> I can try that
14:32:41 <hwoarang> it might trigger an fsck during boot
14:32:43 <hwoarang> anyway
14:32:51 <fdegir> we can continue talking about this later on after reboot
14:32:56 <hwoarang> yes
14:32:59 <fdegir> Julien-zte: you are here?
14:33:05 <Julien-zte> yes
14:33:11 <fdegir> anything you want to say?
14:33:16 <Julien-zte> just got lost
14:33:41 <Julien-zte> the last issue has been resolved, and bifrost CI runs correctly in inner lab
14:34:12 <Julien-zte> busy for other things and not enough time in these week.
14:34:32 <Julien-zte> good news is that Daisy team interests in bifrost
14:34:42 <fdegir> Julien-zte: I noticed that
14:34:46 <Julien-zte> and they will invest some resource in Daisy
14:34:53 <Julien-zte> for bifrost
14:35:02 <fdegir> and then we get stuff for free
14:35:03 <fdegir> cool
14:35:08 <Julien-zte> yes1
14:35:12 <jmorgan1> no you get more work
14:35:21 <fdegir> that's what you think
14:35:24 <Julien-zte> I'm persuading they be quick enough for decision
14:35:51 <fdegir> #info Bifrost CI runs correctly in ZTE inner lab
14:35:59 <fdegir> #info Daisy is also looking at bifrost
14:36:15 <fdegir> Julien-zte: persuade harder
14:36:38 <fdegir> qiliang: how is it going?
14:36:44 <Julien-zte> OK, I will do it, fdegir
14:36:50 <Julien-zte> and bifrost will be used in the installer project:)
14:36:56 <fdegir> qiliang: have you been able to look at what we have been doing?
14:37:01 <fdegir> Julien-zte: :)
14:37:16 <yolanda> is great to see more bifrost usage
14:37:23 <Julien-zte> hi qilaing, sorry for not troubleshooting for you issue
14:37:27 <Julien-zte> is it resolved?
14:38:11 <fdegir> qiliang: ^
14:38:32 <qiliang> yes, i've successfully run birfrost and puppet-infracloud in huawei lab.
14:38:33 <fdegir> moving to jmorgan1 then
14:38:44 <Julien-zte> yes, yolanda, currenty centos on centos is stable ?
14:38:51 <fdegir> qiliang: by following the readme files?
14:39:12 <jmorgan1> i've not done anything yet, my pod is being used by others atm
14:39:16 <fdegir> cause it is good to hear it works in other places
14:39:27 <jmorgan1> i will look to get it back next week
14:39:35 <yolanda> Julien-zte, so have not tested in the last 2 weeks but when i left, centos-on-centos was working
14:39:43 <qiliang> and take a look at some of the bifrost and puppet related code.
14:40:04 <Julien-zte> yolanda, OK, I will assign some work for test
14:40:24 <fdegir> thx jmorgan1
14:40:37 <fdegir> updates from me
14:40:49 <qiliang> sorry i just got lost
14:40:55 <fdegir> #info we now have jobs to verify opnfv/bifrost and openstack/bifrost patches
14:41:02 <fdegir> even though they fail, it feels good
14:41:11 <fdegir> #link https://build.opnfv.org/ci/view/3rd%20Party%20CI/
14:41:33 <fdegir> #info Once SuSe work is done, those 2 jobs will also be enabled
14:41:35 <qiliang> fdegir: i've successfully run birfrost and puppet-infracloud in huawei lab, and take a look at some of the bifrost and puppet related code.
14:41:38 <fdegir> having more failures
14:41:41 <hwoarang> yeah thanks for cleaning the job up
14:42:09 <fdegir> #info qiliang has successfully run birfrost and puppet-infracloud in huawei lab, and take a look at some of the bifrost and puppet related code.
14:42:21 <fdegir> I just got rid of daily jobs
14:42:39 <fdegir> since we lack machines and I wasn't so sure about the use of them for the timebeing
14:43:02 <fdegir> we can recreate them if anyone feels it is good to have them
14:43:18 <hwoarang> the triggered ones are good for now i believe
14:43:59 <fdegir> good
14:44:07 <fdegir> I think that's all of us
14:44:09 <Julien-zte> agree
14:44:11 <Julien-zte> fdegir, I just lost when you mentioned, "finally I did something useful :)" what is the useful thing or I will check the meeting records after the meeting.
14:44:13 <fdegir> anyone wants to bring anything up?
14:44:17 <jmorgan1> what about summit?
14:44:29 <fdegir> Julien-zte: https://build.opnfv.org/ci/view/3rd%20Party%20CI/
14:44:39 <jmorgan1> any details on meetup?
14:44:45 <fdegir> Julien-zte: we verify both openstack/bifrost and opnfv/bifrost patches
14:44:57 <fdegir> jmorgan1: good poing
14:45:03 <Julien-zte> fgedir that's great
14:45:10 <fdegir> jmorgan1: I will create an etherpad and send the link so people can sign up and add topics
14:45:20 <Julien-zte> I will study the the jenkins job
14:45:33 <jmorgan1> we have opnfv board F2F on Friday
14:45:54 <jmorgan1> so might have it our meetup early
14:46:08 <fdegir> I can add some random day/times there
14:46:18 <fdegir> and see which one is more popular
14:46:48 <hwoarang> so we don't have a room or something right? sounds like an ad-hoc f2f meeting :)
14:47:01 <fdegir> we can have it on the beach
14:47:02 <jmorgan1> find a table or cafe?
14:47:12 <hwoarang> sounds good :)
14:47:24 <fdegir> don't know how the weather would be
14:47:27 <Julien-zte> ok for me. I will be there till Saturday
14:47:42 <fdegir> before we end the meeting
14:48:03 <fdegir> there will be some discussions regarding 3rd party ci and a similar topic during infra wg meeting in 12 minutes
14:48:03 <jmorgan1> qiliang: will you be at openstack summit?
14:48:10 <fdegir> please call in if you are interested
14:48:15 <fdegir> https://wiki.opnfv.org/display/INF/Infra+Working+Group
14:48:31 <hwoarang> ok
14:48:32 <fdegir> that's all for the day
14:48:39 <qiliang> jmorgan1: sorry, today my boss told me NO. :(
14:48:43 <fdegir> thank you for joining
14:48:46 <fdegir> #endmeeting