==================================================
#acumos-meeting: Acumos TSC Architecture Committee
==================================================


Meeting started by bryan_att at 14:04:29 UTC.  The full logs are
available at
http://ircbot.wl.linuxfoundation.org/meetings/acumos-meeting/2018/acumos-meeting.2018-06-21-14.04.log.html
.


Meeting summary
---------------

* Agenda  (bryan_att, 14:04:41)
  * New epics added in the last few days from the AT&T CMLP team
    (bryan_att, 14:05:27)
  * present: Bryan, Aimee, Anand, Chris, Chuxin, Farheen, Nat, Ofer,
    Pantelis  (bryan_att, 14:07:21)

* New AT&T CMLP team epics  (bryan_att, 14:07:46)
  * https://jira.acumos.org/browse/ACUMOS-1181 "DataSet Management
    Capabilities"  (bryan_att, 14:09:38)
  * Anwar: CMLP team has some code implementing the use case "I want to
    : be able to create and manage different datasets (training, testing
    & scoring) that contains the metadata of the datasets with
    optionally linked with the source of the dataset, so that: I can use
    it building the model and scoring the predictions using a model"
    (bryan_att, 14:10:47)
  * Pantelis: this relates to functions being considered for the
    Training project; the code is welcome; AT&T has been present,
    looking forward to CMLP team joining  (bryan_att, 14:11:41)
  * Pantelis: the training project has assumed that a PoC will be
    developed but not released code yet; some reqs will be defered to
    Release B.  (bryan_att, 14:12:29)
  * Anwar: the CMLP team is not here today so we will ask them to join
    the next call (Community committee on Tue), give a demo etc.
    (bryan_att, 14:13:22)
  * Anwar: this will relate to the data model and to the Training
    project components  (bryan_att, 14:15:37)
  * Pantelis: we are working on drafting the component set for Training
    and should have some proposals next week  (bryan_att, 14:16:05)
  * Chris: we need to understand what type of dataset(s) are intended to
    be supported  (bryan_att, 14:16:25)
  * https://jira.acumos.org/browse/ACUMOS-1182 "Onboarding Datasets" "As
    an user, I want to : be able to have a guided workflow on creating
    different datasets that contains metadata of the dataset along with
    datasource information of the dataset so that: my experience will be
    seamless and intuitive"  (bryan_att, 14:17:15)
  * Anwar: datasets should be reusable across models  (bryan_att,
    14:17:48)
  * Pantelis: need to understand what this means, e.g. the output of
    training, or input to training  (bryan_att, 14:18:53)
  * Bryan: this could be both, a trained model dataset (as used in the
    training), or a dataset available for use in training or testing
    (bryan_att, 14:20:31)
  * Chuxin: does this also relate to sample data  (bryan_att, 14:20:54)
  * Anwar: the onboarding may provide references to the data but not the
    actual data itself; the sample data capability e.g. as used in Dev
    Challenge expects the user to upload an actual data file, vs a
    reference  (bryan_att, 14:22:20)
  * Anwar: the result would be entries in a dataset catalog, but not
    local hosting of the data  (bryan_att, 14:23:49)
  * Anwar: does the team agree this should be in scope  (bryan_att,
    14:24:18)
  * Pantelis: seems implicitly in scope  (bryan_att, 14:24:32)
  * Chuxin: how are these two Jira items related ?  (bryan_att,
    14:24:53)
  * Pantelis: we could merge 1182 into 1181  (bryan_att, 14:25:20)
  * Anwar: or we could leave them as distinct capabilities supporting
    the overall training goal, since the work needed may not overlap
    entirely  (bryan_att, 14:26:06)
  * https://jira.acumos.org/browse/ACUMOS-1183 "Guided workflow for
    onboarding and operationalizing models" As an user: I want to : have
    an guided workflow based for each of my use cases on onboarding and
    operationalizing the model so that : my user experience is intuitive
    and completely guided"  (bryan_att, 14:27:03)
  * Chris: OI think this means there is some wizard type system that
    graphically/dialog-wise guides the user through the process of
    onboarding/"operationalizing"  (bryan_att, 14:28:03)
  * Bryan: "operationalizing" is new  (bryan_att, 14:28:20)
  * Pantelis: behind this is some workflow engine  (bryan_att, 14:28:56)
  * present: Vasu Kallepalli  (bryan_att, 14:30:53)
  * Vasu: the entire end-to-end flow of a model is what's meant here; we
    have some code/UX that we can share as a demo  (bryan_att, 14:31:51)
  * Anwar: on the dataset capability; are we talking only about the
    metadata  (bryan_att, 14:32:28)
  * Adi Mishra (AT&T)  (bryan_att, 14:33:17)
  * Adi: CMLP team has a set of self-service portals allowing hadoop
    access; we can absrtact that platform for Acumos  (bryan_att,
    14:34:07)
  * Anwar: mapping an abstract capability to a aspecific implementation
    e.g. for a dataset  is what we are after  (bryan_att, 14:34:40)
  * Chuxin: CMLP team uses a tool called Rapid Minor  (bryan_att,
    14:35:40)
  * LINK: https://rapidminer.com/  (aimeeu, 14:35:44)
  * Adi: this is AGPL based  (bryan_att, 14:36:10)
  * Bryan: it's OK as long as we do not tightly integrate  (bryan_att,
    14:37:54)
  * Manoop: we still need to do a legal review on the contribution of
    that as a platform component  (bryan_att, 14:38:27)
  * #link https://github.com/rapidminer/rapidminer-studio  (aimeeu,
    14:38:29)
  * Adi: With Rapid Minor we have end-to-end workflow support for model
    creating, training, etc  (bryan_att, 14:39:48)
  * Pantelis: we need abstraction, so the functions implementing model
    development etc can be decoupled  (bryan_att, 14:40:33)
  * Anwar: is this covered by the epics we have so far ?  (bryan_att,
    14:40:54)
  * Adi: no this is additional  (bryan_att, 14:41:16)
  * Anwar: we need to create an epic for this if intended for the
    release  (bryan_att, 14:41:40)
  * Adi: will follow up  (bryan_att, 14:41:50)
  * Adi: re 1182 and 1181; these are closely related and could be
    combined; they relate to onboarding metadata and then guided
    creation of uses for the referenced data  (bryan_att, 14:44:01)
  * Adi: will enter a new feature request for streaming datasets
    (bryan_att, 14:45:00)
  * Adi: also entered https://jira.acumos.org/browse/ACUMOS-1179
    "Ability to perform Exploratory Data Analysis + Visualization +
    Sampling"  (bryan_att, 14:46:13)
  * Anwar: we will dive a little deeper into this now so Ofer can
    discuss further in the Community call  (bryan_att, 14:47:30)
  * Adi: 1179 relates to understanding the data and how a business user
    looks at the dataset, prior to starting training etc  (bryan_att,
    14:48:54)
  * Adi: e.g. analyzing application data requires understanding the
    application, what datasets it creates, what semantics apply, etc
    (bryan_att, 14:49:40)
  * Anwar: do you have support for that in CMLP?  (bryan_att, 14:49:58)
  * Adi: we currently use  Zeppelin, which can help do rudimentary data
    analysis (link?)  (bryan_att, 14:50:30)
  * #link https://zeppelin.apache.org/  (aimeeu, 14:50:56)
  * Chuxin: not sure this capability fits with the scope of Acumos; it
    goes way beyond what Acumos is designed to support  (bryan_att,
    14:51:38)
  * Anwar: we are at discussion level on items that might be in scope,
    where tools exist that we can reuse  (bryan_att, 14:52:07)
  * Adi: the existing CMLP tool does not data manage the data
    (bryan_att, 14:52:50)
  * Ofer: this seems highly related to training of models  (bryan_att,
    14:53:15)
  * Ofer: the initial training will be a demo but not part of the
    release  (bryan_att, 14:53:31)
  * Ofer: so we will develop a spec and a small demo; if the team can do
    more it will be discussed in those meetings  (bryan_att, 14:54:09)
  * Bryan: the visualization capability seems key to the end-to-end UX,
    and as long as we abstract it we should be able to include this to
    support the E2E experience  (bryan_att, 14:57:20)
  * Chuxin: concerned that this would require the model to run under the
    platform  (bryan_att, 14:57:42)
  * Anwar: we all agree this is a design time tool (the Acumos
    platform), not a runtime tool  (bryan_att, 14:58:07)
  * Anwar: is th eonboarding of the dataset more than putting them into
    a catalog?  (bryan_att, 14:59:05)
  * Anwar: e.g. validating them etc without bringing the data into a
    running model environment  (bryan_att, 14:59:30)
  * Manoop: want to clarify that the dataset will be external
    (bryan_att, 15:01:20)
  * Adi: agree, the data is pulled into the visuallization tool as
    needed, and not locally stored  (bryan_att, 15:01:39)
  * Chuxin: at design time, the design studio does not consider the data
    and does not have features  (bryan_att, 15:02:26)
  * Anwar: feedback so far is this seems related to Training, so
    recommend that the CMLP team work with Pantelis and the training
    team for deeper dives; the main purpose for today was intro to the
    CMLP team and their proposals  (bryan_att, 15:04:18)


Meeting ended at 15:05:33 UTC.


People present (lines said)
---------------------------

* bryan_att (72)
* collabot_ (4)
* aimeeu (3)


Generated by `MeetBot`_ 0.1.4