#acumos-meeting log

14:04:29 <bryan_att> #startmeeting Acumos TSC Architecture Committee
14:04:29 <collabot_> Meeting started Thu Jun 21 14:04:29 2018 UTC.  The chair is bryan_att. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:04:29 <collabot_> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:04:29 <collabot_> The meeting name has been set to 'acumos_tsc_architecture_committee'
14:04:41 <bryan_att> #topic Agenda
14:05:27 <bryan_att> #info New epics added in the last few days from the AT&T CMLP team
14:07:21 <bryan_att> #info present: Bryan, Aimee, Anand, Chris, Chuxin, Farheen, Nat, Ofer, Pantelis
14:07:46 <bryan_att> #topic New AT&T CMLP team epics
14:09:38 <bryan_att> #info https://jira.acumos.org/browse/ACUMOS-1181 "DataSet Management Capabilities"
14:10:47 <bryan_att> #info Anwar: CMLP team has some code implementing the use case "I want to : be able to create and manage different datasets (training, testing & scoring) that contains the metadata of the datasets with optionally linked with the source of the dataset, so that: I can use it building the model and scoring the predictions using a model"
14:11:41 <bryan_att> #info Pantelis: this relates to functions being considered for the Training project; the code is welcome; AT&T has been present, looking forward to CMLP team joining
14:12:29 <bryan_att> #info Pantelis: the training project has assumed that a PoC will be developed but not released code yet; some reqs will be defered to Release B.
14:13:22 <bryan_att> #info Anwar: the CMLP team is not here today so we will ask them to join the next call (Community committee on Tue), give a demo etc.
14:15:37 <bryan_att> #info Anwar: this will relate to the data model and to the Training project components
14:16:05 <bryan_att> #info Pantelis: we are working on drafting the component set for Training and should have some proposals next week
14:16:25 <bryan_att> #info Chris: we need to understand what type of dataset(s) are intended to be supported
14:17:15 <bryan_att> #info https://jira.acumos.org/browse/ACUMOS-1182 "Onboarding Datasets" "As an user, I want to : be able to have a guided workflow on creating different datasets that contains metadata of the dataset along with datasource information of the dataset so that: my experience will be seamless and intuitive"
14:17:48 <bryan_att> #info Anwar: datasets should be reusable across models
14:18:53 <bryan_att> #info Pantelis: need to understand what this means, e.g. the output of training, or input to training
14:20:31 <bryan_att> #info Bryan: this could be both, a trained model dataset (as used in the training), or a dataset available for use in training or testing
14:20:54 <bryan_att> #info Chuxin: does this also relate to sample data
14:22:20 <bryan_att> #info Anwar: the onboarding may provide references to the data but not the actual data itself; the sample data capability e.g. as used in Dev Challenge expects the user to upload an actual data file, vs a reference
14:23:49 <bryan_att> #info Anwar: the result would be entries in a dataset catalog, but not local hosting of the data
14:24:18 <bryan_att> #info Anwar: does the team agree this should be in scope
14:24:32 <bryan_att> #info Pantelis: seems implicitly in scope
14:24:53 <bryan_att> #info Chuxin: how are these two Jira items related ?
14:25:20 <bryan_att> #info Pantelis: we could merge 1182 into 1181
14:26:06 <bryan_att> #info Anwar: or we could leave them as distinct capabilities supporting the overall training goal, since the work needed may not overlap entirely
14:27:03 <bryan_att> #info https://jira.acumos.org/browse/ACUMOS-1183 "Guided workflow for onboarding and operationalizing models" As an user: I want to : have an guided workflow based for each of my use cases on onboarding and operationalizing the model so that : my user experience is intuitive and completely guided"
14:28:03 <bryan_att> #info Chris: OI think this means there is some wizard type system that graphically/dialog-wise guides the user through the process of onboarding/"operationalizing"
14:28:20 <bryan_att> #info Bryan: "operationalizing" is new
14:28:56 <bryan_att> #info Pantelis: behind this is some workflow engine
14:30:01 <bryan_att> s/Chris: O/Guy: /
14:30:53 <bryan_att> #info present: Vasu Kallepalli
14:31:51 <bryan_att> #info Vasu: the entire end-to-end flow of a model is what's meant here; we have some code/UX that we can share as a demo
14:32:28 <bryan_att> #info Anwar: on the dataset capability; are we talking only about the metadata
14:33:11 <bryan_att> s/Vasu:/Adi:/
14:33:17 <bryan_att> #info Adi Mishra (AT&T)
14:34:07 <bryan_att> #info Adi: CMLP team has a set of self-service portals allowing hadoop access; we can absrtact that platform for Acumos
14:34:40 <bryan_att> #info Anwar: mapping an abstract capability to a aspecific implementation e.g. for a dataset  is what we are after
14:35:40 <bryan_att> #info Chuxin: CMLP team uses a tool called Rapid Minor
14:35:44 <aimeeu> #link https://rapidminer.com/
14:36:10 <bryan_att> #info Adi: this is AGPL based
14:37:54 <bryan_att> #info Bryan: it's OK as long as we do not tightly integrate
14:38:27 <bryan_att> #info Manoop: we still need to do a legal review on the contribution of that as a platform component
14:38:29 <aimeeu> #info #link https://github.com/rapidminer/rapidminer-studio
14:39:48 <bryan_att> #info Adi: With Rapid Minor we have end-to-end workflow support for model creating, training, etc
14:40:33 <bryan_att> #info Pantelis: we need abstraction, so the functions implementing model development etc can be decoupled
14:40:54 <bryan_att> #info Anwar: is this covered by the epics we have so far ?
14:41:16 <bryan_att> #info Adi: no this is additional
14:41:40 <bryan_att> #info Anwar: we need to create an epic for this if intended for the release
14:41:50 <bryan_att> #info Adi: will follow up
14:44:01 <bryan_att> #info Adi: re 1182 and 1181; these are closely related and could be combined; they relate to onboarding metadata and then guided creation of uses for the referenced data
14:45:00 <bryan_att> #info Adi: will enter a new feature request for streaming datasets
14:46:13 <bryan_att> #info Adi: also entered https://jira.acumos.org/browse/ACUMOS-1179 "Ability to perform Exploratory Data Analysis + Visualization + Sampling"
14:47:30 <bryan_att> #info Anwar: we will dive a little deeper into this now so Ofer can discuss further in the Community call
14:48:54 <bryan_att> #info Adi: 1179 relates to understanding the data and how a business user looks at the dataset, prior to starting training etc
14:49:40 <bryan_att> #info Adi: e.g. analyzing application data requires understanding the application, what datasets it creates, what semantics apply, etc
14:49:58 <bryan_att> #info Anwar: do you have support for that in CMLP?
14:50:30 <bryan_att> #info Adi: we currently use  Zeppelin, which can help do rudimentary data analysis (link?)
14:50:56 <aimeeu> #info #link https://zeppelin.apache.org/
14:51:38 <bryan_att> #info Chuxin: not sure this capability fits with the scope of Acumos; it goes way beyond what Acumos is designed to support
14:52:07 <bryan_att> #info Anwar: we are at discussion level on items that might be in scope, where tools exist that we can reuse
14:52:50 <bryan_att> #info Adi: the existing CMLP tool does not data manage the data
14:53:15 <bryan_att> #info Ofer: this seems highly related to training of models
14:53:31 <bryan_att> #info Ofer: the initial training will be a demo but not part of the release
14:54:09 <bryan_att> #info Ofer: so we will develop a spec and a small demo; if the team can do more it will be discussed in those meetings
14:57:20 <bryan_att> #info Bryan: the visualization capability seems key to the end-to-end UX, and as long as we abstract it we should be able to include this to support the E2E experience
14:57:42 <bryan_att> #info Chuxin: concerned that this would require the model to run under the platform
14:58:07 <bryan_att> #info Anwar: we all agree this is a design time tool (the Acumos platform), not a runtime tool
14:59:05 <bryan_att> #info Anwar: is th eonboarding of the dataset more than putting them into a catalog?
14:59:30 <bryan_att> #info Anwar: e.g. validating them etc without bringing the data into a running model environment
15:01:20 <bryan_att> #info Manoop: want to clarify that the dataset will be external
15:01:39 <bryan_att> #info Adi: agree, the data is pulled into the visuallization tool as needed, and not locally stored
15:02:26 <bryan_att> #info Chuxin: at design time, the design studio does not consider the data and does not have features
15:04:18 <bryan_att> #info Anwar: feedback so far is this seems related to Training, so recommend that the CMLP team work with Pantelis and the training team for deeper dives; the main purpose for today was intro to the CMLP team and their proposals
15:05:00 <bryan_att> #chair bryan_att, aimeeu
15:05:00 <collabot_> Current chairs: aimeeu bryan_att
15:05:33 <bryan_att> #endmeeting