14:04:29 #startmeeting Acumos TSC Architecture Committee 14:04:29 Meeting started Thu Jun 21 14:04:29 2018 UTC. The chair is bryan_att. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:04:29 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:04:29 The meeting name has been set to 'acumos_tsc_architecture_committee' 14:04:41 #topic Agenda 14:05:27 #info New epics added in the last few days from the AT&T CMLP team 14:07:21 #info present: Bryan, Aimee, Anand, Chris, Chuxin, Farheen, Nat, Ofer, Pantelis 14:07:46 #topic New AT&T CMLP team epics 14:09:38 #info https://jira.acumos.org/browse/ACUMOS-1181 "DataSet Management Capabilities" 14:10:47 #info Anwar: CMLP team has some code implementing the use case "I want to : be able to create and manage different datasets (training, testing & scoring) that contains the metadata of the datasets with optionally linked with the source of the dataset, so that: I can use it building the model and scoring the predictions using a model" 14:11:41 #info Pantelis: this relates to functions being considered for the Training project; the code is welcome; AT&T has been present, looking forward to CMLP team joining 14:12:29 #info Pantelis: the training project has assumed that a PoC will be developed but not released code yet; some reqs will be defered to Release B. 14:13:22 #info Anwar: the CMLP team is not here today so we will ask them to join the next call (Community committee on Tue), give a demo etc. 14:15:37 #info Anwar: this will relate to the data model and to the Training project components 14:16:05 #info Pantelis: we are working on drafting the component set for Training and should have some proposals next week 14:16:25 #info Chris: we need to understand what type of dataset(s) are intended to be supported 14:17:15 #info https://jira.acumos.org/browse/ACUMOS-1182 "Onboarding Datasets" "As an user, I want to : be able to have a guided workflow on creating different datasets that contains metadata of the dataset along with datasource information of the dataset so that: my experience will be seamless and intuitive" 14:17:48 #info Anwar: datasets should be reusable across models 14:18:53 #info Pantelis: need to understand what this means, e.g. the output of training, or input to training 14:20:31 #info Bryan: this could be both, a trained model dataset (as used in the training), or a dataset available for use in training or testing 14:20:54 #info Chuxin: does this also relate to sample data 14:22:20 #info Anwar: the onboarding may provide references to the data but not the actual data itself; the sample data capability e.g. as used in Dev Challenge expects the user to upload an actual data file, vs a reference 14:23:49 #info Anwar: the result would be entries in a dataset catalog, but not local hosting of the data 14:24:18 #info Anwar: does the team agree this should be in scope 14:24:32 #info Pantelis: seems implicitly in scope 14:24:53 #info Chuxin: how are these two Jira items related ? 14:25:20 #info Pantelis: we could merge 1182 into 1181 14:26:06 #info Anwar: or we could leave them as distinct capabilities supporting the overall training goal, since the work needed may not overlap entirely 14:27:03 #info https://jira.acumos.org/browse/ACUMOS-1183 "Guided workflow for onboarding and operationalizing models" As an user: I want to : have an guided workflow based for each of my use cases on onboarding and operationalizing the model so that : my user experience is intuitive and completely guided" 14:28:03 #info Chris: OI think this means there is some wizard type system that graphically/dialog-wise guides the user through the process of onboarding/"operationalizing" 14:28:20 #info Bryan: "operationalizing" is new 14:28:56 #info Pantelis: behind this is some workflow engine 14:30:01 s/Chris: O/Guy: / 14:30:53 #info present: Vasu Kallepalli 14:31:51 #info Vasu: the entire end-to-end flow of a model is what's meant here; we have some code/UX that we can share as a demo 14:32:28 #info Anwar: on the dataset capability; are we talking only about the metadata 14:33:11 s/Vasu:/Adi:/ 14:33:17 #info Adi Mishra (AT&T) 14:34:07 #info Adi: CMLP team has a set of self-service portals allowing hadoop access; we can absrtact that platform for Acumos 14:34:40 #info Anwar: mapping an abstract capability to a aspecific implementation e.g. for a dataset is what we are after 14:35:40 #info Chuxin: CMLP team uses a tool called Rapid Minor 14:35:44 #link https://rapidminer.com/ 14:36:10 #info Adi: this is AGPL based 14:37:54 #info Bryan: it's OK as long as we do not tightly integrate 14:38:27 #info Manoop: we still need to do a legal review on the contribution of that as a platform component 14:38:29 #info #link https://github.com/rapidminer/rapidminer-studio 14:39:48 #info Adi: With Rapid Minor we have end-to-end workflow support for model creating, training, etc 14:40:33 #info Pantelis: we need abstraction, so the functions implementing model development etc can be decoupled 14:40:54 #info Anwar: is this covered by the epics we have so far ? 14:41:16 #info Adi: no this is additional 14:41:40 #info Anwar: we need to create an epic for this if intended for the release 14:41:50 #info Adi: will follow up 14:44:01 #info Adi: re 1182 and 1181; these are closely related and could be combined; they relate to onboarding metadata and then guided creation of uses for the referenced data 14:45:00 #info Adi: will enter a new feature request for streaming datasets 14:46:13 #info Adi: also entered https://jira.acumos.org/browse/ACUMOS-1179 "Ability to perform Exploratory Data Analysis + Visualization + Sampling" 14:47:30 #info Anwar: we will dive a little deeper into this now so Ofer can discuss further in the Community call 14:48:54 #info Adi: 1179 relates to understanding the data and how a business user looks at the dataset, prior to starting training etc 14:49:40 #info Adi: e.g. analyzing application data requires understanding the application, what datasets it creates, what semantics apply, etc 14:49:58 #info Anwar: do you have support for that in CMLP? 14:50:30 #info Adi: we currently use Zeppelin, which can help do rudimentary data analysis (link?) 14:50:56 #info #link https://zeppelin.apache.org/ 14:51:38 #info Chuxin: not sure this capability fits with the scope of Acumos; it goes way beyond what Acumos is designed to support 14:52:07 #info Anwar: we are at discussion level on items that might be in scope, where tools exist that we can reuse 14:52:50 #info Adi: the existing CMLP tool does not data manage the data 14:53:15 #info Ofer: this seems highly related to training of models 14:53:31 #info Ofer: the initial training will be a demo but not part of the release 14:54:09 #info Ofer: so we will develop a spec and a small demo; if the team can do more it will be discussed in those meetings 14:57:20 #info Bryan: the visualization capability seems key to the end-to-end UX, and as long as we abstract it we should be able to include this to support the E2E experience 14:57:42 #info Chuxin: concerned that this would require the model to run under the platform 14:58:07 #info Anwar: we all agree this is a design time tool (the Acumos platform), not a runtime tool 14:59:05 #info Anwar: is th eonboarding of the dataset more than putting them into a catalog? 14:59:30 #info Anwar: e.g. validating them etc without bringing the data into a running model environment 15:01:20 #info Manoop: want to clarify that the dataset will be external 15:01:39 #info Adi: agree, the data is pulled into the visuallization tool as needed, and not locally stored 15:02:26 #info Chuxin: at design time, the design studio does not consider the data and does not have features 15:04:18 #info Anwar: feedback so far is this seems related to Training, so recommend that the CMLP team work with Pantelis and the training team for deeper dives; the main purpose for today was intro to the CMLP team and their proposals 15:05:00 #chair bryan_att, aimeeu 15:05:00 Current chairs: aimeeu bryan_att 15:05:33 #endmeeting