================================================== #acumos-meeting: Acumos TSC Architecture Committee ================================================== Meeting started by bryan_att at 14:04:29 UTC. The full logs are available at http://ircbot.wl.linuxfoundation.org/meetings/acumos-meeting/2018/acumos-meeting.2018-06-21-14.04.log.html . Meeting summary --------------- * Agenda (bryan_att, 14:04:41) * New epics added in the last few days from the AT&T CMLP team (bryan_att, 14:05:27) * present: Bryan, Aimee, Anand, Chris, Chuxin, Farheen, Nat, Ofer, Pantelis (bryan_att, 14:07:21) * New AT&T CMLP team epics (bryan_att, 14:07:46) * https://jira.acumos.org/browse/ACUMOS-1181 "DataSet Management Capabilities" (bryan_att, 14:09:38) * Anwar: CMLP team has some code implementing the use case "I want to : be able to create and manage different datasets (training, testing & scoring) that contains the metadata of the datasets with optionally linked with the source of the dataset, so that: I can use it building the model and scoring the predictions using a model" (bryan_att, 14:10:47) * Pantelis: this relates to functions being considered for the Training project; the code is welcome; AT&T has been present, looking forward to CMLP team joining (bryan_att, 14:11:41) * Pantelis: the training project has assumed that a PoC will be developed but not released code yet; some reqs will be defered to Release B. (bryan_att, 14:12:29) * Anwar: the CMLP team is not here today so we will ask them to join the next call (Community committee on Tue), give a demo etc. (bryan_att, 14:13:22) * Anwar: this will relate to the data model and to the Training project components (bryan_att, 14:15:37) * Pantelis: we are working on drafting the component set for Training and should have some proposals next week (bryan_att, 14:16:05) * Chris: we need to understand what type of dataset(s) are intended to be supported (bryan_att, 14:16:25) * https://jira.acumos.org/browse/ACUMOS-1182 "Onboarding Datasets" "As an user, I want to : be able to have a guided workflow on creating different datasets that contains metadata of the dataset along with datasource information of the dataset so that: my experience will be seamless and intuitive" (bryan_att, 14:17:15) * Anwar: datasets should be reusable across models (bryan_att, 14:17:48) * Pantelis: need to understand what this means, e.g. the output of training, or input to training (bryan_att, 14:18:53) * Bryan: this could be both, a trained model dataset (as used in the training), or a dataset available for use in training or testing (bryan_att, 14:20:31) * Chuxin: does this also relate to sample data (bryan_att, 14:20:54) * Anwar: the onboarding may provide references to the data but not the actual data itself; the sample data capability e.g. as used in Dev Challenge expects the user to upload an actual data file, vs a reference (bryan_att, 14:22:20) * Anwar: the result would be entries in a dataset catalog, but not local hosting of the data (bryan_att, 14:23:49) * Anwar: does the team agree this should be in scope (bryan_att, 14:24:18) * Pantelis: seems implicitly in scope (bryan_att, 14:24:32) * Chuxin: how are these two Jira items related ? (bryan_att, 14:24:53) * Pantelis: we could merge 1182 into 1181 (bryan_att, 14:25:20) * Anwar: or we could leave them as distinct capabilities supporting the overall training goal, since the work needed may not overlap entirely (bryan_att, 14:26:06) * https://jira.acumos.org/browse/ACUMOS-1183 "Guided workflow for onboarding and operationalizing models" As an user: I want to : have an guided workflow based for each of my use cases on onboarding and operationalizing the model so that : my user experience is intuitive and completely guided" (bryan_att, 14:27:03) * Chris: OI think this means there is some wizard type system that graphically/dialog-wise guides the user through the process of onboarding/"operationalizing" (bryan_att, 14:28:03) * Bryan: "operationalizing" is new (bryan_att, 14:28:20) * Pantelis: behind this is some workflow engine (bryan_att, 14:28:56) * present: Vasu Kallepalli (bryan_att, 14:30:53) * Vasu: the entire end-to-end flow of a model is what's meant here; we have some code/UX that we can share as a demo (bryan_att, 14:31:51) * Anwar: on the dataset capability; are we talking only about the metadata (bryan_att, 14:32:28) * Adi Mishra (AT&T) (bryan_att, 14:33:17) * Adi: CMLP team has a set of self-service portals allowing hadoop access; we can absrtact that platform for Acumos (bryan_att, 14:34:07) * Anwar: mapping an abstract capability to a aspecific implementation e.g. for a dataset is what we are after (bryan_att, 14:34:40) * Chuxin: CMLP team uses a tool called Rapid Minor (bryan_att, 14:35:40) * LINK: https://rapidminer.com/ (aimeeu, 14:35:44) * Adi: this is AGPL based (bryan_att, 14:36:10) * Bryan: it's OK as long as we do not tightly integrate (bryan_att, 14:37:54) * Manoop: we still need to do a legal review on the contribution of that as a platform component (bryan_att, 14:38:27) * #link https://github.com/rapidminer/rapidminer-studio (aimeeu, 14:38:29) * Adi: With Rapid Minor we have end-to-end workflow support for model creating, training, etc (bryan_att, 14:39:48) * Pantelis: we need abstraction, so the functions implementing model development etc can be decoupled (bryan_att, 14:40:33) * Anwar: is this covered by the epics we have so far ? (bryan_att, 14:40:54) * Adi: no this is additional (bryan_att, 14:41:16) * Anwar: we need to create an epic for this if intended for the release (bryan_att, 14:41:40) * Adi: will follow up (bryan_att, 14:41:50) * Adi: re 1182 and 1181; these are closely related and could be combined; they relate to onboarding metadata and then guided creation of uses for the referenced data (bryan_att, 14:44:01) * Adi: will enter a new feature request for streaming datasets (bryan_att, 14:45:00) * Adi: also entered https://jira.acumos.org/browse/ACUMOS-1179 "Ability to perform Exploratory Data Analysis + Visualization + Sampling" (bryan_att, 14:46:13) * Anwar: we will dive a little deeper into this now so Ofer can discuss further in the Community call (bryan_att, 14:47:30) * Adi: 1179 relates to understanding the data and how a business user looks at the dataset, prior to starting training etc (bryan_att, 14:48:54) * Adi: e.g. analyzing application data requires understanding the application, what datasets it creates, what semantics apply, etc (bryan_att, 14:49:40) * Anwar: do you have support for that in CMLP? (bryan_att, 14:49:58) * Adi: we currently use Zeppelin, which can help do rudimentary data analysis (link?) (bryan_att, 14:50:30) * #link https://zeppelin.apache.org/ (aimeeu, 14:50:56) * Chuxin: not sure this capability fits with the scope of Acumos; it goes way beyond what Acumos is designed to support (bryan_att, 14:51:38) * Anwar: we are at discussion level on items that might be in scope, where tools exist that we can reuse (bryan_att, 14:52:07) * Adi: the existing CMLP tool does not data manage the data (bryan_att, 14:52:50) * Ofer: this seems highly related to training of models (bryan_att, 14:53:15) * Ofer: the initial training will be a demo but not part of the release (bryan_att, 14:53:31) * Ofer: so we will develop a spec and a small demo; if the team can do more it will be discussed in those meetings (bryan_att, 14:54:09) * Bryan: the visualization capability seems key to the end-to-end UX, and as long as we abstract it we should be able to include this to support the E2E experience (bryan_att, 14:57:20) * Chuxin: concerned that this would require the model to run under the platform (bryan_att, 14:57:42) * Anwar: we all agree this is a design time tool (the Acumos platform), not a runtime tool (bryan_att, 14:58:07) * Anwar: is th eonboarding of the dataset more than putting them into a catalog? (bryan_att, 14:59:05) * Anwar: e.g. validating them etc without bringing the data into a running model environment (bryan_att, 14:59:30) * Manoop: want to clarify that the dataset will be external (bryan_att, 15:01:20) * Adi: agree, the data is pulled into the visuallization tool as needed, and not locally stored (bryan_att, 15:01:39) * Chuxin: at design time, the design studio does not consider the data and does not have features (bryan_att, 15:02:26) * Anwar: feedback so far is this seems related to Training, so recommend that the CMLP team work with Pantelis and the training team for deeper dives; the main purpose for today was intro to the CMLP team and their proposals (bryan_att, 15:04:18) Meeting ended at 15:05:33 UTC. People present (lines said) --------------------------- * bryan_att (72) * collabot_ (4) * aimeeu (3) Generated by `MeetBot`_ 0.1.4