#acumos-meeting: Acumos TSC Architecture Committee
Meeting started by bryan_att at 14:04:29 UTC
(full logs).
Meeting summary
- Agenda (bryan_att, 14:04:41)
- New epics added in the last few days from the
AT&T CMLP team (bryan_att,
14:05:27)
- present: Bryan, Aimee, Anand, Chris, Chuxin,
Farheen, Nat, Ofer, Pantelis (bryan_att,
14:07:21)
- New AT&T CMLP team epics (bryan_att, 14:07:46)
- https://jira.acumos.org/browse/ACUMOS-1181
"DataSet Management Capabilities" (bryan_att,
14:09:38)
- Anwar: CMLP team has some code implementing the
use case "I want to : be able to create and manage different
datasets (training, testing & scoring) that contains the
metadata of the datasets with optionally linked with the source of
the dataset, so that: I can use it building the model and scoring
the predictions using a model" (bryan_att,
14:10:47)
- Pantelis: this relates to functions being
considered for the Training project; the code is welcome; AT&T
has been present, looking forward to CMLP team joining (bryan_att,
14:11:41)
- Pantelis: the training project has assumed that
a PoC will be developed but not released code yet; some reqs will be
defered to Release B. (bryan_att,
14:12:29)
- Anwar: the CMLP team is not here today so we
will ask them to join the next call (Community committee on Tue),
give a demo etc. (bryan_att,
14:13:22)
- Anwar: this will relate to the data model and
to the Training project components (bryan_att,
14:15:37)
- Pantelis: we are working on drafting the
component set for Training and should have some proposals next
week (bryan_att,
14:16:05)
- Chris: we need to understand what type of
dataset(s) are intended to be supported (bryan_att,
14:16:25)
- https://jira.acumos.org/browse/ACUMOS-1182
"Onboarding Datasets" "As an user, I want to : be able to have a
guided workflow on creating different datasets that contains
metadata of the dataset along with datasource information of the
dataset so that: my experience will be seamless and
intuitive" (bryan_att,
14:17:15)
- Anwar: datasets should be reusable across
models (bryan_att,
14:17:48)
- Pantelis: need to understand what this means,
e.g. the output of training, or input to training (bryan_att,
14:18:53)
- Bryan: this could be both, a trained model
dataset (as used in the training), or a dataset available for use in
training or testing (bryan_att,
14:20:31)
- Chuxin: does this also relate to sample
data (bryan_att,
14:20:54)
- Anwar: the onboarding may provide references to
the data but not the actual data itself; the sample data capability
e.g. as used in Dev Challenge expects the user to upload an actual
data file, vs a reference (bryan_att,
14:22:20)
- Anwar: the result would be entries in a dataset
catalog, but not local hosting of the data (bryan_att,
14:23:49)
- Anwar: does the team agree this should be in
scope (bryan_att,
14:24:18)
- Pantelis: seems implicitly in scope
(bryan_att,
14:24:32)
- Chuxin: how are these two Jira items related
? (bryan_att,
14:24:53)
- Pantelis: we could merge 1182 into 1181
(bryan_att,
14:25:20)
- Anwar: or we could leave them as distinct
capabilities supporting the overall training goal, since the work
needed may not overlap entirely (bryan_att,
14:26:06)
- https://jira.acumos.org/browse/ACUMOS-1183
"Guided workflow for onboarding and operationalizing models" As an
user: I want to : have an guided workflow based for each of my use
cases on onboarding and operationalizing the model so that : my user
experience is intuitive and completely guided" (bryan_att,
14:27:03)
- Chris: OI think this means there is some wizard
type system that graphically/dialog-wise guides the user through the
process of onboarding/"operationalizing" (bryan_att,
14:28:03)
- Bryan: "operationalizing" is new (bryan_att,
14:28:20)
- Pantelis: behind this is some workflow
engine (bryan_att,
14:28:56)
- present: Vasu Kallepalli (bryan_att,
14:30:53)
- Vasu: the entire end-to-end flow of a model is
what's meant here; we have some code/UX that we can share as a
demo (bryan_att,
14:31:51)
- Anwar: on the dataset capability; are we
talking only about the metadata (bryan_att,
14:32:28)
- Adi Mishra (AT&T) (bryan_att,
14:33:17)
- Adi: CMLP team has a set of self-service
portals allowing hadoop access; we can absrtact that platform for
Acumos (bryan_att,
14:34:07)
- Anwar: mapping an abstract capability to a
aspecific implementation e.g. for a dataset is what we are
after (bryan_att,
14:34:40)
- Chuxin: CMLP team uses a tool called Rapid
Minor (bryan_att,
14:35:40)
- https://rapidminer.com/
(aimeeu,
14:35:44)
- Adi: this is AGPL based (bryan_att,
14:36:10)
- Bryan: it's OK as long as we do not tightly
integrate (bryan_att,
14:37:54)
- Manoop: we still need to do a legal review on
the contribution of that as a platform component (bryan_att,
14:38:27)
- #link
https://github.com/rapidminer/rapidminer-studio (aimeeu,
14:38:29)
- Adi: With Rapid Minor we have end-to-end
workflow support for model creating, training, etc (bryan_att,
14:39:48)
- Pantelis: we need abstraction, so the functions
implementing model development etc can be decoupled (bryan_att,
14:40:33)
- Anwar: is this covered by the epics we have so
far ? (bryan_att,
14:40:54)
- Adi: no this is additional (bryan_att,
14:41:16)
- Anwar: we need to create an epic for this if
intended for the release (bryan_att,
14:41:40)
- Adi: will follow up (bryan_att,
14:41:50)
- Adi: re 1182 and 1181; these are closely
related and could be combined; they relate to onboarding metadata
and then guided creation of uses for the referenced data
(bryan_att,
14:44:01)
- Adi: will enter a new feature request for
streaming datasets (bryan_att,
14:45:00)
- Adi: also entered
https://jira.acumos.org/browse/ACUMOS-1179 "Ability to perform
Exploratory Data Analysis + Visualization + Sampling" (bryan_att,
14:46:13)
- Anwar: we will dive a little deeper into this
now so Ofer can discuss further in the Community call (bryan_att,
14:47:30)
- Adi: 1179 relates to understanding the data and
how a business user looks at the dataset, prior to starting training
etc (bryan_att,
14:48:54)
- Adi: e.g. analyzing application data requires
understanding the application, what datasets it creates, what
semantics apply, etc (bryan_att,
14:49:40)
- Anwar: do you have support for that in
CMLP? (bryan_att,
14:49:58)
- Adi: we currently use Zeppelin, which can help
do rudimentary data analysis (link?) (bryan_att,
14:50:30)
- #link https://zeppelin.apache.org/ (aimeeu,
14:50:56)
- Chuxin: not sure this capability fits with the
scope of Acumos; it goes way beyond what Acumos is designed to
support (bryan_att,
14:51:38)
- Anwar: we are at discussion level on items that
might be in scope, where tools exist that we can reuse (bryan_att,
14:52:07)
- Adi: the existing CMLP tool does not data
manage the data (bryan_att,
14:52:50)
- Ofer: this seems highly related to training of
models (bryan_att,
14:53:15)
- Ofer: the initial training will be a demo but
not part of the release (bryan_att,
14:53:31)
- Ofer: so we will develop a spec and a small
demo; if the team can do more it will be discussed in those
meetings (bryan_att,
14:54:09)
- Bryan: the visualization capability seems key
to the end-to-end UX, and as long as we abstract it we should be
able to include this to support the E2E experience (bryan_att,
14:57:20)
- Chuxin: concerned that this would require the
model to run under the platform (bryan_att,
14:57:42)
- Anwar: we all agree this is a design time tool
(the Acumos platform), not a runtime tool (bryan_att,
14:58:07)
- Anwar: is th eonboarding of the dataset more
than putting them into a catalog? (bryan_att,
14:59:05)
- Anwar: e.g. validating them etc without
bringing the data into a running model environment (bryan_att,
14:59:30)
- Manoop: want to clarify that the dataset will
be external (bryan_att,
15:01:20)
- Adi: agree, the data is pulled into the
visuallization tool as needed, and not locally stored (bryan_att,
15:01:39)
- Chuxin: at design time, the design studio does
not consider the data and does not have features (bryan_att,
15:02:26)
- Anwar: feedback so far is this seems related to
Training, so recommend that the CMLP team work with Pantelis and the
training team for deeper dives; the main purpose for today was intro
to the CMLP team and their proposals (bryan_att,
15:04:18)
Meeting ended at 15:05:33 UTC
(full logs).
Action items
- (none)
People present (lines said)
- bryan_att (72)
- collabot_ (4)
- aimeeu (3)
Generated by MeetBot 0.1.4.