Pipeline

#

Overview #

The Pipeline is where you manage data in edgeSuite. It is rare that raw data from a source comes into edgeSuite ready to be visualized. The raw data often has to be transformed or filtered in order to make an effective visualization. edgeSuite’s Data Pipeline provides a visual way to do this.

To open the Pipeline user interface:
 – located near the top of the Admin Toolbar, the vertical toolbar on the left side of an administrator’s screen.

Pipeline Nodes #

The Data Pipeline is comprised of different colored objects called “nodes”. Each node has a color and icon associated with it, indicating the type of operation performed at that step. The following table details what the colors and icons represent:

Icon

#

Connection #

This node represents a connection to a data source. It contains all the configuration information required to connect to that source, and serves as a grouping mechanism for data sets in the system. The Connection type is reflected in the icon.

See Web Adapters and Connections and Feeds for more information.

#

Feed #

This node represents an initial data set from the associated data source. The options available in a Feed’s configuration will vary based upon the type of data source you are working with. For example:

  • A CSV Feed will have limited configuration, mainly for specifying a delimiter to use when parsing the file.
  • A Database Feed allows for an initial SQL query.

See Web Adapters and Connections and Feeds for more information.

#

Transform #

This node represents a structural change to the data, meaning the attributes (columns) in the resulting data set may differ from the parent node. A transform represents concepts such as GROUPBY, JOIN, or UNION. They also represent transforming flat tabular data into relational models.

See SQL Transform and JavaScript Transform for more information.

#

Filter #

This node represents a data set where the records (rows) have been limited in some way. Filters leave the structure of the data intact, meaning the attribute names (columns) remain the same from parent to child.

See Filter for more information.

#

Visualization #

This node represents configuration required for specific visualizations. Data sets can have multiple visualizations defined, and the type of visualization is reflected in the icon.

See Visualizations for more information.

View Modes #

The Pipeline can be viewed several different ways

Topology A graphical hierarchy that starts with a Connection and ends with a Visualization.
Table A flat tabular view of the nodes in a system.
Tree Table A tabluar view that maintains the hierarchical relationships through an expand/collapse mechanic.

Pipeline Modes #

Pipeline mode is a primary filter for the pipeline. By default, the Pipeline shows “Data”, but there are other modes available.

Data This shows how data flows from a Connection to an end Visualization
Action This shows Actions. These work backwards from the typical data flow. Server Actions start with a Visualization, and update values on the source connection. See Server Actions for more information.

Filtering the Data Pipeline #

A secondary filter control is located in the header of the pipeline (top right). This allows the administrator to limit their search to names of specific types of nodes, such as only Connections, or only Visualizations.

The “View All” option will show all data in the system, regardless of type. For large deployments a warning message may appear indicating that it may take some time to generate a layout for all nodes in the system.

Constrained Filtering #

The left pull-down allows an administrator to constrain their search based on different types of objects in the pipeline. The “Group By Type” option divides search results by categories.

Group Option
Constraint List
Search Results
With Group By Type = Enabled, results are grouped by sub categories. In the example to the right, the Visualizations are grouped by Visualization Type, and then displayed alphabetically within those groups.

With Group By Type = Disabled, results are shown as a flat alphabetical list.

Pipeline Settings #

Pipeline settings can be accessed using the  icon at the top right of the header.

Default View Mode #

Sets the default view mode for the pipeline:

Topology A graphical hierarchy that starts with a Connection and ends with a Visualization.
Table A flat tabular view of the nodes in a system.
Tree Table A tabluar view that maintains the hierarchical relationships through an expand/collapse mechanic.

Double Click Action #

Sets the default behavior for a Double Click:

Edit Double clicking a node will open the configuration for that node.
Filter Double clicking a node will filter the view by that node.

Hold down ALT (or optionally CTRL on Windows) while double clicking to use the alternate of whatever action is defined here.

Popup Menu Style #

Sets the style of the popup menu associated with nodes in the pipeline.

Radial A circular menu containing a list of actions.
Vertical A vertical menu containing a list of actions.

Mouse Wheel Action #

Sets the style of the popup menu associated with nodes.

Pan The mouse wheel will scroll vertically.
Zoom The mouse wheel will zoom in and out.

Hold down ALT (or optionally CTRL on Windows) while using the mouse wheel to use the alternate of whatever action is defined here.

Link Style #

Determines the style of the links connecting each node.

Bezier Connectors will used curved lines to help distinguish relationships.
Orthogonal Connectors will use right angles, along with a healthy corner radius to help distinguish relationships.

Layout Direction #

Determines direction when in the topology view mode.

Right The relationships will flow from left to right.
Down The relationships will flow from top to bottom.

Show Overview Window #

Determines whether small overview appears.

Yes The overview will appear at the bottom right.
No The overview will be hidden.

Pipeline Actions #

Base Pipeline actions are located in the footer (bottom left).

Add Connection #

This action will add a new Connection to the Data Pipeline. Connections act like “root nodes” in the data pipeline, and they typically appear at the far left of the display.  See the “Adding a Node” section below.

Edit Selected #

The action will launch configuration for the selected node, allowing the administrator to make changes. You can also double-click a node in the Data Pipeline to Edit.

Clone

This will create a clone of a selected pipeline node. The user will be prompted to enter a suffix to ensure the new node is distinct. After cloning a node, it can be reparented to a different location in the pipeline.

Delete Selected #

This action will delete a selected node.  See the “Deleting a Node” section below.

Jobs

This will launch into the jobs view, which shows jobs for the selected pipeline.

Reparent

An alternate way to reparent an item in the pipeline. This option is helpful when a target is filtered out of the current view.

Adding a Node #

To add a new node to the data pipeline, mouse over a node. If a node has available actions, then a gear icon will appear.

Clicking the gear will expose the available actions in a radial menu. The actions that are available will vary based on context. For example, you cannot reparent a Connection, and Connections do not have any associated jobs, so those options are disabled for a Connection node.

To add a node, click the  icon.

This will display a context menu of valid node types to choose from. In the case shown above, it is a Filesystem Connection, so the user will select from a list of relevant feed types:

Deleting a Node #

When deleting a node that has children, the system will prompt the user with two options:

Only this node This option will delete the selected node, but it will leave all child nodes. The remaining child nodes will be considered orphans.

Orphaned nodes will lose their association to a valid data source. This means that all Visualizations associated with these orphaned nodes will be broken. Orphaned nodes can be reparented to another data source; however this may require modifying configuration to reference new or different attribute names.

See Reparenting Nodes for more information.

This node and its children This option will delete the selected node and all of its children.

Node Status #

In the example below, two CSV Feeds are shown – “assets” and “incidents”.  These Feeds result in two distinct data sets: one with a list of assets, and one with a list incidents. In order to show the health for the assets in a List Visualization, you first have to join those two data sets together, as shown below.

Now assume that you delete one of these feeds. The missing data set would create a problem for the JOIN node, and ultimately the List Visualization. In the example below, the “incidents” feed has been removed, and the Data Pipeline is indicating a problem with the “JoinByAssetID” SQL Query.

If you edit that JOIN transform, and take a look at the query, you will see that there is another layer of status indication in the queries themselves:

Since you deleted the “incidents” feed, the references to that missing data set are highlighted with red boxes.

#

Reparenting Nodes #

It is possible to reparent nodes in the Data Pipeline. The Data Pipeline uses a drag and drop gesture for this operation.

  1. Select the link between two nodes.
    This causes the link to highlight, and a magenta diamond to appear.

    #

  2. Drag the diamond to a new parent

    #

  3. Drop the diamond on the new parent

    #

Reparenting between different datasets can cause problems.

Transform, Filter, and Visualization configurations often reference explicit attribute names. So, when reparenting, it is possible that the attribute names in the old data could have different names in the new data set.

Job Status #

In order to better diagnose pipeline problems, metrics from running jobs in the pipeline are collected and reported in the Job Status user interface.

Metrics:

  • Start time and end time for each job’s task
  • Time spent in preparation to run the update, time performing the update, and time running cleanup
  • Name of the producer for the job
  • Key for the job’s task (includes all nodeVars and secVars that apply to the job)
  • Trigger, if known (like client subscription)
  • Result of the update (success, server error, etc.)
  • Size of the results (number of rows)

Reporting

  • All publish Jobs
  • All publish Jobs for a session
  • All active tasks
  • Last completed task for all active jobs
  • All tasks in last X seconds
  • Last task for all jobs required by a session publish Jobs

The administrator can filter which types of job Results (rows) are displayed in the grid by clicking individual labels in the bottom section to toggle each type of job on or off. Initially, all types of jobs are shown in the display.

The administrator can filter which information is displayed about each job (columns) by clicking the “Toggle Column Visibility” button in the upper-right, and then checking the boxes for the desired columns to display.