The Pipeline is where you manage data in edgeSuite. It is rare that raw data from a source comes into edgeSuite ready to be visualized. The raw data often has to be transformed or filtered in order to make an effective visualization. edgeSuite’s Data Pipeline provides a visual way to do this.
To open the Pipeline user interface:
– located near the top of the Admin Toolbar, the vertical toolbar on the left side of an administrator’s screen.
The Data Pipeline is comprised of different colored objects called “nodes”. Each node has a color and icon associated with it, indicating the type of operation performed at that step. The following table details what the colors and icons represent:
This node represents a connection to a data source. It contains all the configuration information required to connect to that source, and serves as a grouping mechanism for data sets in the system. The Connection type is reflected in the icon.
See Web Adapters and Connections and Feeds for more information.
This node represents an initial data set from the associated data source. The options available in a Feed’s configuration will vary based upon the type of data source you are working with. For example:
This node represents a structural change to the data, meaning the attributes (columns) in the resulting data set may differ from the parent node. A transform represents concepts such as GROUPBY, JOIN, or UNION. They also represent transforming flat tabular data into relational models.
This node represents a data set where the records (rows) have been limited in some way. Filters leave the structure of the data intact, meaning the attribute names (columns) remain the same from parent to child.
See Filter for more information.
This node represents configuration required for specific visualizations. Data sets can have multiple visualizations defined, and the type of visualization is reflected in the icon.
See Visualizations for more information.
The Pipeline can be viewed several different ways
Pipeline mode is a primary filter for the pipeline. By default, the Pipeline shows “Data”, but there are other modes available.
A secondary filter control is located in the header of the pipeline (top right). This allows the administrator to limit their search to names of specific types of nodes, such as only Connections, or only Visualizations.
The “View All” option will show all data in the system, regardless of type. For large deployments a warning message may appear indicating that it may take some time to generate a layout for all nodes in the system.
The left pull-down allows an administrator to constrain their search based on different types of objects in the pipeline. The “Group By Type” option divides search results by categories.
Pipeline settings can be accessed using the icon at the top right of the header.
Sets the default view mode for the pipeline:
Sets the default behavior for a Double Click:
Hold down ALT (or optionally CTRL on Windows) while double clicking to use the alternate of whatever action is defined here.
Sets the style of the popup menu associated with nodes in the pipeline.
Sets the style of the popup menu associated with nodes.
Hold down ALT (or optionally CTRL on Windows) while using the mouse wheel to use the alternate of whatever action is defined here.
Determines the style of the links connecting each node.
Determines direction when in the topology view mode.
Determines whether small overview appears.
Base Pipeline actions are located in the footer (bottom left).
This action will add a new Connection to the Data Pipeline. Connections act like “root nodes” in the data pipeline, and they typically appear at the far left of the display. See the “Adding a Node” section below.
The action will launch configuration for the selected node, allowing the administrator to make changes. You can also double-click a node in the Data Pipeline to Edit.
This will create a clone of a selected pipeline node. The user will be prompted to enter a suffix to ensure the new node is distinct. After cloning a node, it can be reparented to a different location in the pipeline.
This action will delete a selected node. See the “Deleting a Node” section below.
This will launch into the jobs view, which shows jobs for the selected pipeline.
An alternate way to reparent an item in the pipeline. This option is helpful when a target is filtered out of the current view.
To add a new node to the data pipeline, mouse over a node. If a node has available actions, then a gear icon will appear.
Clicking the gear will expose the available actions in a radial menu. The actions that are available will vary based on context. For example, you cannot reparent a Connection, and Connections do not have any associated jobs, so those options are disabled for a Connection node.
To add a node, click the icon.
This will display a context menu of valid node types to choose from. In the case shown above, it is a Filesystem Connection, so the user will select from a list of relevant feed types:
When deleting a node that has children, the system will prompt the user with two options:
Orphaned nodes will lose their association to a valid data source. This means that all Visualizations associated with these orphaned nodes will be broken. Orphaned nodes can be reparented to another data source; however this may require modifying configuration to reference new or different attribute names.
See Reparenting Nodes for more information.
In the example below, two CSV Feeds are shown – “assets” and “incidents”. These Feeds result in two distinct data sets: one with a list of assets, and one with a list incidents. In order to show the health for the assets in a List Visualization, you first have to join those two data sets together, as shown below.
Now assume that you delete one of these feeds. The missing data set would create a problem for the JOIN node, and ultimately the List Visualization. In the example below, the “incidents” feed has been removed, and the Data Pipeline is indicating a problem with the “JoinByAssetID” SQL Query.
If you edit that JOIN transform, and take a look at the query, you will see that there is another layer of status indication in the queries themselves:
Since you deleted the “incidents” feed, the references to that missing data set are highlighted with red boxes.
It is possible to reparent nodes in the Data Pipeline. The Data Pipeline uses a drag and drop gesture for this operation.
Reparenting between different datasets can cause problems.
Transform, Filter, and Visualization configurations often reference explicit attribute names. So, when reparenting, it is possible that the attribute names in the old data could have different names in the new data set.
In order to better diagnose pipeline problems, metrics from running jobs in the pipeline are collected and reported in the Job Status user interface.
The administrator can filter which types of job Results (rows) are displayed in the grid by clicking individual labels in the bottom section to toggle each type of job on or off. Initially, all types of jobs are shown in the display.
The administrator can filter which information is displayed about each job (columns) by clicking the “Toggle Column Visibility” button in the upper-right, and then checking the boxes for the desired columns to display.
We won't track your information when you visit our site. But in order to comply with your preferences, we'll have to use just one tiny cookie so that you're not asked to make this choice again.