Handling Paginated Web Data Sources

Use case #

The data source has a limited number of rows that can be retrieved, but all the rows are needed in edgeCore.  This pattern works if there is a way to retrieve page 1 of x and there is a way to determine the number of pages needed.

CCI Example #

Web data feed with URL: /santaba/rest/device/devices?size=<numberOfRows>&offset=<startingRow>

Setup #

  1. Create data producer (probably feed) to retrieve a page
    1. URL: /santaba/rest/device/devices?size={nodeVar.PageSize}&offset={nodeVar.Offset}
    2. PageSize defaults to 1, Offset defaults to 0 (this is convenience for most efficiently retrieving a sample row)
    3. JsonPath: $.data.items.*
  2. Create data producer to retrieve count
    1. URL: /santaba/rest/device/devices?size=1
    2. JsonPath: $.data.total
  3. Create JS Transform off the Page source
    1. The page source needs to be source index 0 for the script to work and for the attribute defs to be in desired order (since the sample row will drive the attribute defs)
    2. After transform is created, add the source for the count so it will be index 1
    3. The script is below.  It has some defined constants to more easily reuse the script for other data
      1. var countAttribute = “total”;
        countAttribute is the name of the attribute from the data producer that returns the count
      2. var offsetVarName = “Offset”;
        offsetVarName is the name of the nodeVar from the page retrieval producer that is the starting row for data
      3. var pageSizeVarName = “PageSize”;
        pageSizeVarName is the name of the nodeVar for specifying the maximum number of records retrieved in a request
      4. var pageSize = 200;
        pageSize is the maximum record count for a request
      5. var pageFeedName = “LM Devices Page”;
        pageFeedName is the producer name for retrieving a single page
    4. Once on the preview step, the nodeVars need default values and the transform’s offset and pageSize nodeVar(s) need deleted.  These are just the values to be passed to the page retrieval source to get exactly 1 record back (for fetching attributes using the default discovery behavior)
      1. Offset: 0
      2. PageSize: 1
    5. Testing with the comments before the print statements in the script removed will show something like the following in the console:
      Console
      Fetching with vars: [Offset=0, PageSize=200]
        retrieved rows: 200
      Fetching with vars: [Offset=200, PageSize=200]
        retrieved rows: 200
      Fetching with vars: [Offset=400, PageSize=200]
        retrieved rows: 200
      Fetching with vars: [Offset=600, PageSize=200]
        retrieved rows: 194
Javascript De-pagination
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
var ParameterValueDO = Java.type("edge.server.pipeline.parameter.ParameterValueDO");
var ArrayList = Java.type("java.util.ArrayList");
var countAttribute = "total";
var offsetVarName = "Offset";
var pageSizeVarName = "PageSize";
var pageSize = 200;
var pageFeedName = "LM Devices Page";
function main(sourceBundles, outputBundle,
              jsNodeVars, jsSecVars) {
  
    var javaNodeVars = jsNodeVars._asJavaCollection();
    var javaSecVars = jsSecVars._asJavaCollection();
  
    // The "Count" feed needs to be the second source added
    var countRecord = getData(sourceBundles.get(1)).get(0);
    var count = getValue(countRecord, countAttribute);
    var outputData = getData(outputBundle);
    // Because the data is copied directly from the source results into
    // the results table, the column mapping must be copied over from the
    // first (index 0) sourceBundle for the "getPage" feed
    var sourceRs = sourceBundles.get(0).getResultSet();
    var outputRs = outputBundle.getResultSet();
    outputRs.copyColumnMappingFrom( sourceRs );
    for (var offset = 0; offset < count; offset += pageSize) {
        var pageNodeVars = getNodeVars(javaNodeVars, offsetVarName, offset, pageSizeVarName, pageSize);
        var pageResults = dataProducerService.fetch(pageFeedName, pageNodeVars, javaSecVars);
        print("Fetching with vars: " + pageNodeVars);
        outputData.addAll(pageResults.getData());
        print("  retrieved rows: " + pageResults.getData().size());
    }
    return outputBundle;
}
// this assumes all the transform will have pageVars/nodeVars that will
// need to be passed to sources.  It should have passthrough mappings for
// all nodeVars needed upstream except for the offset and pagesize
function getNodeVars(incomingNodeVars, offsetName, offsetValue,
                     pageSizeName, pageSizeValue) {
    var nodeVars = new ArrayList(3);
    nodeVars.add(new ParameterValueDO(offsetName, offsetValue));
    nodeVars.add(new ParameterValueDO(pageSizeName, pageSizeValue));
    // the following will show a warning in the editor, but "for each" is
    // the way to iterate over a java collection inside javascript
    for each (var nodeVar in incomingNodeVars) {
        if (nodeVar.getName() !== offsetName &&
            nodeVar.getName() !== pageSizeName) {
            nodeVars.add(nodeVar);
        }
    }
    return nodeVars;
}

Sample screen shots #

Logic Monitor Devices JSON #

LM Devices Page #

LM Devices Count #

 

LM Devices All #