Extensibility of the new Mongo SQL engine

Available from release 2.2.1-hyperblast

With the new SQL engine, it is possible to modify or include new functions in it, so that its functionality can be extended.

All this can be done with a platform Administrator user, from the Configurations menu:

 

The new SQL engine must be enabled, with the Use Quasar option set to false:

Bear in mind that, depending on the value of use-legacysql, we will have activated the _id with string format (legacysql is true) or mongodb with the $oid (legacysql is false).

There are two configurations to take into account (SQLENGINE type). Any change in them will require the restart of the platform module that uses the SQLENGINE (controlpanel, dashboardengine, router, ...).

Json Dictionary

This is a json in which each entry represents the name and definition of the functionality of that function. Each entry is structured as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { "functionName": { "expression": "$mongofn", "steps": ["project",...], "argsLength": 1, "type": "inline", "argsType": "direct", "comment": "my comment about the function" "switchbyargument":{ "argument": 0, "switch": { "month": "$month", "year": "$year", "dayOfMonth": "$dayOfMonth", "hour": "$hour", "minute": "$minute", "second": "$second", "millisecond": "$millisecond", "dayOfYear": "$dayOfYear", "dayOfWeek": "$dayOfWeek", "week": "$week" } }, "preExpressionSteps": [{ "$unwind": { "path": "${0}", "preserveNullAndEmptyArrays": "${1}" } }], "unique": true, "codeFn": "tsExp" } }

There are 4 types of ways to generate functions depending on the typology of the function. Let's see first the common parameters:

  • functionName → Name of the function to define. We must take into account that only functions with letters, numbers and _ that do not begin with _ are allowed. When using the function, it will be in case-insensitive mode. It is mandatory to always define the name of the function, as it will be the determinant of the use, so all functions must have different names.

  • steps → array of places within the query where it is possible to use that function (PROJECT, WHERE, HAVING, GROUPBY, ORDERBY). For example, functions that increment the record number (e.g. a flattening of an array) make sense that they can only be used within the project stage. By default, if nothing is included, it will be allowed everywhere in the query. It will be used to validate the places. If you use this function in an unallowed place, you will get the corresponding error.

 

  • unique → true/false. False by default. If set to true, you can only have that function in the given query position. This is useful for example for projection functions that modify records and are not compatible with other functions. If included with other functions, a corresponding error will be given.

 

  • argsLength → Number of arguments the function has. Defaults to 1. If the function has a variable number of parameters, the value -1 can be used. If the function is used with another number of arguments, an error will be obtained.

  • comment → open description of the function, for information on how it will be used - parameters, results, ... Defaults to empty string.

  • type → type of the function that will determine the construction of the function. Inline, switchbyargument, customexp and customcode values are supported. Default will be inline.

Function construction types:

inline: direct switch from SQL function to mongoDB function. The following parameters are used:

  • expression → mongodb function to use which will start with $, for example "$concat".

  • argsType → type of function arguments. There are two types: direct and array.

Direct (default option) will be for one parameter functions that are included directly, e.g. $toInt:

1 2 3 "toInt": { "expression": "$toint" }

array for functions with parameters that will be taken to an array, e.g. concat function:

1 2 3 4 5 "concat": { "expression": "$concat", "argsType": "array", "argsLength": -1 }

switchbyargument: functions that, given a certain input parameter, call one or another mongo function. It is necessary to define the input with the same name; this input will have two parameters:

  • argument → parameter position that will determine the function to apply.

  • switch → key-value that will determine, given a parameter input with "argument" position, which mongo function will be called.

E.g. date_part('mask',timestamp)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 "date_part": { "type": "switchbyargument", "switchbyargument":{ "argument": 0, "switch": { "month": "$month", "year": "$year", "dayOfMonth": "$dayOfMonth", "hour": "$hour", "minute": "$minute", "second": "$second", "millisecond": "$millisecond", "dayOfYear": "$dayOfYear", "dayOfWeek": "$dayOfWeek", "week": "$week" } }, "argsType": "direct", "argsLength": 2, "comment": "date_part(\"year\",c.dateLastUpdate)" }

customexp: more complex functions than the previous ones, they require custom expressions that do not meet the previous parameters. The affected parameters would be:

  • expression → in this case it will be a json with the expression. Inside this json, it will be possible to have the parameters with "${i}" being i the position of the parameter. For example:

1 2 3 4 5 6 7 8 "to_timestamp": { "type": "customexp", "expression": { "$dateFromString": { "dateString":"${0}" } } }

The substitution, if it is a field value, will be preceded with "$" ($field). If it is another value it is substituted as is. It may happen that we want to make a small transformation of the parameter when substituting, so that we have:

“${i}|nodollar” → insert the field type parameter but do not include the previous $.

“${i}|jsonParse“ → parses the parameter when using it, so that, even if it is string, the input will be used with the specified structure.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 "geoWithin": { "steps": ["where"], "type": "customexp", "expression": { "${0}|nodollar": { "$geoWithin": { "$geometry": "${1}|jsonParse" } } }, "argsType": "direct", "argsLength": 2, "comment": "geowithin(field, geometry)" }
  • preExpressionSteps → for special type project stages, it may be necessary to do some step before the aggregate stage mode. In this case, as in "expression", you have an expression where the values of the parameters will be substituted. An example is the unzip of an array that does not have part of expression:

1 2 3 4 5 6 7 8 9 10 11 12 "unzip": { "steps": ["project"], "type": "customexp", "preExpressionSteps": [{ "$unwind": { "path": "${0}", "preserveNullAndEmptyArrays": "${1}" } }], "argsLength": 2, "unique": true }

customcode: in certain scenarios, if no previous case covers the way to generate the query, it can be generated with Java code directly. There are two parameters to use:

  • preCodeFn → in project stages that need very custom pre-processing, it is necessary to include here the name of the Java function defined in the other Java util class configuration. The function has to return a List<Document> with the necessary transformations.

  • codeFn → in project stages that need very custom treatment, it is necessary to include here the name of the Java function defined in the other configuration of Java util class. The function has to return a Document with the part of the function itself, or null if it is not necessary.

An example is the unzipts function:

1 2 3 4 5 6 7 8 "unzipts": { "steps": ["project"], "type": "customcode", "preCodeFn": "tsPreSteps", "codeFn": "tsExp", "argsLength": 2, "unique": true }

Inside the Java util class configuration, you have the two definitions that will always have the same parameterization.

final List<Expression> le, final MongoDBQueryHolder mongoDBQueryHolder

The first parameter is the list of expressions parsed with JSQLParser. The second is a complete structure of the output query that can be modified if necessary.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 import com.github.vincentrussell.query.mongodb.sql.converter.holder.MongoDBQueryHolder; import net.sf.jsqlparser.expression.Expression; import net.sf.jsqlparser.expression.StringValue; import org.bson.Document; import java.util.LinkedList; import java.util.List; public final class UtilsCode { public enum WINDOWTS { SECONDS, MINUTES, HOURS, DAYS, MONTHS } private static final int[] TIMETOSEG = new int[] {1, 60, 3600, 86400}; private static final int[] TIMETONEXT = new int[] {1, 60, 60, 24}; /** * previous steps for timeseries unzip by n levels. * This use the period and frecuency for calculating this variable steps * @param le params: 0 upper(period), 1 lower(frecuency) * @param mongoDBQueryHolder * @return list of documents for previous steps in unzip ts */ public static List<Document> tsPreSteps(final List<Expression> le, final MongoDBQueryHolder mongoDBQueryHolder) { WINDOWTS windowh = WINDOWTS.valueOf(((StringValue) le.get(0)).getValue().toUpperCase()); WINDOWTS windowl = WINDOWTS.valueOf(((StringValue) le.get(1)).getValue().toUpperCase()); List<Document> ldocs = new LinkedList<>(); ldocs.add( Document.parse("{\n" + " \"$project\": {\n" + " \"TimeSerie\": 1,\n" + " \"tmpArray\": { \"$objectToArray\": \"$TimeSerie.values.v\" }\n" + " \"tmpIndex\": { \"$literal\": 0 }\n" + " },\n" + " }") ); ldocs.add( Document.parse("{\n" + " \"$unwind\": \"$tmpArray\"\n" + " }") ); for (int i = windowh.ordinal() - 1; i > windowl.ordinal(); i--) { ldocs.add( Document.parse("{\n" + " \"$project\": {\n" + " \"TimeSerie\": 1,\n" + " \"tmpIndex\": { " + " \"$multiply\": [" + "{\"$sum\":[\"$tmpIndex\",{\"$toInt\":\"$tmpArray.k\"}]}," + TIMETONEXT[i] + " ]},\n" + " \"tmpArray\": { \"$objectToArray\": \"$tmpArray.v\" }\n" + " }\n" + " }") ); ldocs.add( Document.parse("{\n" + " \"$unwind\": \"$tmpArray\"\n" + " }") ); } ldocs.add( Document.parse("{\n" + " \"$addFields\": {\n" + " \"TimeSerie.value\": \"$tmpArray.v\",\n" + " \"TimeSerie.timestamp\": {\n" + " \"$dateFromParts\": {\n" + " \"year\": { \"$year\": \"$TimeSerie.timestamp\" },\n" + " \"month\": { \"$month\": \"$TimeSerie.timestamp\" },\n" + " \"day\": { \"$dayOfMonth\": \"$TimeSerie.timestamp\" },\n" + " \"hour\": { \"$hour\": \"$TimeSerie.timestamp\" },\n" + " \"minute\": { \"$minute\": \"$TimeSerie.timestamp\" },\n" + " \"second\": { \"$multiply\": [" + " {\"$sum\":[\"$tmpIndex\",{\"$toInt\":\"$tmpArray.k\"}]}," + TIMETOSEG[windowl.ordinal()] + "]}\n" + " }\n" + " }\n" + " }\n" + " }") ); return ldocs; } /** * include ts fields in project, combine then with previous. * @param le * @param mongoDBQueryHolder * @return null */ public static Document tsExp(final List<Expression> le, final MongoDBQueryHolder mongoDBQueryHolder) { mongoDBQueryHolder.getProjection().put("tmpArray", 0); mongoDBQueryHolder.getProjection().put("tmpIndex", 0); mongoDBQueryHolder.getProjection().put("TimeSerie.values", 0); return null; }