New SQL Engine on MongoDB

Available in version 2.2.0-hyperblast

EN | ES

1 Introduction
2 Migration from the old SQL Engine (Quasar)
- 2.1 ObjectID and _id considerations
3 Extensibility (from 2.2.1-hyperblast)
4 Sintaxis
- 4.1 Data Types
- 4.2 Clauses
- 4.3 Use Examples
  - 4.3.1 Basic Queries:
  - 4.3.2 Where:
  - 4.3.3 Group/Having:
  - 4.3.4 Limit/offset
  - 4.3.5 Case when else (from 2.2.1-hyperblast)
  - 4.3.6 Mathematical operators (from 2.2.1-hyperblast)
  - 4.3.7 Function NOW()
  - 4.3.8 Joins/Subqueries
  - 4.3.9 Union all (from 2.2.1-hyperblast)
  - 4.3.10 Functions (from 2.2.1-hyperblast)
  - 4.3.11 Count Distinct clause (From 4.2.1-predator)

Introduction

After several months of testing, the new version of the SQL engine for MongoDB has been released.

This new SQL engine is based on the opensource project https://github.com/vincentrussell/sql-to-mongo-db-query-converter to which improvements have been added and covers the part of reading queries (selects).

This project is built purely in Java code, so it is easily extensible and you can include new features or bug fixes, contributing to the community in the same github repository.

Unlike the previous engine (Quasar) that worked with mongo version 3.4, this engine works with mongo versions from 4.0 onwards. By default, in platform, it is mounted on the most current version so far, 4.4.

Another big difference is the efficiency when translating queries, making the transformation cost minimal. In Quasar, in certain cases, there were queries that caused a high translation cost.

Migration from the old SQL Engine (Quasar)

There are several changes with respect to the old Quasar engine, so that, if a migration is considered, it must be taken into account that the new engine works by maintaining a stricter SQL and avoiding the use of the map-reduce framework, avoiding collection locks and the generation of intermediate tables, prioritizing performance and reducing possible errors:

String data are with ' always
Joins are limited to the use of inner join and left join only. The syntax of multiple comma origins is not valid: select * from ontology1, ontology2,….
The timestamp from string changes from timestamp '' to to_timestamp('') function.
The $oid from string changes from OID '' to OID('') or toObjectId('') function.
Array projections or .* .0 [2] [] {} type objects are not supported, implementation with functions is possible.
You cannot concatenate with ||, you must use the concat() function.
You cannot use the alias or the name of the ontology to make the projections nor with *. These queries would be incorrect (instead you should use select * from Ontology):

SELECT c FROM Ontology AS c LIMIT 3

SELECT c.* FROM Ontology AS c LIMIT 3

SELECT Ontology FROM Ontology AS c LIMIT 3

SELECT Ontology.* FROM Ontology AS c LIMIT 3

ObjectID and _id considerations

When returning data of type objectId from mongoDB, Quasar avoided returning the _id

If explicitly instructed, Quasar returns it in string format:

In the new engine, for backward compatibility (with a certain cost due to response parsing), there is the parameter (if not indicated, it is set to true by default, as in the case of updates):

mongodb-use-legacysql

It can be disabled within the platform settings:

If disabled (default on new installations) it is returned in MongoDB format:

If a * is used in the new engine, the _id is always returned.

Extensibility (from 2.2.1-hyperblast)

There is the possibility of extending the number of functions available in this engine, so that new functionalities can be included from the controlpanel, as shown here

https://onesaitplatform.atlassian.net/wiki/pages/createpage.action?spaceKey=DOCT&title=Extensibilidad%20del%20nuevo%20motor%20SQL%20de%20mongo&linkCreation=true&fromPageId=2220830733

Sintaxis

Data Types

Type	Description	Examples

Type	Description	Examples
Null	Indicates missing information.	`null`
Boolean	true or false	`true`, `false`
Integer	Whole numbers (no fractional component)	`1`, `-2`
Decimal	Decimal numbers (optional fractional components)	`1.0`, `-2.19743`
String	Text	`'221B Baker Street'`
DateTime	Date and time, in ISO8601 format	`to_timestamp('2004-10-19T10:23:54Z')`
Object ID	Unique object identifier.	`OID('507f1f77bcf86cd799439011')`

Clauses

The following clauses are supported:

Type	Clauses

Type	Clauses
Basic	`SELECT`, `AS`, `FROM`
Joins	`LEFT JOIN`, `INNER JOIN`, `JOIN`
Filtering	`WHERE`
Grouping	`GROUP BY`, `HAVING`
Subquery	`FROM (SELECT ...) AS`
Paging	`LIMIT`, `OFFSET`
Sorting	`ORDER BY` , `DESC`, `ASC`
Conditional	`CASE WHEN ELSE`

The following operators are supported:

Type	Operators

Type	Operators
String	`LIKE`, `NOT LIKE`
Relational	`=`, `>=`, `<=`, `<>`, `IN`, `NOT IN`
Boolean	`AND`, `OR`, `NOT`
Null	`IS NULL`, `IS NOT NULL`
Projection	`foo.bar`
Mathematical	`+`, `-`, `*`, `/`, `%`

The following functions are supported by default (this can be extended with new functions). Those are case insensitive:

Type	Functions

Type	Functions
String	`CONCAT`, `LOWER`, `UPPER`, `SUBSTRING`, `LENGTH`
DateTime	`DATE_PART`, `TO_TIMESTAMP`
Arrays	`UNZIP (only in project)`, `ELEMAT`
Geo	`GEOWITHIN (only in where)`
Set-Level	`DISTINCT`
Aggregation	`COUNT`, `SUM`, `MIN`, `MAX`, `AVG, FIRST, LAST`
Conversion	`TOINT, TOBOOL, TOSTRING, TOOBJECTID (or OID), TODOUBLE,TOLONG, TODATE, TODECIMAL`
Timeserie	`UNZIPTS (only in project)`

Use Examples

Basic Queries:

select * from ISO3166_1 returns the collection, along with the _id and contextData:

select c.ISO3166 from ISO3166_1 as c returns only the collection:

Where:

Filters can be used in the where clause with different operators:

Relational: =, >=, <=, <>, IN, NOT IN, LIKE, NOT LIKE
Boolean: AND, OR, NOT

select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia':

select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia' and c.ISO3166.language='EN':

select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name like 'S%' and c.ISO3166.language='EN':

select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name in ('Sudan','Suriname') and c.ISO3166.language='EN':

Group/Having:

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language

or

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name:

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language having count(*) > 250

or

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name having c > 250:

Limit/offset

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 1:

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1:

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 3 offset 1

or

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1 limit 3 (retrocompatibility):

Case when else (from 2.2.1-hyperblast)

It is possible to use the syntax case when default as projection or in the group by part (to make conditional groupings although as in any group by it has to be a field that is in the projection or the alias of the same one). It is mandatory to use the default clause.

select h.Helsinki.population as population, case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end from HelsinkiPopulation as h

select case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end as grouprange, count(*) as c from HelsinkiPopulation as h group by cca

Mathematical operators (from 2.2.1-hyperblast)

It is possible to use mathematical operators in SQL operations

select h.Helsinki.population as pop, h.Helsinki.population+1000 as pop2 from HelsinkiPopulation as h

select toInt(c.Helsinki.year/1000) as m, sum(c.Helsinki.population) as s, sum(c.Helsinki.population_men) as sm, sum(c.Helsinki.population_women) as sw, concat(toString(toInt(10000*sum(c.Helsinki.population_men)/sum(c.Helsinki.population))/100),'%') as percentm, concat(toString(toInt(10000*sum(c.Helsinki.population_women)/sum(c.Helsinki.population))/100),'%') as percentw from HelsinkiPopulation as c group by m

Function NOW()

This platform function returns the system date and time, it can be used anywhere in the query and will add this date in string format. If we want to parse it to mongo date and operate with it we can make a toDate(now())

NOW(“format“,'unitTime', amount)

“format“: formatting of the date, the default is to use"yyyy-MM-dd'T'HH: mm: ss'Z '"
'unitTime': uunit of time to increase or decrease the number of hours, days, ... possible values of unitTime are: 'year', 'month', 'date', 'hour', 'minute', 'second ',' millisecond '
"amount": positive or negative integer, with the amount of unitTime to be added or subtracted on the system date

select Helsinki, now() as nowstr, toDate(now()) as nowdate from HelsinkiPopulation LIMIT 3

Joins/Subqueries

It is possible to make joins of type inner join and left join and even join them with subqueries.

select re.countrysrc as countrysrc,re.countrydest as countrydest,re.count, iso.ISO3166.latitude as latitude, iso.ISO3166.longitude as longitude from ( select rx.routesexten.countrysrc As countrysrc, rx.routesexten.countrydest As countrydest, count(re.routesexten.countrysrc) As count from routesexten as rx group by rx.routesexten.countrysrc, rx.routesexten.countrydest order by count desc) As re inner join ISO3166_1 As iso on re.countrydest = iso.ISO3166.name

Union all (from 2.2.1-hyperblast)

With this clause you can add data of different queries (duplicated wouldn’t be removed)

select 1 as v from Restaurants limit 1 union all

select 2 as v from Restaurants limit 1 union all

select 3 as v from Restaurants limit 1

Due to restrictions of mongodb, this is limited to “union all” not “union” clause. So if you need to have distincts result of some union of data you need to group them after union them.

Functions (from 2.2.1-hyperblast)

Several functions are included in the engine by default (these can be extended by configuration). They are used in case insensitive mode so they can be typed case insensitive:

Type	Name	Params	Places	Example	Comments

Type	Name	Params	Places	Example	Comments
String	CONCAT	1..* String or column with type string	all	`select concat('Cuisine ', c.Restaurant.cuisine) from Restaurants as c`	Return combination of all strings
String	LOWER, UPPER	1 String or column with type string	all	`select concat('Cuisine ', upper(c.Restaurant.cuisine)) from Restaurants as c`	lowercase/uppercase of string
String	SUBSTRING	3 String or column with type string, index (integer), length (integer)	all	`select concat('Cuisine ', substring(c.Restaurant.cuisine,2,3)) from Restaurants as c`	substring of string
String	LENGTH	1 String or column with type string	all	`select length(c.Restaurant.cuisine) from Restaurants as c`	length of string
String	TRIM	1 String or column with type string	all	`SELECT trim(' aaa '), trim(c.Restaurant.borough) FROM Restaurants AS c LIMIT 300`	trim a string
DateTime	DATE_PART	2 part type ("month","year","dayOfMonth","hour","minute","second", "millisecond","dayOfYear","dayOfWeek","week"), timestamp or field with timestamp type	all	`SELECT date_part('year',timestamp) as year, date_part('month',timestamp) as month FROM QA_DETAIL AS c LIMIT 3` `SELECT date_part('dayOfMonth',timestamp) as day, count(*) FROM QA_DETAIL AS c group by day`	get date part from timestamp
DateTime	TO_TIMESTAMP/TIMESTAMP	1 timestamp or field with timestamp type	all	`SELECT timestamp FROM QA_DETAIL AS c where timestamp > to_timestamp('2018-10-24T23:00:00.000Z') LIMIT 3` `SELECT timestamp FROM QA_DETAIL AS c where timestamp > timestamp('2018-10-24T23:00:00.000Z') LIMIT 3`	convert string to timestamp
Arrays	UNZIP	2 array type field, boolean (preserve null and empty arrays in unzip)	project	`select unzip(Restaurant.grades,true) from Restaurants`	unzip array in multiple instances. Second param enable/disable preserve null and empty arrays. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzip. You can use this in a subquery for continue operating over array
Arrays	ELEMAT	1 array type field	all	`select elemat(Restaurant.grades,1) from Restaurants`	get element by position in array
Geo	GEOWITHIN	2 geometry point field, geometry	where	`select * from Supermarkets where geoWithin(Supermarkets.geometry, '{type : "Polygon" , coordinates: [ [ [-15.44488, 28.137924], [-15.423848, 28.137924], [-15.423848, 28.144054], [-15.44488, 28.144054], [-15.44488, 28.137924] ] ] }')`	find some geometry field point in geometry structure as the second arguments. Only in where because mongoDB limitation
Aggregation	COUNT	1 field or *	project or having	`select count(*) from Supermarkets`	count
Aggregation	SUM, MIN, MAX, AVG, FIRST, LAST	1 field (type depending on operation)	project or having	`select min(c.Supermarkets.status) as maxs from Supermarkets as c`	aggregation functions. Last and first get first or last of a group
Conversion	TOINT, TOBOOL, TOSTRING, TOOBJECTID (or OID), TODOUBLE,TOLONG, TODATE, TODECIMAL	1 field (type depending on operation)	all	`select concat('Cuisine ', UPPER(c.Restaurant.cuisine), ' of length ',toString(length(UPPER(c.Restaurant.cuisine)))) as complex from Restaurants as c`	conversion between type
Timeserie	UNZIPTS	2 window type in String (uppercase), window frequency in String (uppercase)	project	`SELECT unzipts('DAYS','MINUTES') FROM timeserie AS c LIMIT 3`	unzip timeserie into plain strcture returning the instances like when they're inserted. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzipts. You can use this in a subquery for continue operating over timeserie

Count Distinct clause (From 4.2.1-predator)

With this clause, you can perform a count operation of distinct values of a group (the group inside distinct clause) for another group (the fields outside of the distinct clause)

select count(distinct Restaurant.borough) from Restaurants

select Restaurant.cuisine, count(distinct Restaurant.borough, Restaurant.name) from Restaurants

Due to restrictions of mongodb query languaje, it’s not allowed the combination with other elements different than columns