New SQL Engine on MongoDB
Available in version 2.2.0-hyperblast
EN | ES
- 1 Introduction
- 2 Migration from the old SQL Engine (Quasar)
- 3 Extensibility (from 2.2.1-hyperblast)
- 4 Sintaxis
- 4.1 Data Types
- 4.2 Clauses
- 4.3 Use Examples
- 4.3.1 Basic Queries:
- 4.3.2 Where:
- 4.3.3 Group/Having:
- 4.3.4 Limit/offset
- 4.3.5 Case when else (from 2.2.1-hyperblast)
- 4.3.6 Mathematical operators (from 2.2.1-hyperblast)
- 4.3.7 Function NOW()
- 4.3.8 Joins/Subqueries
- 4.3.9 Union all (from 2.2.1-hyperblast)
- 4.3.10 Functions (from 2.2.1-hyperblast)
- 4.3.11 Count Distinct clause (From 4.2.1-predator)
Introduction
After several months of testing, the new version of the SQL engine for MongoDB has been released.
This new SQL engine is based on the opensource project https://github.com/vincentrussell/sql-to-mongo-db-query-converter to which improvements have been added and covers the part of reading queries (selects).
This project is built purely in Java code, so it is easily extensible and you can include new features or bug fixes, contributing to the community in the same github repository.
Unlike the previous engine (Quasar) that worked with mongo version 3.4, this engine works with mongo versions from 4.0 onwards. By default, in platform, it is mounted on the most current version so far, 4.4.
Another big difference is the efficiency when translating queries, making the transformation cost minimal. In Quasar, in certain cases, there were queries that caused a high translation cost.
Migration from the old SQL Engine (Quasar)
There are several changes with respect to the old Quasar engine, so that, if a migration is considered, it must be taken into account that the new engine works by maintaining a stricter SQL and avoiding the use of the map-reduce framework, avoiding collection locks and the generation of intermediate tables, prioritizing performance and reducing possible errors:
String data are with ' always
Joins are limited to the use of inner join and left join only. The syntax of multiple comma origins is not valid: select * from ontology1, ontology2,….
The timestamp from string changes from timestamp '' to to_timestamp('') function.
The $oid from string changes from OID '' to OID('') or toObjectId('') function.
Array projections or .* .0 [2] [] {} type objects are not supported, implementation with functions is possible.
You cannot concatenate with ||, you must use the concat() function.
You cannot use the alias or the name of the ontology to make the projections nor with *. These queries would be incorrect (instead you should use select * from Ontology):
SELECT c FROM Ontology AS c LIMIT 3
SELECT c.* FROM Ontology AS c LIMIT 3
SELECT Ontology FROM Ontology AS c LIMIT 3
SELECT Ontology.* FROM Ontology AS c LIMIT 3
ObjectID and _id considerations
When returning data of type objectId from mongoDB, Quasar avoided returning the _id
If explicitly instructed, Quasar returns it in string format:
In the new engine, for backward compatibility (with a certain cost due to response parsing), there is the parameter (if not indicated, it is set to true by default, as in the case of updates):
mongodb-use-legacysql
It can be disabled within the platform settings:
If disabled (default on new installations) it is returned in MongoDB format:
If a * is used in the new engine, the _id is always returned.
Extensibility (from 2.2.1-hyperblast)
There is the possibility of extending the number of functions available in this engine, so that new functionalities can be included from the controlpanel, as shown here
Sintaxis
Data Types
Type | Description | Examples |
---|---|---|
Null | Indicates missing information. |
|
Boolean | true or false |
|
Integer | Whole numbers (no fractional component) |
|
Decimal | Decimal numbers (optional fractional components) |
|
String | Text |
|
DateTime | Date and time, in ISO8601 format |
|
Object ID | Unique object identifier. |
|
Clauses
The following clauses are supported:
Type | Clauses |
---|---|
Basic |
|
Joins |
|
Filtering |
|
Grouping |
|
Subquery |
|
Paging |
|
Sorting |
|
Conditional |
|
The following operators are supported:
Type | Operators |
---|---|
String |
|
Relational |
|
Boolean |
|
Null |
|
Projection |
|
Mathematical |
|
The following functions are supported by default (this can be extended with new functions). Those are case insensitive:
Type | Functions |
---|---|
String |
|
DateTime |
|
Arrays |
|
Geo |
|
Set-Level |
|
Aggregation |
|
Conversion |
|
Timeserie |
|
Use Examples
Basic Queries:
select * from ISO3166_1
returns the collection, along with the _id and contextData:
select c.ISO3166 from ISO3166_1 as c
returns only the collection:
Where:
Filters can be used in the where clause with different operators:
Relational:
=
,>=
,<=
,<>
,IN
,NOT IN, LIKE, NOT LIKE
Boolean:
AND
,OR
,NOT
select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia':
select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia' and c.ISO3166.language='EN':
select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name like 'S%' and c.ISO3166.language='EN':
select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name in ('Sudan','Suriname') and c.ISO3166.language='EN':
Group/Having:
select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language
or
select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name:
select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language having count(*) > 250
or
select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name having c > 250:
Limit/offset
select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 1:
select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1:
select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 3 offset 1
or
select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1 limit 3 (retrocompatibility)
:
Case when else (from 2.2.1-hyperblast)
It is possible to use the syntax case when default as projection or in the group by part (to make conditional groupings although as in any group by it has to be a field that is in the projection or the alias of the same one). It is mandatory to use the default clause.
select h.Helsinki.population as population, case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end from HelsinkiPopulation as h
select case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end as grouprange, count(*) as c from HelsinkiPopulation as h group by cca
Mathematical operators (from 2.2.1-hyperblast)
It is possible to use mathematical operators in SQL operations
select h.Helsinki.population as pop, h.Helsinki.population+1000 as pop2 from HelsinkiPopulation as h
select toInt(c.Helsinki.year/1000) as m, sum(c.Helsinki.population) as s, sum(c.Helsinki.population_men) as sm, sum(c.Helsinki.population_women) as sw, concat(toString(toInt(10000*sum(c.Helsinki.population_men)/sum(c.Helsinki.population))/100),'%') as percentm, concat(toString(toInt(10000*sum(c.Helsinki.population_women)/sum(c.Helsinki.population))/100),'%') as percentw from HelsinkiPopulation as c group by m
Function NOW()
This platform function returns the system date and time, it can be used anywhere in the query and will add this date in string format. If we want to parse it to mongo date and operate with it we can make a toDate(now())
NOW(“format“,'unitTime', amount)
“format“: formatting of the date, the default is to use"yyyy-MM-dd'T'HH: mm: ss'Z '"
'unitTime': uunit of time to increase or decrease the number of hours, days, ... possible values of unitTime are: 'year', 'month', 'date', 'hour', 'minute', 'second ',' millisecond '
"amount": positive or negative integer, with the amount of unitTime to be added or subtracted on the system date
select Helsinki, now() as nowstr, toDate(now()) as nowdate from HelsinkiPopulation LIMIT 3
Joins/Subqueries
It is possible to make joins of type inner join and left join and even join them with subqueries.
select re.countrysrc as countrysrc,re.countrydest as countrydest,re.count, iso.ISO3166.latitude as latitude, iso.ISO3166.longitude as longitude from ( select rx.routesexten.countrysrc As countrysrc, rx.routesexten.countrydest As countrydest, count(re.routesexten.countrysrc) As count from routesexten as rx group by rx.routesexten.countrysrc, rx.routesexten.countrydest order by count desc) As re inner join ISO3166_1 As iso on re.countrydest = iso.ISO3166.name
Union all (from 2.2.1-hyperblast)
With this clause you can add data of different queries (duplicated wouldn’t be removed)
select 1 as v from Restaurants limit 1 union all
select 2 as v from Restaurants limit 1 union all
select 3 as v from Restaurants limit 1
Due to restrictions of mongodb, this is limited to “union all” not “union” clause. So if you need to have distincts result of some union of data you need to group them after union them.
Functions (from 2.2.1-hyperblast)
Several functions are included in the engine by default (these can be extended by configuration). They are used in case insensitive mode so they can be typed case insensitive:
Type | Name | Params | Places | Example | Comments |
---|---|---|---|---|---|
String | CONCAT | 1..* String or column with type string | all |
| Return combination of all strings |
String | LOWER, UPPER | 1 String or column with type string | all |
| lowercase/uppercase of string |
String | SUBSTRING | 3 String or column with type string, index (integer), length (integer) | all |
| substring of string |
String | LENGTH | 1 String or column with type string | all |
| length of string |
String | TRIM | 1 String or column with type string | all |
| trim a string |
DateTime | DATE_PART | 2 part type ("month","year","dayOfMonth","hour","minute","second", | all |
| get date part from timestamp |
DateTime | TO_TIMESTAMP/TIMESTAMP | 1 timestamp or field with timestamp type | all |
| convert string to timestamp |
Arrays | UNZIP | 2 array type field, boolean (preserve null and empty arrays in unzip) | project |
| unzip array in multiple instances. Second param enable/disable preserve null and empty arrays. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzip. You can use this in a subquery for continue operating over array |
Arrays | ELEMAT | 1 array type field | all |
| get element by position in array |
Geo | GEOWITHIN | 2 geometry point field, geometry | where |
| find some geometry field point in geometry structure as the second arguments. Only in where because mongoDB limitation |
Aggregation | COUNT | 1 field or * | project or having |
| count |
Aggregation | SUM, MIN, MAX, AVG, FIRST, LAST | 1 field (type depending on operation) | project or having |
| aggregation functions. Last and first get first or last of a group |
Conversion | TOINT, TOBOOL, TOSTRING, TOOBJECTID (or OID), TODOUBLE,TOLONG, TODATE, TODECIMAL | 1 field (type depending on operation) | all |
| conversion between type |
Timeserie | UNZIPTS | 2 window type in String (uppercase), window frequency in String (uppercase) | project |
| unzip timeserie into plain strcture returning the instances like when they're inserted. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzipts. You can use this in a subquery for continue operating over timeserie |
Count Distinct clause (From 4.2.1-predator)
With this clause, you can perform a count operation of distinct values of a group (the group inside distinct clause) for another group (the fields outside of the distinct clause)
select count(distinct Restaurant.borough) from Restaurants
select Restaurant.cuisine, count(distinct Restaurant.borough, Restaurant.name) from Restaurants
Due to restrictions of mongodb query languaje, it’s not allowed the combination with other elements different than columns