New SQL Engine on MongoDB

Available in version 2.2.0-hyperblast

EN | ES

Introduction

After several months of testing, the new version of the SQL engine for MongoDB has been released.

This new SQL engine is based on the opensource project https://github.com/vincentrussell/sql-to-mongo-db-query-converter to which improvements have been added and covers the part of reading queries (selects).

This project is built purely in Java code, so it is easily extensible and you can include new features or bug fixes, contributing to the community in the same github repository.

Unlike the previous engine (Quasar) that worked with mongo version 3.4, this engine works with mongo versions from 4.0 onwards. By default, in platform, it is mounted on the most current version so far, 4.4.

Another big difference is the efficiency when translating queries, making the transformation cost minimal. In Quasar, in certain cases, there were queries that caused a high translation cost.

Migration from the old SQL Engine (Quasar)

There are several changes with respect to the old Quasar engine, so that, if a migration is considered, it must be taken into account that the new engine works by maintaining a stricter SQL and avoiding the use of the map-reduce framework, avoiding collection locks and the generation of intermediate tables, prioritizing performance and reducing possible errors:

  • String data are with ' always

  • Joins are limited to the use of inner join and left join only. The syntax of multiple comma origins is not valid: select * from ontology1, ontology2,….

  • The timestamp from string changes from timestamp '' to to_timestamp('') function.

  • The $oid from string changes from OID '' to OID('') or toObjectId('') function.

  • Array projections or .* .0 [2] [] {} type objects are not supported, implementation with functions is possible.

  • You cannot concatenate with ||, you must use the concat() function.

  • You cannot use the alias or the name of the ontology to make the projections nor with *. These queries would be incorrect (instead you should use select * from Ontology):

SELECT c FROM Ontology AS c LIMIT 3

SELECT c.* FROM Ontology AS c LIMIT 3

SELECT Ontology FROM Ontology AS c LIMIT 3

SELECT Ontology.* FROM Ontology AS c LIMIT 3

ObjectID and _id considerations

When returning data of type objectId from mongoDB, Quasar avoided returning the _id

If explicitly instructed, Quasar returns it in string format:

In the new engine, for backward compatibility (with a certain cost due to response parsing), there is the parameter (if not indicated, it is set to true by default, as in the case of updates):

mongodb-use-legacysql

It can be disabled within the platform settings:

If disabled (default on new installations) it is returned in MongoDB format:

If a * is used in the new engine, the _id is always returned.

Extensibility (from 2.2.1-hyperblast)

There is the possibility of extending the number of functions available in this engine, so that new functionalities can be included from the controlpanel, as shown here

https://onesaitplatform.atlassian.net/wiki/spaces/OP/pages/1189937153

Sintaxis

Data Types

Type

Description

Examples

Type

Description

Examples

Null

Indicates missing information.

null

Boolean

true or false

truefalse

Integer

Whole numbers (no fractional component)

1-2

Decimal

Decimal numbers (optional fractional components)

1.0-2.19743

String

Text

'221B Baker Street'

DateTime

Date and time, in ISO8601 format

to_timestamp('2004-10-19T10:23:54Z')

Object ID

Unique object identifier.

OID('507f1f77bcf86cd799439011')

Clauses

The following clauses are supported:

Type

Clauses

Type

Clauses

Basic

SELECTASFROM

Joins

LEFT JOININNER JOINJOIN

Filtering

WHERE

Grouping

GROUP BYHAVING

Subquery

FROM (SELECT ...) AS

Paging

LIMITOFFSET

Sorting

ORDER BY , DESCASC

Conditional

CASE WHEN ELSE

The following operators are supported:

Type

Operators

Type

Operators

String

LIKENOT LIKE

Relational

=>=<=<>INNOT IN

Boolean

ANDORNOT

Null

IS NULLIS NOT NULL

Projection

foo.bar

Mathematical

+, -, *, /, %

The following functions are supported by default (this can be extended with new functions). Those are case insensitive:

Type

Functions

Type

Functions

String

CONCATLOWERUPPERSUBSTRINGLENGTH

DateTime

DATE_PARTTO_TIMESTAMP

Arrays

UNZIP (only in project)ELEMAT

Geo

GEOWITHIN (only in where)

Set-Level

DISTINCT

Aggregation

COUNTSUMMINMAXAVG, FIRST, LAST

Conversion

TOINT, TOBOOL, TOSTRING, TOOBJECTID (or OID), TODOUBLE,TOLONG, TODATE, TODECIMAL

Timeserie

UNZIPTS (only in project)

Use Examples

Basic Queries:

select * from ISO3166_1 returns the collection, along with the _id and contextData:

select c.ISO3166 from ISO3166_1 as c returns only the collection:

Where:

Filters can be used in the where clause with different operators:

  • Relational: =>=<=<>INNOT IN, LIKE, NOT LIKE

  • Boolean: ANDORNOT

select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia':

select c.ISO3166 from ISO3166_1 as c where c.ISO3166.name='Zambia' and c.ISO3166.language='EN':

 

select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name like 'S%' and c.ISO3166.language='EN':

select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name in ('Sudan','Suriname') and c.ISO3166.language='EN':

Group/Having:

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language

or

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name:

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by c.ISO3166.language having count(*) > 250

or

select c.ISO3166.language as name, count(*) as c from ISO3166_1 as c group by name having c > 250:

Limit/offset

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 1:

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1:

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name limit 3 offset 1

or

select c.ISO3166.name as name, count(*) as c from ISO3166_1 as c where c.ISO3166.name like 'S%' group by name offset 1 limit 3 (retrocompatibility):

Case when else (from 2.2.1-hyperblast)

It is possible to use the syntax case when default as projection or in the group by part (to make conditional groupings although as in any group by it has to be a field that is in the projection or the alias of the same one). It is mandatory to use the default clause.

select h.Helsinki.population as population, case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end from HelsinkiPopulation as h

select case when h.Helsinki.population between 0 and 50000 then 'low' else 'high' end as grouprange, count(*) as c from HelsinkiPopulation as h group by cca

Mathematical operators (from 2.2.1-hyperblast)

It is possible to use mathematical operators in SQL operations

select h.Helsinki.population as pop, h.Helsinki.population+1000 as pop2 from HelsinkiPopulation as h

select toInt(c.Helsinki.year/1000) as m, sum(c.Helsinki.population) as s, sum(c.Helsinki.population_men) as sm, sum(c.Helsinki.population_women) as sw, concat(toString(toInt(10000*sum(c.Helsinki.population_men)/sum(c.Helsinki.population))/100),'%') as percentm, concat(toString(toInt(10000*sum(c.Helsinki.population_women)/sum(c.Helsinki.population))/100),'%') as percentw from HelsinkiPopulation as c group by m

Function NOW()

This platform function returns the system date and time, it can be used anywhere in the query and will add this date in string format. If we want to parse it to mongo date and operate with it we can make a toDate(now())

NOW(“format“,'unitTime', amount)

  • format“: formatting of the date, the default is to use"yyyy-MM-dd'T'HH: mm: ss'Z '"

  • 'unitTime': uunit of time to increase or decrease the number of hours, days, ... possible values of unitTime are: 'year', 'month', 'date', 'hour', 'minute', 'second ',' millisecond '

  • "amount": positive or negative integer, with the amount of unitTime to be added or subtracted on the system date

select Helsinki, now() as nowstr, toDate(now()) as nowdate from HelsinkiPopulation LIMIT 3

Joins/Subqueries

It is possible to make joins of type inner join and left join and even join them with subqueries.

select re.countrysrc as countrysrc,re.countrydest as countrydest,re.count, iso.ISO3166.latitude as latitude, iso.ISO3166.longitude as longitude from ( select rx.routesexten.countrysrc As countrysrc, rx.routesexten.countrydest As countrydest, count(re.routesexten.countrysrc) As count from routesexten as rx group by rx.routesexten.countrysrc, rx.routesexten.countrydest order by count desc) As re inner join ISO3166_1 As iso on re.countrydest = iso.ISO3166.name

Union all (from 2.2.1-hyperblast)

With this clause you can add data of different queries (duplicated wouldn’t be removed)

select 1 as v from Restaurants limit 1 union all

select 2 as v from Restaurants limit 1 union all

select 3 as v from Restaurants limit 1

Due to restrictions of mongodb, this is limited to “union all” not “union” clause. So if you need to have distincts result of some union of data you need to group them after union them.

Functions (from 2.2.1-hyperblast)

Several functions are included in the engine by default (these can be extended by configuration). They are used in case insensitive mode so they can be typed case insensitive:

Type

Name

Params

Places

Example

Comments

Type

Name

Params

Places

Example

Comments

String

CONCAT

1..* String or column with type string

all

select concat('Cuisine ', c.Restaurant.cuisine) from Restaurants as c

Return combination of all strings

String

LOWER, UPPER

1 String or column with type string

all

select concat('Cuisine ', upper(c.Restaurant.cuisine)) from Restaurants as c

lowercase/uppercase of string

String

SUBSTRING

3 String or column with type string, index (integer), length (integer)

all

select concat('Cuisine ', substring(c.Restaurant.cuisine,2,3)) from Restaurants as c

substring of string

String

LENGTH

1 String or column with type string

all

select length(c.Restaurant.cuisine) from Restaurants as c

length of string

String

TRIM

1 String or column with type string

all

SELECT trim(' aaa '), trim(c.Restaurant.borough) FROM Restaurants AS c LIMIT 300

trim a string

DateTime

DATE_PART

2 part type ("month","year","dayOfMonth","hour","minute","second",
"millisecond","dayOfYear","dayOfWeek","week"), timestamp or field with timestamp type

all

SELECT date_part('year',timestamp) as year, date_part('month',timestamp) as month FROM QA_DETAIL AS c LIMIT 3

SELECT date_part('dayOfMonth',timestamp) as day, count(*) FROM QA_DETAIL AS c group by day

get date part from timestamp

DateTime

TO_TIMESTAMP/TIMESTAMP

1 timestamp or field with timestamp type

all

SELECT timestamp FROM QA_DETAIL AS c where timestamp > to_timestamp('2018-10-24T23:00:00.000Z') LIMIT 3

SELECT timestamp FROM QA_DETAIL AS c where timestamp > timestamp('2018-10-24T23:00:00.000Z') LIMIT 3

convert string to timestamp

Arrays

UNZIP

2 array type field, boolean (preserve null and empty arrays in unzip)

project

select unzip(Restaurant.grades,true) from Restaurants

unzip array in multiple instances. Second param enable/disable preserve null and empty arrays. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzip. You can use this in a subquery for continue operating over array

Arrays

ELEMAT

1 array type field

all

select elemat(Restaurant.grades,1) from Restaurants

get element by position in array

Geo

GEOWITHIN

2 geometry point field, geometry

where

select * from Supermarkets where geoWithin(Supermarkets.geometry, '{type : "Polygon" , coordinates: [ [ [-15.44488, 28.137924], [-15.423848, 28.137924], [-15.423848, 28.144054], [-15.44488, 28.144054], [-15.44488, 28.137924] ] ] }')

find some geometry field point in geometry structure as the second arguments. Only in where because mongoDB limitation

Aggregation

COUNT

1 field or *

project or having

select count(*) from Supermarkets

count

Aggregation

SUM, MIN, MAX, AVG, FIRST, LAST

1 field (type depending on operation)

project or having

select min(c.Supermarkets.status) as maxs from Supermarkets as c

aggregation functions. Last and first get first or last of a group

Conversion

TOINT, TOBOOL, TOSTRING, TOOBJECTID (or OID), TODOUBLE,TOLONG, TODATE, TODECIMAL

1 field (type depending on operation)

all

select concat('Cuisine ', UPPER(c.Restaurant.cuisine), ' of length ',toString(length(UPPER(c.Restaurant.cuisine)))) as complex from Restaurants as c

conversion between type

Timeserie

UNZIPTS

2 window type in String (uppercase), window frequency in String (uppercase)

project

SELECT unzipts('DAYS','MINUTES') FROM timeserie AS c LIMIT 3

unzip timeserie into plain strcture returning the instances like when they're inserted. This function can only be used alone in project because of the changing of the number of records. All filters applied are before unzipts. You can use this in a subquery for continue operating over timeserie