SQL on Mongo (legacy)

IMPORTANT

This guide applies to the Quasar-based SQL query engine, which worked for Mongo versions up to 3.4. 

If you are using the new SQL engine or a Mongo 4.X version you should read this guide: New SQL Engine on MongoDB


The Semantic Data Hub has among other objectives to decouple the ontology from the underlying database. 

For that, we propose to use SQL query language, so the platform offers a SQL Query Engine for each of the supported databases.

Here you can find Reference Guides for this language:

SELECT Operations

Data Types

TypeDescriptionExamplesMongoDB native-supported
NullIndicates missing information.nullYes
Booleantrue or false.true, falseYes
IntegerWhole numbers (no fractional component).1, -2Yes
DecimalDecimal numbers (optional fractional components).1.0, -2.19743Yes
StringText.'221B Baker Street'Yes
DateTimeDate and time, in ISO8601 format.TIMESTAMP '2004-10-19 10:23:54'Yes
TimeTime in the format HH:MM:SS.TIME '10:23:54'No
DateDate in the format YYYY-MM-DD.DATE '2004-10-19'No
IntervalTime interval, in ISO8601 format.INTERVAL 'P3DT4H5M6S'No
Object IDUnique object identifier.OID '507f1f77bcf86cd799439011'Yes
Ordered SetOrdered list with no duplicates allowed.(1, 2, 3)No
ArrayOrdered list with duplicates allowed.[1, 2, 2]Yes

Clauses

The following clauses are supported:

TypeClauses
BasicSELECT, AS, FROM
JoinsLEFT OUTER JOIN, RIGHT OUTER JOIN, INNER JOIN, FULL JOIN, CROSS
FilteringWHERE
GroupingGROUP BY, HAVING
ConditionalCASE , WHEN, DEFAULT
PagingLIMIT, OFFSET
SortingORDER BY , DESC, ASC

The following operators are supported:

TypeOperators
Numeric+, -, *, /, %
String~ , ~*, !~, !~*, LIKE, ||
Array||, [ ... ]
Relational=, >=, <=, <>, BETWEEN, IN, NOT IN
BooleanAND, OR, NOT
Projectionfoo.bar, foo[2], foo{*}, foo[*]
Date/TimeTIMESTAMP, DATE, INTERVAL, TIME
IdentityOID

Note: ~ , ~*, !~, and !~* are regular expression operators. ~*, !~, and !~* are preliminary and may not work in the current release.

Note: The || operator for strings will concatenate two strings; for example, you can create a full name from a first and last name property: c.firstName || ' ' || c.lastName. The || operator for arrays will concatenate two arrays; for example, if xy is an array with two values, then c.xy || [0] will create an array with three values, where the third value is zero.

The following functions are supported:

TypeFunctions
StringCONCAT, LOWER, UPPER, SUBSTRING, LENGTH, SEARCH
DateTimeDATE_PART, TO_TIMESTAMP
NullsCOALESCE
ArraysARRAY_LENGTH, FLATTEN_ARRAY
ObjectsFLATTEN_OBJECT
Set-LevelDISTINCT, DISTINCT_BY
AggregationCOUNT, SUM, MIN, MAX, AVG
IdentitySQUASH

Basic Queries

select * from ISO3166_1 as c returns info and ContextData:

select c.ISO3166 as ISO3166 from ISO3166_1 as c returns only Data:

select _id,c from ISO3166_1  returns * including id from Registry:


Filtering

You can filter a result set using the WHERE clause. The following operators are supported:

  • Relational: -, =, >=, <=, <>, BETWEEN, IN, NOT IN
  • Boolean: AND, OR, NOT

Examples:

select c.ISO3166 as ISO3166 from ISO3166_1 as c where c.ISO3166.name="Zambia"

Numeric and String Operations

You can use any of the operators or functions listed in the Clauses, Operators, and Functions section on numbers and strings. Some common string operators and functions include:

Operator or FunctionDescription
||Concatenates.
LOWERConverts to lowercase.
UPPERConverts to uppercase.
SUBSTRINGReturns a substring.
LENGTHReturns length of string.

Examples:

select c.ISO3166.number/100+1000 as calculated from ISO3166_1 as c where c.ISO3166.name="Zambia" and c.ISO3166.language="EN"

select c.ISO3166.name as name from ISO3166_1 as c where c.ISO3166.name like "S%" and c.ISO3166.language="EN"


Dates and Times

Filter by dates and times using the TIMESTAMP, TIME, and DATE operators. Also, you can also use the DATEPART operator for selection to select part of a date, such as the day.

Note: Some databases will automatically convert strings into dates or date/times. SlamData does not perform this conversion, since the underlying database has no schema and no fixed type for any field. As a result, an expression like WHERE ts > '2015-02-10' compares string-valued ts fields with the string '2015-02-10' instead of a date comparison. If you want to embed literal dates, timestamps, etc. into your SQL queries, you should use the time conversion operators, which accept a string and return value of the appropriate type. For example, the above snippet could be converted to WHERE ts > DATE '2015-02-10', which looks for date-valued ts fields and compares them with the date 2015-02-10.

Note: If your database data does not use a native date/time type, and instead, you store your timestamps as epoch milliseconds in a numeric value, then you should either compare numbers or use the TO_TIMESTAMP function.

Filter based on a timestamp (date and time).

Use the TIMESTAMP operator to convert a string into a date and time. The string should have the format YYYY-MM-DDTHH:MM:SS.

select c.type,c.name,c.dateLastUpdate,DATE_PART("year",c.dateLastUpdate) as year from CLM_BusStop as c

select c from CLM_BusStop as c WHERE c.CLM_BusStop.lastUpdateMillis > TO_TIMESTAMP(1446335999)

select c from CLM_BusStop as c WHERE c.CLM_BusStop.lastUpdate > TIMESTAMP('2018-11-06T12:00:00.000Z')


Function NOW()

This function returns the current system date, you can use it in the WHERE of your queries to bring data.

NOW(“format“,'unitTime', amount)

  • “format“: we will give the format that we need to date, for example "yyyy-MM-dd'T'HH: mm: ss'Z '".
  • 'unitTime': to date we can increase or decrease a number of hours, days, ... the possible values for 'unitTime' are: 'year', 'month', 'date', 'hour', 'minute', 'second', 'millisecond'.
  • "amount": is a positive or negative integer value that is added or subtracted depending on the unitTime selected.

Examples:

select * from MyOntology as c where c.date.timestamp>NOW()

Grouping

SQL on Mongo allows you to group data by fields and by date parts.

Group based on a single field.

Use GROUP BY to group results by a field.

Example:

SELECT c.age, COUNT(*) AS cnt
  FROM Users as c GROUP BY c.age

SELECT c.age, c.gender, COUNT(*) AS cnt
  FROM Users as c GROUP BY c.age, c.gender

Group based on date part.

Use the DATE_PART function to group by a part of a date, such as the month.

Example:

SELECT DATE_PART('day', c.ts) AS day, COUNT(*) AS cnt
  FROM Events as c
  GROUP BY DATE_PART('day', c.ts)

Filter within a group.

Filter results within a group by adding a HAVING clause followed by a Boolean predicate.

Example:

SELECT DATE_PART('day', c.ts) AS day, COUNT(*) AS cnt
  FROM Events as c
  GROUP BY DATE_PART('day', c.ts)
  HAVING c.gender = 'female'

Double grouping

Perform double-grouping operations by putting operators inside other operators. The inside operator will be performed on each group created by the GROUP BY clause, and the outside operator will be performed on the results of the inside operator.

Example:

This query returns the average population of states. The outer aggregation function (AVG) operates on the results of the inner aggregation (SUM) and GROUP BY clause.

SELECT AVG(SUM(p.pop)) FROM Population as p
  GROUP BY p.state

Nested Data and arrays

Unlike a relational database, many NoSQL databases allows data to be nested (that is, data can be objects) and to contain arrays.

Nesting

Nesting is represented by levels separated by a period (.).

Arrays

Array elements are represented by the array index in square brackets ([n]).

Example:

SELECT c.profile.allAddress[0].street.number
  FROM Users as c

Flattening

You can extract all elements of an array or all field values simultaneously, essentially removing levels and flattening the data. Use the asterisk in square brackets ([*]) to extract all array elements.

Example:

SELECT c.profile.allAddresses[*]
  FROM Users as c

Use the asterisk in curly brackets ({*}) to extract all field values.

Example:

SELECT c.profile.{*} FROM Users as c

Filtering using arrays

You can filter using data in all array elements by using the asterisk in square brackets ([*]) in a WHERE clause.

Example:

SELECT DISTINCT * FROM Users as c
  WHERE c.profile.allAddresses[*].street.number = '221B'

Pagination and Sorting

Pagination

Pagination is used to break large return results into smaller chunks. Use the LIMIT operator to set the number of results to be returned and the OFFSET operator to set the index at which the results should start.

Example (Limit results to 20 entries):

SELECT u FROM Users as u LIMIT 20

Example (Limit results to 20 entries empezando en el 10):

SELECT u FROM Users as u OFFSET 10 LIMIT 20

Sorting

Use the ORDER BY clause to sort the results. You can specify one or more fields for sorting, and you can use operators in the ORDER BY arguments. Use ASC for ascending sorting and DESC for decending sorting.

Example (Sort users by ascending age):

SELECT u 
  FROM Users as u
  ORDER BY u.age ASC

Joins

Use the JOIN operator to join different collections.

Examples:

This example returns the names of employees and the names of the departments they belong to by matching up the employee department ID with the department’s ID, where both IDs are ObjectID types.

SELECT emp.name, dept.name
  FROM Employee as emp
  JOIN Department as dept
  ON dept._id = emp.departmentId

Conditional and Nulls

Conditionals

Use the CASE expression to provide if-then-else logic to SQL². The CASE sytax is:

SELECT (CASE <field>
  WHEN <value1> THEN <result1>
  WHEN <value2> THEN <result2>
  ...
  [ELSE <elseResult>
  END)
FROM "<path>"

Example:

The following example translates string number values into actual numbers.

select

    (CASE c.Ticket.number

      WHEN 'one' THEN 1

      WHEN 'two' THEN 2

      END)

as nameParsed

from Ticket as c

Considerations on ID

MongoDB has special rules about fields called _id. For example, they must remain unique, which means that some queries (such as SELECT myarray[*] FROM foo) will introduce duplicates that MongoDB won’t allow. In addition, other queries change the value of _id (such as grouping).


To filter on `_id`, you must first convert a string to an object ID, by using the `OID` function. For example:

SELECT * FROM Ticket WHERE _id = OID( 'abc123')


By default, the `_id` field will not appear in a result set. However, you can specify it by selecting the `_id` field. For example:

SELECT _id, u AS email FROM users as u

Considerations on wildcard *

For performance reasons, we encourage you to avoid the use of the wildcard '*' on SELECTs, instead replace it with either projections or explicit fields:

SELECT * FROM Ticket  -> SELECT t FROM Ticket as t

SELECT * FROM Ticket  -> SELECT t.Ticket.name as name, t.Ticket.number as number FROM Ticket as t


Complex examples

Here we show some examples of queries that we've used in different projects:

select 
t.tempo, e.lotto, count(1)
from TempoDesc as t
inner join
(select
c.lotto, c.epochEnd - d.minStart as durataTotale
from
(select
min(epochStart) as minStart, lotto
from Eventi1
group by lotto) as d
inner join Eventi1 as c
on c.lotto=d.lotto ) as e
on e.durataTotale <= t.tempo
group by t.tempo, e.lotto
select 
      re.countrysrc,re.countrydest,re.count, iso.ISO3166.latitude , iso.ISO3166.longitude
     from
       ( select rx.routesexten.countrysrc as countrysrc, rx.routesexten.countrydest as countrydest, count(re.routesexten.countrysrc) as count
from routesexten as rx
group by rx.routesexten.countrysrc, rx.routesexten.countrydest
order by count desc) as re
     inner join
        ISO3166_1 As iso on re.countrydest = iso.ISO3166.name
     order by
     re.count desc
     limit 10

UPDATE Operations


You can also perform UPDATE operations in SQL. Here we will explain the peculiarities of this type of SQL directives.

Simple syntax

As in standard SQL, the update directive is built as follows:

UPDATE column SET attributes WHERE attributesFilter


Here are some examples of basic Update operations:

Let the Restaurants ontology be our working example with the following structure:

update Restaurants set Restaurant.borough="Queens" where Restaurant.borough="Manhattan"

update Restaurants set Restaurant.borough="Queens", Restaurant.restaurant_id=5400 where Restaurant.borough="Manhattan" and Restaurant.borough="Dj Reynolds Pub And Restaurant"

ObjectId

If you want to filter by Mongo ObjectId, you don't need to use any special function, you just need to indicate the quoted value of the ObjectId, for example:

update Restaurants set Restaurant.borough="Manhattan", Restaurant.restaurant_id=5400 where _id="5d14883e29a91330a9625d0c"

ISODates

As for the ObjectId, if you want work with Mongo dates, you just need to put the quoted value of the date in ISODate Format (yyyy-MM-dd'T'HH:mm:ss.SSS'Z'):

update ThermometerData set ThermometerData.value=60 where ThermometerData.timestamp="2019-06-26T13:57:01.000Z"

The compiled query:

 Result:

Booleans

If you want to set or filter boolean values, you have to use the SQL function Boolean() as follows:

update ThermometerData set ThermometerData.bool=Boolean(true) where ThermometerData.timestamp="2019-06-26T13:57:01.000Z"

Arrays

If you want to add elements to an array field, you have to use the APPEND() function.

APPEND(i, value) accepts two parameters:

  • i: This is optional (default is last position in array). It indicates the position where the value is going to be pushed.
  • value: This is the value being pushed to the array, it can be an object, string, number... This parameter can be also an array of values.

For example:

update ThermometerData set ThermometerData.array=APPEND("{\"Property\":\"value\"}")

update ThermometerData set ThermometerData.array=APPEND([true, false])

update ThermometerData set ThermometerData.array=APPEND([true, false, 45, "somestring", "2019-06-26T13:57:01.000Z"])

The result is:


Important note: If you want to push an ISODate (Mongo date) to the array, you want to surround the isodate value by ISODate function:

update ThermometerData set ThermometerData.array=APPEND([true, false, 45, "somestring", ISODate("2019-06-26T13:57:01.000Z")])

 


DELETE operations

Delete directive works as in standard SQL.

DELETE FROM column WHERE attributes

ObjectId, ISODate, and Boolean operations works as with the updates. You can't use any type of array operations yet in this directive.

Examples:

DELETE FROM Restaurants WHERE _id="5d14883e29a91330a9625d0c"

DELETE FROM Restaurants WHERE timestamp="2019-06-26T13:57:01.000Z"

DELETE FROM Restaurants WHERE isClosed=Boolean(true)