A Trail Guide To GemFire OQL: Object Query Language
Overview
VMware GemFire supports a query language known as Object Query Language (OQL). OQL was originally developed between 1993 and 2001 by the Object Data Management Group (ODMG) for object databases. The most recent specification for OQL (ODMG 3.0) was published by OMDG in book form¹. OQL was later used as the basis for the query language used in the Java Data Objects (JDO) specification, known as JDOQL.
For more information about the particular dialect of OQL used in GemFire, please consult the Querying section of the GemFire Documentation for the specific version of GemFire that you are using. This article assumes GemFire 10.0, but most of the information here also applies to earlier versions.
This article is meant as a supplement to the GemFire documentation, to highlight some of the unique aspects of OQL and provide some guidance on writing queries in OQL with GemFire.
Footnote 1: The Object Data Standard: ODMG 3.0. Edited by R.G.G. Cattell and Douglas K. Barry, with contributions by Mark Berler, Jeff Eastman, David Jordan, Craig L. Russell, Olaf Schadow, Torsten Stanienda, and Fernando Velez. Morgan Kaufmann Publishers, Inc., 2000. ISBN 1-55860-647-5.
Differences between SQL and OQL
The basic advantages of OQL are given in the GemFire docs as:
- You can query on any arbitrary object
- You can navigate object collections
- You can invoke methods and access the behavior of objects
- Data mapping is supported
(i.e. map/dictionary lookups, and mapping from region values to region keys using indexes) - You are not required to declare types. Since you do not need type definitions, you can work across multiple languages
- You are not constrained by a schema
Composable Expressions
A basic difference between OQL and SQL is that OQL consists entirely of expressions, including the SELECT
expression which is the core expression for querying.
Since SELECT
is an expression and not a statement, it can be embedded in more complex expressions anywhere a collection value is needed. This makes the language very composable where expressions can be nested within each other to the extent that the required (runtime) types are in alignment.
As an example of the flexibility of this kind of composability, consider a query that gets the number of employees who have a manager that has a lower salary than themselves. Assuming an employees
region with employee objects that have a managerId
attribute, and considering that a select expression can be used anywhere a collection is expected, you could evaluate an OQL query such as:
select count(*)
from /employees as emp
where emp.managerId != 0
and emp.salary >
element(select mgr.salary
from /employees as mgr
where emp.managerId = mgr.emplNumber)
The ELEMENT
function applies to a collection that is known to have exactly one element and extracts the single element from it.
Here are some other expressions that are valid in OQL, which would not be valid in SQL:
'Hello World'
-- evaluates to Hello World
2 + 2
-- evaluates to 4
ELEMENT(SELECT * FROM /employees WHERE emplNumber=10006).hoursPerWeek >= 30
-- evalutes to true (or false)
Method invocation on Objects
Another difference from SQL is that OQL allows methods, simple attributes (that can map to “getter” methods), and arbitrarily complex chaining of method calls and attributes, to be evaluated on objects. In Java, a simple attribute using dot notation like myObject.x
gets translated to either a public field or a call to a getter method such as myObject.getX()
. Methods that take parameters can also be invoked in OQL.
Care should be taken when working with methods implemented in languages like Java where arbitrary methods (and even “getters”) can cause data mutation and other side effects. Since OQL is declarative, the order of expression evaluation is not always predictable or prescribed; it works best when the methods invoked in the query language are pure “read-only” functions, returning values without causing side effects.
Evaluating queries in a REPL
In this article there will be some sample query expressions for demonstration purposes. You can evaluate these queries yourself using the GemFire command-line tool gfsh
as a “REPL” (Read-Evaluate-Print-Loop) tool.
The syntax for executing a query in gfsh
is:
query --query=<Select_Expression>
Note (1): When evaluating OQL expressions in gfsh
, you are currently limited to only using SELECT expressions that query on GemFire regions, so the query expressions that were given earlier won’t work in gfsh
—but they do work when using the QueryService
API in a server(peer) application. This may be improved in a future version of GemFire.
Note (2): This limitation of using a SELECT
expression also applies when querying against partitioned regions, whereas other types of expressions can be used when querying replicated (or local) regions.
Set up example Cluster
If you would like to follow along with evaluating these example queries, here are step-by-step instructions using gfsh
to start a GemFire cluster and populate the example data. After this is done you will be able to evaluate the queries shown in the rest of this article.
- If you haven’t already done so, install GemFire and run
gfsh
to get thegfsh
prompt:
% gfsh
gfsh>
2. Start a locator:
gfsh> start locator --name=locator
3. Start a server:
gfsh> start server --name=server1
4. Compile the (3) domain classes (see source code at end of this article) into a jar file, say employees.jar
, then deploy the jar to the cluster:
gfsh> deploy --jars=employees.jar
- Run the
query.gfsh
script This source for this script is also given at the end of this article. This creates theemployees
anddepartments
regions and populates them with sample data.
gfsh> run --file="query.gfsh"
You can test that the data is there with:
gfsh> query --query="select * from /employees"
You should see output showing a table with 14 rows of employee data, something like:
firstName | lastName | emplNumber | email | salary | hoursPerWeek | managerId | deptId
--------- | ---------- | ---------- | ---------------------------- | ------ | ------------ | --------- | ------
"Frankie" | "Forth" | 10004 | "Frankie.Forth@example.com" | 100000 | 30 | 0 | 101
"Jamie" | "Jive" | 10005 | "Jamie.Jive@example.com" | 60000 | 20 | 0 | 102
"Pat" | "Puts" | 10007 | "Pat.Puts@example.com" | 75000 | 40 | 0 | 101
… (etc)
Likewise, test that the /departments
data is there with:
gfsh> query --query="select * from /departments"
You should see a table with 3 department rows:
deptId | name | location
------ | ----------------- | ---------------------------------
101 | "Human Resources" | {"city":"Beaverton","state":"OR"}
100 | "Accounting" | {"city":"New York","state":"NY"}
102 | "Engineering" | {"city":"Beaverton","state":"OR"}
Elements of a SELECT expression
The SELECT expression has the basic form:
SELECT
projectionList FROM
fromClause [WHERE
booleanExpression]
projectionList := (*
| projection {,
projection})
projection := attribute | expr | aggregationExpr
aggregationExpr := aggOp(
projection)
aggOp := MIN
| MAX
| SUM
| AVG
| COUNT
fromClause := collectionExpr {,
collectionExpr}
By convention, this grammar definition includes these symbols:
:= : a grammar production rule showing a breakdown of syntactical parts
{ } : zero or more repetitions
( ) : a group of elements
| : alternatives.
A collectionExpr is any expression that evaluates to a value of type collection, and a booleanExpr is any expression that evaluates to a value of type boolean.
Note: This is a simplified form of a select expression, as it can also be followed by ORDER BY
and/or GROUP BY
clauses. This grammar also doesn’t show the optional binding of expressions to identifiers (with or without AS
). The complete grammar of GemFire OQL can be found in the current GemFire documentation under the heading Query Language Grammar.
Subqueries
A select expression itself evaluates to a value of type collection, so it is possible to nest select expressions within select expressions, as long as the types line up. Some examples: (the backslash \
denotes line continuation in gfsh
)
QUERY (1)
Select expression used in an IN operator, in a WHERE clause.
Gets the names of the employees that in departments located in the State of New York.
SELECT firstName, lastName FROM /employees
WHERE deptId IN (SELECT dept.deptId FROM /departments dept
WHERE dept.location.state = 'NY')
gfsh> query --query="\
SELECT firstName, lastName FROM /employees \
WHERE deptId IN (SELECT dept.deptId FROM /departments dept \
WHERE dept.location.state = 'NY')"
Result : true
Limit : 100
Rows : 5
firstName | lastName
--------- | --------
Morgan | Minnow
Taylor | Tack
Dale | Driver
Alex | Able
Ryan | Redo
Query(2)
Select expression used in a FROM clause.
Gets the average of the department average salaries.
SELECT AVG(average_salary) FROM
(SELECT AVG(salary) AS average_salary, deptId
FROM /employees
GROUP BY deptId)
gfsh> query --query="\
SELECT AVG(average_salary) FROM \
(SELECT AVG(salary) AS average_salary, deptId \
FROM /employees \
GROUP BY deptId)"
Result : true
Limit : 100
Rows : 1
Result
-----------------
79416.66666666667
Query (3)
Select expression used in a projection.
Calculates the difference between salary and average salary.
SELECT emplNumber, firstName, lastName, salary, \
(salary - ELEMENT(SELECT AVG(e.salary) \
FROM /employees e)) AS diff \
FROM /employees
gfsh> query --query=" \
SELECT emplNumber, firstName, lastName, salary, \
(salary - ELEMENT(SELECT AVG(e.salary) \
FROM /employees e)) AS diff \
FROM /employees"
Result : true
Limit : 100
Rows : 14
emplNumber | firstName | lastName | salary | diff
---------- | --------- | -------- | ------ | -------------------
10001 | Bertie | Bell | 80000 | 357.14285714285506
10010 | Casey | Catch | 60000 | -19642.857142857145
(etc.)
Query (4)
Method invocation without arguments.
Shows hourly wage in USD by calling the hourlyWage() method on Employee and returns name and hourly wage ordered by last name.
Note that a method call without any arguments can be written as an attribute without parentheses.
SELECT lastName, firstName, hourlyWage \
FROM /employees
ORDER BY lastName
gfsh>query --query="\
select lastName , firstName, hourlyWage \
from /employees order by lastName"
Result : true
Limit : 100
Rows : 14
lastName | firstName | hourlyWage
-------- | --------- | ----------
Able | Alex | 28.85
Bell | Bertie | 38.46
(etc.)
Query (5)
Method invocation with argument.
Shows hourly wage in EUR by calling hourlyWage(double) method with currency conversion factor on Employee returning name and hourly wage in EUR ordered by last name.
SELECT lastName , firstName, hourlyWage(0.91) AS hourlyWageInEuros
FROM /employees
ORDER BY lastName
gfsh> query --query="\
select lastName , firstName, \
hourlyWage(0.91) AS hourlyWageInEuros \
from /employees order by lastName"
Result : true
Limit : 100
Rows : 14
lastName | firstName | hourlyWageInEuros
-------- | --------- | -----------------
Able | Alex | 26.25
Bell | Bertie | 35.0
(etc.)
Joins
Queries that read from more than one region, or the same region more than once (a self-join) can often be simplified and in some cases evaluated more efficiently by using joins instead of subqueries. In OQL, a join between regions is indicated by referencing more than one region in the FROM clause.
A basic join between the employees
and departments
regions:
SELECT
e.firstName, e.lastName, d.name as deptName,
d.location.city as deptCity, d.location.state as deptState
FROM /employees e, /departments d
WHERE e.deptId = d.deptId
gfsh> query --query="\
SELECT \
e.firstName, e.lastName, d.name as deptName, \
d.location.city as deptCity, d.location.state as deptState \
FROM /employees e, /departments d \
WHERE e.deptId = d.deptId"
Result : true
Limit : 100
Rows : 14
firstName | lastName | deptName | deptCity | deptState
--------- | -------- | --------------- | --------- | ---------
Ryan | Redo | Accounting | New York | NY
Kris | Call | Engineering | Beaverton | OR
Skyler | Skip | Human Resources | Beaverton | OR
(etc.)
Query (1)
shown above could be re-written as a join:
SELECT emp.firstName, emp.lastName FROM /employees emp, /departments dept \
WHERE emp.deptId = dept.deptId AND dept.location.state = 'NY'
Object Serialization in Regions
GemFire supports the PDX
(Portable Data Exchange) format for storing domain objects or JSON documents in a serialized form in a region, and BSON
as another format for JSON objects. These objects can be queried like any other objects using OQL, but they have the added advantage that the entire object doesn’t need to be deserialized in order to be queried—only the fields needed.
Using PDX serialization also enables reading and querying access for REST clients.
For more information about PDX Serialization, JSON Documents, and the REST API, see the GemFire Documention.
Optimizing OQL Queries
gfsh
commands, but they are illustrative only and in require additional data or commands to execute without error.
Indexing
Types of indexes and when to use them
Range Index
Example gfsh
command to create:
gfsh> create index --name=myIndex --expression=location.coordinates.latitude --region=/exampleRegion
This is the default type of index, i.e. if you don’t specify an index type in gfsh
it will default to a range index. Range indexes support comparisons, both equality and inequality (=
, !=
, <
, <=
, >
, >=
). These indexes used to be called functional indexes because the index is created on an arbitrary OQL expression that is evaluated on the values of the region—the expression defines a function on the region values. The expression is not limited to just one attribute or field, it can be chained dot-notation for a path that drills down into the object, or a map index or a method invocation with specific parameters. It uses the composable expression syntax of OQL to define a function. Generally speaking a query will only use the index if contains precisely the same expression that the index was created with.
For example, with the index created with the expression locations['hq'].coordinates.latitude
, the following query would be supported by the index:
SELECT * FROM /exampleRegion WHERE locations['hq'].coordinates.latitude >= 42
Key Index
Example gfsh
command to create:
gfsh> create index --name=myKeyIndex --expression=orderId --region=region1 --type=key
A key index maps region values back to region keys, enabling the query engine to recognize when a query is restricting the values to specific region keys (using equality on the key expression). This is useful when querying partitoned regions as it enables the query to be sent only to the partition where the data resides. For example, if the key expression is orderId
, then the following query could be routed to the specific partition:
SELECT * FROM /Orders WHERE orderId = '12345' AND amount > 1000
Using index hints
Index hints can be embedded into the query string to give preference to the use of one or more specific indexes by name. This can help to optimize query evaluation when multiple indexes could be used for a query but the query developer knows that some index will prune the results much faster than others. A query hint looks like:
<HINT 'IdIndex'> SELECT * FROM /Portfolios p WHERE p.ID > 10 AND p.owner = 'XYZ'
Pruning Results
AND
conditions.
Using Bind Parameters
Some performance may be improved for queries that are reused multiple times by using bind parameters and re-evaluating the same query each time with different parameters. Bind parameters can be used anywhere in an OQL query that a value is expected. The placeholder for a bind parameter starts with a dollar sign ($
) follwed by the sequence number for the parameter starting with $1
. To pass a parameter to a query with bind parameters, you use the Query.execute(Object[])
method, passing in the parameters in order.
An example query that uses a bind parameter in the FROM clause as well as a comparison operand in the WHERE clause. This query could be reused for multiple regions and different datetime comparisons:
SELECT * FROM $1 WHERE datetime > $2
When executed, the Region itself and a java.sql.Timestamp for comparison would be passed in to the execute
method:
Region r = ...;
java.sql.Timestamp ts = ...;
QueryService qs = ...;
Query query = qs.newQuery("SELECT * FROM $1 WHERE datetime > $2");
query.execute(r, ts);
Debugging OQL Queries
Tracing queries
Adding the <trace>
tag to a query string, some information about the query will be logged to system.log
. This can help in diagnosing why a query might be slow or showing how much time a query actually takes to be evaluated. Reported information includes the timestamp when the query finished being evaluated, how much time it took to evaluate, how many rows were in the result, and how many indexes were used in the query evaluation. Adding the trace tag to a query looks like:
<TRACE> select * from /exampleRegion
Logging in methods
For debugging purposes, adding logging to methods that are invoked in a query could be helpful. Since methods can take parameters, logging named values or the result of expression evaluation could provide insight on the what values expressions are being evaluated to, whether they are being evaluated at all, how many times they are being evaluated, etc.
Query-related Statistics
GemFire collects various statistics on virtually everything happening inside the cluster, and querying is no exception. Statistics related to querying could also provide insight that could help with diagnosing issues. These statistics can be viewed visually using the VSD
(Visual Statistics Display) utility that comes with GemFire.
Just to give you an idea, statistics that are specifically related to querying include:
CachePerfStats
queryExecutions
queryExecutionTime
CacheServerStats
processQueryTime
queryRequests
queryResponses
readQueryRequestTime
writeQueryResponseTime
ClientStats
querys
queryFailures
querySuccesses
querysInProgress
queryTime
queryTimeouts
IndexStats
numKeys
numUpdates
numValues
updatesInProgress
updateTime
numUses
usesInProgress
useTime
For specific explanations for the meaning of these statistics, and for more information on using VSD, see the GemFire documentation for the version of GemFire that you are using.
Appendix: Source Code
gfsh
script
query.gfsh
Run this in gfsh
after the cluster is started (e.g. a locator and the REPL tool running as a peer), to create populate the regions with data.
connect
create region --name=employees --type=REPLICATE --if-not-exists
create region --name=departments --type=REPLICATE --if-not-exists
put --region=employees --key=10000 --value="('firstName':'Alex','lastName':'Able','emplNumber':10000,'email':'Alex.Able@example.com','salary':60000,'hoursPerWeek':40,'deptId':100)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10001 --value="('firstName':'Bertie','lastName':'Bell','emplNumber':10001,'email':'Bertie.Bell@example.com','salary':80000,'hoursPerWeek':40,'managerId':10000,'deptId':101)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10002 --value="('firstName':'Kris','lastName':'Call','emplNumber':10002,'email':'Kris.Call@example.com','salary':75000,'hoursPerWeek':40,'deptId':102)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10003 --value="('firstName':'Dale','lastName':'Driver','emplNumber':10003,'email':'Dale.Driver@example.com','salary':90000,'hoursPerWeek':40,'deptId':100)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10004 --value="('firstName':'Frankie','lastName':'Forth','emplNumber':10004,'email':'Frankie.Forth@example.com','salary':100000,'hoursPerWeek':30,'deptId':101)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10005 --value="('firstName':'Jamie','lastName':'Jive','emplNumber':10005,'email':'Jamie.Jive@example.com','salary':60000,'hoursPerWeek':20,'deptId':102)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10006 --value="('firstName':'Morgan','lastName':'Minnow','emplNumber':10006,'email':'Morgan.Minnow@example.com','salary':80000,'hoursPerWeek':40,'deptId':100)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10007 --value="('firstName':'Pat','lastName':'Puts','emplNumber':10007,'email':'Pat.Puts@example.com','salary':75000,'hoursPerWeek':40,'deptId':101)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10008 --value="('firstName':'Ricky','lastName':'Reliable','emplNumber':10008,'email':'Ricky.Reliable@example.com','salary':90000,'hoursPerWeek':40,'deptId':102)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10009 --value="('firstName':'Taylor','lastName':'Tack','emplNumber':10009,'email':'Taylor.Tack@example.com','salary':100000,'hoursPerWeek':40,'deptId':100)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10010 --value="('firstName':'Casey','lastName':'Catch','emplNumber':10010,'email':'Casey.Catch@example.com','salary':60000,'hoursPerWeek':30,'deptId':101)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10011 --value="('firstName':'Jessie','lastName':'Jam','emplNumber':10011,'email':'Jessie.Jam@example.com','salary':80000,'hoursPerWeek':20,'deptId':102)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10012 --value="('firstName':'Ryan','lastName':'Redo','emplNumber':10012,'email':'Ryan.Redo@example.com','salary':75000,'hoursPerWeek':40,'deptId':100)" --value-class=com.vmware.query.blog.Employee
put --region=employees --key=10013 --value="('firstName':'Skyler','lastName':'Skip','emplNumber':10013,'email':'Skyler.Skip@example.com','salary':90000,'hoursPerWeek':40,'deptId':101)" --value-class=com.vmware.query.blog.Employee
put --region=departments --key=100 --value="('deptId':100,'name':'Accounting','location':{'city':'New York','state':'NY'})" --value-class=com.vmware.query.blog.Department
put --region=departments --key=101 --value="('deptId':101,'name':'Human Resources','location':{'city':'Beaverton','state':'OR'})" --value-class=com.vmware.query.blog.Department
put --region=departments --key=102 --value="('deptId':102,'name':'Engineering','location':{'city':'Beaverton','state':'OR'})" --value-class=com.vmware.query.blog.Department
Domain Classes
The domain classes need to be deployed to the servers in the cluster.
Employee.java
package com.vmware.query.blog;
import java.io.Serializable;
import java.util.StringJoiner;
public class Employee implements Serializable {
public String firstName;
public String lastName;
public int emplNumber;
public String email;
public int salary;
public int hoursPerWeek;
public int managerId;
public int deptId;
public Employee() {}
@Override
public String toString() {
return new StringJoiner(", ", Employee.class.getSimpleName() + "[", "]")
.add("firstName='" + firstName + "'")
.add("lastName='" + lastName + "'")
.add("emplNumber=" + emplNumber)
.add("email='" + email + "'")
.add("salary=" + salary)
.add("hoursPerWeek=" + hoursPerWeek)
.add("managerId=" + managerId)
.add("deptId=" + deptId)
.toString();
}
/** @return hourlyWage in USD rounded to the nearest cent. */
public double hourlyWage() {
return hourlyWage(1.0);
}
/**
* Return hourlyWage given a currency conversion factor, rounded to nearest 0.01.
* @param usdCurrencyConversion factor for target currency relative to USD
* @return hourlyWage in target currency
*/
public double hourlyWage(final double usdCurrencyConversion) {
final double fullTimeEquivalent = (double)hoursPerWeek / 40;
return Math.round(salary / 52.0 / (40.0 * fullTimeEquivalent) * usdCurrencyConversion * 100.0)
/ 100.0;
}
}
Department.java
package com.vmware.query.blog;
import java.io.Serializable;
import java.util.StringJoiner;
public class Department implements Serializable {
public int deptId;
public String name;
public Location location;
public Department() { }
@Override
public String toString() {
return new StringJoiner(", ", Department.class.getSimpleName() + "[", "]")
.add("deptId=" + deptId)
.add("name='" + name + "'")
.add("location=" + location)
.toString();
}
}
Location.java
package com.vmware.query.blog;
import java.io.Serializable;
import java.util.StringJoiner;
public class Location implements Serializable {
public String city;
public String state;
@Override
public String toString() {
return new StringJoiner(",")
.add(city)
.add(state)
.toString();
}
}