LibreOffice Database Handbook 5
LibreOffice Database Handbook 5
Queries
General information on queries
Queries to a database are the most powerful tool that we have to use databases in a practical way.
They can bring together data from different tables, calculate results where necessary, and quickly
filter a specific record from a mass of data. The large Internet databases that people use every day
exist mainly to deliver a quick and practical result for the user from a huge amount of information
by thoughtful selection of keywords including the search-related advertisements that encourage
people to make purchases.
Entering queries
Queries can be entered both in the GUI and directly as SQL code. In both cases a window opens,
where you can create a query and also correct it if necessary.
From the tables available, select the Loan table. This window allows multiple tables (and also
views and queries) to be combined. To select a table, click its name and then click the Add button.
Or, double-click the tables name. Either method adds the table to the graphical area of the Query
Design dialog.
When all necessary tables have been selected, click the Close button. Additional tables and
queries can be added later if required. However no query can be created without at least one table,
so a selection must be made at the beginning.
The selected field designation Loan.* has a special meaning. Here one click allows you to add all
fields from the underlying table to the query. When you use this field designation with the wildcard
* for all fields, the query becomes indistinguishable from the table.
In the above test, special attention should be paid to the first column of the query result. The active
record marker (green arrow) always appears on the left side of the table, here pointing to the first
record as the active record. While the first field of the first record in Figure 43 is highlighted, the
corresponding field in Figure 44 shows only a dashed border. The highlight indicates that this field
can be modified. The records, in other words, are editable. The dashed border indicates that this
field cannot be modified. Figure 43 also contains an extra line for the entry of a new record, with
the ID field already marked as <AutoField>. This also shows that new entries are possible.
A basic rule is that no new entries are possible if the primary key in the queried
Tip table is not included in the query.
The Loan_Date and Return_Date fields are given aliases. This does not cause them to be
renamed but only to appear under these names for the user of the query.
The table view above shows how the aliases replace the actual field names.
The Return_Date field is given not just an alias but also a search criterion, which will cause only
those records to be displayed for which the Return_Date field is empty. (Enter IS EMPTY in the
Criterion row of the Return_Date field.) This exclusion criterion will cause only those records to be
displayed that relate to media that have not yet been returned from loan.
Here the SQL formula created by our previous choices is revealed. To make it easier to read, some
line breaks have been included. Unfortunately the editor does not store these line breaks, so when
the query is called up again, it will appear as a single continuous line breaking at the window edge.
SELECT begins the selection criteria. AS specifies the field aliases to be used. FROM shows the
table which is to be used as the source of the query. WHERE gives the conditions for the query,
namely that the Return_date field is to be empty (IS NULL). ORDER BY defines the sort criteria,
namely ascending order (ASC ascending) for the two fields Reader_ID and Loan date. This sort
specification illustrates how the alias for the Loan_Date field can be used within the query itself.
When working in Design View Mode, use IS EMPTY to require a field be empty.
When working in SQL Mode, use IS NULL which is what SQL (Structured Query
Tip Language) requires.
When you want to sort by descending order using SQL, use DESC instead of ASC.
So far the Media_ID and Reader_ID fields are only visible as numeric fields. The readers names
are unclear. To show these in a query, the Reader table must be included. For this purpose we
return to Design Mode. Then a new table can be added to the Design view.
If a link is absent, it can be created at this point by dragging the mouse from "Loan"."Reader_ID" to
"Reader"."ID".
Now fields from the Reader table can be entered into the tabular area. The fields are initially added
to the end of the query.
The position of the fields can be corrected in the tabular area of the editor using the mouse. So for
example, the First_name field has been dragged into position directly before the Loan_date field.
Now the names are visible. The Reader_ID has become superfluous. Also sorting by Surname and
First_name makes more sense than sorting by Reader_ID.
This query is no longer suitable for use as a query that allows new entries into the resulting table,
since it lacks the primary key for the added Reader table. Only if this primary key is built in, does
the query become editable again. In fact it then becomes completely editable so that the readers
names can also be altered. For this reason, making query results editable is a facility that should
be used with extreme caution, if necessary under the control of a form.
Having a query that you can edit can create problems. Editing data in the query also
Caution edits data in the underlying table and the records contained in the table. The data
may not have the same meaning. For example, change the name of the reader, and
you have also changed what books the reader has borrowed and returned.
If you have to edit data, do so in a form so you can see the effects of editing data.
If we now switch back to SQL View, we see that all fields are now shown in double quotes: "Table
_name"."Field_name". This is necessary so that the database knows from which table the
previously selected fields come from. After all, fields in different tables can easily have the same
field names. In the above table structure this is particularly true of the ID field.
The following query works without putting table names in front of the field names:
SELECT "ID", "Number", "Price" FROM "Stock", "Dispatch"
WHERE "Dispatch"."stockID" = "Stock"."ID"
Note Here the ID is taken from the table which comes first in the FROM definition. The
table definition in the WHERE Formula is also superfluous, because stockID only
occurs once (in the Dispatch table) and ID was clearly taken from the Stock table
(from the position of the table in the query).
If a field in the query has an alias, it can be referred to for example in sorting by this alias
without a table name being given. Sorting is carried out in the graphical user interface according to
the sequence of fields in the tabular view. If instead you want to sort first by "Loan date" and then
by "Loan"."Reader_ID", that can be done if:
The sequence of fields in the table area of the graphical user interface is changed (drag
and drop "Loan date" to the left of "Loan"."Reader_ID", or
An additional field is added, set to be invisible, just for sorting (however, the editor will
register this only temporarily if no alias was defined for it) [add another "Loan date" field just
before "Loan"."Reader_ID" or add another "Loan"."Reader_ID" field just after "Loan date"],
or
The text for the ORDER BY command in the SQL editor is altered correspondingly (ORDER
BY "Loan date", "Loan"."Reader_ID").
Specifying the sort order may not be completely error-free, depending on the LibreOffice version.
From version 3.5.3, sorting from the SQL view is correctly registered and displayed in the graphical
user interface, including the fields that are used in the query but are not visible in the query output.
(These fields do not have a check in the Visible row.)
A query may require a field that is not part of the query output. In the graphic in the
next section, Return_Date is an example. This query is searching for records that do
Tip not contain a return date. This field provides a criterion for the query but no useful
visible data.
The result of the query shows that Reader_ID '0' has a total of 3 media on loan. If the Count
function had been assigned to the Return_Date instead of the ID, every Reader_ID would have '0'
media on loan, since Return_date is predefined as NULL.
The corresponding Formula in SQL code is shown above.
Altogether the graphical user interface provides the following functions, which
correspond to functions in the underlying HSQLDB.
For an explanation of the functions, see Query enhancement using SQL Mode
on page 134.
A somewhat free translation would be: The following expression contains no aggregate function or
grouping.
When using Design View Mode, a field is only visible if the Visible row contains a
Tip check mark for the field. When using SQL Mode, a field is only visible when it follows
the keyword, SELECT.
When a field is not associated with a function, the number of rows in the query
output is determined by the search conditions. When a field is associated with a
function, the number of rows in the query output is determined by whether there is
any grouping or not. If there is no grouping, there is only one row in the query
output. If there is grouping, the number of rows matches the number of distinct
values that the grouping field has. So, all of the visible fields must either be
associated with a function or not be associated with a function to prevent this
Note conflict in the query output.
After this, the complete query is listed in the error message, but unfortunately
without the offending field being named specifically. In this case the field
Return_Date has been added as a displayed field. This field has no function
associated with it and is not included in the grouping statement either.
The information provided by using the More button is not very illuminating for the
normal database user. It just displays the SQL error code.
To correct the error, remove the check mark in the Visible row for the Return_Date field. Its search
condition (Criterion) is applied when the query is run, but it is not visible in the query output.
Using the GUI, basic calculations and additional functions can be used.
Only for people who use a comma for their decimal separator:
If you wish to enter numbers with decimal places using the graphical user interface,
Caution you must ensure that a decimal point rather than a comma is used to separate the
decimal places within the final SQL statement. Commas are used as field
separators, so new query fields are created for the decimal part.
An entry with a comma in the SQL view always leads to a further field containing the
numerical value of the decimal part.
The query now yields for each medium still on loan the fines that have accrued, based on the recall
notices issued and the additional multiplication field. The following query structure will also be
useful for calculating the fines due from individual users.
The "Loan"."ID" and "Loan"."Media_ID" fields have been removed. They were used in the previous
query to create by grouping a separate record for each medium. Now we will be grouping only by
the reader. The result of the query looks like this:
The simple query for the Title field from the Media table shows the test entries for this table, 9
records in all. But if you enter Subtitle into the query table, the record content of the Media table is
reduced to only 2 Titles. Only for these two Titles are there also Subtitles in the table. For all the
other Titles, no subtitles exist. This corresponds to the join condition that only those records for
which the Media_ID field in the Subtitle table is equal to the ID field in the Media table should be
shown. All other records are excluded.
By default, relationships are set as Inner Joins. The window provides information on the way this
type of join works in practice.
The two previously selected tables are listed as Tables Involved. They are not selectable here. The
relevant fields from the two tables are read from the table definitions. If there is no relationship
specified in the table definition, one can be created at this point for the query. However, if you have
planned your database in an orderly manner using HSQLDB, there should be no need to alter
these fields.
The most important setting is the Join option. Here relationships can be so chosen that all records
from the Subtitle table are selected, but only those records from Media which have a subtitle
entered in the Subtitle table.
Or you can choose the opposite: that in any case all records from the table Media are displayed,
regardless of whether they have a subtitle.
The Natural option specifies that the linked fields in the tables are treated as equal. You can also
avoid having to use this setting by defining your relationships properly at the very start of planning
your database.
For the type Right join, the description shows that all records from the Media table will be displayed
(Subtitle RIGHT JOIN Media). As there is no Subtitle that lacks a title in Media but there are
certainly Titles in Media that lack a Subtitle, this is the right choice.
Direct use of SQL commands is also accessible using the graphical user interface, as the above
figure shows. Click the icon highlighted (Run SQL command directly) to turn the Design View
Off/On icon off. Now when you click the Run icon, the query runs the SQL commands directly.
Here is an example of the extensive possibilities available for posing questions to the database
and specifying the type of result required:
SELECT [{LIMIT <offset> <limit> | TOP <limit>}][ALL | DISTINCT]
{ <Select-Formulation> | "Table_name".* | * } [, ...]
[INTO [CACHED | TEMP | TEXT] "new_Table"]
FROM "Table_list"
[WHERE SQL-Expression]
[GROUP BY SQL-Expression [, ...]]
[HAVING SQL-Expression]
[{ UNION [ALL | DISTINCT] | {MINUS [DISTINCT] | EXCEPT [DISTINCT] } |
INTERSECT [DISTINCT] } Query statement]
[ORDER BY Order-Expression [, ...]]
[LIMIT <limit> [OFFSET <offset>]];
[ALL | DISTINCT]
SELECT ALL is the default. All records are displayed that fulfill the search conditions. Example:
SELECT ALL "Name" FROM "Table_name" yields all names; if "Peter" occurs three times
and "Egon" four times in the table, these names are displayed three and four times
respectively. SELECT DISTINCT "Name" FROM "Table_name" suppresses query results
<Select-Formulation>
{ Expression | COUNT(*) |
{ COUNT | MIN | MAX | SUM | AVG | SOME | EVERY | VAR_POP | VAR_SAMP |
STDDEV_POP | STDDEV_SAMP }
([ALL | DISTINCT]] Expression) } [[AS] "display_name"]
Field names, calculations, record totals are all possible entries. In addition different functions
are available for the field shown. Except for COUNT(*) (which counts all the records) none of
these functions access NULL fields.
COUNT | MIN | MAX | SUM | AVG | SOME | EVERY | VAR_POP | VAR_SAMP |
STDDEV_POP | STDDEV_SAMP
COUNT("Name") counts all entries for the field Name.
MIN("Name") shows the first name alphabetically. The result of this function is always
formatted just as it occurs in the field. Text is shown as text, integers as integers, decimals as
decimals and so on.
MAX("Name") shows the last name alphabetically.
SUM("Number") can add only the values in numerical fields. The function fails for date fields.
AVG("Number") shows the average of the contents of a column. This function too is limited to
numerical fields.
SOME("Field_Name"), EVERY("Field_Name"): Fields used with these functions must
have the Yes/No [BOOLEAN] field type (contains only 0 or 1). Furthermore, they produce a
summary of the field content to which they are applied.
SOME returns TRUE (or 1) if at least one entry for the field is 1, and it returns FALSE (or 0) only
if all the entries are 0. EVERY returns 1 only if every entry for the field is 1, and returns FALSE
if at least one entry is 0.
The Boolean field type is Yes/No[BOOLEAN]. However, this field contains only 0 or
1. In query search conditions, use either TRUE, 1, FALSE or 0. For the Yes
Tip condition, you can use either TRUE or 1. For the No condition, use either FALSE or
0. If you try to use either Yes or No instead, you get an error message. Then you
will have to correct your error.
Example:
SELECT "Class", EVERY("Swimmer") FROM "Table1" GROUP BY "Class";
Class contains the names of the swimming class. Swimmer is a Boolean field describing
whether a student can swim or not (1 or 0). Students contains the names of the students.
Table1 contains these fields: its primary key, Class, Swimmer, and Students. Only Class and
Swimmer are needed for this query.
Because the query is grouped by the entries of the field Class, EVERY will return a value for
the field, Swimmer, for each class. When every person in a swimming class can swim, EVERY
returns TRUE. Otherwise EVERY returns FALSE because at least one student of the class can
not swim. Since the output for the Swimmer field is a checkbox, A check mark indicates TRUE,
and no check mark indicates FALSE.
VAR_POP | VAR_SAMP | STDDEV_POP | STDDEV_SAMP are statistical functions and affect
only integer and decimal fields.
All these functions return 0, if the values within the group are all equal.
"Table_name".* | * [, ...]
Each field to be displayed is given with its field names, separated by commas. If fields from
several tables are entered into the query, a combination of the field name with the table name
is necessary: "Table_name"."Field_name".
Instead of a detailed list of all the fields of a table, its total content can be displayed. For this
you use the symbol "*". It is then unnecessary to use the table name, if the results will only
apply to the one table. However, if the query includes all of the fields of one table and at least
one field from a second table, use:
"Table_name 1".*, Table_name 2.Field_name.
FROM <Table_list>
"Table_name 1" [{CROSS | INNER | LEFT OUTER | RIGHT OUTER} JOIN
"Table_name 2" ON Expression] [, ...]
The tables which are to be jointly searched are usually in a list separated by commas. The
relationship of the tables to one another is then additionally defined by the keyword WHERE.
If the tables are bound through a JOIN rather than a comma, their relationship is defined by the
term beginning with ON which occurs directly after the second table.
A simple JOIN has the effect that only those records are displayed for which the conditions in
both the tables apply.
Example:
SELECT "Table1"."Name", "Table2"."Class" FROM "Table1", "Table2" WHERE
"Table1"."ClassID" = "Table2"."ID"
is equivalent to:
SELECT "Table1"."Name", "Table2"."Class" FROM "Table1" JOIN "Table2"
ON "Table1"."ClassID" = "Table2"."ID"
Here the names and the corresponding classes are displayed. If a name has no class listed for
it, that name is not included in the display. If a class has no names, it is also not displayed. The
addition of INNER does not alter this.
SELECT "Table1"."Name", "Table2"."Class" FROM "Table1" LEFT JOIN
"Table2" ON "Table1"."ClassID" = "Table2"."ID"
If LEFT is added, all "Names" from "Table1" are displayed even if they have no "Class". If, on
the contrary, RIGHT is added, all Classes are displayed even if they have no names in them.
Addition of OUTER need not be shown here. (Right Outer Join is the same thing as Right Join;
Left Outer Join is the same thing as Left Join.)
[WHERE SQL-Expression]
The standard introduction for conditions to request a more accurate filtering of the data. Here
too the relationships between tables are usually defined if they are not linked together with
JOIN.
[GROUP BY SQL-Expression [, ]]
Use this when you want to divide the query data into groups before applying the functions to
each one of the groups separately. The division is based upon the values of the field or fields
contained in the GROUP BY term.
Example:
SELECT "Name", SUM("Input"-"Output") AS "Balance" FROM "Table1" GROUP
BY "Name";
Records with the same name are summed. In the query result, the sum of Input Output
is given for each person. This field is to be called Balance. Each row of the query result
contains a value from the Name table and the calculated balance for that specific value.
When fields are processed using a particular function (for example COUNT, SUM
Tip ), all fields that are not processed with a function but should be displayed are
grouped together using GROUP BY.
[HAVING SQL-Expression]
The HAVING formula closely resembles the WHERE formula. The difference is that the WHERE
formula applies to the values of selected fields in the query. The HAVING formula applies to
selected calculated values. Specifically, the WHERE formula can not use an aggregate function
as part of a search condition; the HAVING formula does.
The HAVING formula serves two purposes as shown in the two examples below. In the first
one, the search condition requires that the minimum run-time be less than 40 minutes. In the
second example, the search condition requires that an individual's balance must be positive.
The query results for the first one lists the names of people whose run-time has been less than
40 minutes at least one time and the minimum run-time. People whose run-times have all be
greater than 40 minutes are not listed.
The query results for the second one lists the names of people who have a total greater output
than input and their balance. People whose balance is 0 or less are not listed.
[SQL Expression]
SQL expressions are combined according to the following scheme:
[NOT] condition [{ OR | AND } condition]
Example:
SELECT * FROM "Table_name" WHERE NOT "Return_date" IS NULL AND
"ReaderID" = 2;
The records read from the table are those for which a Return_date has been entered and the
"ReaderID" is 2. In practice this means that all media loaned to a specific person and returned
can be retrieved. The conditions are only linked with AND. The NOT refers only to the first
condition.
SELECT * FROM "Table_name" WHERE NOT ("Return_date" IS NULL AND
"ReaderID" = 2);
Parentheses around the condition, with NOT outside them shows only those records that do not
fulfill the condition in parentheses completely. This would cover all records, except for those for
"ReaderID" number 2, which have not yet been returned.
Example:
SELECT "Name" FROM "Table1" WHERE EXISTS (SELECT "First_name" FROM
"Table2" WHERE "Table2"."First_name" = "Table1"."Name")
The names from Table1 are displayed for which first names are given in Table2.
| Value BETWEEN Value AND Value
BETWEEN value1 AND value2 yields all values from value1 up to and including value2. If
the values are letters, an alphabetic sort is used in which lower-case letters have the same
value as the corresponding upper-case ones.
[ORDER BY Ordering-Expression [, ]]
The expression can be a field name, a column number (beginning with 1 from the left), an alias
(formulated with AS for example) or a composite value expression (see [SQL Expression]:
values). The sort order is usually ascending (ASC). If you want a descending sort you must
specify DESC explicitly.
SELECT "First_name", "Surname" AS "Name" FROM "Table1" ORDER BY "Surname";
is identical to
SELECT "First_name", "Surname" AS "Name" FROM "Table1" ORDER BY 2;
is identical to
SELECT "First_name", "Surname" AS "Name" FROM "Table1" ORDER BY "Name";
What is hidden behind the numbers cannot be made visible by using a list box, as the foreign key
is input directly using the barcode. In the same way, it is impossible to use a list box next to the
item to show at least the unit price.
Here a query can help.
SELECT "Checkout"."Receipt_ID", "Checkout"."Total", "Stock"."Item",
"Stock"."Unit_Price", "Checkout"."Total"*"Stock"."Unit_price" AS
"Total_Price" FROM "Checkout", "Item" WHERE "Stock"."ID" =
"Checkout"."Item_ID";
Now at least after the information has been entered, we know how much needs to be paid for
3 * Item'17'. In addition only the information relevant to the corresponding Receipt_ID needs to be
filtered through the form. What is still lacking is what the customer needs to pay overall.
SELECT "Checkout"."Receipt_ID",
SUM("Checkout"."Total"*"Stock"."Unit_price") AS "Sum" FROM "Checkout",
"Stock" WHERE "Stock"."ID" = "Checkout"."Item_ID" GROUP BY
"Checkout"."Receipt_ID";
Design the form to show one record of the query at a time. Since the query is grouped by
Receipt_ID, the form shows information about one customer at a time.
Subqueries
Subqueries built into fields can always only return one record. The field can also return only one
value.
SELECT "ID", "Income", "Expenditure", ( SELECT SUM( "Income" ) -
SUM( "Expenditure" ) FROM "Checkout") AS "Balance" FROM "Checkout";
This query allows data entry (primary key included). The subquery yields precisely one value,
namely the total balance. This allows the balance at the till to be read after each entry. This is still
not comparable with the supermarket checkout form described in Queries as a basis for additional
information in forms. Naturally it lacks the individual calculations of Total * Unit_price, but also the
presence of the receipt number. Only the total sum is given. At least the receipt number can be
included by using a query parameter:
SELECT "ID", "Income", "Expenditure", ( SELECT SUM( "Income" ) -
SUM( "Expenditure" ) FROM "Checkout" WHERE "Receipt_ID" =
:Receipt_Number) AS "Balance" FROM "Checkout" WHERE "Receipt_ID" =
:Receipt_Number;
Subforms based on queries are not automatically updated on the basis of their
Note parameters. It is more appropriate to pass on the parameter directly from the main
form.
Correlated subqueries
Using a still more refined query, an editable query allows one to even carry the running balance for
the till:
SELECT "ID", "Income", "Expenditure", ( SELECT SUM( "Income" ) -
SUM( "Expenditure" ) FROM "Checkout" WHERE "ID" <= "a"."ID" ) AS
"Balance" FROM "Checkout" AS "a" ORDER BY "ID" ASC
The Checkout table is the same as Table "a". "a" however yields only the relationship to the current
values in this record. In this way the current value of ID from the outer query can be evaluated
within the subquery. Thus, depending on the ID, the previous balance at the corresponding time is
determined, if you start from the fact that the ID, which is an autovalue, increments by itself.
The Design View Mode cannot find the field contained in the inner query "Loan_ID", which governs
the relationship between the inner and outer queries.
When the query is run in SQL Mode, the corresponding content from the subquery is reproduced
without error. Therefore you do not have to use direct SQL mode in this case.
The outer query used the results of the inner query to produce the final results. These are a list
of the "Loan_ID" values that should be locked and why. If you want to further limit the final results,
use the sort and filter functions of the graphical user interface.
The normal linking of tables, after all tables have been counted, follows the keyword
WHERE.
If there is a LEFT JOIN or a RIGHT JOIN, the assignment is defined directly
Note
after the two table names using ON. The sequence is therefore always
Table1 LEFT JOIN Table2 ON Table1.Field1 = Table2.Field1
LEFT JOIN Table3 ON Table2.Field1 = Table3.Field1 ...
Two titles of the Media table do not yet have an author entry or a Subtitle. At the same time one
Title has a total of three Authors. If the Author Table is linked without a LEFT JOIN, the two
Media without an Author will not be shown. But as one medium has three authors instead of one,
the total number of records displayed will still be 15.
Only by using LEFT JOIN will the query be instructed to use the Media table to determine which
records to show. Now the records without Subtitle or Author appear again, giving a total of 17
records.
Using appropriate Joins usually increases the amount of data displayed. But this enlarged data set
can easily be scanned, since authors and subtitles are displayed in addition to the titles. In the
example database, all of the media-dependent tables can be accessed.
A query created using Create Query in SQL View has the disadvantage that it
cannot be sorted or filtered using the GUI. There are therefore limits to its use.
Tip A View on the other hand can be managed in Base just like a normal table with the
exception that no change in the data is possible. Here therefore even in direct SQL-
commands all possibilities for sorting and filtering are available.
Views are a solution for many queries, if you want to get any results at all. If for example a
Subselect is to be used on the results of a query, create a View that gives you these results. Then
use the subselect on the View. Corresponding examples are to be found in Chapter 8, Database
Tasks.
Creating a View from a query is rather easy and straight forward.
1) Click the Table object in the Database section.
2) Click Create View.
3) Close the Add Table dialog.
When using the Report Builder, you should frequently save your work during editing.
In addition to saving within the Report Builder itself after each significant step, you
Caution should also save the whole database.
Depending on the version of LibreOffice that you are using, the Report Builder can
sometimes crash during editing.
The functionality of completed reports is not affected even if they were created under
another version, in which the problem does not occur.