InfluxData Platform Future and Vision

IFQL and the future of
InﬂuxData
Paul Dix

Founder & CTO

@pauldix

paul@inﬂuxdata.com

Evolution of a query
language…

Vaguely Familiar
select percentile(90, value) from cpu
where time > now() - 1d and
“host” = ‘serverA’
group by time(10m)

0.8 -> 0.9
Breaking API change, addition of tags

Difﬁcult to improve & change

Kapacitor’s TICKscript
stream
|from()
.database('telegraf')
.measurement('cpu')
.groupBy(*)
|window()
.period(5m)
.every(5m)
.align()
|mean('usage_idle')
.as('usage_idle')
|influxDBOut()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('mean_cpu_idle')
.precision('s')

Kapacitor is Background
Processing
Stream or Batch

InﬂuxDB is batch interactive

IFQL and uniﬁed API
Building towards 2.0

Project Goals
Photo by Glen Carrie on Unsplash

Iterate & deploy
more frequently

{
"operations": [
{
"id": "select0",
"kind": "select",
"spec": {
"database": "foo",
"hosts": null
}
},
{
"id": "where1",
"kind": "where",
"spec": {
"expression": {
"root": {
"type": "binary",
"operator": "and",
"left": {
"type": "binary",
"operator": "and",
"left": {
"type": "binary",
"operator": "==",
"left": {
"type": "reference",
"name": "_measurement",
"kind": "tag"
},
"right": {
"type": "stringLiteral",
"value": "cpu"
}
},
Query represented as DAG in JSON

UI for Many
because no one wants to actually write a query

Flexible
add to language easily

Testable
new functions and user queries

Easy to Contribute
inspiration from Telegraf

Code Sharing & Reuse
no code > code

// get the last value written for anything from a given host
from(db:"mydb")
|> filter(fn: (r) => r["host"] == "server0")
|> last()

// get the last value written for anything from a given host
from(db:"mydb")
|> last()
Result: _result
Block: keys: [_field, _measurement, host, region] bounds: [1677-09-21T00:12:43.145224192Z, 2018-02-12T15:53:04.361902250Z)
_time _field _measurement host region _value
------------------------------ --------------- --------------- --------------- --------------- ----------------------
2018-02-12T15:53:00.000000000Z usage_system cpu server0 east 60.6284
------------------------------ --------------- --------------- --------------- --------------- ----------------------
2018-02-12T15:53:00.000000000Z usage_user cpu server0 east 39.3716

// get the last minute of data from a specific
// measurement & field & host
from(db:"mydb")
|> filter(fn: (r) =>
r["host"] == "server0" and
r["_measurement"] == "cpu" and
r["_field"] == "usage_user")
|> range(start:-1m)

// get the last minute of data from a specific
// measurement & field & host
from(db:"mydb")
|> range(start:-1m)
Result: _result
------------------------------ --------------- --------------- --------------- --------------- ----------------------

// get the mean in 10m intervals of last hour
from(db:"mydb")
r["_measurement"] == "cpu")
|> range(start:-1h)
|> window(every:15m)
|> mean()
Result: _result
------------------------------ --------------- --------------- --------------- --------------- ----------------------
------------------------------ --------------- --------------- --------------- --------------- ----------------------

Functional
// get the last 1 hour written for anything from a given host
from(db:"mydb")
|> range(start:-1m)

Functional
from(db:"mydb")
|> range(start:-1m)
built in functions

Functional
from(db:"mydb")
|> range(start:-1m)
anonymous functions

Functional
from(db:"mydb")
|> range(start:-1m)
pipe forward operator

Named Parameters
from(db:"mydb")
|> range(start:-1m)
named parameters only!

Functions have inputs &
outputs

Inputs
from(db:"mydb")
|> range(start:-1m)
no input

Outputs
from(db:"mydb")
|> range(start:-1m)
output is entire db

Outputs
from(db:"mydb")
|> range(start:-1m)
pipe that output to ﬁlter

Filter function input
from(db:"mydb")
|> range(start:-1m)
anonymous ﬁlter function
input is a single record
{“_measurement”:”cpu”, ”_ﬁeld”:”usage_user", “host":"server0", “region":"west", "_value":23.2}

Filter function input
from(db:"mydb")
|> range(start:-1m)
A record looks like a ﬂat object
or row in a table

Record Properties
from(db:"mydb")
|> range(start:-1m)
tag key

Record Properties
from(db:"mydb")
|> filter(fn: (r) => r.host == "server0")
|> range(start:-1m)
same as before

Special Properties
starts with _
reserved for system
attributes
from(db:"mydb")
|> range(start:-1m)
|> max()

Special Properties
works other way
from(db:"mydb")
r._measurement == "cpu" and
r._field == "usage_user")
|> range(start:-1m)
|> max()

Special Properties
_measurement and _ﬁeld
present for all InﬂuxDB data
from(db:"mydb")
|> range(start:-1m)
|> max()

Special Properties
_value exists in all series
from(db:"mydb")
r["_field"] == “usage_user" and
r[“_value"] > 50.0)
|> range(start:-1m)
|> max()

Filter function output
from(db:"mydb")
|> range(start:-1m)
ﬁlter function output
is a boolean to determine if record is in set

Filter Operators
from(db:"mydb")
|> range(start:-1m)
!=
=~
!~
in

Filter Boolean Logic
from(db:"mydb")
|> filter(fn: (r) => (r[“host"] == “server0" or
r[“host"] == “server1") and
r[“_measurement”] == “cpu")
|> range(start:-1m)
parens for precedence

Function with explicit return
from(db:"mydb")
|> filter(fn: (r) => {return r[“host"] == “server0"})
|> range(start:-1m)
long hand function deﬁnition

Outputs
from(db:"mydb")
|> range(start:-1m)
ﬁlter output
is set of data matching ﬁlter function

Outputs
from(db:"mydb")
|> range(start:-1m)
piped to range
which further ﬁlters by a time range

Outputs
from(db:"mydb")
|> range(start:-1m)
range output is the ﬁnal query result

Function Isolation
(but the planner may do otherwise)

Does order matter?
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()

Does order matter?
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()
range and ﬁlter switched

Does order matter?
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()
results the same
Result: _result
Block: keys: [_ﬁeld, _measurement, host, region] bounds: [2018-02-12T17:52:02.322301856Z, 2018-02-12T17:53:02.322301856Z)
_time _ﬁeld _measurement host region _value
------------------------------ --------------- --------------- --------------- --------------- ----------------------

Planner Optimizes
maintains query semantics

Optimization
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()

Optimization
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()
this is more eﬃcient

Optimization
from(db:"mydb")
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
|> max()
query DAG diﬀerent
plan DAG same as one on left

Optimization
from(db:"mydb")
r["_field"] == “usage_user”
r[“_value"] > 22.0)
|> range(start:-1m)
|> max()
from(db:"mydb")
|> range(start:-1m)
r["_field"] == “usage_user"
r[“_value"] > 22.0)
|> max()
this does a full table scan

Variables & Closures
db = "mydb"
measurement = "cpu"
from(db:db)
|> filter(fn: (r) => r._measurement == measurement and
r.host == "server0")
|> last()

Variables & Closures
db = "mydb"
measurement = "cpu"
from(db:db)
|> filter(fn: (r) => r._measurement == measurement and
r.host == "server0")
|> last()
anonymous ﬁlter function
closure over surrounding context

User Deﬁned Functions
db = "mydb"
measurement = “cpu"
fn = (r) => r._measurement == measurement and
r.host == "server0"
from(db:db)
|> filter(fn: fn)
|> last()
assign function to variable fn

from(db:"mydb")
r["_field"] == "usage_user" and
r["host"] == "server0")
|> range(start:-1h)

from(db:"mydb")
r["_field"] == "usage_user" and
r["host"] == "server0")
|> range(start:-1h)
get rid of some common boilerplate?

select = (db, m, f) => {
return from(db:db)
|> filter(fn: (r) => r._measurement == m and r._field == f)
}

return from(db:db)
}
select(db: "mydb", m: "cpu", f: "usage_user")
|> range(start:-1h)

return from(db:db)
}
select(m: "cpu", f: "usage_user")
|> range(start:-1h)
throws error
error calling function "select": missing required keyword argument "db"

Default Arguments
select = (db="mydb", m, f) => {
return from(db:db)
}
select(m: "cpu", f: "usage_user")
|> range(start:-1h)

Multiple Results to Client
data = from(db:"mydb")
|> filter(fn: (r) r._measurement == "cpu" and
|> window(every: 5m)
data |> min() |> yield(name: "min")
data |> max() |> yield(name: "max")
data |> mean() |> yield(name: "mean")
Result: min
Block: keys: [_ﬁeld, _measurement, host, region] bounds: [2018-02-12T16:55:55.487457216Z, 2018-02-12T20:55:55.487457216Z)
_time _ﬁeld _measurement host region _value
------------------------------ --------------- --------------- --------------- --------------- ----------------------
name

User Deﬁned Pipe Forwardable Functions
mf = (m, f, table=<-) => {
return table
|> filter(fn: (r) => r._measurement == m and
r._field == f)
}
from(db:"mydb")
|> mf(m: "cpu", f: "usage_user")
|> last()

mf = (m, f, table=<-) => {
return table
r._field == f)
}
from(db:"mydb")
|> last()
takes a table
from a pipe forward
by default

mf = (m, f, table=<-) => {
return table
r._field == f)
}
from(db:"mydb")
|> last()
calling it, then chaining

Passing as Argument
mf = (m, f, table=<-) => {
return table
r._field == f)
}
sending the from as argument
mf(m: "cpu", f: "usage_user", table: from(db:"mydb"))
|> last()

Passing as Argument
mf = (m, f, table=<-) =>
filter(fn: (r) => r._measurement == m and r._field == f,
table: table)
rewrite the function to use argument
mf(m: "cpu", f: "usage_user", table: from(db:"mydb"))
|> last()

Any pipe forward function can use arguments
min(table:
range(start: -1h, table:
filter(fn: (r) => r.host == "server0", table:
from(db: "mydb"))))

Easy to add Functions
like plugins in Telegraf

package functions
import (
"fmt"
"github.com/influxdata/ifql/ifql"
"github.com/influxdata/ifql/query"
"github.com/influxdata/ifql/query/execute"
"github.com/influxdata/ifql/query/plan"
)
const CountKind = "count"
type CountOpSpec struct {
}
func init() {
ifql.RegisterFunction(CountKind, createCountOpSpec)
query.RegisterOpSpec(CountKind, newCountOp)
plan.RegisterProcedureSpec(CountKind, newCountProcedure, CountKind)
execute.RegisterTransformation(CountKind, createCountTransformation)
}
func createCountOpSpec(args map[string]ifql.Value, ctx ifql.Context) (query.OperationSpec, error) {
if len(args) != 0 {
return nil, fmt.Errorf(`count function requires no arguments`)
}
return new(CountOpSpec), nil
}
func newCountOp() query.OperationSpec {
return new(CountOpSpec)
}
func (s *CountOpSpec) Kind() query.OperationKind {
return CountKind
}

type CountProcedureSpec struct {
}
func newCountProcedure(query.OperationSpec) (plan.ProcedureSpec, error) {
return new(CountProcedureSpec), nil
}
func (s *CountProcedureSpec) Kind() plan.ProcedureKind {
return CountKind
}
func (s *CountProcedureSpec) Copy() plan.ProcedureSpec {
return new(CountProcedureSpec)
}
func (s *CountProcedureSpec) PushDownRule() plan.PushDownRule {
return plan.PushDownRule{
Root: SelectKind,
Through: nil,
}
}
func (s *CountProcedureSpec) PushDown(root *plan.Procedure, dup func() *plan.Procedure) {
selectSpec := root.Spec.(*SelectProcedureSpec)
if selectSpec.AggregateSet {
root = dup()
selectSpec = root.Spec.(*SelectProcedureSpec)
selectSpec.AggregateSet = false
selectSpec.AggregateType = ""
return
}
selectSpec.AggregateSet = true
selectSpec.AggregateType = CountKind
}

type CountAgg struct {
count int64
}
func createCountTransformation(id execute.DatasetID, mode execute.AccumulationMode, spec plan.ProcedureSpec, ctx execute.Context
(execute.Transformation, execute.Dataset, error) {
t, d := execute.NewAggregateTransformationAndDataset(id, mode, ctx.Bounds(), new(CountAgg))
return t, d, nil
}
func (a *CountAgg) DoBool(vs []bool) {
a.count += int64(len(vs))
}
func (a *CountAgg) DoUInt(vs []uint64) {
}
func (a *CountAgg) DoInt(vs []int64) {
}
func (a *CountAgg) DoFloat(vs []float64) {
}
func (a *CountAgg) DoString(vs []string) {
}
func (a *CountAgg) Type() execute.DataType {
return execute.TInt
}
func (a *CountAgg) ValueInt() int64 {
return a.count
}

Deﬁnes parser, validation,
execution

Imports and Namespaces
from(db:"mydb")
// square the value
|> map(fn: (r) => r._value * r._value)
shortcut for this?

from(db:"mydb")
// square the value
|> map(fn: (r) => r._value * r._value)
square = (table=<-) {
table |> map(fn: (r) => r._value * r._value)
}

import "github.com/pauldix/ifqlmath"
from(db:"mydb")
|> ifqlmath.square()

import "github.com/pauldix/ifqlmath"
from(db:"mydb")
|> ifqlmath.square()
namespace

Math across measurements
foo = from(db: "mydb")
|> filter(fn: (r) => r._measurement == "foo")
bar = from(db: "mydb")
|> filter(fn: (r) => r._measurement == "bar")
join(
tables: {foo:foo, bar:bar},
fn: (t) => t.foo._value + t.bar._value)
|> yield(name: "foobar")

Having Query
from(db:"mydb")
|> filter(fn: (r) => r._measurement == "cpu")
|> range(start:-1h)
|> mean()
// this is the having part
|> filter(fn: (r) => r._value > 90)

Grouping
// group - average utilization across regions
from(db:"mydb")
|> filter(fn: (r) => r._measurement == "cpu" and
r._field == "usage_system")
|> group(by: ["region"])
|> mean()

Get Metadata
from(db:"mydb")
|> range(start: -48h, stop: -47h)
|> tagValues(key: "host")

Get Metadata
from(db:"mydb")
|> group(by: ["measurement"], keep: ["host"])
|> distinct(column: "host")

Get Metadata
tagValues = (table=<-) =>
table
|> group(by: ["measurement"], keep: ["host"])
|> distinct(column: "host")

Get Metadata
from(db:"mydb")
|> tagValues(key: “host")
|> count()

Functions Implemented as IFQL
// _sortLimit is a helper function, which sorts
// and limits a table.
_sortLimit = (n, desc, cols=["_value"], table=<-) =>
table
|> sort(cols:cols, desc:desc)
|> limit(n:n)
// top sorts a table by cols and keeps only the top n records.
top = (n, cols=["_value"], table=<-) =>
_sortLimit(table:table, n:n, cols:cols, desc:true)

API 2.0 Work
Lock down query request/response format

We’re contributing the Go
implementation!
https://github.com/inﬂuxdata/arrow

Finalize Language
(a few months or so)

Ship with Enterprise 1.6
(summertime)

Hack & workshop day
tomorrow!
Ask the registration desk today

Thank you!
Paul Dix

paul@inﬂuxdata.com

@pauldix

InfluxData Platform Future and Vision

More Related Content

What's hot (20)

Similar to InfluxData Platform Future and Vision (20)

More from InfluxData (20)

Recently uploaded (20)

InfluxData Platform Future and Vision