Jennifer Widom:
~~~~~~~~~~~~~~
There is a stream query repository at
http://www-db.cs.berkeley.edu/stream/sqr. There are about 40
queries in four areas: auctions, squirrels, networks and
bird nests.

Jennifer then presented the four "challenge" queries in
English, and in CQL. Here are the queries in English. The
queries in CQL can be found at above site.

///////////////begin queries/////////////////////////

Query 1: windowing and aggregation, self-join or subquery

  Stream: Packets(pID, length, time)
    // time may be explicit or implicit

Generate the stream of packets whose length is greater than twice the
average packet length over the last 1 hour.
--------
Query 2: windowing, substreams, stored relation

  Stream: SquirrelSensors(sID, region, time)
    // time may be explicit or implicit
  Relation: SquirrelType(sID, type)

Create an alert when more than 20 type 'A' squirrels are in Jennifer's
backyard.  (I've purposely underspecified whether alert() occurs once
when more than 20 squirrels are first detected, or at every time step
with more than 20 squirrels, or something else.  Perhaps not
important, or perhaps a point worth discussing.)
----------
Query 3: stream self-joins

  SquirrelChirps(sID, loc, time)
    // time may be explicit or implicit

Stream an event each time 3 different squirrels within a pairwise
distance of 5 meters from each other chirp within 10 seconds of each
other.
---------
Super-Bonus Query 4: windowing, stream transformations

  Packets(pID, src, dest, length, time)
    // time may be explicit or implicit

Create a log of flow information from a stream of packets.  A flow
(simple definition) from a source S to a destination D ends when no
packet from S to D is seen for at least 2 minutes after the last
packet from S to D. The next packet from S to D starts a new flow.
The flow log contains the source, destination, count of packets, and
total length of packets for each flow.

//////////////////end queries////////////////////////


Dave Meier has a question about whether streams are ordered
by time, and also should the language be aware of this
ordering. Dennis Shasha thinks that language should be aware
of this ordering.

Sirish said maybe timestamp should be an explicit attribute
that can be referred to in the Where clause. Jennifer
thinks it is unpleasant syntactically and also, it could
lead to harder implementation issues: a less restricted
language is harder to build optimizers for. However, Alex
Buchmann pointed out that without explicit timestamps it is
hard to express stuff like A came before B came before C
(this is the kind of query that SQL-TS should be able to
support).

Coming back to CQL: it can only express windows that end at
NOW. That is whether the window is sliding or landmark, the
later end of the window is always at the latest timestamp.
Is that the right decision? This does not allow queries that
correlates current with historical data (Join the latest
hour of data on Vehicle Identification Number with the same
hour of data for every day in the last year).

Jennifer pointed out one last thing: CQL uses
application-defined notion of time.

Dennis Shasha:
~~~~~~~~~~~~~
The language defines queries over Arrable type(== ordered
data set).
Their language has a clause [assuming order] that tells you
if the tuples are entering the system in timestamp order. 

There is a difference (from CQL and StreaQuel) in the way
they solve the squirrels in Jennifer's backyard problem:
they have moving counts of squirrels with latest readings
in Jennifer's backyard, and they check when the moving
count > 20.

*Dave Meier comment*: All the languages proposed so far have
the number of streams in Query 3 (the chirping one) = 3 =
number of squirrels in the query. Is there a way to fix the
languages so the number of streams in the From Clause is
independent of the number of squirrels involved in the
query?
 
Stan Zdonik:
~~~~~~~~~~~
Aurora does not have an SQL like language. Instead it has a GUI.

Why did they make this choice? They claim that it is hard to
optimize common subexpressions. Instead, let the users do it
for you.

In AuroraGUI, everything is a stream, there are no relations
(unlike CQL)

What goes in their GUIs? Boxes. What are the boxes?
Regular Operators: Filter, map, Union, Join, aggregate
New Operators: WSort, resample (Hmm...WSORT in
CQL/StreaQuel??)

Finally: Add windows
Like StreaQuel and CQL They also have the notion of window
size and window hop size (range and slide in CQL speak).

The timestamp of the derived tuple is the minimum of the
timestamps of the inputs but it can be changed by the
application.

Jennifer had a question: how do you keep things going even
when you don't have new tuples arriving. Answer: use a
heartbeat.

* Aurora cares about QoS, and believes it should be part of
the language.
For blocking, they allow you to specify a timeout. 

* Aurora cares about how to deal with lost and out-of-order
tuples. They believe this has to do about knowing when to
close out windows (see Jennifer's comment above).
For disorder, they allow you to specify a slack in the query.


Mehul wants to know if the visual query is actually easier
to compose for general users than for example, using
queries. Hari says it is. Some databasy person in the room
said the database industry was proof that it was not.

Mike Stonebraker thinks that the reason for workflow is
there is lots of signal processing in the front-end. And he
believes the Aurora GUI is at a higher level than say CQL
(Jennifer thinks it is the other way - language higher level
than boxes and arrows).

Carlo Zaniolo:
~~~~~~~~~~~~~~
UDAs are the key. You have init, next and close methods
within the UDA, and you can write arbitrary code for the
three stages. The language is Turing complete. They made a
conscious decision not to join streams - that is because
they cannot join windows.
Interesting question raised by someone: can we view XML
documents as steams?

Franklin:
~~~~~~~~
He showed a flavor of StreaQuel - the language for
TelegraphCQ. Language allows combining different notions of
time, accesses both historical data and newly arriving data.
(TelegraphCQ claims you need a rich model of windows over
both historical and newly arriving data, especially
since they care about archiving streams and querying the
archive)
The language essentially has a for-loop construct to declare
a set of windows of data over which the query is to be
executed. The construct captures all the kinds of windows on
Mike's slide (personal note: the for-loop is used to
*declare* a set of windows, the lanaguage is *not*
procedural).

Mike Stonebraker's comment: There is a tradeoff between
simplicity and expressiveness.