TelegraphCQ v2.1

Introduction

version 2.1

What's new

Better stability: Extensive testing. Lots of bugs have been squashed
Better performance: System has been load-tested, profiled and optimized to a large extent.
New window semantics: You can now specify a sliding, hopping or jumping time window over streams using the window clause: [RANGE BY '' SLIDE BY '' START AT '']. In addition, a special aggregate, wtime(*), returns the latest timestamp in the current time window.
SQL:1999-like WITH syntax and recursive queries.
Heartbeats: We've added support for heartbeats using punctuation tuples.
Summarize on drop (Data Triage): Tuples dropped on stream overload can now be summarized.
Flux: Infrastructure for a dataflow operator that encapsulates adaptive state partioning
and data flow routing.

Installation

Pre-requisites

Platforms

Fedora

Additional libraries

${HOME}/tcq-tools

configure-opts

mm

tar

path-to-mm/INSTALL

$ ./configure --prefix=${HOME}/tcq-tools
$ make			
$ make test 
$ make install

Judy

Judy

version

tar.gz

path-to-judy/INSTALL

$ ./configure --prefix=${HOME}/tcq-tools
$ make 
$ make check 
$ make install

Building TelegraphCQ

configure

configure,

GNU Build system

configure-tcq

configure

configure-tcq

${HOME}/tcq-tools

configure-tcq

PGRSHOME

make

$ ./configure-tcq 
$ make 
$ make install

Getting Started

postmaster

Create a new database cluster

$ initdb --no-locale -D ${PGDATA}

initdb

-D

PGDATA

--no-locale

initdb

Create a new database

$ pg_ctl start -D ${PGDATA} -l <logfile-name>
$ createdb ${DBNAME} 
$ pg_ctl stop -D ${PGDATA} -l <logfile-name>

createdb

pg_ctl start|stop|restart

logfile-name

Start PostgreSQL in TelegraphCQ mode

$ pg_ctl start -D ${PGDATA}  -l logfile-name -o " -t database-name -u \
user-name -G -i -Q 64  -d 1"

pg_ctl

-o

postmaster

-t

-d

-i

-G

eddy

-Q

Start the client

$ psql -C database-name

psql

-C

SELECT

-I n

-i m

Define a stream

CREATE STREAM

CREATE TABLE

sample

Welcome to psql 0.2, the PostgreSQL interactive terminal.
                                                                                           
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit
                                                                                           
starting in CQ Query mode.  All queries will be submitted using cursors.
sample=# CREATE SCHEMA traffic;
sample=# CREATE STREAM traffic.measurements (stationid INT, 
 					     speed REAL,
                                             tcqtime TIMESTAMP TIMESTAMPCOLUMN
                                            ) TYPE ARCHIVED;

tcqtime

TIMESTAMPCOLUMN

archived

unarchived

Associate a wrapper with the stream

sample=# ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper;

ALTER STREAM

A simple windowed query

sample=# SELECT dst, COUNT(*), wtime(*) AS c FROM network.tcpdump AS st 
         [RANGE BY '5 seconds' SLIDE BY '1 second' START AT '2003-06-06 18:50:20'] 
         GROUP BY dst;

[RANGE BY ... SLIDE BY ... START AT ...]

RANGE

SLIDE

START AT

Send input data to stream

$ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdump

tcpdump.log

tcpdump

128.32.37.185,32797,64.174.7.0,80,S,06/06/2003 18:50:20.856709
128.32.37.185,32797,64.174.7.1,80,S,06/06/2003 18:50:21.856709
128.32.37.185,32797,64.174.7.2,80,S,06/06/2003 18:50:22.856709
128.32.37.185,32797,64.174.7.3,80,S,06/06/2003 18:50:23.856709
128.32.37.185,32797,64.174.7.4,80,S,06/06/2003 18:50:24.856709
128.32.37.185,32797,64.174.7.5,80,S,06/06/2003 18:50:25.856709

source.pl

clearinghouse

wtime

 dst | count | c
---------------+-------+---------------------
64.174.7.0/32 | 1 | 2003-06-06 18:50:25
64.174.7.1/32 | 1 | 2003-06-06 18:50:25
64.174.7.2/32 | 1 | 2003-06-06 18:50:25
64.174.7.3/32 | 1 | 2003-06-06 18:50:25
64.174.7.4/32 | 1 | 2003-06-06 18:50:25
64.174.7.1/32 | 1 | 2003-06-06 18:50:26
64.174.7.2/32 | 1 | 2003-06-06 18:50:26

More complex queries
Now that we've seen a simple windowed query, let's look at how we'd specify joins and recursive queries in TelegraphCQ.

Joins

SELECT 
  R.i, R.j, count(*) 
FROM 
  R [RANGE BY 't1 seconds' SLIDE BY 't2 seconds'], 
  S [RANGE BY 't3 seconds' SLIDE BY 't4 seconds'] 
WHERE 
  R.k = S.k 
GROUP BY 
  R.i, R.j 
HAVING 
  R.j > C;

Recursive queries, the WITH clause

WITH 
  StreamOne AS 
  (
    SELECT R.i, sum(R.j) as sum, wtime(*)
    FROM R [RANGE BY 't1 seconds' SLIDE BY 't2 seconds']
  )
  StreamTwo AS
  (
    SELECT S.k, sum(S.l) as sum, wtime(*)
    FROM S [RANGE BY 't3 seconds' SLIDE BY 't4 seconds']
  )
  
(SELECT * FROM StreamOne S1, StreamTwo S2
   WHERE S1.i = S2.k);

Windowed aggregates

SELECT 
  SUM(R.i), AVG(R.j), COUNT(*), wtime(*) 
FROM 
  R [RANGE BY 't1 seconds' SLIDE BY 't2 seconds'];

The Sanity Test suites

src/test/examples/tcqsanity/

src/test/examples/csv/streamsanity.pl

TelegraphCQ architectural overview

do not involve streams

continuous

eddy

result queues

The TelegraphCQ Executor: Sharing Explained

eddy

continuously adaptive

eddy

Filters	These represent single-table qualifications in the query. They are always of the form `column OP constant`. These filter modules will be merged together by the eddy to form a grouped filter which can evaluate filter predicates for multiple queries simultaneously.
SteMs	The building blocks for continuous query joins. A SteM holds tuples for a particular base relation or stream, and can be probed efficiently to find data tuples which match a probe tuple.
Table or stream Scans	These scans are used to obtain data from TelegraphCQ streams or regular PostgreSQL relations.

Data Acquisition Mechanisms in TelegraphCQ

wrappers

user-defined function

clearinghouse

Streams

unarchived

archived

tcqtime

TIMESTAMP

TIMESTAMPCOLUMN

Data Sources

Push

Pull

Data Acquisition in TelegraphCQ

integrate

Introspective query processing: self-monitoring capability

streams

Windows and Streams

Running Queries Over Streams

SELECT

RANGE BY

SLIDE BY

START AT

SELECT

SELECT <select_list>
FROM <relation_and_stream_list>
WHERE <predicate>
GROUP BY <group_by_expressions>
ORDER BY <order_by_expressions>;

stream_list

[RANGE BY ... SLIDE BY ... START AT...]

SELECT

Windows may not be defined over PostgreSQL relations.
WHERE clause qualifications that join two streams may only involve columns, not column expressions or functions.
WHERE clause qualifications that filter tuples must be of the form column OP constant
The WHERE clause may contain ANDs, but not ORs.
GROUP BY and ORDER BY clauses may only appear if the query also contains a window.

Command reference

Run-time configuration

numResultQueues

-Q N

postmaster

src/include/telegraphcq/telegraphcqinit.h

TCQMAXQUERIES
TCQMAXOPERATORS
TCQMAXSOURCES
TCQTSATTRNAME

TIMESTAMPCOLUMN

Publications

Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Vijayshankar Raman, Fred Reiss, and Mehul A. Shah. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003. [PDF]
Sailesh Krishnamurthy, Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Samuel R. Madden, Vijayshankar Raman, Fred Reiss, and Mehul A. Shah. TelegraphCQ: An Architectural Status Report. IEEE Data Engineering Bulletin, Vol 26(1), March 2003. [PDF]
Vijayshankar Raman, Amol Deshpande, and Joseph M. Hellerstein. Using State Modules for Adaptive Query Processing. ICDE 2003. [PDF]
Mehul A. Shah, Joseph M. Hellerstein, Sirish Chandrasekaran, and Michael J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. ICDE 2003. [PDF]
Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein and Vijayshankar Raman. Continuously Adaptive Continuous Queries over Streams. ACM SIGMOD Conference, Madison, WI, June 2002. [PDF]
Samuel R. Madden and Michael J. Franklin. Fjording the Stream: An Architecture for Queries over Streaming Sensor Data ICDE Conference, February, 2002, San Jose. [PDF]
Joseph M. Hellerstein and Ron Avnur. Eddies: Continuously Adaptive Query Processing. In SIGMOD 2000. [PDF]

Additional references

postgresql.org is an invaluable site with a wealth of information, both on using PostgreSQL and developing on it.

Examples of TelegraphCQ applications

Using Data Stream Management Systems for Traffic Analysis A Case Study, Thomas Plagemann, Vera Goebel, Andrea Bergamini, Giacomo Tolu, Guillaume Urvoy-Keller, Ernst W. Biersack. In the 5th Passive and Active Measurement workshop, Antibes Juan-les-Pins, France, April 2004.[PDF]
A Flexible Architecture for Statistical Learning and Data Mining from System Log Streams, Wei Xu, Peter Bodik, David Patterson. In the workshop on Temporal Data Mining: Algorithms, Theory and Applications at The Fourth IEEE International Conference on Data Mining (ICDM'04), Brighton, UK, November 2004.[PDF]

Appendixes

TelegraphCQ v2.1

Introduction

Installation

Pre-requisites

Platforms

Additional libraries

mm

Judy

Building TelegraphCQ

Getting Started

TelegraphCQ architectural overview

The TelegraphCQ Executor: Sharing Explained

Data Acquisition Mechanisms in TelegraphCQ

Streams

Data Sources

Introspective query processing: self-monitoring capability

Windows and Streams

Running Queries Over Streams

Command reference

Run-time configuration

Publications

Additional references

Examples of TelegraphCQ applications

Appendixes