-------------------------------- Notes from the Engine Panel -------------------------------- Goals (stated by MF) ________________________ - Identify key areas where DB tech. falls short - Give succinct justification for this area - Identify research areas - Road map - Point out possible solutions, non-solutions - CIDR paper - Dieter Gawlick and co., Oracle Does a lot of the fancy stuff that we think cannot be done in traditional systems. What gives ? - Mike Franklin - Sharing and Adaptivity really closely related - Sharing fundamental .. benefit: scalability - With standing queries we have all queries, unlike traditional MQO - Also queries are long lived - need to avoid duplicate work and keep with dataflow rate - need to reduce query entry cost - Adaptivity - How dynamic is the environment really ? - statistics, query mix, state fluctuate - CSE - traditional MQO - Static .. over time (w/ query insertion/deletion) they degrade the plan - Approaches: YFilter and TelegraphCQ - YFilter - NFA fragment for XPath expressions - Combine NFA fragments -> - create a single state machine to process 'em ==> shared (machine) state and processing - A trie over Xpath expressions - Can create one big uber-query .. doable even for XML - Adding queries is still easy - TelegraphCQ - Eddies and SteMs - Purest form of eddy, routing on a per-tuple basis - Adding new queries are easy, no need to recompile - Just add to the eddy, routing decisions take care of it - Overtime routing adjusts to accommodate the new query - Makes it much easier to do sharing .. in a dynamic environment - What's the degree of dynamism targeting ? (Hari) - Input changing is the classical idea of Eddy benefit - However queries might change - Ability to add queries over time and still don't degrade - What's the beef ? How many queries, changing how often ?(BrianB/JW) - Big problem (Stonebraker) - Oracle guys do a scalable trigger system .. that's going to solve the standing query problem. - Overhead of routing a tuple through an eddy is comparable to the cost of processing it through a join or a filter. Cost of an eddy is a factor of 2. Adaptivity is going to cost a 100%. - Wake up every 500ms and change the routing policy. - Batch tuples through eddy to reduce routing cost - Same as what Oracle does (or Oracle could do this too) - Mike Franklin's response - Only a factor of 2 is great - Query optimization gets you a lot of benefit - Still don't know how dynamic things really are of course - Alex B. - Not sure things are really that dynamic - Use streaming queries to complex events that are going to trigger some action. There are a finite set of queries in some application. Lots of instances of these, but rare that you are going to get more and more of them randomly dynamically. - MF response: - EBay example: millions of people coming in .. - MS: only one query .. - Well the predicates vary and have different selectivities (Sam) - CZ - Which killer-apps have this property ? - Perhaps not a lot of such applications - AA - User-defined operators .. don't know how they commute ? (Benefits of Eddy primarily with commutative operators?) - MS - 100K standing queries .. we won't use something like Eddies - Will use an indexing structure - Forgy's RETE n/w paper - GroupedSelectionFilter are the TelegraphCQ response - Alex Buchman: - DREAM (Distributed Reliable Event-based Application Management) - Stepping outside the db box - Use some db techniques, but more about tweaking db techniques - Applications - ATIS: Air Traffic Info Systems - New generation being developed in Europe currently - Prototype on the gate-to-runway overflow - Internet enabled cars - Traffic management - Want to do individualized routing - Don't want everybody to head to a new direction - That's what happens when it gets broadcast in the radio - People take a detour and that causes a fresh traffic jam - Sensor data from a car (aggregated locally) - Failure data are piped out for analysis by a manufacturer - Failure data can also be used to trigger workflows - Setup appointments in a repair shop etc. - Personalization of a car - Take your car personalizations with you .. a car portal - When you rent a car you get to take your settings with you - Car and you both have preferences - You can store the personalizations in a PDA .. privacy issues - Different Event Generators - Streams of events that have to be pushed to consumers - Brokers .. there are more than one broker - Eg: ATC - more than one ATC center - Self-stabilizing pub/sub - Two correctness conditions: liveness and safety - When there are random failures, things should eventually be delivered. Done through leases. Must resubscribe from time to time (i.e., renew the lease periodically) - System has 2 parts - Pub/Sub system - Loosely couple a bunch of clients - Add new consumers easily .. subscribe to interesting notifications - Messages are routed through a notification service. - Lot of hairy stuff happens in the notification service. - Lots of simple filters .. not a lot of complex aggregation - Reactive functionality - Event-Condition-Action style rules - Indivi. services .. that can be composed as needed - event detection - notification - conditional evaluation - purging - action execution - logging - Need a mechanism for composing individual events - Just a trigger for some reaction that must be generated - Event composition can be done through streaming queries .. but you don't really need to do it through streaming queries - Can be done with an event algebra - Can be compiled into an event graph .. - When a complex event gets created in the graph it pops out - Sharma Chakravarthy's consumption modes .. part of SNOOP - They use the same for their version - With sensors, order and correlate are important - Queries are really run in clients .. - Very simple clients are those that generate events like sensors - Complex clients are those nodes that do complex stuff - How to reduce the number of events pushed through a pub-sub system - So that you don't have to do flooding - Magdalena B.: Does the system take into account that clients have limited b/w? - AB: intelligent processing in n/w .... - Problem sources - Scale - filter placement .. - Roaming and resuming. Scopes for security - dynamic subscriptions - Distribution - a source of problems for time and synchronizations - Heterogeneity - with heterogeneous events how to make them compatible. - The old s-word - Less control in these systems - QoS in loosely coupled systems? - Addressing stuff in a pub-sub system - Tibco does it on the subject keyword .. very strict. Rigid subject hierarchy. - JMS: (topic) content based model .. simple model homogeneous - Content based addressing - simple predicates - Problems with above: can do only simple predicates, and assume homogeneous environment - One level deeper - Concept-based addressing - If concept-identifiers match, the match is easy - If not, we need appropriate conversion functions defined with a context. Relatively efficient because it gets reduced to a simple context. - user-defined context with ontology (MIX) - 3 levels - representation - infrastructure - domain spec. - event notification carries context ID. An example w/ NYSE and Frankfurt SE. - Use of main memory reliable cache (replicate stuff in the main memory of other nodes, instead of disk?) - MikeF: - Is heterogeneity problem different from data integration env. ? - Not particularly, but gotta resolve it on the fly - So you gotta build it in the infrastructure - Ugur Cetentimel - Overload management in stream processing engines - Aurora' main concern is how to do overload management - Different in push-based systems and stream processing in general - Aurora - Scheduler is the brain of the system - Router gets tuples either from outside of the network or from boxes as the result of processing - Scheduler and load-shedder are the main components - "Control is good" despite what Brewer said at CIDR - Push-based systems - Things pushed asynchronously - Gotta react to things .. including dealing when load is beyond the capacity of the system - Here we need Graceful Degradation. - Without overload management, throughput is going to decrease and then the response time is going to reduce. - What is the right execution model - Thread-driven execution model - Each task (can be a query or an operator) is assigned a thread - OS manages a thread - This is not scalable because of scheduling overhead - Destroys cache locality, TLB misses - Resource mgmt decisions made by OS in general - Gain the control back from the OS - Our execution model - Tuple-driven model - Small number of threads (one per CPU) - Going to loop continuously - Make scheduling decisions - Full control means we can keep tuples from entering the system and clogging the network. - Difference between thread-based and "message-based" - Do you lose control ? What if you have sharing ? So different tuples for different queries passing through it ? In the thread-driven model you have the view of a single query. In a tuple-driven model when you merge operators this view gets diluted ? - Challenge .. how to do scheduling and how to do it in a light-weight fashion. - Sharing makes things more interesting. - Scheduling must be made light-weight - Load-shedding for overload management - Drop tuples from the system .. where to place drop boxes etc. - Dynamic approaches may be too expensive .. esp. when system is overloaded - Dave Maier: where to place drop boxes ? edge of the network ? - Don't want to drop tuples after they enter the system and you have committed resources for it .. but sharing complicates things. Drop it right after a split in one branch. - Assumption from UC: we don't have control over sources - DM: it's a bad assumption - Others mentioned that in some cases it might be better to take a quick look at tuples before dropping them to decide which ones to drop (as opposed to dropping them without any processing) - MF: Over configure the system to handle the peak load in a classical system. Why can't you do this in such a system. UC response: cannot size your system based on your worst case system. - Ted Johnson (Gigascope) - How to process 1M packets/sec and run a non-trivial query - Look at the queries, analyze them and push 'em down as far into the system as possible. - Gigascope is a stream only model .. similar to boxes-and-arrows - Why is this different (JW) ? - Router is a high-volume data source .. a stream - Use a costly mirror to reproduce the packets passing through the router - The mirror is a select stream - Mirror is attached to a network interface card (NIC) - What kind of operation can be pushed down into the mirror (select stream) - E.g., selection and projection - The selections are pushed down by setting registers in the select-stream - Like a db operator - Pushed a select/project/agg in a NIC - Difft cards have different capabilities - Snap create .. 134 bytes (lets you do projections) - bitmasking .. let you do approximate selections - The NIC flows data into a circular buffer .. when you overload we drop tuples - Sometimes we can do BPF (Berkeley packet filter) .. can get locality to push projection operations into the kernel - From the buffer it goes into the box of low-level queries (no data copying happens here) which is the packet filter layer - Johannes - is it a push-pull interface ? Queries pulling, data pushing ? - Queries pulling from the circular buffer - Queries running in user-space. - Queries can run selection,projection,agg etc. - Aggregates they use a direct mapped cache - First copy is from the NIC into the buffer. Next copy if from the lower level query layer to the higher level one. - How it works - Take the query set (Q1,.. Qn) - Extract stuff from it (q1,q2 ... qm) - Push them to the BPF-below .. gotta do some MQO - Push Bloom filters down so that approximate filtering can be done - need to drop tuples early whenever possible - low-level queries are just scheduled with the OS - If there are sampling filters (state determining) got to be extremely careful in commuting 'em. - Jennifer Widom (STREAM) - Issues in building the system - Maintaining intermediate state .. memory or disk ? - Is there an accuracy-latency-storage tradeoff ? - AB .. an intermediate way - A reliable main memory cache .. sending things off to another machine for main memory caching might be faster than local disk caching - Asynchronously writing to disk - Less latency with very high communication - When we run out of memory we approximate - Stonebraker: Write to virtual memory - What is the objective of a scheduler - What are the performance metrics for continuous queries ? - How important is approximation ? - Are streams really going to be all that fast - Can't the filters be pushed to source ? - Dennis Sasha: Filtering might be possible only after fusion. - How adaptive do we need to be ? - Are they really going to vary that much ? - Eddies one extreme .. where do you want to be in the spectrum - Military envt. more varying than network monitoring .. - Adapting adaptivity - API issues ?? ----------------------------------------------------------------------------------