=============================================================================== Approximation and QoS panel: Scribe: Abhinandan Das =============================================================================== Coordinator: Mitch Cherniack Participants: Dave Maier, Rajeev Motwani, Johannes Gehrke, Hari Balakrishnan ------------------------------------------------------------------------------- Mitch: ------ * Three main talking points to be addressed by panelists: 1) Whether QoS is appropriate in the context of stream processing 2) If it is, how to express QoS constraints for streams (how is it different in streams from other domains) 3) QoS Processing issues in streams * Definition of QoS: Resource management is driven by application specific requirements: - eg: Multimedia applications: Error tolerance may be specified as acceptable jitter Utility function graph on slide: - QoS specified by utility functions (value between 0 and 1) - x axis: Error - eg: jitter/delay/inaccuracy - y axis: Utility * In traditional network apps, QoS specs are used to drive: -Admission control -Resource allocation -Load shedding Best effort may not be sufficient for network apps with real time requirements. * A1: Why QoS is appropriate for streams: Example: Battlefield monitoring scenario 2 main characteristics: - Resource management - Non uniform error tolerance: Different requirements for different apps eg of latter characterisitc: Some apps (Remote triage): Low latency required, some inaccuracy ok Remote enemy tracking: Some latency ok, needs to be accurate Missile detection: low latency and no inaccuracy Shirish: Why not throw more machines for meeting high loads? The key issue is actually unpredictability and variability Brian: Have different systems handling queries with different error requirements, rather than the same system: eg: Have different systems to handle missile attack and remote triage. Shirish: Sharing complicates QoS handling MC: Separate machine for each query => Hardcoding resource allocation! Buchmann: QoS semantics for message delivery -- In distributed systems there exist multiple different semantics like "at most once", "at least once"... Mitch: As pointed out earlier, error tolerances depend on application Buchman: Mitch only looking at QoS in the query processing engine, not at other levels of messaging and/or networking * Why QoS is appropriate for streams (contd) Processing QoS Specs for streams: - Operator scheduling - Load shedding - Query Opt - Tuple Reordering - Load balancing * Why stream and packet networks are dissimilar: 1) In-flight generation of tuples: Tuples on the o/p of a stream network not necessarily a subset of input tuples -- tuples not just routed, but also generated 2) Semantic richness of tuples: Taking advantage of semantics of tuples * Specifying QoS: (Differences between stream and packet networks) For 1): - Packet networks: Reasonable measure of inaccuracy in PACKET NETWORKS: Packets dropped / Total packets [anywhere in the network?] - Stream networks: Depends on whether dropped tuple affects result (output) + if aggregation present at end: affects accuracy + 1 tuple dropped in stream scenario may result in multiple tuples dropped in output (non linearity: unlike n/w scenario) + Thus, hard to determine the impact of dropped tuples. Depends on when/where the tuple was dropped. Hence: + Multiple flavours of inaccuracy + Different levels of impact of dropped tuples For 2) Semantic richness: Value based QoS (a la Aurora): Load shedding: Drop less valuable tuples Scheduling: Reorder - Process more valuable tuples first Alin: How to come up with QoS graphs? In case of joins, "alue" of a tuple in 1 stream depends on its frequency on joining stream Mitch: Difficult to come up with QoS graphs. QoS is attached to applications, not boxes. Alin: What about propagating QoS? Mitch: Not easy. Issues with propagating QoS requirements... One can randomly drop tuples, or perform some sort of predicitive analysis maximizing QoS Mehul: Any example where it is impossible to propagate QoS specs across operators? Mitch: We assume that all QoS specs are feasible Stan: Depends on data rates and resource availability Conclusion: QoS is appropriate for streams! :) =============================================================================== Dave Meier: ----------- * QoS lessons from multimedia: (streaming media instead of streaming data) - If you have to degrade, do so in a thrifty/graceful manner: Least perceived reduction in quality for max reduction in resource usage - Error is multifaceted: Different error models applicable for different tasks eg Lower resolution Vs lower frame rate - Software adaptivity difficult to tune and converge to a stable state - Model may or may not apply to data streams * More than 1 way to explain error: - Shift, drift, quantization error, phase lag: Shift error may be preferable to drift error, depending on application, even if magnitude of error is same - Error interpretation: Can have more than 1 interpretation for an error: Lag + Amplitude error can be thought of as (composition of) little lag + large amplitude error or vice versa * Uses of error model: -Combined quality bound: Error = function of different types of errors, or min or max over all interpretations - Define permissible limits on individual components - Accomodate user preferences: Degrade shift error before introducing quantization error =============================================================================== Rajeev Motwani: --------------- QoS in Data Stream Systems * Example killer app fpr a stream system: - IT infrastructure mgmt: + Several devices such as web servers, routers, app servers, firewall logs generating streams of data. + Monitoring streams, figuring out why servers crashed, intrusion detection can be potentially posed as continuous queries. * Need for QoS: - Characteristics of stream apps: + Variable/bursty load + Realtime requirements + Event driven adaptivity: If u detect an event such as intrusion, you would like to completely change resource allocation: Stop monitoring application server and reallocate resources to intrusion tracking * Key goals of QoS in streams: - QoS mechanisms: Techniques for load shedding. - QoS models: Enable user to specify, control and understand impact of load shedding - QoS delivery: Management of system resources to optimize global QoS metric 1) Load shedding mechanisms: -Sampling: Does input data reduction. Useful only for v simple queries Jennifer (Widom): Join of samples is not sample of joins. Sampling ok for foreign key joins eg Those involving fact table in a star schema - Alerts - sampling could be used to do load shedding for alerts. Sampling not a good idea, coz may miss critical events. - Approximation: incomplete processing. What are the right mechanisms for specifying the semantics of impact of approximations. What are the right metrics for set valued outputs? For aggregations, one has a pretty good idea right now for what might be the semantics of aggregation Even for the "bad" metrics that currently exist, we dont have good techniques for approximation. - Focussing: Value based load shedding: Certain values could be of interest to the user. User specifies: When system is loaded, focus on particular set of tuples. - Skipping: Output reduction: Throttle output rate in continuous query systems. Dont process a tuple if it is not used to produce one of the every 10th output tuples, which is the output you are interested in. -Jumping windows: instead of sliding windows -Latency: Reducing the timeliness * Granularity/level at which load shedding mechanisms are applied: - Input/Output levels - Subquery or Operator levels * Expressing QoS Requirements: - What form of load shedding is acceptable for the application - Relative priorities of queries/streams: Weights to streams/queries - Penalty for load shedding: + Trade off curve: Arora model + Rate Chart: Allowable error/latency vs input rate + Threshold: Specify most important queries and give maximum tolerable error for those queries, for remaining queries, do best effort. * Sample QoS models: - Latency bounds and query weights specified: Minimize wtd subject to meeting latency bounds * Delivering QoS: - MQO could constrain QoS delivery =============================================================================== Johannes Gehrke: ---------------- * The Why/What/How for QoS: * Semantic approximations: - If you approximate, you need to inform the user what this means - Not clear that the user wants approximation - There will be competing axes: Will need to combine errors + Combining errros is hard + MAchine learning techniques may be needed * Two models: - Fast CPU: Not enough main memory, and writing to disk too slow - Slow CPU: Cannot keep up with arrival rates * What works -- approximating aggregates: - Problem: Records are streaming in, need to compute second frequency moments - Approximations with space, error guarantees (probabilistic) using sketches Remarks on aggregate qeries: - Approximation errors increse exponentially with multi way joins * Approximating set valued queries: - No universally accepted error metric - Some proposals: + Match and compare + Earth Movers Distance + Max subset/Symmetric set difference + Archive metric: Performance based metric * Formulation of joins as a bipartite graph - minimum number of edges to be deleted = join load shedding - semantic load shedding - hard problem * Open problems: - Meaningful metrics for set valued results - User interface for specifying approximations - Composability of operators that approximate + Aurora operators need to be composed carefully to prevent cascading effects * Multiple queries: - Resource allocation across queries - Right metrics, and how to specify them - Summary: + Main problem: Need ways to specify QoS. =============================================================================== Hari Balakrishnan: ------------------ * QoS First/Later/Never?! (a la CIDR '03 panel! :) - Why QoS didn't work on the internet for the first 15 odd yrs since proposed, and why it worked later - Assumption: Cannot provision enough capacity to handle peak load - Design QoS in such a way that it is practical and can be used to make money (v important consideration often overlooked) * Why QoS? - Provides a principled way to: + Schedule + Shed Load + Approximate results * To make money for service providers (main financial driving force behind internet QoS today) * Handle overload: If QoS specs are simple, will tell us how to decide what to do when overloaded Rajeev: QoS or SLA? QoS helps applications, SLA helps make money for service providers? * Router Pkt Forwarding Vs Data stream processing: - Both systems have pipelined stages, queues, have bursts - But in routers, stages operate in parallel, while in Streams, stages share physical resources (eg 1 CPU) - 2 sharing problems in streams: Inter operator and inter stream 1 shared Queue, FIFO in router pkt forwarding domain - Data stream processing: Need to handle many different resources: Network, disk I/O, CPU Resource sharing: - MacroStream: Group of streams sharing a queue in front of an operator - eg: gold customers - queue 1 silver customers - queue 2 Who performs mapping? * How to share resources between macrostreams - Scheduling - Queue mgmt - Approximations NOTE: All of following discussion refers to macrostreams. * Key networking results: - Work conserving scheme: Scheduler is never idle if a job is waiting Not true in a broadcast scenario - All work conserving schemes have the same mean delay (NOT per packet delay): If u reduce delay for 1 stream, you have to add it to someone else. * Weighted fair queuing: - bandwidth or rate sharing guarantee - bounds worst case delay under linearization assumptions * Single queue FIFO minimizes "jitter": Mean linear deviation of inter arrival delays. WFQ could have large jitter. * QoS service models: - Absolute Vs Relative? eg: X mesgs/sec Vs Gold/Silver/Bronze classes (latter usually easier) - End to end or per hop delay guarantees? (End to end is harder) - Most existing systems deal with optimizing latter. This is where the money is! * Utility functions: - V hard to express - Economic principles: Wts on queues set according to financial considerations. * 2 classes of utility functions: - Elastic Vs Drop dead real time - File transfer falls into first category: No delay requirement, but more resources you throw at it, you have smaller marginal utility - Drop dead real time: eg: Missile coming in and you need to allocate resources absolutely urgently in real time Need adequate provisioning in this case: eg 5 missiles coming in and you can process only 3 at one time... => 2 will not be attended to * Utility functions: - Utility Vs Resource: Usually concave everywhere. - Former is a smooth curve, latter a step function. (Resource ~ 1/delay) * Aggregate utility is nearly maximised with simple methods in elastic scenario. Rajeev: Aggregate utility example may not carry over from networks to stream scenario, coz in networks, either my packet goes through, or COMPETING packet. In streams, several queries could be issued from 1 source and all AGREE that missile query has high priority: No competition. * Medusa: - Federation of computers in different administrative domains participting in system - Load multiplexing in the federation - Pushing feedback to sources about acceptable incoming rates * Summary: - Difference between networking and data stream scenarios: Joint mgmt of resources in data streams: CPU, disk I/O, network -- different from networking scenario. - Simplicity of service model is essential - QoS schemes with tangible benefits to service providers - Put the application in the control loop! Feedback to sources are important so they can adapt rates and queries whenever possible. ===============================================================================