------------------------------------------------------------------------------ APPROXIMATION and QoS PANEL ------------------------------------------------------------------------------ Coordinator: Mitch Cherniack Participants: Dave Maier, Rajeev Motwani, Johannes Gehrke, Hari Balakrishnan ------------------------------------------------------------------------------ MITCH CHERNIACK Mitch emphasized three talking points: 1. to justify if QoS is appropriate for streams At CIDR, QoS came up also in Telegraph (Franklin: as "graceful degradation") 2. if answer to 1 is "yes", how to specify QoS for streams 3. how to process QoS specifications Definition of QoS: - example: multimedia -> error tolerance of jitter -> x: error metric y: mert factor (i.e., how happy you are with a result, ranges between 0-1) - traditional network applications -> admission control -> resource reservation -> load shedding - for real-time applications, best effort is not enough Mitch's answer to point 1: - 2 characteristics: -> resource management -> for different apps, different requirements - examples for different apps, different tolerance: -> remote triage - every two seconds -> personnel info - low tolerance for latency -> where enemy is - high accuracy, can wait longer Sirish: Why not use more machines? The real issue is unpredictability and variability.. Brian: Is missile attack equivalent to remote triage? There are separate ways of doing things in separate systems.. Sirish: When there is sharing, QoS is harder.. Mitch: If there is separate machine for each query, this is hard-coded resource allocation.. We looked at different places where we can use QoS in Aurora. Buchmann: QoS of message delivery? In distributed systems, there are different semantics like "at most once", "at least once".. Mitch: Errors and tolerance depend on the application. Buchmann: QoS not on query processor level, but other levels of messaging/networking Answer to points 2 and 3: - differences between stream network and packet networks justify 1. Output tuples is not a subset of input tuples not just routing, but producing new ones - makes a difference 2. Semantic richness can take advantage about knowing about tuples - for the first difference: -> In packet networks, packets either make it there or be dropped. Inaccuracy = dropped/total anywhere within the network. -> In stream networks, tuple is lost when SELECT_p. It depends on if p(t) or ~p(t). If a tuple is lost before join, several tuples may be lost from the output. There is a non-linearity. This is different from packet networks. -> It is hard to determine the impact of dropped tuples on the output. It highly depends on where dropped. - for semantic richness: -> Some tuples are more important. In Aurora, this affects: load shedding - dropping scheduling - reordering Alin: How do you come up with graphs? Mitch: It is hard. Characterizing abnormal ones is important. Alin: In join, values in one stream depending on values in other in terms of importance.. Mitch: QoS attached to applications, not boxes.. Alin: Propagating QoS? Mitch: Processing QoS is not easy. Dropping randomly vs. predictive analysis that maximizes QoS Mehul: When is it impossible to meet QoS? Any analysis? Mitch: Assumption is that every QoS is feasible.. Stan: Depends on rates and resources.. Johannes: Sampling over joins.. Carlo: Several applications? Mitch: In an app, stale tuples in queue, latency requirements.. It may be better to drop them.. ------------------------------------------------------------------------------ DAVE MAIER Multimedia QoS - streaming media, streaming data rather than data streams - lessons learned -> non-linearity between doing random or thoughtful things -> error model differs -> control theory - control parameters, to avoid oscillations hard to apply since it is a discontinuous system -> hardest is instrumentation for watching it to tuning it; make sure it exists - model might apply to streams, not guaranteed Ways to explain error - quantization - cut down on resolution - quantization and drift are worse than shift and lag Error model - completeness: can you express an error with it? - more than one interpretation -> how they compose -> how to deal depends on app Use of error model - don't want to measure quality of error - instead, precompute effect of different load shedding options and switch between them based on the current resources ------------------------------------------------------------------------------ RAJEEV MOTWANI Disclaimer - common sense info - no QoS in STREAM system Should we have QoS in stream systems? Sure, why not? Start with and example - IT organization - network devices, servers, apps - each generating streams at: level of SNMP, packets, syslogs - put all into a centralized management system to do -> security -> why something crashed -> if peopleSoft is down, why? -> continuous queries and ad-hoc analysis Characteristics of such apps/killer apps: - variable bursty load - large number of continuous queries - real-time, online - event-driven -> example: if an intrusion detected, then do totally different things -> QoS becomes important -> change resource allocation QoS guarantees - load shedding - prioritization - degree of load shedding - react to run-time events QoS must be simple and effective. Mechanisms - sampling -> problem: effective for simple queries; effect on complex queries is hard - alertness -> boolean output (false positives and false negatives) -> haystack problem -> security alert - may miss important events - approximation -> mechanisms in general approximate processing - not much progress in even the general context -> simple queries are ok. Jennifer: Input sampling is bad for interesting things like Join Load shedding may eliminate everything - eliminate needle in haystack Dennis: Can use a fact table.. Alin: Foreign key joins.. Jennifer: They are the same.. Rajeev: Situation is different for streams, not as in the fact tables case.. Magda: There are multiple techniques. Based on app, we can use an appropriate technique.. Rajeev: We don't have a metric to decide on that.. - focusing -> value-based load shedding -> focus on certain domain values Jennifer: example: incoming missile, squirrels Johannes: gold customers - skipping -> dropping output? -> sampling at a different level -> accumulating and processing later - jumping windows - shrinking windows - latency -> does not shed load, postpones it -> for burstiness? -> batching for more throughput - level at which apply load shedding -> input/output/subquery Expressing QoS 1. What form is acceptable? Where to do? 2. Weights, partial order, run-time, compile-time.. 3. penalty, utility -> Aurora - utility -> rate chart - app specifies based on rate -> threshold These are critical queries, here is the max for them. For the rest, do the best you can.. Sample Models - Aurora Delivering - MQO is a problem, constrains it.. ------------------------------------------------------------------------------ JOHANNES GEHRKE Why? How? What? Competing axes - combining them is hard for user - machine learning techniques may be needed Two models - fast CPU - not enough MM - slow CPU - can't keep up Control QoS by how much space you give Stan: This is not reactive. You pre-compute. Not for spikes.. Johannes: Can't do the adjustment quickly for spikes.. Sketches: - not very good method for stream systems Set-valued Queries - use synopsis? - subset approximation - which subset? - element frequencies, values - metric from Stanford (EMD) - max subset ~ symmetric set difference Sampling - constructing sets from samples Histograms - computation in histogram space - problem: high-dimensional histograms, joins Semantic Load Shedding - minimum number of edges deleted (in join) - window-based? - hard problems.. Open problems - classes of metrics - interface to specify Two mini points - composing techniques Multiple Queries - space all sketches - care needed - hardest: specify and feedback to user - right metrics and how to state Question: It is hard - consequences to the result, effect on metric.. Due to this, some techniques might be more desirable. ------------------------------------------------------------------------------ HARI BALAKRISHNAN QoS-first, QoS-later, or QoS-never? - networking QoS - why it didn't work on Internet before and working now? Design QoS such that it is practical and can be made money (so that it is used) Rajeev: SLA or QoS? Hari: SLA don't have QoS, but starting to have rate specification. There is focus on delay. Handling overload - what to drop ISP - what to accept, what not; for protection purposes today.. Sharing resources - macrostreams gold -> queue1 silver -> queue2 who maps? Work-conserving scheme - scheduler not idle if there's a job waiting - same mean delay (per queue delay, not per packet delay) - tradeoff delay between different streams WFQ (Weighted Fair Queue) - bandwidth or rate sharing guarantee - bounds on worst case delay - bounded delay on streams -> use WFQ - makes bursts look like linear process Franklin: In broadcasting environment, it is not clear if it works.. SQ FIFO - everybody on same queue - minimizes jitter - preserves burst - may be important for applications - WFQ x SQ FIFO - pushing in opposite directions Dennis: What is jitter? Hari: burst of packets coming; mean linear lineation of packets.. QoS models - relative: who is Gold/Silver changes in time.. - end-to-end or per-hop -> end-to-end harder - optimize app performances -> per-hop - where money is Utility functions - hard to express except two cases Economic principles - what brings benefit to providers, not to apps Inverse of Aurora utility functions - elastic app -> like file transfer (waited RR is good) - drop-dead -> have to respond in a certain time -> need adequate provisioning -> no real way to bound the number of missiles (can't solve the problem) - elastic -> concave easier -> concave then convex harder (as in MM apps) -> convex portion - maximizing slope Rajeev: aggregate utility? stream nets vs packet nets? Hari: utility x degree of commonality - one more dimension Medusa Claims - difference from networks -> joint management of multiple resources -> simple service model needed -> should be practical -> feedback to sources (not in conventional databases) ------------------------------------------------------------------------------