A Cost Driven Approach to Information Collection for Mobile ...

A Cost Driven Approach to Information Collection for Mobile ...

Time Sensitive Computation of Aggregate Functions over Distributed Imprecise Data Qi Han, Matthew Ba Nguyen Sandy Irani, Nalini Venkatasubramanian Distributed Systems Middleware Group http://www.ics.uci.edu/~dsm School of Information and Computer Science University of California-Irvine Motivation Real time applications often make decisions based on timely results aggregated over distributed data Example: real-time fault localization and problem diagnosis How many nodes are overloaded? (count) What is the total latency along the path N1N2 Nk? (sum) What is the bottleneck (minimum link bandwidth) along a path N1N2 Nk? (min) Challenges Continuous stream of fast changing source data Diverse user requirements in terms of data

accuracy and service timeliness Effective utilization of underlying computation, communication and storage resources competing goals of timeliness, accuracy and cost-effectiveness This paper Previous research Tradeoff between accuracy and cost storing data in ranges Tradeoffs between accuracy, cost and timeliness (single data item) RTSS2003 This paper addresses Tradeoffs between accuracy, cost and timeliness for aggregate functions (count, sum, min) Probing part of the sources might be sufficient How the server selects an appropriate subset of sources to probe so that the overall probing cost is minimized without violating accuracy

and timeliness requirements of aggregate queries Problem Characterization Queries: f(s1,s2,,si,sn), A, D Objective : minimize ci A: accuracy constraint D: time constraint Example: min(s1,s2,sn), 3, 1 minute iP (1) f (r1P , , rnP ) A subject to ( 2)T f D server c 2 /d 2 1 /d 1 /d cn di c i/

c Database [L1,U1],[L2,U2] [Ln,Un] n s1: E1 [L1,U1] s2: E2 [L2,U2] source 1 source 2 ... si: Ei [Li,Ui] source i ... Sn: En [Ln,Un] source n Time-sensitive Computation of Aggregate Functions Compute the function based on stored values in

the database If the answer meets accuracy constraint: done Otherwise: select probing set Only consider SD: the subset of sources whose diD Batch selection The entire set of sources to probe is selected before the probes actually occur Iterative selection The source is probed one at a time Batch Selection of Source Probing Set For Count (Batch_COUNT) Problem: calculate the number of source values that fall inside the range r=[l,u]: fcount=|{si|si[l,u]}| l u

r s1 s2 s3 s4 s5 s6 Inside | I | f count | I | | U | Outside Uncertain e.g. f count [1,4] Algorithm: if |U|>A: we must probe |U|-A sources to determine the function within the desired accuracy If |USD||U|-A: order the sources in |USD| according to increasing cost and probe the first |U|-A sources in this ordering o/w: cannot determine the the function to within the desired accuracy Batch Selection of Source

Probing Set for Sum (BATCH_SUM) n Problem : f sum si i 1 e.g. s1 [1,5] before probing : f sum li , ui f sum (ui li ) si s D si sD si s D s2 [2,4] f sum [3,9], f sum 6 probing si P ui li 0 P : probing set (u i li ) A, where P S P si P can be reduced to 0 / 1 knapsack problem The objective is to maximize ci si P Batch Selection of Source

Probing Set for Min (BATCH_MIN) n Problem : f min min si i 1 Algorithm : 1. identify upper and lower bound of the min function before probing : u min ui , l min li ; iS 2.if u-l A, decide subset L : L i | li u A ; 3. probe each source in L S D A Example: s1 s2 s3 u l s4 L s1 , s2 , s3 iS Issues in Computing min In the case of computing count or sum Know in advance exactly the benefit of probing any particular source, thus can decide in advance which sources to probe

count: probing any source decreases the uncertainty by 1 sum: probing source si decreases the uncertainty by ui-li In the case of computing min The number of probes required may vary depending on the values of the sources Example: s1=[0,5], s2=[1,6], A=1 If probe s1 and e1=2, then min=[1,2], done If probe s1 and e1=5, then min=[1,5], so s2 must be probed Iterative Selection of Source Probing Set for Min define current upper bound : u min min ui iS after probing si , the new upper bound will be min ei , u min Assuming ei is uniformly distributed over the range li , ui , expected amount by which the range for the minimum decreases when si is probed : u min ben( si ) li (li umin ) 2 umin x dx ui li

2(ui li ) probe sources that maximizes BCR ben( si ) ci d i Performance Evaluation Baseline policies compared with GREEDY: probe all LAZY: probe none RANDOM Performance metrics Cost ci iP Accuracy ratio (a/A): measures how close the answer interval a matches the accuracy constraint A Latency ratio (d/D): measure how close the time d spent answering the query matches the time constraint D

Accuracy satisfaction ratio: the percentage of queries with their accuracy constraints met Deadline satisfaction ratio: the percentage of queries with their time constraints met Basic Performance Results for Computing Count count 450 latency ratio 400 350 cost 300 250 200 150 100 50 0 RANDOM BATCH LAZY 4 10 9 8 7 6 5

4 3 2 1 0 3.5 accuracy ratio 500 GREEDY count count 3 2.5 2 1.5 1 0.5 0 GREEDY RANDOM BATCH LAZY GREEDY RANDOM BATCH

LAZY Basic Performance Results for Computing Min min min 1.2 1 accuracy satisfaction ratio deadline satisfaction ratio 1.2 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 GREEDY RANDOM BATCH ITERATIVE

LAZY GREEDY RANDOM BATCH ITERATIVE LAZY Performance of Computing Count under varying accuracy constraints count 400 350 4 3.5 300 250 200 150 3 2.5 accuracy ratio cost count

100 50 0 2 1.5 1 0.5 0 5 10 15 20 accuracy constraints 25 30 5 10 15 20 accuracy constraints 25 30

Performance of Computing Count under varying time constraints count count 3.5 250 3 accuracy ratio 300 cost 200 150 100 2.5 2 1.5 1 50 0.5 0 0 1 3 5

7 deadline 9 11 1 3 5 7 deadline 9 11 Conclusions The worst case analysis (batch selection algorithms) provides a bound on the cost to satisfy queries regardless of the exact values of sources More sophisticated models such as Gaussian distribution can be used to capture the change of source values Also interesting to conduct competitive analysis of algorithms for answering aggregate queries

Recently Viewed Presentations

  • Credit Exposure Analysis using ICE Prices

    Credit Exposure Analysis using ICE Prices

    Credit Exposure Analysis using ICE Prices. Suggested methodology by CRIM to use ICE Futures prices: Utilize Daily Peak futures (END) and Daily Off-Peak futures (NED) contract prices from ICE for determining projected forward price
  • Flood Hazard Risk Reduction and Resiliency Grant Program

    Flood Hazard Risk Reduction and Resiliency Grant Program

    Flood Hazard Risk Reduction And Resiliency Grant Program . Funded by the US Dept. of Housing and Urban Development. Community Development Block Grant-Disaster Recovery Program . DEP selects and awards grants to Municipalities/Counties as subrecipients. Flood Hazard Risk Reduction and...
  • How Section 511 of the Rehabilitation act Applies to Schools ...

    How Section 511 of the Rehabilitation act Applies to Schools ...

    We are familiar with Section 504. Section 504 of the Rehabilitation Act is a civil rights law that prohibits discrimination on the basis of disability.This law applies to public elementary and secondary schools, among other entities.
  • Chapter 1: Introduction to Corporate Finance

    Chapter 1: Introduction to Corporate Finance

    Incremental cash flow analysis looks at how the cash flows change by taking a particular project instead of another project. Using IRR and PI correctly when projects are mutually exclusive and are of differing scales IRR and PI analysis of...
  • What do we know about Price and Income Elasticities of Demand ...

    What do we know about Price and Income Elasticities of Demand ...

    What do we know about Price and Income Elasticities of Demand for Wine? Leslie Butler, UC Davis. Sara Savastano, Univ. of Rome. Marianne Wolf, Cal Poly SLO. Presentation to the AAWE/AARES Conference Workshop "The World's Wine Markets by 2030", February...
  • Australian Curriculum - The Curriculum Place

    Australian Curriculum - The Curriculum Place

    In Years 5 and 6, students develop awareness of key aspects of Australia's Anglo-Celtic heritage, including the Westminster system, and knowledge and understanding of the key features and processes of Australia's system of government.
  • Unit 3 : the solar system

    Unit 3 : the solar system

    Understand how our view of the solar system has changed over time and how discoveries made have led to our changing our view of the solar system. Learn planetary characteristics such as number of moons, size, composition, type of atmosphere,...
  • Map Reduce for data-intensive computing

    Map Reduce for data-intensive computing

    "The Google Filesystem" by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in SOSP '03. CACM Papers on MapReduce and Parallel DBMSs. MapReduce and Parallel DBMSs: Friends or Foes? MapReduce: A Flexible Data Processing Tool