Posts in papers

Napa - Adaptive Partitioning for Distributed Queries

Efficient query execution in distributed data warehouses depends on how well the workload is balanced across nodes. Napa improves performance by dynamically partitioning data at query time, adapting to each query’s needs instead of relying on fixed partitions. It uses a progressive approach designed to be “good enough” thereby balancing partitioning time and performance.

written in data-paritioning, databases, papers Read on →

Pivot Tracing

Pivot Tracing lets users define arbitrary metrics over trace data at runtime. It does so by combining two powerful techniques:

  1. A Happen-Before operator that allows users to perform queries based on the causal relationship of the events.
  2. The ability to instrument code dinamically without having to redeploy.

written in distributed-tracing, observability, papers Read on →

Sifter: Scalable Sampling for Distributed Tracing

Distributed tracing can be ridiculously expensive if you try to trace a hundred percent of requests. A common technique to reduce costs is to sample only a small portion of the traffic. But naive sampling techniques like uniform sampling will inevitably capture more common-case executions and might miss the more interesting edge cases. Instead, [Sifter’s approach][1] is to bias sampling decisions towards outliers and anomalous traces. This way, anomalous traces have a higher chance of being sampled, and the more uninteresting traces are discarded.

written in distributed-tracing, observability, papers, sampling Read on →