Category: papers

Napa - Adaptive Partitioning for Distributed Queries

Efficient query execution in distributed data warehouses depends on how well the workload is balanced across nodes. Napa improves performance by dynamically partitioning data at query time, adapting to each query’s needs instead of relying on fixed partitions. It uses a progressive approach designed to be “good enough” thereby balancing partitioning time and performance.

written January 30, 2025 in data-paritioning, databases, papers Read on →

Pivot Tracing

Pivot Tracing lets users define arbitrary metrics over trace data at runtime. It does so by combining two powerful techniques:

A Happen-Before operator that allows users to perform queries based on the causal relationship of the events.
The ability to instrument code dinamically without having to redeploy.

written February 14, 2022 in distributed-tracing, observability, papers Read on →

Sifter: Scalable Sampling for Distributed Tracing

Distributed tracing can be ridiculously expensive if you try to trace a hundred percent of requests. A common technique to reduce costs is to sample only a small portion of the traffic. But naive sampling techniques like uniform sampling will inevitably capture more common-case executions and might miss the more interesting edge cases. Instead, [Sifter’s approach][1] is to bias sampling decisions towards outliers and anomalous traces. This way, anomalous traces have a higher chance of being sampled, and the more uninteresting traces are discarded.

written July 28, 2021 in distributed-tracing, observability, papers, sampling Read on →

Posts in papers

Napa - Adaptive Partitioning for Distributed Queries

Pivot Tracing

Sifter: Scalable Sampling for Distributed Tracing