Instrumenting a Search Query Path with OpenTelemetry

When a search request is slow, the question is always which hop: query parsing, the engine round-trip, the reranker, or render. Without a single trace spanning all four, you correlate four logs by wall-clock timestamp and guess. This guide instruments one query path end-to-end with OpenTelemetry — discrete spans for parse, engine round-trip, rerank, and render, a latency histogram with query attributes (query, hit_count, took_ms), and OTLP export to a collector. It is the implementation detail behind the Observability & SRE for Search area, which itself fits the broader Search Engine Selection & Architecture framework.

Prerequisites

  1. Python 3.10+ or Node 18+ query service you control.
  2. opentelemetry-sdk>=1.24 and opentelemetry-exporter-otlp>=1.24 (Python), or @opentelemetry/sdk-node>=1.21 (Node).
  3. An OTLP collector reachable at otel-collector:4317 (gRPC) or :4318 (HTTP).
  4. A search engine on localhost:9200 (Elasticsearch), localhost:7700 (Meilisearch), or localhost:8108 (Typesense).

Diagnosis / Context

The default of a single timing log (query took 312ms) tells you the total but not the split. Here is the same request as an uninstrumented log versus what you actually need:

2026-06-20T11:04:22Z  search  q="wireless headphones"  total=312ms  hits=240

That 312ms could be 30ms of engine time and 270ms of reranking, or the reverse — the log cannot say. The fix is structured spans whose durations are nested under one parent, so the trace viewer renders a waterfall and the largest self-time span names the culprit. Spans also carry attributes, so you can filter to “slow queries with zero hits” or “queries where took_ms is fine but the parent span is not” — the two signatures that separate an engine problem from a service problem.

Solution Steps

1. Configure the tracer, meter, and OTLP export

Set up the SDK once at process start. Both a TracerProvider (spans) and a MeterProvider (the latency histogram) export over OTLP to the collector.

# telemetry.py — one-time SDK bootstrap
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

resource = Resource.create({"service.name": "search-query-api"})

# traces -> OTLP gRPC collector
tp = TracerProvider(resource=resource)
tp.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(endpoint="otel-collector:4317", insecure=True)))
trace.set_tracer_provider(tp)

# metrics -> same collector, flushed every 10s
reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(endpoint="otel-collector:4317", insecure=True),
    export_interval_millis=10_000)
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[reader]))

tracer = trace.get_tracer("search.query")
meter = metrics.get_meter("search.query")

# explicit buckets must straddle your latency SLO threshold (here ~150ms)
query_latency = meter.create_histogram("search.query.latency", unit="ms")

2. Wrap the query path in nested spans

One parent span per request; child spans for each stage. Record attributes on the parent so a single span carries the full request shape.

# search.py — spans for parse -> engine -> rerank -> render
from opentelemetry import trace
from telemetry import tracer, query_latency
import time, httpx

def handle_search(raw_q: str, query_class: str = "search"):
    start = time.perf_counter()
    with tracer.start_as_current_span("search.request") as root:
        with tracer.start_as_current_span("parse"):
            parsed = build_engine_query(raw_q)   # tokenize, apply filters

        with tracer.start_as_current_span("engine.query") as eng:
            resp = httpx.post("http://localhost:9200/_search", json=parsed).json()
            took = resp.get("took", 0)           # engine-reported ms
            hits = resp["hits"]["total"]["value"]
            eng.set_attribute("engine.took_ms", took)   # engine time only

        with tracer.start_as_current_span("rerank"):
            ranked = rerank(resp["hits"]["hits"])       # custom scoring pass

        with tracer.start_as_current_span("render"):
            payload = to_response(ranked)

        latency_ms = (time.perf_counter() - start) * 1000
        # attributes: keep query as a low-cardinality hash if PII is a concern
        attrs = {"query.class": query_class, "hit_count": hits}
        root.set_attribute("query", raw_q)
        root.set_attribute("hit_count", hits)
        root.set_attribute("engine.took_ms", took)
        query_latency.record(latency_ms, attrs)   # histogram for p50/p95/p99
        return payload

The same shape in Node uses the @opentelemetry/api tracer with tracer.startActiveSpan; the span names and attributes stay identical so dashboards are portable across services.

// search.js — equivalent nested spans (Node SDK)
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('search.query');

async function handleSearch(rawQ) {
  return tracer.startActiveSpan('search.request', async (root) => {
    const parsed = await tracer.startActiveSpan('parse', (s) => {
      const p = buildEngineQuery(rawQ); s.end(); return p;
    });
    const resp = await tracer.startActiveSpan('engine.query', async (s) => {
      const r = await engineSearch(parsed);
      s.setAttribute('engine.took_ms', r.took); s.end(); return r;
    });
    root.setAttribute('hit_count', resp.hits.total.value);
    root.end();
    return toResponse(resp);
  });
}

3. Run a collector to receive OTLP

The collector terminates OTLP and re-exports to your backends. This minimal config accepts gRPC + HTTP and exposes metrics for Prometheus to scrape.

# otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
exporters:
  prometheus: { endpoint: 0.0.0.0:8889 }   # scrape target for metrics
  otlphttp/traces: { endpoint: http://tempo:4318 }
service:
  pipelines:
    traces:  { receivers: [otlp], exporters: [otlphttp/traces] }
    metrics: { receivers: [otlp], exporters: [prometheus] }
# start the collector
docker run --rm -p 4317:4317 -p 4318:4318 -p 8889:8889 \
  -v "$PWD/otel-collector.yaml":/etc/otel/config.yaml \
  otel/opentelemetry-collector-contrib --config /etc/otel/config.yaml

Verification

Send one query, then confirm both the trace and the histogram landed.

# 1) drive a request through the instrumented path
curl -s -X POST localhost:8000/search -d '{"q":"wireless headphones"}'
# 2) the histogram must appear at the collector's Prometheus endpoint
curl -s localhost:8889/metrics | grep search_query_latency_ms_count
# expected: search_query_latency_ms_count{query_class="search"} 1
# 3) the trace must have all four child spans under one trace_id
curl -s 'localhost:3200/api/search?tags=service.name%3Dsearch-query-api' \
  | jq '.traces[0].spanSets[0].spans | length'
# expected: 4  (parse, engine.query, rerank, render)

If the count is 4 and the histogram increments, the path is fully traced. Open the trace in the viewer and read the waterfall: the widest bar is your latency budget’s owner.

Common Pitfalls

High-cardinality attributes blow up metric storage

Recording the raw query string as a metric label (not a span attribute) creates one series per unique query — millions of series. Keep query on the span only; on the histogram use bounded labels like query.class. If you must aggregate by query, hash it or bucket by length.

# WRONG: unbounded label cardinality
query_latency.record(ms, {"query": raw_q})
# RIGHT: bounded labels on the metric, full query on the span
query_latency.record(ms, {"query.class": query_class})
engine.took_ms and the engine.query span disagree

The engine.query span includes network round-trip and client deserialization; took_ms is engine-internal only. A large gap is not a bug — it is your network/serialization overhead, and seeing both is the point. Always record took_ms as a separate attribute so you can subtract it.

# span duration ~85ms but engine.took_ms=30ms => 55ms is transport/deserialize
Spans never reach the collector

Usually a BatchSpanProcessor that never flushes because the process exits first, or a wrong endpoint/TLS setting. Force a flush on shutdown and confirm the endpoint. This is the same propagation discipline needed before you can split metrics by ranker variant for canary deploys.

from opentelemetry import trace
trace.get_tracer_provider().force_flush()   # call in your shutdown hook