Instrumenting a Search Query Path with OpenTelemetry
When a search request is slow, the question is always which hop: query parsing, the engine round-trip, the reranker, or render. Without a single trace spanning all four, you correlate four logs by wall-clock timestamp and guess. This guide instruments one query path end-to-end with OpenTelemetry — discrete spans for parse, engine round-trip, rerank, and render, a latency histogram with query attributes (query, hit_count, took_ms), and OTLP export to a collector. It is the implementation detail behind the Observability & SRE for Search area, which itself fits the broader Search Engine Selection & Architecture framework.
Prerequisites
- Python 3.10+ or Node 18+ query service you control.
opentelemetry-sdk>=1.24andopentelemetry-exporter-otlp>=1.24(Python), or@opentelemetry/sdk-node>=1.21(Node).- An OTLP collector reachable at
otel-collector:4317(gRPC) or:4318(HTTP). - A search engine on
localhost:9200(Elasticsearch),localhost:7700(Meilisearch), orlocalhost:8108(Typesense).
Diagnosis / Context
The default of a single timing log (query took 312ms) tells you the total but not the split. Here is the same request as an uninstrumented log versus what you actually need:
2026-06-20T11:04:22Z search q="wireless headphones" total=312ms hits=240
That 312ms could be 30ms of engine time and 270ms of reranking, or the reverse — the log cannot say. The fix is structured spans whose durations are nested under one parent, so the trace viewer renders a waterfall and the largest self-time span names the culprit. Spans also carry attributes, so you can filter to “slow queries with zero hits” or “queries where took_ms is fine but the parent span is not” — the two signatures that separate an engine problem from a service problem.
Solution Steps
1. Configure the tracer, meter, and OTLP export
Set up the SDK once at process start. Both a TracerProvider (spans) and a MeterProvider (the latency histogram) export over OTLP to the collector.
# telemetry.py — one-time SDK bootstrap
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
resource = Resource.create({"service.name": "search-query-api"})
# traces -> OTLP gRPC collector
tp = TracerProvider(resource=resource)
tp.add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(endpoint="otel-collector:4317", insecure=True)))
trace.set_tracer_provider(tp)
# metrics -> same collector, flushed every 10s
reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint="otel-collector:4317", insecure=True),
export_interval_millis=10_000)
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[reader]))
tracer = trace.get_tracer("search.query")
meter = metrics.get_meter("search.query")
# explicit buckets must straddle your latency SLO threshold (here ~150ms)
query_latency = meter.create_histogram("search.query.latency", unit="ms")
2. Wrap the query path in nested spans
One parent span per request; child spans for each stage. Record attributes on the parent so a single span carries the full request shape.
# search.py — spans for parse -> engine -> rerank -> render
from opentelemetry import trace
from telemetry import tracer, query_latency
import time, httpx
def handle_search(raw_q: str, query_class: str = "search"):
start = time.perf_counter()
with tracer.start_as_current_span("search.request") as root:
with tracer.start_as_current_span("parse"):
parsed = build_engine_query(raw_q) # tokenize, apply filters
with tracer.start_as_current_span("engine.query") as eng:
resp = httpx.post("http://localhost:9200/_search", json=parsed).json()
took = resp.get("took", 0) # engine-reported ms
hits = resp["hits"]["total"]["value"]
eng.set_attribute("engine.took_ms", took) # engine time only
with tracer.start_as_current_span("rerank"):
ranked = rerank(resp["hits"]["hits"]) # custom scoring pass
with tracer.start_as_current_span("render"):
payload = to_response(ranked)
latency_ms = (time.perf_counter() - start) * 1000
# attributes: keep query as a low-cardinality hash if PII is a concern
attrs = {"query.class": query_class, "hit_count": hits}
root.set_attribute("query", raw_q)
root.set_attribute("hit_count", hits)
root.set_attribute("engine.took_ms", took)
query_latency.record(latency_ms, attrs) # histogram for p50/p95/p99
return payload
The same shape in Node uses the @opentelemetry/api tracer with tracer.startActiveSpan; the span names and attributes stay identical so dashboards are portable across services.
// search.js — equivalent nested spans (Node SDK)
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('search.query');
async function handleSearch(rawQ) {
return tracer.startActiveSpan('search.request', async (root) => {
const parsed = await tracer.startActiveSpan('parse', (s) => {
const p = buildEngineQuery(rawQ); s.end(); return p;
});
const resp = await tracer.startActiveSpan('engine.query', async (s) => {
const r = await engineSearch(parsed);
s.setAttribute('engine.took_ms', r.took); s.end(); return r;
});
root.setAttribute('hit_count', resp.hits.total.value);
root.end();
return toResponse(resp);
});
}
3. Run a collector to receive OTLP
The collector terminates OTLP and re-exports to your backends. This minimal config accepts gRPC + HTTP and exposes metrics for Prometheus to scrape.
# otel-collector.yaml
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
exporters:
prometheus: { endpoint: 0.0.0.0:8889 } # scrape target for metrics
otlphttp/traces: { endpoint: http://tempo:4318 }
service:
pipelines:
traces: { receivers: [otlp], exporters: [otlphttp/traces] }
metrics: { receivers: [otlp], exporters: [prometheus] }
# start the collector
docker run --rm -p 4317:4317 -p 4318:4318 -p 8889:8889 \
-v "$PWD/otel-collector.yaml":/etc/otel/config.yaml \
otel/opentelemetry-collector-contrib --config /etc/otel/config.yaml
Verification
Send one query, then confirm both the trace and the histogram landed.
# 1) drive a request through the instrumented path
curl -s -X POST localhost:8000/search -d '{"q":"wireless headphones"}'
# 2) the histogram must appear at the collector's Prometheus endpoint
curl -s localhost:8889/metrics | grep search_query_latency_ms_count
# expected: search_query_latency_ms_count{query_class="search"} 1
# 3) the trace must have all four child spans under one trace_id
curl -s 'localhost:3200/api/search?tags=service.name%3Dsearch-query-api' \
| jq '.traces[0].spanSets[0].spans | length'
# expected: 4 (parse, engine.query, rerank, render)
If the count is 4 and the histogram increments, the path is fully traced. Open the trace in the viewer and read the waterfall: the widest bar is your latency budget’s owner.
Common Pitfalls
High-cardinality attributes blow up metric storage
Recording the raw query string as a metric label (not a span attribute) creates one series per unique query — millions of series. Keep query on the span only; on the histogram use bounded labels like query.class. If you must aggregate by query, hash it or bucket by length.
# WRONG: unbounded label cardinality
query_latency.record(ms, {"query": raw_q})
# RIGHT: bounded labels on the metric, full query on the span
query_latency.record(ms, {"query.class": query_class})
engine.took_ms and the engine.query span disagree
The engine.query span includes network round-trip and client deserialization; took_ms is engine-internal only. A large gap is not a bug — it is your network/serialization overhead, and seeing both is the point. Always record took_ms as a separate attribute so you can subtract it.
# span duration ~85ms but engine.took_ms=30ms => 55ms is transport/deserialize
Spans never reach the collector
Usually a BatchSpanProcessor that never flushes because the process exits first, or a wrong endpoint/TLS setting. Force a flush on shutdown and confirm the endpoint. This is the same propagation discipline needed before you can split metrics by ranker variant for canary deploys.
from opentelemetry import trace
trace.get_tracer_provider().force_flush() # call in your shutdown hook
Related
- Observability & SRE for Search Systems — the area this instrumentation feeds: golden signals, SLOs, and gating deploys.
- Alerting on indexing lag with SLOs — apply the same metric pipeline to the write path’s freshness budget.
- Ranking Algorithms & Relevance Tuning — the rerank span here wraps the scoring logic tuned in that area.