Skip to main content

Observability for Parca

Parca supports four pillars of Observability. Monitoring, Logging, Tracing, and Profiling.

Profiling

Parca by default scrapes its own pprof endpoints on localhost:7070/debug/pprof. The different types of profiles are available via Parca's UI.

Logging

Parca logs are structured by default. Parca's log level can be configured with --log-level="info".

Tracing

Parca has tracing for some of its functionality. You can configure the OpenTelemetry collector with --otlp-address=STRING.

For a local setup with Jaeger check scripts/local-tracing/.

Monitoring

Parca exposes Prometheus metrics on localhost:7070/metrics.

Parca's monitoring is mostly focused on read and write requests to Parca. For that reason the grpc_server_handled_total is the most important metric for Parca and most of the other metrics and helpful for debugging overall.

There are some FrostDB specific metrics that are useful to monitor as well.

MetricDescriptionType
frostdb_granules_compactions_totalNumber of per table compaction events that have occurred.counter
frostdb_granules_created_totalNumber of per table granules that have been created.counter
frostdb_blocks_rotated_totalNumber of per table blocks that have rotated out.counter
frostdb_active_table_block_sizeThe estimated size in bytes of each tables active block size.gauge
frostdb_row_insert_sizePer table tracking of the size in bytes of rows inserted.histogram
frostdb_thanos_objstore_bucket_last_successful_upload_timeIf bucket storage is enabled, the last successful upload to the bucket.gauge
frostdb_thanos_objstore_bucket_operation_failures_totalTotal number of bucket operations that have failed.counter
frostdb_tx_high_watermarkThe transaction number that is safe to read up to.gauge

Service Level Objectives

Here are some example Service Level Objectives for Parca.

CategorySLISLO
Write
AvailabilityThe proportion of successful ProfileStoreService.WriteRaw requests (such as from Parca agent), as measured by Parca's gRPC metrics interceptor.99.9% in 4w
LatencyThe proportion of sufficiently fast requests to ProfileStoreService.WriteRaw, as measured by Parca's gRPC metrics interceptor.95% of requests in < 100ms in 4w
DebugInfo
AvailabilityThe proportion of successful DebugInfoService.Upload requests, as measured by Parca's gRPC interceptor.99% in 4w
LatencyThe proportion of sufficiently fast requests to DebugInfoService.Upload, as measured by Parca's gRPC metrics interceptor.95% of requests in < 30s in 4w
AvailabilityThe proportion of successful DebugInfoService.Exists requests, as measured by Parca's gRPC interceptor.99% in 4w
LatencyThe proportion of sufficiently fast requests to DebugInfoService.Exists, as measured by Parca's gRPC metrics interceptor.95% of requests in < 100ms in 4w
Query
AvailabilityThe proportion of successful QueryService.Query requests, as measured by Parca's gRPC interceptor.99% in 2w
LatencyThe proportion of sufficiently fast requests to QueryService.Query, as measured by Parca's gRPC metrics interceptor.95% of requests in < 1s in 2w
QueryRange
AvailabilityThe proportion of successful QueryService.QueryRange requests, as measured by Parca's gRPC interceptor.99% in 2w
LatencyThe proportion of sufficiently fast QueryService.QueryRange requests, as measured by Parca's gRPC metrics interceptor.95% of requests in < 100ms in 2w
QueryLabels
AvailabilityThe proportion of successful Query Labels requests to Parca, as measured by Parca's gRPC interceptor.99% in 2w
LatencyThe proportion of sufficiently fast Query Labels requests handled by Parca, as measured by Parca's gRPC metrics interceptor.95% of requests in < 100ms in 2w

Parca considers gRPC codes Aborted, Unavailable, Internal, Unknown, Unimplemented, DataLoss as errors. Every other gRPC code is considered successful.

Pyrra

Parca contains Service Level Objective files for Pyrra. Pyrra will setup alerting based on the given SLOs and contains a UI for the most important aspects of these SLOs.

Parca Pyrra example SLO

Grafana Dashboard

Parca ships with a Grafana dashboard that gives a good overview of Parca's state and should be a good entry point when troubleshooting. This dashboard should follow the above-given SLOs for Parca.

Parca Grafana dashboard