Metrics

StatsD Metrics in Socorro

Socorro uses StatsD with DogStatsD extensions.

Table of metrics:

Key

Type

socorro.cron.job_run

timing

socorro.cron.verifyprocessed.missing_processed

gauge

socorro.processor.betaversionrule.cache

incr

socorro.processor.betaversionrule.lookup

incr

socorro.processor.cache_manager.evict

incr

socorro.processor.cache_manager.q_overflow

incr

socorro.processor.cache_manager.usage

gauge

socorro.processor.cache_manager.file_sizes.avg

gauge

socorro.processor.cache_manager.file_sizes.median

gauge

socorro.processor.cache_manager.file_sizes.ninety_five

gauge

socorro.processor.cache_manager.file_sizes.max

gauge

socorro.processor.cache_manager.files.count

gauge

socorro.processor.cache_manager.files.gt_500

gauge

socorro.processor.denonerule.had_nones

incr

socorro.processor.denullrule.has_nulls

incr

socorro.processor.dest1.save_processed_crash

timing

socorro.processor.es.crash_document_size

histogram

socorro.processor.es.index

histogram

socorro.processor.es.indexerror

incr

socorro.processor.es.save_processed_crash

timing

socorro.processor.ingestion_timing

timing

socorro.processor.minidumpstackwalk.run

incr

socorro.processor.process_crash

timing

socorro.processor.rule.act.timing

timing

socorro.processor.save_processed_crash

incr

socorro.processor.storage.save_processed_crash

timing

socorro.processor.telemetry.save_processed_crash

timing

socorro.processor.truncatestackrule.stack_size

gauge

socorro.processor.truncatestackrule.truncated

incr

socorro.sentry_scrub_error

incr

socorro.submitter.accept

incr

socorro.submitter.ignore

incr

socorro.submitter.process

timing

socorro.submitter.unknown_finished_func_error

incr

socorro.submitter.unknown_process_error

incr

socorro.submitter.unknown_submit_error

incr

socorro.webapp.crashstats.models.cache_set_error

incr

socorro.webapp.view.pageview

timing

Metrics details:

socorro.cron.job_run

Type: timing

Duration of how long it took to run the cron job.

Tags:

  • job: short string for the job that failed

  • result: success or failure

socorro.cron.verifyprocessed.missing_processed

Type: gauge

Gauge of crash reports for which there was no processed crash file.

socorro.processor.betaversionrule.cache

Type: incr

Counter for whether the BetaVersionRule pulled version information from cache or not.

Tags:

  • result: hit or miss

socorro.processor.betaversionrule.lookup

Type: incr

Counter for whether the BetaVersionRule did a lookup using the Crash Stats VersionString API.

Tags:

  • result: success or fail

socorro.processor.cache_manager.evict

Type: incr

Counter for file evictions.

socorro.processor.cache_manager.q_overflow

Type: incr

Counter for inotify Q_OVERFLOW events in cache manager.

socorro.processor.cache_manager.usage

Type: gauge

Gauge for total size of cache. In bytes.

socorro.processor.cache_manager.file_sizes.avg

Type: gauge

Gauge for the average file size for files in the cache. In bytes.

socorro.processor.cache_manager.file_sizes.median

Type: gauge

Gauge for the median file size for files in the cache. In bytes.

socorro.processor.cache_manager.file_sizes.ninety_five

Type: gauge

Gauge for the 95 percentile file size for files in the cache. In bytes.

socorro.processor.cache_manager.file_sizes.max

Type: gauge

Gauge for max file size in cache. In bytes.

socorro.processor.cache_manager.files.count

Type: gauge

Total number of files in the cache.

socorro.processor.cache_manager.files.gt_500

Type: gauge

Total number of files in cache greater than 500mb.

socorro.processor.denonerule.had_nones

Type: incr

Counter for how many crash annotation values were None.

All crash annotation values should be strings, so None isn’t valid and usually comes from a bug in the crash reporter.

socorro.processor.denullrule.has_nulls

Type: incr

Counter for how many nulls were in keys and values for crash annotations.

socorro.processor.dest1.save_processed_crash

Type: timing

Used in tests.

socorro.processor.es.crash_document_size

Type: histogram

Size of crash document. In bytes.

socorro.processor.es.index

Type: histogram

Total time it took to index the crash document in Elasticsearch.

socorro.processor.es.indexerror

Type: incr

Counter for errors when indexing a document in Elasticsearch.

Tags:

  • error: the error code indicating what happened

socorro.processor.es.save_processed_crash

Type: timing

Timer for how long it takes to save the processed crash to Elasticsearch.

socorro.processor.ingestion_timing

Type: timing

Timer for how long it took for a crash report to be ingested. This is the time between the submitted timestamp all the way through when processing was completed.

This uses the submitted_timestamp from the collector as the start time.

socorro.processor.minidumpstackwalk.run

Type: incr

Counter for minidump stackwalk executions.

Tags:

  • outcome: either success or fail

  • exitcode: the exit code of the minidump stackwalk process

socorro.processor.process_crash

Type: timing

Timer for how long it takes to process a crash report.

Tags:

  • ruleset: the ruleset used for processing

socorro.processor.rule.act.timing

Type: timing

Timer for how long it takes for the rule to run.

Tags:

  • rule: rule class name

socorro.processor.save_processed_crash

Type: incr

Counter for number of crash reports successfully processed and saved to storage.

socorro.processor.storage.save_processed_crash

Type: timing

Timer for how long it takes to save the processed crash to storage bucket.

socorro.processor.telemetry.save_processed_crash

Type: timing

Timer for how long it takes to save the processed crash to Telemetry storage bucket.

socorro.processor.truncatestackrule.stack_size

Type: gauge

Gauge for stack sizes.

socorro.processor.truncatestackrule.truncated

Type: incr

Counter for stacks that were truncated because they were too large.

socorro.sentry_scrub_error

Type: incr

Emitted when there are errors scrubbing Sentry events. Monitor these because it means we’re missing Sentry event data.

Tags:

  • service: webapp, submitter, processor or cache_manager

socorro.submitter.accept

Type: incr

Counter for how many destinations the crash report was resubmitted to.

socorro.submitter.ignore

Type: incr

Counter for how many destinations were ignored for resubmitting the crash report.

socorro.submitter.process

Type: timing

Timer for how long it takes to process a crash report which involves figuring out where the crash report should get sent to, downloading the data, creating the payload, and submitting it.

socorro.submitter.unknown_finished_func_error

Type: incr

Counter for how many unknown finished func errors were encountered.

socorro.submitter.unknown_process_error

Type: incr

Counter for how many unknown process errors were encountered.

socorro.submitter.unknown_submit_error

Type: incr

Counter for how many unknown submit errors were encountered.

socorro.webapp.crashstats.models.cache_set_error

Type: incr

Counter for errors when caching middleware model request results.

socorro.webapp.view.pageview

Type: timing

Timer for how long it takes to handle an HTTP request.

Tags:

  • ajax: whether or not the request was an AJAX request

  • api: whether or not the request was an API request (path starts with /api/)

  • path: the path of the request

  • status: the HTTP response code