Metrics¶
StatsD Metrics in Socorro¶
Socorro uses StatsD with DogStatsD extensions.
Table of metrics:
Key |
Type |
|---|---|
timing |
|
gauge |
|
incr |
|
incr |
|
incr |
|
incr |
|
gauge |
|
gauge |
|
gauge |
|
gauge |
|
gauge |
|
gauge |
|
gauge |
|
incr |
|
incr |
|
timing |
|
histogram |
|
histogram |
|
incr |
|
timing |
|
timing |
|
incr |
|
timing |
|
timing |
|
incr |
|
timing |
|
timing |
|
gauge |
|
incr |
|
incr |
|
incr |
|
incr |
|
timing |
|
incr |
|
incr |
|
incr |
|
incr |
|
timing |
Metrics details:
- socorro.cron.job_run¶
Type:
timingDuration of how long it took to run the cron job.
Tags:
job: short string for the job that failedresult:successorfailure
- socorro.cron.verifyprocessed.missing_processed¶
Type:
gaugeGauge of crash reports for which there was no processed crash file.
- socorro.processor.betaversionrule.cache¶
Type:
incrCounter for whether the BetaVersionRule pulled version information from cache or not.
Tags:
result:hitormiss
- socorro.processor.betaversionrule.lookup¶
Type:
incrCounter for whether the BetaVersionRule did a lookup using the Crash Stats VersionString API.
Tags:
result:successorfail
- socorro.processor.cache_manager.evict¶
Type:
incrCounter for file evictions.
- socorro.processor.cache_manager.q_overflow¶
Type:
incrCounter for inotify Q_OVERFLOW events in cache manager.
- socorro.processor.cache_manager.usage¶
Type:
gaugeGauge for total size of cache. In bytes.
- socorro.processor.cache_manager.file_sizes.avg¶
Type:
gaugeGauge for the average file size for files in the cache. In bytes.
- socorro.processor.cache_manager.file_sizes.median¶
Type:
gaugeGauge for the median file size for files in the cache. In bytes.
- socorro.processor.cache_manager.file_sizes.ninety_five¶
Type:
gaugeGauge for the 95 percentile file size for files in the cache. In bytes.
- socorro.processor.cache_manager.file_sizes.max¶
Type:
gaugeGauge for max file size in cache. In bytes.
- socorro.processor.cache_manager.files.count¶
Type:
gaugeTotal number of files in the cache.
- socorro.processor.cache_manager.files.gt_500¶
Type:
gaugeTotal number of files in cache greater than 500mb.
- socorro.processor.denonerule.had_nones¶
Type:
incrCounter for how many crash annotation values were
None.All crash annotation values should be strings, so
Noneisn’t valid and usually comes from a bug in the crash reporter.
- socorro.processor.denullrule.has_nulls¶
Type:
incrCounter for how many nulls were in keys and values for crash annotations.
- socorro.processor.dest1.save_processed_crash¶
Type:
timingUsed in tests.
- socorro.processor.es.crash_document_size¶
Type:
histogramSize of crash document. In bytes.
- socorro.processor.es.index¶
Type:
histogramTotal time it took to index the crash document in Elasticsearch.
- socorro.processor.es.indexerror¶
Type:
incrCounter for errors when indexing a document in Elasticsearch.
Tags:
error: the error code indicating what happened
- socorro.processor.es.save_processed_crash¶
Type:
timingTimer for how long it takes to save the processed crash to Elasticsearch.
- socorro.processor.ingestion_timing¶
Type:
timingTimer for how long it took for a crash report to be ingested. This is the time between the submitted timestamp all the way through when processing was completed.
This uses the
submitted_timestampfrom the collector as the start time.
- socorro.processor.minidumpstackwalk.run¶
Type:
incrCounter for minidump stackwalk executions.
Tags:
outcome: eithersuccessorfailexitcode: the exit code of the minidump stackwalk process
- socorro.processor.process_crash¶
Type:
timingTimer for how long it takes to process a crash report.
Tags:
ruleset: the ruleset used for processing
- socorro.processor.rule.act.timing¶
Type:
timingTimer for how long it takes for the rule to run.
Tags:
rule: rule class name
- socorro.processor.save_processed_crash¶
Type:
incrCounter for number of crash reports successfully processed and saved to storage.
- socorro.processor.storage.save_processed_crash¶
Type:
timingTimer for how long it takes to save the processed crash to storage bucket.
- socorro.processor.telemetry.save_processed_crash¶
Type:
timingTimer for how long it takes to save the processed crash to Telemetry storage bucket.
- socorro.processor.truncatestackrule.stack_size¶
Type:
gaugeGauge for stack sizes.
- socorro.processor.truncatestackrule.truncated¶
Type:
incrCounter for stacks that were truncated because they were too large.
- socorro.sentry_scrub_error¶
Type:
incrEmitted when there are errors scrubbing Sentry events. Monitor these because it means we’re missing Sentry event data.
Tags:
service:webapp,submitter,processororcache_manager
- socorro.submitter.accept¶
Type:
incrCounter for how many destinations the crash report was resubmitted to.
- socorro.submitter.ignore¶
Type:
incrCounter for how many destinations were ignored for resubmitting the crash report.
- socorro.submitter.process¶
Type:
timingTimer for how long it takes to process a crash report which involves figuring out where the crash report should get sent to, downloading the data, creating the payload, and submitting it.
- socorro.submitter.unknown_finished_func_error¶
Type:
incrCounter for how many unknown finished func errors were encountered.
- socorro.submitter.unknown_process_error¶
Type:
incrCounter for how many unknown process errors were encountered.
- socorro.submitter.unknown_submit_error¶
Type:
incrCounter for how many unknown submit errors were encountered.
- socorro.webapp.crashstats.models.cache_set_error¶
Type:
incrCounter for errors when caching middleware model request results.
- socorro.webapp.view.pageview¶
Type:
timingTimer for how long it takes to handle an HTTP request.
Tags:
ajax: whether or not the request was an AJAX requestapi: whether or not the request was an API request (path starts with/api/)path: the path of the requeststatus: the HTTP response code