Signature Generation¶
Overview¶
When processing a crash report, Socorro generates a crash signature. The signature is a short string that lets us group crash reports that likely have a common cause together.
Signature generation typically starts with a string based on the stack of the crashing thread. Various rules are applied that adjust the signature and after everything is done, we have a Socorro crash signature.
Signature generation is finicky. When it generates too coarse a signature, then crash reports that have nothing to do with one another end up grouped together. When it generates too fine a signature, then crash reports end up in very small groups which are unlikely to be looked at. Since technologies are constantly changing, we’re constantly honing signature generation.
Anyone can suggest changes to signature generation. It’s the part of the crash ingestion pipeline that’s maintained by non-Socorro developers.
Signature generation code is here:
https://github.com/mozilla-services/socorro/tree/main/socorro/signature
The lists for configuring the C signature generation class are here:
https://github.com/mozilla-services/socorro/tree/main/socorro/signature/siglists
How to make a signature generation change¶
Signature generation changes are typically self-service. Code reviews and deployments are handled by the Socorro maintainers, but we ask you to file a pull request on GitHub with the desired change.
To make a change to signature generation:
Write up a bug in the Socorro product and please include the following:
explanation of what the problem you want to solve is
urls of examples of crashes that have the problem you’re trying to solve
Examples of signature generation change bugs:
If you’ve made changes to signature generation before or you’re confident in the change you’re making, you can make changes directly using the GitHub interface:
https://github.com/mozilla-services/socorro/tree/main/socorro/signature/siglists
If you want to test your changes or experiment with them, then you’ll need to set up a local development environment and make the changes with a GitHub pull request.
See Development for setting up a local development environment.
Read through the rest of this chapter which describes how signature generation works, what files are involved, and how to test changes.
How to review a signature generation changes¶
This is done by the Socorro maintainers.
Make sure the PR has a corresponding bug in Bugzilla and references the bug in the commit summary.
This is important because signature generation is tricky and we need the historical data for what changes we made, for whom, why, and how it affected signature generation.
Verify there are no typos in the change.
We have a unit test that verifies there are no syntax errors in those files, but that (obviously) doesn’t cover typos.
Run the pull request changes through signature generation using the command line interface in your local dev environment. See Signature generation module.
Verify with the author that the changes occur as intended.
Merge the PR and verify the example crashes on -stage.
The easiest way to do that is to use Super Search and search for a signature. The most common change is an addition to the prefix list, in which case you want to search for the frame signature that was added, and verify that in recent signatures there is something following it.
If you don’t want to wait for new crash reports to arrive, you can find an existing one and send it to reprocessing. That can be done on the report/index page directly, or via the admin panel.
Note that after a signature change has been pushed to production, it may be useful to reprocess the affected signatures. We can help with this if the change author requests it.
Philosophy on signature generation¶
Signatures should be such that they group like crash reports together. Signatures that are too coarse or too fine are unhelpful.
We can make changes to signature generation and then reprocess affected crashes. We often do this to better analyze specific kinds of crashes–maybe to break up a signature into smaller groups.
Sometimes we make changes to signature generation when focusing on a specific class of crashes and we tweak signatures so as to highlight interesting things. Using siggen can make experimenting easier to do.
When you’re adding a symbol to a list so that signature generation will continue past a certain frame and you’re deciding between whether to add a symbol to the “prefix list” or the “irrelevant list”, use the following to help guide you:
If it’s a symbol that has platform variants and the symbol isn’t helpful in summarizing the cause of the crash, then put it in the irrelevant list.
If it’s a symbol that’s part of panic/error/crash handling code that kicks off after the cause of the crash to handle the crash, then put it in the irrelevant list.
Otherwise, put it in the prefix list.
If you have questions, please ask in the bug comments.
Signature generation module¶
This Python module covers crash signature generation.
command line interface¶
This module defines a command line interface for signature generation. Given a crash id, it pulls the raw and processed data from Socorro -prod, generates a signature using the code in this module, and then tells you the original signature and the newly generated one.
This can be used for testing signature generation changes, regression testing, and astounding your friends at parties.
You need to run this inside a Socorro environment. For example, you could run this in the processor Docker container. You can start a container like that like this:
$ make shell
Once you’re in your Socorro environment, you can run signature generation. You can pass it crash ids via the command line as arguments:
socorro-cmd signature CRASHID [CRASHID...]
It can also take crash ids from stdin.
Examples:
getting crash ids from the file
crashids.txt
:$ cat crashids.txt | socorro-cmd signature
getting crash ids from another command:
$ socorro-cmd fetch_crashids --num=10 | socorro-cmd signature
Note
fetch_crashids
defaults to Firefox. If you want a different product, use the--product
argument. Seesocorro-cmd fetch_crashids --help
for options.getting crash ids for crash reports with a specific signature and then checking to see if the signatures have changed:
$ socorro-cmd fetch_crashids --signature='js::NativeGetProperty' --num=5 | socorro-cmd signature
spitting output in CSV format to more easily analyze results for generating signatures for multiple crashes:
$ cat crashids.txt | socorro-cmd signature --format=csv
For more argument help, see:
$ socorro-cmd signature --help
library¶
This code is also available as library that’s updated periodically by WillKG.
If you’re interested in using it, let us know.
Signatures Utilities Lists¶
This folder contains lists that are used to configure the C signature generation
process. Each .txt
file contains a list of signatures or regex matching
signatures, that are used at various steps of our algorithm. Regular expressions
use the Python syntax.
Signature Generation Algorithm¶
When generating a C/Rust signature, 5 steps are involved.
We walk the crashing thread’s stack, looking for things that would match the Signature Sentinels. The first matching element, if any, becomes the top of the sub-stack we’ll consider going forward.
We walk the stack, ignoring everything that matches the Irrelevant Signatures. We consider the first non-matching element the top of the new sub-stack.
We rewrite dll frame signatures to be the module only and merge consecutive ones.
We accumulate signatures that match the Prefix Signatures, until something doesn’t match.
We normalize each signature we accumulated. Signatures that match the Signatures With Line Numbers have their associated code line number added to them, like this:
signature:42
.
The generated signature is a concatenation of all the accumulated signatures,
separated with a pipe sign (|
), and converted to a regular expression.
Signature generation then uses .match()
to match frames.
Because of that, when changing these lists, make sure you keep the following things in mind:
Make sure you’re using valid regular expression syntax and escape special characters like
(
,)
,.
, and$
.There’s no need to add a trailing
.*
since signature generation uses.match()
which will match from the beginning of the string.Try to keep it roughly in alphabetical order so as to make it easier to skim through later.
Signature Sentinels¶
File: signature_sentinels.txt
Signature Sentinels are signatures (not regular expression) that should be used as the top of the stack if present. Everything before the first matching signature will be ignored.
The code iterates through the stack frame, throwing away everything it finds until it encounters a match to this regular expression or the end of the stack. If it finds a match, it passes all the frames after the match to the next step. If it finds no match, it passes the whole list of frames to the next step.
A typical line might be _purecall
.
Prefix Signatures¶
File: prefix_signature_re.txt
Prefix Signatures are regular expressions of signatures that will be combined with the following frame’s signature. Signature generation stops at the first non-matching signature it finds.
A typical rule might be JSAutoCompartment::JSAutoCompartment.*
.
Note
These are regular expressions. Dollar signs and other regexp characters need
to be escaped with a \
.
Irrelevant Signatures¶
File: irrelevant_signature_re.txt
Irrelevant Signatures are regular expressions of signatures that will be ignored while going through the stack. Anything that matches this list will not be added to the overall signature.
Add symbols to this list that:
have platform variants that prevent crash signatures from being the same across platforms
are involved in panic, error, or crash handling code that happens after the actual crash
A typical rule might be (Nt|Zw)?WaitForSingleObject(Ex)?
.
Signatures With Line Numbers¶
File: signatures_with_line_numbers_re.txt
Signatures with line number are regular expressions of signatures that will be combined with their associated source code line numbers.
Signature generation ruleset¶
This is the signature generation ruleset defined at socorro.signature.generator.DEFAULT_RULESET
:
Rule: SignatureGenerationRule
Generates a signature based on stack frames.
For Java crashes, this generates a basic signature using stack frames.
For C/C++/Rust crashes, this generates a more robust signature using normalized versions of stack frames augmented by the contents of the signature lists.
Rough signature list rules (there are more details in the siglists README):
Walk the frames looking for a “signature sentinel” which becomes the first item in the signature.
Continue walking frames.
If the frame is in the “irrelevant” list, ignore it and continue.
If the frame is in the “prefix” list, add it to the signature and continue.
If the frame isn’t in either list, stop walking frames.
Signature is generated by joining those frames with “ | “ between them.
If it’s a C/C++/Rust signature, this rule also adds to
result.extra
:normalized_frames
: the list of normalized framesproto_signature
: a" | "
delimited string of the normalized frames
- param signature_list_dir:
path to the directory with the signature lists to use or
None
if you want to use the included ones
Rule: StackwalkerErrorSignatureRule
Appends minidump-stackwalker error to signature.
Rule: BadHardware
Prepends
bad hardware
to signatures that are from bad hardware.See bug #1733904.
Rule: OOMSignature
Prepends
OOM | <size>
to signatures for OOM crashes.See bug #1007530.
Rule: AbortSignature
Prepends abort message to signature.
See bug #803779.
Rule: SignatureShutdownTimeout
Replaces signature with
async_shutdown_timeout
message.This supports
AsyncShutdownTimeout
annotation values with the following structure:{ "phase": <str>, "conditions": [ { "name": <str>, ... } ] }
Rule: SignatureRunWatchDog
Prepends
shutdownhang
to signature for shutdown hang crashes.Rule: SignatureIPCChannelError
Stomps on signature with shutdownkill signature
Either
IPCError-browser | ShutDownKill
orIPCError-content | ShutDownKill
.Rule: SignatureIPCMessageName
Appends
ipc_message_name
value to signature.Rule: StackOverflowSignature
Prepends
stackoverflow
See bug #1796389.
Rule: HungProcess
Prepends
hang: <hang_type>
to the signature of hangs.See bug #1826703.
Rule: SigFixWhitespace
Fix whitespace in signatures.
This does the following:
trims leading and trailing whitespace
converts all non-space whitespace characters to space
reduce consecutive spaces to a single space
Rule: SigTruncate
Truncates signatures down to
SIGNATURE_MAX_LENGTH
characters.