7. Signature Generation

7.1. Introduction

During processing of a crash, Socorro creates a signature using the signature generation module. Signature generation typically starts with a string based on the stack of the crashing thread. Various rules are applied and after everything is done, we have a Socorro crash signature.

The signature generation code is here:

https://github.com/mozilla-services/socorro/tree/master/socorro/signature

7.2. Signature generation module

This Python module covers crash signature generation.

7.2.1. command line interface

This module defines a command line interface for signature generation. Given a crash id, it pulls the raw and processed data from Socorro -prod, generates a signature using the code in this module, and then tells you the original signature and the newly generated one.

This can be used for testing signature generation changes, regression testing, and astounding your friends at parties.

To use:

$ python -m socorro.signature CRASHID [CRASHID ...]

Pulling crash ids from the file crashids.txt:

$ cat crashids.txt | python -m socorro.signature

Pulling crash ids from another script:

$ ./scripts/fetch_crashids.py --num=10 | python -m socorro.signature

Spitting output in CSV format to more easily analyze results for generating signatures for multiple crashes:

$ cat crashids.txt | python -m socorro.signature --format=csv

For more argument help, see:

$ python -m socorro.signature --help

Note

You need to run this inside a Socorro environment. For example, you could do this:

$ docker-compose run processor bash
app@.../app$ python -m socorro.signature --help

7.2.2. library

This code can sort of be used as a library. It’s been decoupled from many of Socorro’s bits, but still has some requirements. Roughtly, it requires:

  • requests
  • socorro.siglists
  • socorro.lib.treelib
  • ujson

The main class is socorro.signature.SignatureGenerator. It takes a pipeline of rules to use to generate signatures.

Rough usage:

from socorro.signature import SignatureGenerator

generator = SignatureGenerator()

raw_crash = get_raw_crash_from_somewhere()
processed_crash = get_processed_crash_from_somewhere()

ret = generator.generate(raw_crash, processed_crash)
print(ret['signature'])

Note

If you’re interested in using this library, write up a bug and let us know the use case and we’ll work with you to make it more library-friendly to meet your needs.

7.3. Signatures Utilities Lists

This folder contains lists that are used to configure the C signature generation process. Each .txt file contains a list of signatures or regex matching signatures, that are used at various steps of our algorithm. Regular expressions use the Python syntax.

7.3.1. Signature Generation Algorithm

When generating a C signature, 5 steps are involved.

  1. We walk the crashing thread’s stack, looking for things that would match the Signature Sentinels. The first matching element, if any, becomes the top of the sub-stack we’ll consider going forward.
  2. We walk the stack, ignoring everything that matches the Irrelevant Signatures. We consider the first non-matching element the top of the new sub-stack.
  3. We rewrite every signature missing symbols that matches the Trim DLL Signatures to be the module only (the part before the first @ sign). We also merge them so only one of those frames makes it to the final signature.
  4. We accumulate signatures that match the Prefix Signatures, until something doesn’t match.
  5. We normalize each signature we accumulated. Signatures that match the Signatures With Line Numbers have their associated code line number added to them, like this: signature:42.

The generated signature is a concatenation of all the accumulated signatures, separated with a pipe sign (|).

7.3.1.1. Signature Sentinels

File: signature_sentinels.txt

Signature Sentinels are signatures (not regular expression) that should be used as the top of the stack if present. Everything before the first matching signature will be ignored.

The code iterates through the stack frame, throwing away everything it finds until it encounters a match to this regular expression or the end of the stack. If it finds a match, it passes all the frames after the match to the next step. If it finds no match, it passes the whole list of frames to the next step.

A typical line might be _purecall.

7.3.1.2. Irrelevant Signatures

File: irrelevant_signature_re.txt

Irrelevant Signatures are regular expressions of signatures that will be ignored while going through the stack. Anything that matches this list will not be added to the overall signature.

A typical rule might be (Nt|Zw)?WaitForSingleObject(Ex)?.

7.3.1.3. Prefix Signatures

File: prefix_signature_re.txt

Prefix Signatures are regular expressions of signatures that will be combined with the following frame’s signature. Signature generation stops at the first non-matching signature it finds.

A typical rule might be JSAutoCompartment::JSAutoCompartment.*.

Note: These are regular expressions. Dollar signs and other regexp characters need to be escaped with a \.

7.3.1.4. Trim DLL Signatures

File: trim_dll_signature_re.txt

Trim DLL Signatures are regular expressions of signatures that will be trimmed down to only their module name. For example, if the list contains foo32\.dll.* and a stack trace looks like this:

0x0
foo32.dll@0x2131
foo32.dll@0x1943
myFavoriteSig()

The generated signature will be: 0x0 | foo32.dll | myFavoriteSig().

7.3.1.5. Signatures With Line Numbers

File: signatures_with_line_numbers_re.txt

Signatures with line number are regular expressions of signatures that will be combined with their associated source code line numbers.

7.3.2. How to edit these lists

The first thing we will ask you to do is to file a bug. We keep track of every change in Socorro via bugs, so it’s important that each commit has one associated to it. File a bug in the Socorro::General component, describe the changes you want to make, and assign it to you.

Then proceed to making those changes...

7.3.2.1. Using the command line

If you are a git power user, you probably don’t need us to explain how to do this! :)

If you are not, you’re probably better off using GitHub’s interface. Read on!

7.3.2.2. Using GitHub’s interface

First, you need to be logged in to GitHub. Open the file you want to edit, and then click the little pen in the top right corner of the page, the one that says Fork this project and edit the file, or Edit the file in your fork of this project if you already have a fork of it.

That will take you to an editor, where you can write any changes you want. Once you are done editting the file, enter a commit description. We have some conventions, and a bot that will automatically close bugs, so please make your commit message following this pattern: Fixes bug XYZ - Desciption of the change. Once you are ready, click Propose file change.

That will create a branch in your fork of the socorro project, and take you to the commit you just created. You can verify that the changes you made are correct, and then click Create pull request, and then Create pull request again. Once the pull request is opened, Travis CI will automatically start running our test suite, which includes sanity checks for those signature lists. You can see the status of those tests in the pull request, and click the Details link to see logs in case of a failure.

That’s it! You have proposed a change, we have been notified about it. Someone from the Socorro team will review your changes and merge them if they are appropriate. Thank you for contributing to Socorro!

7.3.3. Watching only the siglists folder

If you are interested in watching what’s changing in the siglists directory in the repository, but don’t care much about what happens in the rest of the Socorro repo, you can easily set a filter in your email client to do that. Here’s an example filter you can use today:

to:(socorro@noreply.github.com) ("A socorro/siglists/" OR "M socorro/siglists/" OR "D socorro/siglists")

7.3.4. How to review a siglist change

The first step is to verify that there is no typo in the change (usually, the bug contains examples of crash reports that should be impacted, look at their frames). Note that we have a unit test that verifies there are no syntax errors in those files.

Run the pull request changes through signature generation using the command line interface in your local dev environment.

Verify with the author that the changes occur as intended.

Then merge it and verify the example crashes on -stage. The easiest way to do that is to use Super Search and search for a signature. The most common change is an addition to the prefix list, in which case you want to search for the frame signature that was added, and verify that in recent signatures there is something following it.

If you don’t want to wait for new crash reports to arrive, you can find an existing one and send it to reprocessing. That can be done on the report/index page directly, or via the admin panel.

Note that after a signature change has been pushed to production, you might want to reprocess the affected signatures.