Development¶
This chapter covers getting started with Socorro using Docker for a local development environment.
Setup quickstart¶
Install required software: Docker, just, and git.
Linux:
Use your package manager.
OSX:
Install Docker for Mac which will install Docker.
Use homebrew to install just and git:
$ brew install just git
Other:
Clone the repository so you have a copy on your host machine.
Instructions for cloning are on the Socorro page in GitHub.
(Optional for Linux users) Set UID and GID for Docker container user.
If you’re on Linux or you want to set the UID/GID of the app user that runs in the Docker containers, run:
$ just _env
Then edit the file and set the
USE_UID
andUSE_GID
variables. These will get used when creating the app user in the base image.If you ever want different values, change them in
.env
and re-runjust build
.Build Docker images for Socorro services.
From the root of this repository, run:
$ just build
That will build the app Docker image required for development.
Initialize Postgres, Elasticsearch, Pub/Sub, S3, and SQS.
To do that, run:
$ just setup
This creates the Postgres database and sets up tables, stored procedures, integrity rules, types, and a bunch of other things. It also adds a bunch of static data to lookup tables.
For Elasticsearch, it sets up Super Search fields and the index for processed crash data.
For S3, this creates the required buckets.
For Pub/Sub, this creates the required topics and subscriptions.
For SQS, this creates queues.
Populate data stores with required data.
Then you need to fetch product build data and normalization data that Socorro relies on that comes from external systems and changes day-to-day.
To do that, run:
$ just update-data
At this point, you should have a basic functional Socorro development environment that has no crash data in it.
Note
You can run just setup
and just update-data
any time you want to
throw out all state and re-initialize services.
See also
- Make changes to signature generation!
If you need to make changes to signature generation, see Signature Generation.
- Run the processor and get some crash data!
If you need crash data, see Processor for additional setup, fetching crash data, and running the processor.
- Update your local development environment!
See Updating data in a dev environment for how to maintain and update your local development environment.
- Learn about configuration!
See Configuration for how configuration works and about
my.env
.- Run the webapp!
See Crash Stats Webapp for additional setup and running the webapp.
- Run scheduled tasks!
See Crontabber for additional setup and running cronrun.
Bugs / Issues¶
We use Bugzilla for bug tracking.
If you want to do work for which there is no bug, please write up a bug first so we can work out the problem and how to approach a solution.
Code workflow¶
Bugs¶
Either write up a bug or find a bug to work on.
Assign the bug to yourself.
Work out any questions about the problem, the approach to fix it, and any additional details by posting comments in the bug.
Pull requests¶
Pull request summary should indicate the bug the pull request addresses. Use a hyphen between “bug” and the bug ID(s). For example:
bug-nnnnnnn: removed frog from tree class
For multiple bugs fixed within a single pull request, list the bugs out individually. For example:
bug-nnnnnnn, bug-nnnnnnn: removed frog from tree class
Pull request descriptions should cover at least some of the following:
what is the issue the pull request is addressing?
why does this pull request fix the issue?
how should a reviewer review the pull request?
what did you do to test the changes?
any steps-to-reproduce for the reviewer to use to test the changes
After creating a pull request, attach the pull request to the relevant bugs.
We use the rob-bugson Firefox addon. If the pull request has “bug-nnnnnnn: …” or “bug-nnnnnnn, bug-nnnnnnn: …” in the summary, then rob-bugson will see that and create a “Attach this PR to bug …” link.
Then ask someone to review the pull request. If you don’t know who to ask, look at other pull requests to see who’s currently reviewing things.
Code reviews¶
Pull requests should be reviewed before merging.
Style nits should be covered by linting as much as possible.
Code reviews should review the changes in the context of the rest of the system.
Landing code¶
Once the code has been reviewed and all tasks in CI pass, the pull request author should merge the code.
This makes it easier for the author to coordinate landing the changes with other things that need to happen like landing changes in another repository, data migrations, configuration changes, and so on.
We use “Rebase and merge” in GitHub.
Conventions¶
For conventions, see: https://github.com/mozilla-services/socorro/blob/main/.editorconfig
Python code conventions¶
All Python code files should have an MPL v2 header at the top:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
To lint the code:
$ just lint
If you hit issues, use # noqa
.
To run the reformatter:
$ just lint --fix
We’re using:
ruff: linting and code formatting
HTML conventions¶
2-space indentation.
Javascript code conventions¶
2-space indentation.
We’re using:
eslint: linting
Git conventions¶
First line is a summary of the commit. It should start with the bug number. Use a hyphen between “bug” and the bug ID(s). For example:
bug-nnnnnnn: summary
For multiple bugs fixed within a single commit, list the bugs out individually. For example:
bug-nnnnnnn, bug-nnnnnnn: summary
After that, the commit should explain why the changes are being made and any notes that future readers should know for context.
Migrations¶
Database migrations (Django)¶
We use Django’s ORM and thus we do database migrations using Django’s migration system.
Do this:
$ just shell
app@socorro:/app$ cd webapp
app@socorro:/app/webapp$ ./manage.py makemigration --name "BUGID_desc" APP
Elasticsearch migrations (Elasticsearch)¶
We don’t do migrations of Elasticsearch data. The system creates a new index every week, so any changes to new fields or mappings will be reflected the next time it creates an index.
Dependencies¶
Python Dependencies¶
Python dependencies for all parts of Socorro are in requirements.in
and compiled using pip-compile
with hashes and dependencies of dependencies
in the requirements.txt
file.
For example, to add foobar
version 5:
add
foobar==5
torequirements.in
run:
$ just rebuild-reqs
to apply the updates to
requirements.txt
rebuild your docker environment:
$ just build
If there are problems, it’ll tell you.
In some cases, you might want to update the primary and all the secondary dependencies. To do this, run:
$ just rebuild-reqs --update
JavaScript Dependencies¶
Frontend dependencies for the webapp are in webapp/package.json
. They
must be pinned and included in
package-lock.json.
You can add new dependencies using npm
(you must use version 5 or higher):
$ npm install --save-exact foobar@1.0.0
Then rebuild your docker environment:
$ just build
If there are problems, it’ll tell you.
Documentation¶
Documentation for Socorro is build with Sphinx and is available on ReadTheDocs. API is automatically extracted from docstrings in the code.
To build the docs, run this:
$ just docs
Testing¶
Running tests¶
The Socorro tests are in socorro/tests/
.
The webapp tests are in webapp/
.
Both sets of tests use pytest.
To run the tests, do:
$ just test
That runs the /app/bin/test.sh
script in the test container using test
configuration.
To run specific tests or specify arguments, you’ll want to start a shell in the test container:
$ just test-shell
Then you can run pytest on the Socorro tests or the webapp tests.
Running the Socorro tests:
app@socorro:/app$ pytest
Running the webapp tests (make sure you run ./manage.py collectstatic
first):
app@socorro:/app$ cd webapp
app@socorro:/app/webapp$ ./manage.py collectstatic
app@socorro:/app/webapp$ pytest
Note
For the webapp tests, you have to run ./manage.py collectstatic
before
running the tests.
Writing tests¶
For Socorro tests, put them in socorro/tests/
in a subdirectory parallel
to the thing you’re testing.
For webapp tests, put them in the tests/
directory of the appropriate app in
webapp/
directory tree.
Repository structure¶
If you clone our git repository, you will find the following folders.
Here is what each of them contains:
- bin/
Scripts for building Docker images, running Docker containers, deploying, and supporting development in a local development environment.
- docker/
Docker environment related scripts, configuration, and other bits.
- docs/
Documentation of the Socorro project (you’re reading it right now).
- socorro/
The bulk of the Socorro source code.
- webapp/
The webapp source code.
Updating data in a dev environment¶
Updating the code¶
Any time you want to update the code in the repostory, run something like this from the main branch:
$ git pull
After you do that, you’ll need to update other things.
If there were changes to the requirements files or setup scripts, you’ll need to build new images:
$ just build
If there were changes to the database tables, stored procedures, types, migrations, Super Search schema, or anything like that, you’ll need to wipe state and re-initialize services:
$ just setup
$ just update-data
Wiping crash storage and state¶
Any time you want to wipe all the crash storage destinations, remove all the data, and reset the state of the system, run:
$ just setup
$ just update-data
Updating release data¶
Release data and comes from running archivescraper. This is used by the
BetaVersionRule
in the processor.
Run:
$ just update-data
Configuration¶
Configuration is pulled from three sources:
Envronment variables
ENV files located in
/app/docker/config/
. Seedocker-compose.yml
for which ENV files are used in which containers, and their precedence.Defaults for the processor are in
socorro/processor/processor_app.py
inCONFIG_DEFAULTS
.Defaults for the webapp are in
webapp/crashstats/settings/
.
The sources above are ordered by precedence, i.e. configuration values defined by environment variables will override values from ENV files or defaults.
The following ENV files can be found in /app/docker/config/
:
local_dev.env
This holds secrets and environment-specific configuration required to get services to work in a Docker-based local development environment.
This should NOT be used for server environments, but you could base configuration for a server environment on this file.
test.env
This holds configuration specific to running the tests. It has some configuration value overrides because the tests are “interesting”.
This ENV file is found in the repository root:
my.env
This file lets you override any environment variables set in other ENV files as well as set variables that are specific to your instance.
It is your personal file for your specific development environment–it doesn’t get checked into version control.
The template for this is in
docker/config/my.env.dist
.
In this way:
environmental configuration which covers secrets, hosts, ports, and infrastructure-specific things can be set up for every environment
behavioral configuration which covers how the code behaves and which classes it uses is versioned alongside the code making it easy to deploy and revert behavioral changes with the code depending on them
my.env
lets you set configuration specific to your development environment as well as override any configuration and is not checked into version control
Setting configuration specific to your local dev environment¶
There are some variables you need to set that are specific to your local dev
environment. Put them in my.env
.
Overriding configuration¶
If you want to override configuration temporarily for your local development
environment, put it in my.env
.
Setting up a development container for VS Code¶
The repository contains configuration files to build a development container in the .devcontainer directory. If you have the “Dev Containers” extension installed in VS Code, you should be prompted whether you want to reopen the folder in a container on startup. You can also use the “Dev containers: Reopen in container” command from the command palette. The container has all Python requirements installed. IntelliSense, type checking, code formatting with Ruff and running the tests from the test browser are all set up to work without further configuration.
VS Code should automatically start the container, but it may need to be built on first run:
$ just build devcontainer
Additionally on mac there is the potential that running git from inside any container that mounts the current directory to /app, such as the development container, will fail with fatal: detected dubious ownership in repository at ‘/app’. This is likely related to mozilla-services/tecken#2872, and can be treated by running the following command from inside the development container, which will probably throw exceptions on some git read-only objects that are already owned by app:app, so that’s fine:
$ chown -R app:app /app
If you change settings in my.env
you may need to restart the container to
pick up changes:
$ just run devcontainer
Upgrading to a new Python version¶
To upgrade Python to a new minor or major version, you need to change the version in these files:
.devcontainer/Dockerfile
.github/dependabot.yml
.readthedocs.yaml
docker/Dockerfile
docker/Dockerfile.fakesentry
docker/set_up_legacy_es_requirements.sh
pyproject.toml
socorro/tests/processor/test_processor_app.py
webapp/crashstats/crashstats/tests/test_sentry.py
Processing crashes¶
Running the processor is pretty uninteresting since it’ll just sit there until you give it something to process.
In order to process something, you first need to acquire raw crash data, put the data in the S3 container in the appropriate place, then you need to add the crash id to the standard queue.
We have helper scripts for these steps.
All helper scripts run in the shell in the container:
$ just shell
Some of the scripts require downloading production data from crash-stats.mozilla.org, and it is useful to add an API token with higher permissions before entering the shell.
Adding an API Token¶
By default, the download scripts will fetch anonymized crash data, which does not include personally identifiable information (PII). This anonymized data can be used to test some workflows, but the the processor will not be able to analyze memory dumps or generate signatures.
If you have access to memory dumps, you can fetch those with the crash data by using an API token with these permissions:
View Personal Identifiable Information
View Raw Dumps
You can generate API tokens at https://crash-stats.mozilla.org/api/tokens/.
Note
Make sure you treat any data you pull from production in accordance with our data policies that you agreed to when granted access to it.
Add the API token value to your my.env
file:
SOCORRO_API_TOKEN=apitokenhere
The API token is used by the download scripts (run inside $ just shell
),
but not directly by the processor.
bin/process_crashes.sh¶
You can use the bin/process_crashes.sh
script which will fetch crash
data, sync it with the S3 bucket, and publish the crash ids to the queue
for processing. If you have access to memory dumps and use a valid
API token, then memory dumps will be fetched for processing as well.
It takes one or more crash ids as arguments.
For example:
app@socorro:/app$ bin/process_crashes.sh ed35821d-3af5-4fe9-bfa3-dc4dc0181128
You can also use it with fetch_crashids
:
app@socorro:/app$ socorro-cmd fetch_crashids --num=1 | bin/process_crashes.sh
Run the processor and webapp with just run
to process the crash reports.
If you find this doesn’t meet your needs, you can write a shell script using
the commands and scripts that process_crashes.sh
uses. They are described
below.
socorro-cmd fetch_crashids¶
This will generate a list of crash ids from crash-stats.mozilla.org that meet specified criteria. Crash ids are printed to stdout, so you can use this in conjunction with other scripts or redirect to a file.
This pulls 100 crash ids from yesterday for Firefox product:
app@socorro:/app$ socorro-cmd fetch_crashids
This pulls 5 crash ids from 2017-09-01:
app@socorro:/app$ socorro-cmd fetch_crashids --num=5 --date=2017-09-01
This pulls 100 crash ids for criteria specified with a Super Search url that we copy and pasted:
app@socorro:/app$ socorro-cmd fetch_crashids "--url=https://crash-stats.mozilla.org/search/?product=Firefox&date=%3E%3D2017-09-05T15%3A09%3A00.000Z&date=%3C2017-09-12T15%3A09%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform"
You can get command help:
app@socorro:/app$ socorro-cmd fetch_crashids --help
socorro-cmd fetch_crash_data¶
This will fetch raw crash data from crash-stats.mozilla.org and save it in the appropriate directory structure rooted at outputdir. If you have access to memory dumps and use a valid API token, then memory dumps will be fetched for processing as well.
Usage from host:
app@socorro:/app$ socorro-cmd fetch_crash_data <outputdir> <crashid> [<crashid> ...]
For example (assumes this crash exists):
app@socorro:/app$ socorro-cmd fetch_crash_data ./testdata 5c9cecba-75dc-435f-b9d0-289a50170818
Use with fetch_crashids
to fetch crash data from 100 crashes from yesterday
for Firefox:
app@socorro:/app$ socorro-cmd fetch_crashids | socorro-cmd fetch_crash_data ./testdata
You can get command help:
app@socorro:/app$ socorro-cmd fetch_crash_data --help
bin/socorro_aws_s3.sh¶
This script is a convenience wrapper around the aws cli s3 subcommand that uses Socorro environment variables to set the credentials and endpoint.
For example, this creates an S3 bucket named dev-bucket
:
app@socorro:/app$ bin/socorro_aws_s3.sh mb s3://dev-bucket/
This copies the contents of ./testdata
into the dev-bucket
:
app@socorro:/app$ bin/socorro_aws_s3.sh sync ./testdata s3://dev-bucket/
This lists the contents of the bucket:
app@socorro:/app$ bin/socorro_aws_s3.sh ls s3://dev-bucket/
Since this is just a wrapper, you can get help:
app@socorro:/app$ bin/socorro_aws_s3.sh help
socorro-cmd sqs¶
This script can manipulate the AWS SQS emulator and also publish crash ids AWS SQS queues.
Typically, you’d use this to publish crash ids to the AWS SQS standard queue for processing.
For example:
app@socorro:/app$ socorro-cmd sqs publish local-dev-standard \
ed35821d-3af5-4fe9-bfa3-dc4dc0181128
For help:
app@socorro:/app$ socorro-cmd sqs publish --help
Note
Processing will fail unless the crash data is in the S3 container first!
Example using all the scripts¶
Let’s process crashes for Firefox from yesterday. We’d do this:
# Set up dev environment resources
$ just setup
# Set SOCORRO_API_TOKEN in my.env
# Start bash in the socorro container
$ just shell
# Generate a file of crashids--one per line
app@socorro:/app$ socorro-cmd fetch_crashids > crashids.txt
# Pull raw crash data from -prod for each crash id and put it in the
# "crashdata" directory on the host
app@socorro:/app$ cat crashids.txt | socorro-cmd fetch_crash_data ./crashdata
# Copy that data from the host into the gcs emulator
app@socorro:/app$ gcs-cli upload ./crashdata gs://dev-bucket/
# Add all the crash ids to the pubsub topic
app@socorro:/app$ cat crashids.txt | pubsub-cli publish test local-standard-topic
# Then exit the container
app@socorro:/app$ exit
# Run the processor to process all those crashes
$ docker compose up processor
Processing crashes from the collector¶
Antenna is the collector of the Socorro crash ingestion pipeline. It was originally part of the Socorro repository, but we extracted and rewrote it and now it lives in its own repository and infrastructure.
Antenna deployments are based on images pushed to Docker Hub.
To run Antenna in the Socorro local dev environment, do:
$ docker compose up collector
It will listen on http://localhost:8888/
for incoming crashes from a
breakpad crash reporter. It will save crash data to the dev-bucket
in the
local S3 which is where the processor looks for it. It will publish the crash
ids to the standard queue.
Connect to PostgreSQL database¶
The local development environment’s PostgreSQL database exposes itself on a non-standard port when run with docker compose. You can connect to it with the client of your choice using the following connection settings:
Username:
postgres
Password:
postgres
Port:
8574
Database:
socorro
For example:
PGPASSWORD=postgres psql -h localhost -p 8574 -U postgres --no-password socorro
You can also connect with just
:
just psql