Snippets Groups Projects

7 years ago
1dc0f8cf

update docs · 1dc0f8cf
Tim O'Donnell authored 7 years ago

1dc0f8cf

History

update docs
Tim O'Donnell authored 7 years ago

commandline_tutorial.rst 4.82 KiB

Command-line tutorial

Downloading models

Most users will use pre-trained MHCflurry models that we release. These models are distributed separately from the pip package and may be downloaded with the :ref:`mhcflurry-downloads` tool:

$ mhcflurry-downloads fetch models_class1

Files downloaded with :ref:`mhcflurry-downloads` are stored in a platform-specific directory. To get the path to downloaded data, you can use:

We also release a few other "downloads," such as curated training data and some experimental models. To see what's available and what you have downloaded, run:

Note

The code we use for generating the downloads is in the downloads_generation directory in the repository.

Generating predictions

The :ref:`mhcflurry-predict` command generates predictions from the command-line. By default it will use the pre-trained models you downloaded above; other models can be used by specifying the --models argument.

Running:

results in a file like this:

The predictions are given as affinities (KD) in nM in the mhcflurry_prediction column. The other fields give the 5-95 percentile predictions across the models in the ensemble and the quantile of the affinity prediction among a large number of random peptides tested on that allele.

The predictions shown above were generated with MHCflurry |version|. Different versions of MHCflurry can give considerably different results. Even on the same version, exact predictions may vary (up to about 1 nM) depending on the Keras backend and other details.

In most cases you'll want to specify the input as a CSV file instead of passing peptides and alleles as commandline arguments. See :ref:`mhcflurry-predict` docs.

Fitting your own models

The :ref:`mhcflurry-class1-train-allele-specific-models` command is used to fit models to training data. The models we release with MHCflurry are trained with a command like:

$ mhcflurry-class1-train-allele-specific-models \
    --data TRAINING_DATA.csv \
    --hyperparameters hyperparameters.yaml \
    --min-measurements-per-allele 75 \
    --out-models-dir models

MHCflurry predictors are serialized to disk as many files in a directory. The command above will write the models to the output directory specified by the --out-models-dir argument. This directory has files like:

The manifest.csv file gives metadata for all the models used in the predictor. There will be a weights_... file for each model giving its weights (the parameters for the neural network). The percent_ranks.csv stores a histogram of model predictions for each allele over a large number of random peptides. It is used for generating the percent ranks at prediction time.

To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some training data. The data we use for our released predictors can be downloaded with :ref:`mhcflurry-downloads`:

$ mhcflurry-downloads fetch data_curated

It looks like this:

Scanning protein sequences for predicted epitopes

The mhctools package provides support for scanning protein sequences to find predicted epitopes. It supports MHCflurry as well as other binding predictors. Here is an example.

First, install mhctools if it is not already installed:

$ pip install mhctools

We'll generate predictions across example.fasta, a FASTA file with two short sequences:

Here's the mhctools invocation. See mhctools -h for more information.

This will write a file giving predictions for all subsequences of the specified lengths: