-
Tim O'Donnell authoredTim O'Donnell authored
Command-line tutorial
Downloading models
Most users will use pre-trained MHCflurry models that we release. These models are distributed separately from the pip package and may be downloaded with the :ref:`mhcflurry-downloads` tool:
$ mhcflurry-downloads fetch models_class1
Files downloaded with :ref:`mhcflurry-downloads` are stored in a platform-specific directory. To get the path to downloaded data, you can use:
We also release a few other "downloads," such as curated training data and some experimental models. To see what's available and what you have downloaded, run:
Note
The code we use for generating the downloads is in the
downloads_generation
directory in the repository.
Generating predictions
The :ref:`mhcflurry-predict` command generates predictions from the command-line.
By default it will use the pre-trained models you downloaded above; other
models can be used by specifying the --models
argument.
Running:
results in a file like this:
The predictions are given as affinities (KD) in nM in the mhcflurry_prediction
column. The other fields give the 5-95 percentile predictions across
the models in the ensemble and the quantile of the affinity prediction among
a large number of random peptides tested on that allele.
The predictions shown above were generated with MHCflurry |version|. Different versions of MHCflurry can give considerably different results. Even on the same version, exact predictions may vary (up to about 1 nM) depending on the Keras backend and other details.
In most cases you'll want to specify the input as a CSV file instead of passing peptides and alleles as commandline arguments. See :ref:`mhcflurry-predict` docs.
Fitting your own models
The :ref:`mhcflurry-class1-train-allele-specific-models` command is used to fit models to training data. The models we release with MHCflurry are trained with a command like:
$ mhcflurry-class1-train-allele-specific-models \
--data TRAINING_DATA.csv \
--hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \
--out-models-dir models
MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the
--out-models-dir
argument. This directory has files like:
The manifest.csv
file gives metadata for all the models used in the predictor.
There will be a weights_...
file for each model giving its weights
(the parameters for the neural network). The percent_ranks.csv
stores a
histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some training data. The data we use for our released predictors can be downloaded with :ref:`mhcflurry-downloads`:
$ mhcflurry-downloads fetch data_curated
It looks like this:
Scanning protein sequences for predicted epitopes
The mhctools package provides support for scanning protein sequences to find predicted epitopes. It supports MHCflurry as well as other binding predictors. Here is an example.
First, install mhctools
if it is not already installed:
$ pip install mhctools
We'll generate predictions across example.fasta
, a FASTA file with two short
sequences:
Here's the mhctools
invocation. See mhctools -h
for more information.
This will write a file giving predictions for all subsequences of the specified lengths: