Newer
Older
Downloading models
------------------
Most users will use pre-trained MHCflurry models that we release. These models
are distributed separately from the pip package and may be downloaded with the
:ref:`mhcflurry-downloads` tool:
.. code-block:: shell
Files downloaded with :ref:`mhcflurry-downloads` are stored in a platform-specific
directory. To get the path to downloaded data, you can use:
.. command-output:: mhcflurry-downloads path models_class1
:nostderr:
We also release a few other "downloads," such as curated training data and some
experimental models. To see what's available and what you have downloaded, run:
.. command-output:: mhcflurry-downloads info
:nostderr:
The code we use for *generating* the downloads is in the
``downloads_generation`` directory in the repository.
The :ref:`mhcflurry-predict` command generates predictions from the command-line.
By default it will use the pre-trained models you downloaded above; other
models can be used by specifying the ``--models`` argument.
Running:
.. command-output::
mhcflurry-predict
--alleles HLA-A0201 HLA-A0301
--peptides SIINFEKL SIINFEKD SIINFEKQ
--out /tmp/predictions.csv
The predictions are given as affinities (KD) in nM in the ``mhcflurry_prediction``
column. The other fields give the 5-95 percentile predictions across
the models in the ensemble and the quantile of the affinity prediction among
a large number of random peptides tested on that allele.
The predictions shown above were generated with MHCflurry |version|. Different versions of
MHCflurry can give considerably different results. Even
on the same version, exact predictions may vary (up to about 1 nM) depending
In most cases you'll want to specify the input as a CSV file instead of passing
peptides and alleles as commandline arguments. See :ref:`mhcflurry-predict` docs.
Fitting your own models
-----------------------
The :ref:`mhcflurry-class1-train-allele-specific-models` command is used to
fit models to training data. The models we release with MHCflurry are trained
with a command like:
.. code-block:: shell
$ mhcflurry-class1-train-allele-specific-models \
--data TRAINING_DATA.csv \
--hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \
--out-models-dir models
MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the
``--out-models-dir`` argument. This directory has files like:
.. program-output::
ls "$(mhcflurry-downloads path models_class1)/models"
:shell:
:nostderr:
The ``manifest.csv`` file gives metadata for all the models used in the predictor.
There will be a ``weights_...`` file for each model giving its weights
(the parameters for the neural network). The ``percent_ranks.csv`` stores a
histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.no_mass_spec.csv.bz2" | head -n 3
Scanning protein sequences for predicted epitopes
-------------------------------------------------
The `mhctools <https://github.com/hammerlab/mhctools>`__ package
provides support for scanning protein sequences to find predicted
epitopes. It supports MHCflurry as well as other binding predictors.
Here is an example.
First, install ``mhctools`` if it is not already installed:
$ pip install mhctools
We'll generate predictions across ``example.fasta``, a FASTA file with two short
sequences:
.. literalinclude:: /example.fasta
Here's the ``mhctools`` invocation. See ``mhctools -h`` for more information.
.. command-output::
mhctools
--mhc-predictor mhcflurry
--input-fasta-file example.fasta
--mhc-alleles A02:01,A03:01
--mhc-peptide-lengths 8,9,10,11
--extract-subsequences
:ellipsis: 2,-2
:nostderr:
This will write a file giving predictions for all subsequences of the specified lengths:
.. command-output::