Skip to content
Snippets Groups Projects
Commit d47def2d authored by Tim O'Donnell's avatar Tim O'Donnell
Browse files

fix

parent 7a050eb3
No related branches found
No related tags found
No related merge requests found
......@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
The ``mhcflurry_affinity_percentile`` gives the quantile of the affinity
prediction among a large number of random peptides tested on that allele. Lower
is stronger. Two percent is a commonly-used threshold.
The ``mhcflurry_affinity_percentile`` gives the percentile of the affinity
prediction among a large number of random peptides tested on that allele (range
0 - 100). Lower is stronger. Two percent is a commonly-used threshold.
The last two columns give the antigen processing and presentation scores,
respectively. These range from 0 to 1 with higher values indicating more
......@@ -72,13 +72,13 @@ favorable processing or presentation.
.. note::
The processing predictor is experimental and under
development. It models allele-independent effects that influence whether a
The processing predictor is experimental. It models allele-independent
effects that influence whether a
peptide will be detected in a mass spec experiment. The presentation score is
a simple logistic regression model that combines the (log) binding affinity
prediction with the processing score to give a composite prediction. The resulting
prediction is appropriate for prioritizing potential epitopes to test, but no
thresholds have yet been established for what constitutes a "high enough"
prediction may be useful for prioritizing potential epitopes, but no
thresholds have been established for what constitutes a "high enough"
presentation score.
In most cases you'll want to specify the input as a CSV file instead of passing
......@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
approach is to use the
:ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
"allele-specific" predictor, in which separate neural networks are used for
each allele. Here's an example:
each allele.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
Here's an example invocation to fit a predictor:
.. code-block:: shell
$ mhcflurry-class1-train-allele-specific-models \
--data TRAINING_DATA.csv \
--data curated_training_data.csv.bz2 \
--hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \
--out-models-dir models
The ``hyperparameters.yaml`` file gives the list of neural network architectures
to train models for. Here's an example specifying a single architecture:
.. code-block:: yaml
- activation: tanh
dense_layer_l1_regularization: 0.0
dropout_probability: 0.0
early_stopping: true
layer_sizes: [8]
locally_connected_layers: []
loss: custom:mse_with_inequalities
max_epochs: 500
minibatch_size: 128
n_models: 4
output_activation: sigmoid
patience: 20
peptide_amino_acid_encoding: BLOSUM62
random_negative_affinity_max: 50000.0
random_negative_affinity_min: 20000.0
random_negative_constant: 25
random_negative_rate: 0.0
validation_split: 0.1
The available hyperparameters for binding predictors are defined in
`~mhcflurry.Class1NeuralNetwork`. To see exactly how
these are used you will need to read the source code.
.. note::
MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the
model training command above will write the models to the output directory specified by the
``--out-models-dir`` argument. This directory has files like:
.. program-output::
......@@ -150,27 +195,19 @@ each allele. Here's an example:
histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
To fit pan-allele models like the ones released with MHCflurry, you can use
a similar tool, ``mhcflurry-class1-train-pan-allele-models``. You'll probably
a similar tool, :ref:`mhcflurry-class1-train-pan-allele-models`. You'll probably
also want to take a look at the scripts used to generate the production models,
which are available in the *downloads-generation* directory in the MHCflurry
repository. The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days.
repository. See the scripts in the *models_class1_pan* subdirectory to see how the
fitting and model selection was done for models currently distributed with MHCflurry.
.. note::
The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days. If you model select over fewer
architectures, however, it should be possible to fit a predictor using less
resources.
Environment variables
......
......@@ -151,7 +151,7 @@ useful methods.
Lower level interfaces
----------------------------------
The `~mhcflurry.Class1PresentationPredictor` predictor delegates to a
The `~mhcflurry.Class1PresentationPredictor` delegates to a
`~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
If all you need are binding affinities, you can use this instance directly.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment