Skip to content
Snippets Groups Projects
Commit d47def2d authored by Tim O'Donnell's avatar Tim O'Donnell
Browse files

fix

parent 7a050eb3
No related merge requests found
......@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
The ``mhcflurry_affinity_percentile`` gives the quantile of the affinity
prediction among a large number of random peptides tested on that allele. Lower
is stronger. Two percent is a commonly-used threshold.
The ``mhcflurry_affinity_percentile`` gives the percentile of the affinity
prediction among a large number of random peptides tested on that allele (range
0 - 100). Lower is stronger. Two percent is a commonly-used threshold.
The last two columns give the antigen processing and presentation scores,
respectively. These range from 0 to 1 with higher values indicating more
......@@ -72,13 +72,13 @@ favorable processing or presentation.
.. note::
The processing predictor is experimental and under
development. It models allele-independent effects that influence whether a
The processing predictor is experimental. It models allele-independent
effects that influence whether a
peptide will be detected in a mass spec experiment. The presentation score is
a simple logistic regression model that combines the (log) binding affinity
prediction with the processing score to give a composite prediction. The resulting
prediction is appropriate for prioritizing potential epitopes to test, but no
thresholds have yet been established for what constitutes a "high enough"
prediction may be useful for prioritizing potential epitopes, but no
thresholds have been established for what constitutes a "high enough"
presentation score.
In most cases you'll want to specify the input as a CSV file instead of passing
......@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
approach is to use the
:ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
"allele-specific" predictor, in which separate neural networks are used for
each allele. Here's an example:
each allele.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
Here's an example invocation to fit a predictor:
.. code-block:: shell
$ mhcflurry-class1-train-allele-specific-models \
--data TRAINING_DATA.csv \
--data curated_training_data.csv.bz2 \
--hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \
--out-models-dir models
The ``hyperparameters.yaml`` file gives the list of neural network architectures
to train models for. Here's an example specifying a single architecture:
.. code-block:: yaml
- activation: tanh
dense_layer_l1_regularization: 0.0
dropout_probability: 0.0
early_stopping: true
layer_sizes: [8]
locally_connected_layers: []
loss: custom:mse_with_inequalities
max_epochs: 500
minibatch_size: 128
n_models: 4
output_activation: sigmoid
patience: 20
peptide_amino_acid_encoding: BLOSUM62
random_negative_affinity_max: 50000.0
random_negative_affinity_min: 20000.0
random_negative_constant: 25
random_negative_rate: 0.0
validation_split: 0.1
The available hyperparameters for binding predictors are defined in
`~mhcflurry.Class1NeuralNetwork`. To see exactly how
these are used you will need to read the source code.
.. note::
MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the
model training command above will write the models to the output directory specified by the
``--out-models-dir`` argument. This directory has files like:
.. program-output::
......@@ -150,27 +195,19 @@ each allele. Here's an example:
histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
To fit pan-allele models like the ones released with MHCflurry, you can use
a similar tool, ``mhcflurry-class1-train-pan-allele-models``. You'll probably
a similar tool, :ref:`mhcflurry-class1-train-pan-allele-models`. You'll probably
also want to take a look at the scripts used to generate the production models,
which are available in the *downloads-generation* directory in the MHCflurry
repository. The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days.
repository. See the scripts in the *models_class1_pan* subdirectory to see how the
fitting and model selection was done for models currently distributed with MHCflurry.
.. note::
The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days. If you model select over fewer
architectures, however, it should be possible to fit a predictor using less
resources.
Environment variables
......
......@@ -151,7 +151,7 @@ useful methods.
Lower level interfaces
----------------------------------
The `~mhcflurry.Class1PresentationPredictor` predictor delegates to a
The `~mhcflurry.Class1PresentationPredictor` delegates to a
`~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
If all you need are binding affinities, you can use this instance directly.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment