Skip to content
Snippets Groups Projects
Commit d47def2d authored by Tim O'Donnell's avatar Tim O'Donnell
Browse files

fix

parent 7a050eb3
No related merge requests found
...@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the ...@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used ``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
threshold for peptides with a reasonable chance of being immunogenic is 500 nM. threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
The ``mhcflurry_affinity_percentile`` gives the quantile of the affinity The ``mhcflurry_affinity_percentile`` gives the percentile of the affinity
prediction among a large number of random peptides tested on that allele. Lower prediction among a large number of random peptides tested on that allele (range
is stronger. Two percent is a commonly-used threshold. 0 - 100). Lower is stronger. Two percent is a commonly-used threshold.
The last two columns give the antigen processing and presentation scores, The last two columns give the antigen processing and presentation scores,
respectively. These range from 0 to 1 with higher values indicating more respectively. These range from 0 to 1 with higher values indicating more
...@@ -72,13 +72,13 @@ favorable processing or presentation. ...@@ -72,13 +72,13 @@ favorable processing or presentation.
.. note:: .. note::
The processing predictor is experimental and under The processing predictor is experimental. It models allele-independent
development. It models allele-independent effects that influence whether a effects that influence whether a
peptide will be detected in a mass spec experiment. The presentation score is peptide will be detected in a mass spec experiment. The presentation score is
a simple logistic regression model that combines the (log) binding affinity a simple logistic regression model that combines the (log) binding affinity
prediction with the processing score to give a composite prediction. The resulting prediction with the processing score to give a composite prediction. The resulting
prediction is appropriate for prioritizing potential epitopes to test, but no prediction may be useful for prioritizing potential epitopes, but no
thresholds have yet been established for what constitutes a "high enough" thresholds have been established for what constitutes a "high enough"
presentation score. presentation score.
In most cases you'll want to specify the input as a CSV file instead of passing In most cases you'll want to specify the input as a CSV file instead of passing
...@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best ...@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
approach is to use the approach is to use the
:ref:`mhcflurry-class1-train-allele-specific-models` command to fit an :ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
"allele-specific" predictor, in which separate neural networks are used for "allele-specific" predictor, in which separate neural networks are used for
each allele. Here's an example: each allele.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
Here's an example invocation to fit a predictor:
.. code-block:: shell .. code-block:: shell
$ mhcflurry-class1-train-allele-specific-models \ $ mhcflurry-class1-train-allele-specific-models \
--data TRAINING_DATA.csv \ --data curated_training_data.csv.bz2 \
--hyperparameters hyperparameters.yaml \ --hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \ --min-measurements-per-allele 75 \
--out-models-dir models --out-models-dir models
The ``hyperparameters.yaml`` file gives the list of neural network architectures
to train models for. Here's an example specifying a single architecture:
.. code-block:: yaml
- activation: tanh
dense_layer_l1_regularization: 0.0
dropout_probability: 0.0
early_stopping: true
layer_sizes: [8]
locally_connected_layers: []
loss: custom:mse_with_inequalities
max_epochs: 500
minibatch_size: 128
n_models: 4
output_activation: sigmoid
patience: 20
peptide_amino_acid_encoding: BLOSUM62
random_negative_affinity_max: 50000.0
random_negative_affinity_min: 20000.0
random_negative_constant: 25
random_negative_rate: 0.0
validation_split: 0.1
The available hyperparameters for binding predictors are defined in
`~mhcflurry.Class1NeuralNetwork`. To see exactly how
these are used you will need to read the source code.
.. note:: .. note::
MHCflurry predictors are serialized to disk as many files in a directory. The MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the model training command above will write the models to the output directory specified by the
``--out-models-dir`` argument. This directory has files like: ``--out-models-dir`` argument. This directory has files like:
.. program-output:: .. program-output::
...@@ -150,27 +195,19 @@ each allele. Here's an example: ...@@ -150,27 +195,19 @@ each allele. Here's an example:
histogram of model predictions for each allele over a large number of random histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time. peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
To fit pan-allele models like the ones released with MHCflurry, you can use To fit pan-allele models like the ones released with MHCflurry, you can use
a similar tool, ``mhcflurry-class1-train-pan-allele-models``. You'll probably a similar tool, :ref:`mhcflurry-class1-train-pan-allele-models`. You'll probably
also want to take a look at the scripts used to generate the production models, also want to take a look at the scripts used to generate the production models,
which are available in the *downloads-generation* directory in the MHCflurry which are available in the *downloads-generation* directory in the MHCflurry
repository. The production MHCflurry models were fit using a cluster with several repository. See the scripts in the *models_class1_pan* subdirectory to see how the
dozen GPUs over a period of about two days. fitting and model selection was done for models currently distributed with MHCflurry.
.. note::
The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days. If you model select over fewer
architectures, however, it should be possible to fit a predictor using less
resources.
Environment variables Environment variables
......
...@@ -151,7 +151,7 @@ useful methods. ...@@ -151,7 +151,7 @@ useful methods.
Lower level interfaces Lower level interfaces
---------------------------------- ----------------------------------
The `~mhcflurry.Class1PresentationPredictor` predictor delegates to a The `~mhcflurry.Class1PresentationPredictor` delegates to a
`~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions. `~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
If all you need are binding affinities, you can use this instance directly. If all you need are binding affinities, you can use this instance directly.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment