Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
mhc_rank
Manage
Activity
Members
Labels
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Patrick Skillman-Lawrence
mhc_rank
Commits
d47def2d
Commit
d47def2d
authored
5 years ago
by
Tim O'Donnell
Browse files
Options
Downloads
Patches
Plain Diff
fix
parent
7a050eb3
Loading
Loading
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
docs/commandline_tutorial.rst
+65
-28
65 additions, 28 deletions
docs/commandline_tutorial.rst
docs/python_tutorial.rst
+1
-1
1 addition, 1 deletion
docs/python_tutorial.rst
with
66 additions
and
29 deletions
docs/commandline_tutorial.rst
+
65
−
28
View file @
d47def2d
...
@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
...
@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
The ``mhcflurry_affinity_percentile`` gives the
qua
ntile of the affinity
The ``mhcflurry_affinity_percentile`` gives the
perce
ntile of the affinity
prediction among a large number of random peptides tested on that allele
. Lower
prediction among a large number of random peptides tested on that allele
(range
is stronger. Two percent is a commonly-used threshold.
0 - 100). Lower
is stronger. Two percent is a commonly-used threshold.
The last two columns give the antigen processing and presentation scores,
The last two columns give the antigen processing and presentation scores,
respectively. These range from 0 to 1 with higher values indicating more
respectively. These range from 0 to 1 with higher values indicating more
...
@@ -72,13 +72,13 @@ favorable processing or presentation.
...
@@ -72,13 +72,13 @@ favorable processing or presentation.
.. note::
.. note::
The processing predictor is experimental
and u
nde
r
The processing predictor is experimental
. It models allele-indepe
nde
nt
development. It models allele-independent
effects that influence whether a
effects that influence whether a
peptide will be detected in a mass spec experiment. The presentation score is
peptide will be detected in a mass spec experiment. The presentation score is
a simple logistic regression model that combines the (log) binding affinity
a simple logistic regression model that combines the (log) binding affinity
prediction with the processing score to give a composite prediction. The resulting
prediction with the processing score to give a composite prediction. The resulting
prediction
is appropriate
for prioritizing potential epitopes
to test
, but no
prediction
may be useful
for prioritizing potential epitopes, but no
thresholds have
yet
been established for what constitutes a "high enough"
thresholds have been established for what constitutes a "high enough"
presentation score.
presentation score.
In most cases you'll want to specify the input as a CSV file instead of passing
In most cases you'll want to specify the input as a CSV file instead of passing
...
@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
...
@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
approach is to use the
approach is to use the
:ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
:ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
"allele-specific" predictor, in which separate neural networks are used for
"allele-specific" predictor, in which separate neural networks are used for
each allele. Here's an example:
each allele.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
Here's an example invocation to fit a predictor:
.. code-block:: shell
.. code-block:: shell
$ mhcflurry-class1-train-allele-specific-models \
$ mhcflurry-class1-train-allele-specific-models \
--data
TRAINING_DATA.csv
\
--data
curated_training_data.csv.bz2
\
--hyperparameters hyperparameters.yaml \
--hyperparameters hyperparameters.yaml \
--min-measurements-per-allele 75 \
--min-measurements-per-allele 75 \
--out-models-dir models
--out-models-dir models
The ``hyperparameters.yaml`` file gives the list of neural network architectures
to train models for. Here's an example specifying a single architecture:
.. code-block:: yaml
- activation: tanh
dense_layer_l1_regularization: 0.0
dropout_probability: 0.0
early_stopping: true
layer_sizes: [8]
locally_connected_layers: []
loss: custom:mse_with_inequalities
max_epochs: 500
minibatch_size: 128
n_models: 4
output_activation: sigmoid
patience: 20
peptide_amino_acid_encoding: BLOSUM62
random_negative_affinity_max: 50000.0
random_negative_affinity_min: 20000.0
random_negative_constant: 25
random_negative_rate: 0.0
validation_split: 0.1
The available hyperparameters for binding predictors are defined in
`~mhcflurry.Class1NeuralNetwork`. To see exactly how
these are used you will need to read the source code.
.. note::
.. note::
MHCflurry predictors are serialized to disk as many files in a directory. The
MHCflurry predictors are serialized to disk as many files in a directory. The
command above will write the models to the output directory specified by the
model training
command above will write the models to the output directory specified by the
``--out-models-dir`` argument. This directory has files like:
``--out-models-dir`` argument. This directory has files like:
.. program-output::
.. program-output::
...
@@ -150,27 +195,19 @@ each allele. Here's an example:
...
@@ -150,27 +195,19 @@ each allele. Here's an example:
histogram of model predictions for each allele over a large number of random
histogram of model predictions for each allele over a large number of random
peptides. It is used for generating the percent ranks at prediction time.
peptides. It is used for generating the percent ranks at prediction time.
To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
training data. The data we use for our released predictors can be downloaded with
:ref:`mhcflurry-downloads`:
.. code-block:: shell
$ mhcflurry-downloads fetch data_curated
It looks like this:
.. command-output::
bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
:shell:
:nostderr:
To fit pan-allele models like the ones released with MHCflurry, you can use
To fit pan-allele models like the ones released with MHCflurry, you can use
a similar tool,
`
`mhcflurry-class1-train-pan-allele-models`
`
. You'll probably
a similar tool,
:ref:
`mhcflurry-class1-train-pan-allele-models`. You'll probably
also want to take a look at the scripts used to generate the production models,
also want to take a look at the scripts used to generate the production models,
which are available in the *downloads-generation* directory in the MHCflurry
which are available in the *downloads-generation* directory in the MHCflurry
repository. The production MHCflurry models were fit using a cluster with several
repository. See the scripts in the *models_class1_pan* subdirectory to see how the
dozen GPUs over a period of about two days.
fitting and model selection was done for models currently distributed with MHCflurry.
.. note::
The production MHCflurry models were fit using a cluster with several
dozen GPUs over a period of about two days. If you model select over fewer
architectures, however, it should be possible to fit a predictor using less
resources.
Environment variables
Environment variables
...
...
This diff is collapsed.
Click to expand it.
docs/python_tutorial.rst
+
1
−
1
View file @
d47def2d
...
@@ -151,7 +151,7 @@ useful methods.
...
@@ -151,7 +151,7 @@ useful methods.
Lower level interfaces
Lower level interfaces
----------------------------------
----------------------------------
The `~mhcflurry.Class1PresentationPredictor`
predictor
delegates to a
The `~mhcflurry.Class1PresentationPredictor` delegates to a
`~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
`~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
If all you need are binding affinities, you can use this instance directly.
If all you need are binding affinities, you can use this instance directly.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment