diff --git a/docs/README.md b/docs/README.md index de9abd942f5f8ade4178a6b1d1f3cd0b16059955..e7b6e9539acfb51f11816e180f7001a05d0c2981 100644 --- a/docs/README.md +++ b/docs/README.md @@ -7,8 +7,16 @@ $ pip install -r requirements.txt # for the first time you generate docs $ make generate html ``` -The above command will also regenerate `docs/package_readme/readme.generated.md`. The main -[README.md](../README.md) is symlinked to this file. +Documentation is written to the _build/ directory. These files should not be +checked into the repo. + +We use the documentation system to generate the mhcflurry package level README. +To build this file, run: + +``` +$ make readme +``` + +This will write `docs/package_readme/readme.generated.rst`. The main +[README.rst](../README.rst) is symlinked to this file. -Many other files are also generated by the command above and output to -the _build/ directory. These files should not be checked into the repo. \ No newline at end of file diff --git a/docs/package_readme/readme.generated.md b/docs/package_readme/readme.generated.md deleted file mode 100644 index 8280de9d8083ce885113acaaf11f351062b55ce9..0000000000000000000000000000000000000000 --- a/docs/package_readme/readme.generated.md +++ /dev/null @@ -1,316 +0,0 @@ -[](https://travis-ci.org/hammerlab/mhcflurry) [](https://coveralls.io/github/hammerlab/mhcflurry?branch=master) - -# mhcflurry -Open source neural network models for peptide-MHC binding affinity prediction - -MHCflurry is a Python package for peptide/MHC I binding affinity -prediction. It provides competitive accuracy with a fast, documented, -open source implementation. - -You can download pre-trained MHCflurry models fit to affinity -measurements deposited in IEDB. See the -"downloads\_generation/models\_class1" directory in the repository for -the workflow used to train these predictors. Users with their own data -can also fit their own MHCflurry models. - -Currently only allele-specific prediction is implemented, in which -separate models are trained for each allele. The released models -therefore support a fixed set of common class I alleles for which -sufficient published training data is available. - -MHCflurry supports Python versions 2.7 and 3.4+. It uses the Keras -neural network library via either the Tensorflow or Theano backends. -GPUs may optionally be used for a generally modest speed improvement. - -If you find MHCflurry useful in your research please cite: - -> O'Donnell, T. et al., 2017. MHCflurry: open-source class I MHC binding -> affinity prediction. bioRxiv. Available at: -> <http://www.biorxiv.org/content/early/2017/08/09/174243>. - -## Installation (pip) - -Install the package: - -> pip install mhcflurry - -Then download our datasets and trained models: - -> mhcflurry-downloads fetch - -From a checkout you can run the unit tests with: - -> pip install nose nosetests . - -## Using conda - -You can alternatively get up and running with a conda environment as -follows. Some users have reported that this can avoid problems -installing tensorflow. - -> conda create -q -n mhcflurry-env python=3.6 'tensorflow\>=1.1.2' -> source activate mhcflurry-env - -Then continue as above: - -> pip install mhcflurry mhcflurry-downloads fetch - -### Command-line usage - -## Downloading models - -Most users will use pre-trained MHCflurry models that we release. These -models are distributed separately from the source code and may be -downloaded with the following command: - -We also release other "downloads," such as curated training data and -some experimental models. To see what you have downloaded, run: - -## mhcflurry-predict - -The "mhcflurry-predict" command generates predictions from the -command-line. It defaults to using the pre-trained models you downloaded -above but this can be customized with the "--models" argument. See -"mhcflurry-predict -h" for details. - -``` -> $ mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL -> SIINFEKD SIINFEKQ -> allele,peptide,mhcflurry\_prediction,mhcflurry\_prediction\_low,mhcflurry\_prediction\_high -> HLA-A0201,SIINFEKL,5326.541919062165,3757.86675352994,7461.37693353508 -> HLA-A0201,SIINFEKD,18763.70298522213,13140.82000240037,23269.82139560844 -> HLA-A0201,SIINFEKQ,18620.10057358322,13096.425874678192,23223.148184869413 -> HLA-A0301,SIINFEKL,24481.726678691946,21035.52779725433,27245.371837497867 -> HLA-A0301,SIINFEKD,24687.529360239587,21582.590014592537,27749.39869616437 -> HLA-A0301,SIINFEKQ,25923.062203902562,23522.5793450799,28079.456657427705 -``` - -The predictions returned are affinities (KD) in nM. The -"prediction\_low" and "prediction\_high" fields give the 5-95 percentile -predictions across the models in the ensemble. The predictions above -were generated with MHCflurry 0.9.2. - -Your exact predictions may vary slightly from these (up to about 1 nM) -depending on the Keras backend in use and other numerical details. -Different versions of MHCflurry can of course give results considerably -different from these. - -You can also specify the input and output as CSV files. Run -"mhcflurry-predict -h" for details. - -## Fitting your own models - -### Library usage - -The MHCflurry Python API exposes additional options and features beyond -those supported by the commandline tools. This tutorial gives a basic -overview of the most important functionality. See the API Documentation -for further details. - -The "Class1AffinityPredictor" class is the primary user-facing -interface. - -> \>\>\> import mhcflurry \>\>\> print("MHCflurry version: %s" % -> (mhcflurry.\_\_version\_\_)) MHCflurry version: 1.0.0 \>\>\> \>\>\> \# -> Load downloaded predictor \>\>\> predictor = -> mhcflurry.Class1AffinityPredictor.load() \>\>\> -> print(predictor.supported\_alleles) \['BoLA-6\*13:01', -> 'Eqca-1\*01:01', 'H-2-Db', 'H-2-Dd', 'H-2-Kb', 'H-2-Kd', 'H-2-Kk', -> 'H-2-Ld', 'HLA-A\*01:01', 'HLA-A\*02:01', 'HLA-A\*02:02', -> 'HLA-A\*02:03', 'HLA-A\*02:05', 'HLA-A\*02:06', 'HLA-A\*02:07', -> 'HLA-A\*02:11', 'HLA-A\*02:12', 'HLA-A\*02:16', 'HLA-A\*02:17', -> 'HLA-A\*02:19', 'HLA-A\*02:50', 'HLA-A\*03:01', 'HLA-A\*11:01', -> 'HLA-A\*23:01', 'HLA-A\*24:01', 'HLA-A\*24:02', 'HLA-A\*24:03', -> 'HLA-A\*25:01', 'HLA-A\*26:01', 'HLA-A\*26:02', 'HLA-A\*26:03', -> 'HLA-A\*29:02', 'HLA-A\*30:01', 'HLA-A\*30:02', 'HLA-A\*31:01', -> 'HLA-A\*32:01', 'HLA-A\*32:07', 'HLA-A\*33:01', 'HLA-A\*66:01', -> 'HLA-A\*68:01', 'HLA-A\*68:02', 'HLA-A\*68:23', 'HLA-A\*69:01', -> 'HLA-A\*80:01', 'HLA-B\*07:01', 'HLA-B\*07:02', 'HLA-B\*08:01', -> 'HLA-B\*08:02', 'HLA-B\*08:03', 'HLA-B\*14:02', 'HLA-B\*15:01', -> 'HLA-B\*15:02', 'HLA-B\*15:03', 'HLA-B\*15:09', 'HLA-B\*15:17', -> 'HLA-B\*15:42', 'HLA-B\*18:01', 'HLA-B\*27:01', 'HLA-B\*27:03', -> 'HLA-B\*27:04', 'HLA-B\*27:05', 'HLA-B\*27:06', 'HLA-B\*27:20', -> 'HLA-B\*35:01', 'HLA-B\*35:03', 'HLA-B\*35:08', 'HLA-B\*37:01', -> 'HLA-B\*38:01', 'HLA-B\*39:01', 'HLA-B\*40:01', 'HLA-B\*40:02', -> 'HLA-B\*42:01', 'HLA-B\*44:01', 'HLA-B\*44:02', 'HLA-B\*44:03', -> 'HLA-B\*45:01', 'HLA-B\*45:06', 'HLA-B\*46:01', 'HLA-B\*48:01', -> 'HLA-B\*51:01', 'HLA-B\*53:01', 'HLA-B\*54:01', 'HLA-B\*57:01', -> 'HLA-B\*58:01', 'HLA-B\*73:01', 'HLA-B\*83:01', 'HLA-C\*03:03', -> 'HLA-C\*03:04', 'HLA-C\*04:01', 'HLA-C\*05:01', 'HLA-C\*06:02', -> 'HLA-C\*07:01', 'HLA-C\*07:02', 'HLA-C\*08:02', 'HLA-C\*12:03', -> 'HLA-C\*14:02', 'HLA-C\*15:02', 'Mamu-A\*01:01', 'Mamu-A\*02:01', -> 'Mamu-A\*02:0102', 'Mamu-A\*07:01', 'Mamu-A\*07:0103', -> 'Mamu-A\*11:01', 'Mamu-A\*22:01', 'Mamu-A\*26:01', 'Mamu-B\*01:01', -> 'Mamu-B\*03:01', 'Mamu-B\*08:01', 'Mamu-B\*10:01', 'Mamu-B\*17:01', -> 'Mamu-B\*17:04', 'Mamu-B\*39:01', 'Mamu-B\*52:01', 'Mamu-B\*66:01', -> 'Mamu-B\*83:01', 'Mamu-B\*87:01', 'Patr-A\*01:01', 'Patr-A\*03:01', -> 'Patr-A\*04:01', 'Patr-A\*07:01', 'Patr-A\*09:01', 'Patr-B\*01:01', -> 'Patr-B\*13:01', 'Patr-B\*24:01'\] -> -> \# coding: utf-8 -> -> \# In\[22\]: -> -> import pandas import numpy import seaborn import logging from -> matplotlib import pyplot -> -> import mhcflurry -> -> \# \# Download data and models -> -> \# In\[2\]: -> -> get\_ipython().system('mhcflurry-downloads fetch') -> -> \# \# Making predictions with Class1AffinityPredictor -> -> \# In\[3\]: -> -> help(mhcflurry.Class1AffinityPredictor) -> -> \# In\[4\]: -> -> downloaded\_predictor = mhcflurry.Class1AffinityPredictor.load() -> -> \# In\[5\]: -> -> downloaded\_predictor.predict(allele="HLA-A0201", -> peptides=\["SIINFEKL", "SIINFEQL"\]) -> -> \# In\[6\]: -> -> downloaded\_predictor.predict\_to\_dataframe(allele="HLA-A0201", -> peptides=\["SIINFEKL", "SIINFEQL"\]) -> -> \# In\[7\]: -> -> downloaded\_predictor.predict\_to\_dataframe(alleles=\["HLA-A0201", -> "HLA-B\*57:01"\], peptides=\["SIINFEKL", "SIINFEQL"\]) -> -> \# In\[8\]: -> -> - downloaded\_predictor.predict\_to\_dataframe( -> allele="HLA-A0201", peptides=\["SIINFEKL", "SIINFEQL"\], -> include\_individual\_model\_predictions=True) -> -> \# In\[9\]: -> -> - downloaded\_predictor.predict\_to\_dataframe( -> allele="HLA-A0201", peptides=\["SIINFEKL", "SIINFEQL", -> "TAAAALANGGGGGGGG"\], throw=False) \# Without throw=False, you'll -> get a ValueError for invalid peptides or alleles -> -> \# \# Instantiating a Class1AffinityPredictor from a saved model on -> disk -> -> \# In\[10\]: -> -> models\_dir = mhcflurry.downloads.get\_path("models\_class1", -> "models") models\_dir -> -> \# In\[11\]: -> -> \# This will be the same predictor we instantiated above. We're just -> being explicit about what models to load. downloaded\_predictor = -> mhcflurry.Class1AffinityPredictor.load(models\_dir) -> downloaded\_predictor.predict(\["SIINFEKL", "SIQNPEKP", "SYNFPEPI"\], -> allele="HLA-A0301") -> -> \# \# Fit a model: first load some data -> -> \# In\[12\]: -> -> \# This is the data the downloaded models were trained on data\_path = -> mhcflurry.downloads.get\_path("data\_curated", -> "curated\_training\_data.csv.bz2") data\_path -> -> \# In\[13\]: -> -> data\_df = pandas.read\_csv(data\_path) data\_df -> -> \# \# Fit a model: Low level Class1NeuralNetwork interface -> -> \# In\[14\]: -> -> \# We'll use mostly the default hyperparameters here. Could also -> specify them as kwargs. new\_model = -> mhcflurry.Class1NeuralNetwork(layer\_sizes=\[16\]) -> new\_model.hyperparameters -> -> \# In\[16\]: -> -> - train\_data = data\_df.loc\[ -> (data\_df.allele == "HLA-B\*57:01") & (data\_df.peptide.str.len() -> \>= 8) & (data\_df.peptide.str.len() \<= 15) -> -> \] get\_ipython().magic('time -> new\_model.fit(train\_data.peptide.values, -> train\_data.measurement\_value.values)') -> -> \# In\[17\]: -> -> new\_model.predict(\["SYNPEPII"\]) -> -> \# \# Fit a model: high level Class1AffinityPredictor interface -> -> \# In\[18\]: -> -> affinity\_predictor = mhcflurry.Class1AffinityPredictor() -> -> \# This can be called any number of times, for example on different -> alleles, to build up the ensembles. -> affinity\_predictor.fit\_allele\_specific\_predictors( n\_models=1, -> architecture\_hyperparameters={"layer\_sizes": \[16\], "max\_epochs": -> 10}, peptides=train\_data.peptide.values, -> affinities=train\_data.measurement\_value.values, -> allele="HLA-B\*57:01", ) -> -> \# In\[19\]: -> -> affinity\_predictor.predict(\["SYNPEPII"\], allele="HLA-B\*57:01") -> -> \# \# Save and restore the fit model -> -> \# In\[20\]: -> -> get\_ipython().system('mkdir /tmp/saved-affinity-predictor') -> affinity\_predictor.save("/tmp/saved-affinity-predictor") -> get\_ipython().system('ls /tmp/saved-affinity-predictor') -> -> \# In\[21\]: -> -> affinity\_predictor2 = -> mhcflurry.Class1AffinityPredictor.load("/tmp/saved-affinity-predictor") -> affinity\_predictor2.predict(\["SYNPEPII"\], allele="HLA-B\*57:01") - -### Supported alleles and peptide lengths - -Models released with the current version of MHCflurry (1.0.0) support -peptides of length 8-15 and the following 124 alleles: - -> BoLA-6\*13:01, Eqca-1\*01:01, H-2-Db, H-2-Dd, H-2-Kb, H-2-Kd, H-2-Kk, -> H-2-Ld, HLA-A\*01:01, HLA-A\*02:01, HLA-A\*02:02, HLA-A\*02:03, -> HLA-A\*02:05, HLA-A\*02:06, HLA-A\*02:07, HLA-A\*02:11, HLA-A\*02:12, -> HLA-A\*02:16, HLA-A\*02:17, HLA-A\*02:19, HLA-A\*02:50, HLA-A\*03:01, -> HLA-A\*11:01, HLA-A\*23:01, HLA-A\*24:01, HLA-A\*24:02, HLA-A\*24:03, -> HLA-A\*25:01, HLA-A\*26:01, HLA-A\*26:02, HLA-A\*26:03, HLA-A\*29:02, -> HLA-A\*30:01, HLA-A\*30:02, HLA-A\*31:01, HLA-A\*32:01, HLA-A\*32:07, -> HLA-A\*33:01, HLA-A\*66:01, HLA-A\*68:01, HLA-A\*68:02, HLA-A\*68:23, -> HLA-A\*69:01, HLA-A\*80:01, HLA-B\*07:01, HLA-B\*07:02, HLA-B\*08:01, -> HLA-B\*08:02, HLA-B\*08:03, HLA-B\*14:02, HLA-B\*15:01, HLA-B\*15:02, -> HLA-B\*15:03, HLA-B\*15:09, HLA-B\*15:17, HLA-B\*15:42, HLA-B\*18:01, -> HLA-B\*27:01, HLA-B\*27:03, HLA-B\*27:04, HLA-B\*27:05, HLA-B\*27:06, -> HLA-B\*27:20, HLA-B\*35:01, HLA-B\*35:03, HLA-B\*35:08, HLA-B\*37:01, -> HLA-B\*38:01, HLA-B\*39:01, HLA-B\*40:01, HLA-B\*40:02, HLA-B\*42:01, -> HLA-B\*44:01, HLA-B\*44:02, HLA-B\*44:03, HLA-B\*45:01, HLA-B\*45:06, -> HLA-B\*46:01, HLA-B\*48:01, HLA-B\*51:01, HLA-B\*53:01, HLA-B\*54:01, -> HLA-B\*57:01, HLA-B\*58:01, HLA-B\*73:01, HLA-B\*83:01, HLA-C\*03:03, -> HLA-C\*03:04, HLA-C\*04:01, HLA-C\*05:01, HLA-C\*06:02, HLA-C\*07:01, -> HLA-C\*07:02, HLA-C\*08:02, HLA-C\*12:03, HLA-C\*14:02, HLA-C\*15:02, -> Mamu-A\*01:01, Mamu-A\*02:01, Mamu-A\*02:0102, Mamu-A\*07:01, -> Mamu-A\*07:0103, Mamu-A\*11:01, Mamu-A\*22:01, Mamu-A\*26:01, -> Mamu-B\*01:01, Mamu-B\*03:01, Mamu-B\*08:01, Mamu-B\*10:01, -> Mamu-B\*17:01, Mamu-B\*17:04, Mamu-B\*39:01, Mamu-B\*52:01, -> Mamu-B\*66:01, Mamu-B\*83:01, Mamu-B\*87:01, Patr-A\*01:01, -> Patr-A\*03:01, Patr-A\*04:01, Patr-A\*07:01, Patr-A\*09:01, -> Patr-B\*01:01, Patr-B\*13:01, Patr-B\*24:01 diff --git a/docs/package_readme/readme_header.md b/docs/package_readme/readme_header.md deleted file mode 100644 index 00ecddc608a47394478f68a0f9be3ce3cfa72e3d..0000000000000000000000000000000000000000 --- a/docs/package_readme/readme_header.md +++ /dev/null @@ -1,5 +0,0 @@ -[](https://travis-ci.org/hammerlab/mhcflurry) [](https://coveralls.io/github/hammerlab/mhcflurry?branch=master) - -# mhcflurry -Open source neural network models for peptide-MHC binding affinity prediction - diff --git a/docs/readme.template.rst b/docs/readme.template.rst deleted file mode 100644 index 674dd28e996c5fde81355e9b9f0c29fefe6d77f3..0000000000000000000000000000000000000000 --- a/docs/readme.template.rst +++ /dev/null @@ -1,315 +0,0 @@ -MHCflurry is a Python package for peptide/MHC I binding affinity -prediction. It provides competitive accuracy with a fast, documented, -open source implementation. - -You can download pre-trained MHCflurry models fit to affinity -measurements deposited in IEDB. See the -"downloads_generation/models_class1" directory in the repository for -the workflow used to train these predictors. Users with their own data -can also fit their own MHCflurry models. - -Currently only allele-specific prediction is implemented, in which -separate models are trained for each allele. The released models -therefore support a fixed set of common class I alleles for which -sufficient published training data is available. - -MHCflurry supports Python versions 2.7 and 3.4+. It uses the Keras -neural network library via either the Tensorflow or Theano backends. -GPUs may optionally be used for a generally modest speed improvement. - -If you find MHCflurry useful in your research please cite: - - O'Donnell, T. et al., 2017. MHCflurry: open-source class I MHC - binding affinity prediction. bioRxiv. Available at: - http://www.biorxiv.org/content/early/2017/08/09/174243. - - -Installation (pip) -****************** - -Install the package: - - pip install mhcflurry - -Then download our datasets and trained models: - - mhcflurry-downloads fetch - -From a checkout you can run the unit tests with: - - pip install nose - nosetests . - - -Using conda -*********** - -You can alternatively get up and running with a conda environment as -follows. Some users have reported that this can avoid problems -installing tensorflow. - - conda create -q -n mhcflurry-env python=3.6 'tensorflow>=1.1.2' - source activate mhcflurry-env - -Then continue as above: - - pip install mhcflurry - mhcflurry-downloads fetch - - -Command-line usage -================== - - -Downloading models -****************** - -Most users will use pre-trained MHCflurry models that we release. -These models are distributed separately from the source code and may -be downloaded with the following command: - -We also release other "downloads," such as curated training data and -some experimental models. To see what you have downloaded, run: - - -mhcflurry-predict -***************** - -The "mhcflurry-predict" command generates predictions from the -command-line. It defaults to using the pre-trained models you -downloaded above but this can be customized with the "--models" -argument. See "mhcflurry-predict -h" for details. - - $ mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ - allele,peptide,mhcflurry_prediction,mhcflurry_prediction_low,mhcflurry_prediction_high - HLA-A0201,SIINFEKL,5326.541919062165,3757.86675352994,7461.37693353508 - HLA-A0201,SIINFEKD,18763.70298522213,13140.82000240037,23269.82139560844 - HLA-A0201,SIINFEKQ,18620.10057358322,13096.425874678192,23223.148184869413 - HLA-A0301,SIINFEKL,24481.726678691946,21035.52779725433,27245.371837497867 - HLA-A0301,SIINFEKD,24687.529360239587,21582.590014592537,27749.39869616437 - HLA-A0301,SIINFEKQ,25923.062203902562,23522.5793450799,28079.456657427705 - -The predictions returned are affinities (KD) in nM. The -"prediction_low" and "prediction_high" fields give the 5-95 percentile -predictions across the models in the ensemble. The predictions above -were generated with MHCflurry 0.9.2. - -Your exact predictions may vary slightly from these (up to about 1 nM) -depending on the Keras backend in use and other numerical details. -Different versions of MHCflurry can of course give results -considerably different from these. - -You can also specify the input and output as CSV files. Run -"mhcflurry-predict -h" for details. - - -Fitting your own models -*********************** - - -Library usage -============= - -The MHCflurry Python API exposes additional options and features -beyond those supported by the commandline tools. This tutorial gives a -basic overview of the most important functionality. See the API -Documentation for further details. - -The "Class1AffinityPredictor" class is the primary user-facing -interface. - - - >>> import mhcflurry - >>> print("MHCflurry version: %s" % (mhcflurry.__version__)) - MHCflurry version: 1.0.0 - >>> - >>> # Load downloaded predictor - >>> predictor = mhcflurry.Class1AffinityPredictor.load() - >>> print(predictor.supported_alleles) - ['BoLA-6*13:01', 'Eqca-1*01:01', 'H-2-Db', 'H-2-Dd', 'H-2-Kb', 'H-2-Kd', 'H-2-Kk', 'H-2-Ld', 'HLA-A*01:01', 'HLA-A*02:01', 'HLA-A*02:02', 'HLA-A*02:03', 'HLA-A*02:05', 'HLA-A*02:06', 'HLA-A*02:07', 'HLA-A*02:11', 'HLA-A*02:12', 'HLA-A*02:16', 'HLA-A*02:17', 'HLA-A*02:19', 'HLA-A*02:50', 'HLA-A*03:01', 'HLA-A*11:01', 'HLA-A*23:01', 'HLA-A*24:01', 'HLA-A*24:02', 'HLA-A*24:03', 'HLA-A*25:01', 'HLA-A*26:01', 'HLA-A*26:02', 'HLA-A*26:03', 'HLA-A*29:02', 'HLA-A*30:01', 'HLA-A*30:02', 'HLA-A*31:01', 'HLA-A*32:01', 'HLA-A*32:07', 'HLA-A*33:01', 'HLA-A*66:01', 'HLA-A*68:01', 'HLA-A*68:02', 'HLA-A*68:23', 'HLA-A*69:01', 'HLA-A*80:01', 'HLA-B*07:01', 'HLA-B*07:02', 'HLA-B*08:01', 'HLA-B*08:02', 'HLA-B*08:03', 'HLA-B*14:02', 'HLA-B*15:01', 'HLA-B*15:02', 'HLA-B*15:03', 'HLA-B*15:09', 'HLA-B*15:17', 'HLA-B*15:42', 'HLA-B*18:01', 'HLA-B*27:01', 'HLA-B*27:03', 'HLA-B*27:04', 'HLA-B*27:05', 'HLA-B*27:06', 'HLA-B*27:20', 'HLA-B*35:01', 'HLA-B*35:03', 'HLA-B*35:08', 'HLA-B*37:01', 'HLA-B*38:01', 'HLA-B*39:01', 'HLA-B*40:01', 'HLA-B*40:02', 'HLA-B*42:01', 'HLA-B*44:01', 'HLA-B*44:02', 'HLA-B*44:03', 'HLA-B*45:01', 'HLA-B*45:06', 'HLA-B*46:01', 'HLA-B*48:01', 'HLA-B*51:01', 'HLA-B*53:01', 'HLA-B*54:01', 'HLA-B*57:01', 'HLA-B*58:01', 'HLA-B*73:01', 'HLA-B*83:01', 'HLA-C*03:03', 'HLA-C*03:04', 'HLA-C*04:01', 'HLA-C*05:01', 'HLA-C*06:02', 'HLA-C*07:01', 'HLA-C*07:02', 'HLA-C*08:02', 'HLA-C*12:03', 'HLA-C*14:02', 'HLA-C*15:02', 'Mamu-A*01:01', 'Mamu-A*02:01', 'Mamu-A*02:0102', 'Mamu-A*07:01', 'Mamu-A*07:0103', 'Mamu-A*11:01', 'Mamu-A*22:01', 'Mamu-A*26:01', 'Mamu-B*01:01', 'Mamu-B*03:01', 'Mamu-B*08:01', 'Mamu-B*10:01', 'Mamu-B*17:01', 'Mamu-B*17:04', 'Mamu-B*39:01', 'Mamu-B*52:01', 'Mamu-B*66:01', 'Mamu-B*83:01', 'Mamu-B*87:01', 'Patr-A*01:01', 'Patr-A*03:01', 'Patr-A*04:01', 'Patr-A*07:01', 'Patr-A*09:01', 'Patr-B*01:01', 'Patr-B*13:01', 'Patr-B*24:01'] - - # coding: utf-8 - - # In[22]: - - import pandas - import numpy - import seaborn - import logging - from matplotlib import pyplot - - import mhcflurry - - - - # # Download data and models - - # In[2]: - - get_ipython().system('mhcflurry-downloads fetch') - - - # # Making predictions with `Class1AffinityPredictor` - - # In[3]: - - help(mhcflurry.Class1AffinityPredictor) - - - # In[4]: - - downloaded_predictor = mhcflurry.Class1AffinityPredictor.load() - - - # In[5]: - - downloaded_predictor.predict(allele="HLA-A0201", peptides=["SIINFEKL", "SIINFEQL"]) - - - # In[6]: - - downloaded_predictor.predict_to_dataframe(allele="HLA-A0201", peptides=["SIINFEKL", "SIINFEQL"]) - - - # In[7]: - - downloaded_predictor.predict_to_dataframe(alleles=["HLA-A0201", "HLA-B*57:01"], peptides=["SIINFEKL", "SIINFEQL"]) - - - # In[8]: - - downloaded_predictor.predict_to_dataframe( - allele="HLA-A0201", - peptides=["SIINFEKL", "SIINFEQL"], - include_individual_model_predictions=True) - - - # In[9]: - - downloaded_predictor.predict_to_dataframe( - allele="HLA-A0201", - peptides=["SIINFEKL", "SIINFEQL", "TAAAALANGGGGGGGG"], - throw=False) # Without throw=False, you'll get a ValueError for invalid peptides or alleles - - - # # Instantiating a `Class1AffinityPredictor` from a saved model on disk - - # In[10]: - - models_dir = mhcflurry.downloads.get_path("models_class1", "models") - models_dir - - - # In[11]: - - # This will be the same predictor we instantiated above. We're just being explicit about what models to load. - downloaded_predictor = mhcflurry.Class1AffinityPredictor.load(models_dir) - downloaded_predictor.predict(["SIINFEKL", "SIQNPEKP", "SYNFPEPI"], allele="HLA-A0301") - - - # # Fit a model: first load some data - - # In[12]: - - # This is the data the downloaded models were trained on - data_path = mhcflurry.downloads.get_path("data_curated", "curated_training_data.csv.bz2") - data_path - - - # In[13]: - - data_df = pandas.read_csv(data_path) - data_df - - - # # Fit a model: Low level `Class1NeuralNetwork` interface - - # In[14]: - - # We'll use mostly the default hyperparameters here. Could also specify them as kwargs. - new_model = mhcflurry.Class1NeuralNetwork(layer_sizes=[16]) - new_model.hyperparameters - - - # In[16]: - - train_data = data_df.loc[ - (data_df.allele == "HLA-B*57:01") & - (data_df.peptide.str.len() >= 8) & - (data_df.peptide.str.len() <= 15) - ] - get_ipython().magic('time new_model.fit(train_data.peptide.values, train_data.measurement_value.values)') - - - # In[17]: - - new_model.predict(["SYNPEPII"]) - - - # # Fit a model: high level `Class1AffinityPredictor` interface - - # In[18]: - - affinity_predictor = mhcflurry.Class1AffinityPredictor() - - # This can be called any number of times, for example on different alleles, to build up the ensembles. - affinity_predictor.fit_allele_specific_predictors( - n_models=1, - architecture_hyperparameters={"layer_sizes": [16], "max_epochs": 10}, - peptides=train_data.peptide.values, - affinities=train_data.measurement_value.values, - allele="HLA-B*57:01", - ) - - - # In[19]: - - affinity_predictor.predict(["SYNPEPII"], allele="HLA-B*57:01") - - - # # Save and restore the fit model - - # In[20]: - - get_ipython().system('mkdir /tmp/saved-affinity-predictor') - affinity_predictor.save("/tmp/saved-affinity-predictor") - get_ipython().system('ls /tmp/saved-affinity-predictor') - - - # In[21]: - - affinity_predictor2 = mhcflurry.Class1AffinityPredictor.load("/tmp/saved-affinity-predictor") - affinity_predictor2.predict(["SYNPEPII"], allele="HLA-B*57:01") - - -Supported alleles and peptide lengths -===================================== - -Models released with the current version of MHCflurry (1.0.0) support -peptides of length 8-15 and the following 124 alleles: - - BoLA-6*13:01, Eqca-1*01:01, H-2-Db, H-2-Dd, H-2-Kb, H-2-Kd, H-2-Kk, - H-2-Ld, HLA-A*01:01, HLA-A*02:01, HLA-A*02:02, HLA-A*02:03, - HLA-A*02:05, HLA-A*02:06, HLA-A*02:07, HLA-A*02:11, HLA-A*02:12, - HLA-A*02:16, HLA-A*02:17, HLA-A*02:19, HLA-A*02:50, HLA-A*03:01, - HLA-A*11:01, HLA-A*23:01, HLA-A*24:01, HLA-A*24:02, HLA-A*24:03, - HLA-A*25:01, HLA-A*26:01, HLA-A*26:02, HLA-A*26:03, HLA-A*29:02, - HLA-A*30:01, HLA-A*30:02, HLA-A*31:01, HLA-A*32:01, HLA-A*32:07, - HLA-A*33:01, HLA-A*66:01, HLA-A*68:01, HLA-A*68:02, HLA-A*68:23, - HLA-A*69:01, HLA-A*80:01, HLA-B*07:01, HLA-B*07:02, HLA-B*08:01, - HLA-B*08:02, HLA-B*08:03, HLA-B*14:02, HLA-B*15:01, HLA-B*15:02, - HLA-B*15:03, HLA-B*15:09, HLA-B*15:17, HLA-B*15:42, HLA-B*18:01, - HLA-B*27:01, HLA-B*27:03, HLA-B*27:04, HLA-B*27:05, HLA-B*27:06, - HLA-B*27:20, HLA-B*35:01, HLA-B*35:03, HLA-B*35:08, HLA-B*37:01, - HLA-B*38:01, HLA-B*39:01, HLA-B*40:01, HLA-B*40:02, HLA-B*42:01, - HLA-B*44:01, HLA-B*44:02, HLA-B*44:03, HLA-B*45:01, HLA-B*45:06, - HLA-B*46:01, HLA-B*48:01, HLA-B*51:01, HLA-B*53:01, HLA-B*54:01, - HLA-B*57:01, HLA-B*58:01, HLA-B*73:01, HLA-B*83:01, HLA-C*03:03, - HLA-C*03:04, HLA-C*04:01, HLA-C*05:01, HLA-C*06:02, HLA-C*07:01, - HLA-C*07:02, HLA-C*08:02, HLA-C*12:03, HLA-C*14:02, HLA-C*15:02, - Mamu-A*01:01, Mamu-A*02:01, Mamu-A*02:0102, Mamu-A*07:01, - Mamu-A*07:0103, Mamu-A*11:01, Mamu-A*22:01, Mamu-A*26:01, - Mamu-B*01:01, Mamu-B*03:01, Mamu-B*08:01, Mamu-B*10:01, Mamu-B*17:01, - Mamu-B*17:04, Mamu-B*39:01, Mamu-B*52:01, Mamu-B*66:01, Mamu-B*83:01, - Mamu-B*87:01, Patr-A*01:01, Patr-A*03:01, Patr-A*04:01, Patr-A*07:01, - Patr-A*09:01, Patr-B*01:01, Patr-B*13:01, Patr-B*24:01