From 883e106b036aa70b38af6d489522174d0aca6e7b Mon Sep 17 00:00:00 2001 From: Tim O'Donnell <timodonnell@gmail.com> Date: Fri, 22 Dec 2017 01:52:58 -0500 Subject: [PATCH] remove docs/package_readme --- docs/README.md | 11 - docs/package_readme/readme.generated.txt | 628 ----------------------- docs/package_readme/readme.template.rst | 183 ------- docs/package_readme/readme_header.rst | 13 - 4 files changed, 835 deletions(-) delete mode 100644 docs/package_readme/readme.generated.txt delete mode 100644 docs/package_readme/readme.template.rst delete mode 100644 docs/package_readme/readme_header.rst diff --git a/docs/README.md b/docs/README.md index 78fdc827..b6330f59 100644 --- a/docs/README.md +++ b/docs/README.md @@ -12,14 +12,3 @@ $ make generate html Documentation is written to the _build/ directory. These files should not be checked into the repo. - -We have experimented with using the documentation system to generate the mhcflurry -package level README, but this is not currently in use. To build the readme, run: - -``` -$ make readme -``` - -This will write `docs/package_readme/readme.generated.txt`. The main -[README.rst](../README.rst) could be symlinked to this file at a later point. - diff --git a/docs/package_readme/readme.generated.txt b/docs/package_readme/readme.generated.txt deleted file mode 100644 index 2ba05d58..00000000 --- a/docs/package_readme/readme.generated.txt +++ /dev/null @@ -1,628 +0,0 @@ -:orphan: - -.. image:: https://travis-ci.org/hammerlab/mhcflurry.svg?branch=master - :target: https://travis-ci.org/hammerlab/mhcflurry - -.. image:: https://coveralls.io/repos/github/hammerlab/mhcflurry/badge.svg?branch=master - :target: https://coveralls.io/github/hammerlab/mhcflurry - -mhcflurry -=================== - -Open source neural network models for peptide-MHC binding affinity prediction - -MHCflurry is an open source package for peptide/MHC I binding affinity -prediction. It provides competitive accuracy with a fast and -documented implementation. - -You can download pre-trained MHCflurry models fit to affinity -measurements deposited in IEDB or train a MHCflurry predictor on your -own data. - -Currently only allele-specific prediction is implemented, in which -separate models are trained for each allele. The released models -therefore support a fixed set of common class I alleles for which -sufficient published training data is available (see Supported alleles -and peptide lengths). - -MHCflurry supports Python versions 2.7 and 3.4+. It uses the keras -neural network library via either the Tensorflow or Theano backends. -GPUs may optionally be used for a generally modest speed improvement. - -If you find MHCflurry useful in your research please cite: - - O’Donnell, T. et al., 2017. MHCflurry: open-source class I MHC - binding affinity prediction. bioRxiv. Available at: - http://www.biorxiv.org/content/early/2017/08/09/174243. - - -Installation (pip) -****************** - -Install the package: - - $ pip install mhcflurry - -Then download our datasets and trained models: - - $ mhcflurry-downloads fetch - -From a checkout you can run the unit tests with: - - $ pip install nose - $ nosetests . - - -Using conda -*********** - -You can alternatively get up and running with a conda environment as -follows. Some users have reported that this can avoid problems -installing tensorflow. - - $ conda create -q -n mhcflurry-env python=3.6 'tensorflow>=1.1.2' - $ source activate mhcflurry-env - -Then continue as above: - - $ pip install mhcflurry - $ mhcflurry-downloads fetch - - -Command-line tutorial -===================== - - -Downloading models -****************** - -Most users will use pre-trained MHCflurry models that we release. -These models are distributed separately from the pip package and may -be downloaded with the mhcflurry-downloads tool: - - $ mhcflurry-downloads fetch models_class1 - -Files downloaded with mhcflurry-downloads are stored in a platform- -specific directory. To get the path to downloaded data, you can use: - - $ mhcflurry-downloads path models_class1 - /Users/tim/Library/Application Support/mhcflurry/4/1.0.0/models_class1/ - -We also release a few other “downloads,” such as curated training data -and some experimental models. To see what’s available and what you -have downloaded, run: - - $ mhcflurry-downloads info - Environment variables - MHCFLURRY_DATA_DIR [unset or empty] - MHCFLURRY_DOWNLOADS_CURRENT_RELEASE [unset or empty] - MHCFLURRY_DOWNLOADS_DIR [unset or empty] - - Configuration - current release = 1.0.0 - downloads dir = '/Users/tim/Library/Application Support/mhcflurry/4/1.0.0' [exists] - - DOWNLOAD NAME DOWNLOADED? DEFAULT? URL - models_class1 YES YES http://github.com/hammerlab/mhcflurry/releases/download/pre-1.0/models_class1.tar.bz2 - models_class1_experiments1 NO NO http://github.com/hammerlab/mhcflurry/releases/download/pre-1.0/models_class1_experiments1.tar.bz2 - cross_validation_class1 YES NO http://github.com/hammerlab/mhcflurry/releases/download/pre-1.0/cross_validation_class1.tar.bz2 - data_iedb NO NO https://github.com/hammerlab/mhcflurry/releases/download/pre-1.0/data_iedb.tar.bz2 - data_kim2014 NO NO http://github.com/hammerlab/mhcflurry/releases/download/0.9.1/data_kim2014.tar.bz2 - data_curated YES YES https://github.com/hammerlab/mhcflurry/releases/download/pre-1.0/data_curated.tar.bz2 - -Note: The code we use for *generating* the downloads is in the - "downloads_generation" directory in the repository. - - -Generating predictions -********************** - -The mhcflurry-predict command generates predictions from the command- -line. By default it will use the pre-trained models you downloaded -above; other models can be used by specifying the "--models" argument. - -Running: - - $ mhcflurry-predict - --alleles HLA-A0201 HLA-A0301 - --peptides SIINFEKL SIINFEKD SIINFEKQ - --out /tmp/predictions.csv - Wrote: /tmp/predictions.csv - -results in a file like this: - - $ cat /tmp/predictions.csv - allele,peptide,mhcflurry_prediction,mhcflurry_prediction_low,mhcflurry_prediction_high,mhcflurry_prediction_percentile - HLA-A0201,SIINFEKL,4899.04784343,2767.76365395,7269.68364294,6.5097875 - HLA-A0201,SIINFEKD,21050.420243,16834.6585914,24129.0460917,34.297175 - HLA-A0201,SIINFEKQ,21048.4726578,16736.5612549,24111.0131144,34.297175 - HLA-A0301,SIINFEKL,28227.2989092,24826.3079098,32714.285974,33.9512125 - HLA-A0301,SIINFEKD,30816.7212184,27685.5084708,36037.3259046,41.225775 - HLA-A0301,SIINFEKQ,24183.0210465,19346.154182,32263.7124753,24.8109625 - -The predictions are given as affinities (KD) in nM in the -"mhcflurry_prediction" column. The other fields give the 5-95 -percentile predictions across the models in the ensemble and the -quantile of the affinity prediction among a large number of random -peptides tested on that allele. - -The predictions shown above were generated with MHCflurry 1.0.0. -Different versions of MHCflurry can give considerably different -results. Even on the same version, exact predictions may vary (up to -about 1 nM) depending on the Keras backend and other details. - -In most cases you’ll want to specify the input as a CSV file instead -of passing peptides and alleles as commandline arguments. See -mhcflurry-predict docs. - - -Fitting your own models -*********************** - -The mhcflurry-class1-train-allele-specific-models command is used to -fit models to training data. The models we release with MHCflurry are -trained with a command like: - - $ mhcflurry-class1-train-allele-specific-models \ - --data TRAINING_DATA.csv \ - --hyperparameters hyperparameters.yaml \ - --percent-rank-calibration-num-peptides-per-length 1000000 \ - --min-measurements-per-allele 75 \ - --out-models-dir models - -MHCflurry predictors are serialized to disk as many files in a -directory. The command above will write the models to the output -directory specified by the "--out-models-dir" argument. This directory -has files like: - - manifest.csv - percent_ranks.csv - weights_BOLA-6*13:01-0-1e6e7c0610ac68f8.npz - ... - weights_PATR-B*24:01-0-e12e0ee723833110.npz - weights_PATR-B*24:01-0-ec4a36529321d868.npz - weights_PATR-B*24:01-0-fd5a340098d3a9f4.npz - -The "manifest.csv" file gives metadata for all the models used in the -predictor. There will be a "weights_..." file for each model giving -its weights (the parameters for the neural network). The -"percent_ranks.csv" stores a histogram of model predictions for each -allele over a large number of random peptides. It is used for -generating the percent ranks at prediction time. - -To call mhcflurry-class1-train-allele-specific-models you’ll need some -training data. The data we use for our released predictors can be -downloaded with mhcflurry-downloads: - - $ mhcflurry-downloads fetch data_curated - -It looks like this: - - $ bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3 - allele,peptide,measurement_value,measurement_type,measurement_source,original_allele - BoLA-1*21:01,AENDTLVVSV,7817.0,quantitative,Barlow - purified MHC/competitive/fluorescence,BoLA-1*02101 - BoLA-1*21:01,NQFNGGCLLV,1086.0,quantitative,Barlow - purified MHC/direct/fluorescence,BoLA-1*02101 - - -Scanning protein sequences for predicted epitopes -************************************************* - -The mhctools package provides support for scanning protein sequences -to find predicted epitopes. It supports MHCflurry as well as other -binding predictors. Here is an example. - -First, install "mhctools" if it is not already installed: - - $ pip install mhctools - -We’ll generate predictions across "example.fasta", a FASTA file with -two short sequences: - - >protein1 - MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQV - EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHH - >protein2 - VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGA - ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC - -Here’s the "mhctools" invocation. See "mhctools -h" for more -information. - - $ mhctools - --mhc-predictor mhcflurry - --input-fasta-file example.fasta - --mhc-alleles A02:01,A03:01 - --mhc-peptide-lengths 8,9,10,11 - --extract-subsequences - --output-csv /tmp/subsequence_predictions.csv - 2017-12-22 01:12:44,974 - mhctools.cli.args - INFO - Building MHC binding prediction type for alleles ['HLA-A*02:01', 'HLA-A*03:01'] and epitope lengths [8, 9, 10, 11] - 2017-12-22 01:12:48,868 - mhctools.mhcflurry - INFO - BindingPrediction(peptide='AARYSAFY', allele='HLA-A*03:01', affinity=5744.3443, percentile_rank=None, source_sequence_name=None, offset=0, prediction_method_name='mhcflurry') - ... - - [1192 rows x 8 columns] - -This will write a file giving predictions for all subsequences of the -specified lengths: - - $ head -n 3 /tmp/subsequence_predictions.csv - ,source_sequence_name,offset,peptide,allele,affinity,percentile_rank,prediction_method_name,length - 0,protein2,42,AARYSAFY,HLA-A*03:01,5744.3442744,,mhcflurry,8 - 1,protein2,42,AARYSAFYN,HLA-A*03:01,10576.5364408,,mhcflurry,9 - - -Python library tutorial -======================= - - -Predicting -********** - -The MHCflurry Python API exposes additional options and features -beyond those supported by the commandline tools. This tutorial gives a -basic overview of the most important functionality. See the API -Documentation for further details. - -The "Class1AffinityPredictor" class is the primary user-facing -interface. Use the "load" static method to load a trained predictor -from disk. With no arguments this method will load the predictor -released with MHCflurry (see Downloading models). If you pass a path -to a models directory, then it will load that predictor instead. - - >>> from mhcflurry import Class1AffinityPredictor - >>> predictor = Class1AffinityPredictor.load() - >>> predictor.supported_alleles[:10] - ['BoLA-6*13:01', 'Eqca-1*01:01', 'H-2-Db', 'H-2-Dd', 'H-2-Kb', 'H-2-Kd', 'H-2-Kk', 'H-2-Ld', 'HLA-A*01:01', 'HLA-A*02:01'] - -With a predictor loaded we can now generate some binding predictions: - - >>> predictor.predict(allele="HLA-A0201", peptides=["SIINFEKL", "SIINFEQL"]) - /Users/tim/miniconda3/envs/py2k/lib/python2.7/site-packages/h5py/__init__.py:34: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 80 - from ._conv import register_converters as _register_converters - /Users/tim/miniconda3/envs/py2k/lib/python2.7/site-packages/h5py/__init__.py:43: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 80 - from . import h5a, h5d, h5ds, h5f, h5fd, h5g, h5r, h5s, h5t, h5p, h5z - /Users/tim/miniconda3/envs/py2k/lib/python2.7/site-packages/h5py/_hl/group.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 80 - from .. import h5g, h5i, h5o, h5r, h5t, h5l, h5p - Using TensorFlow backend. - array([ 4899.04784343, 5685.25682682]) - -Note: MHCflurry normalizes allele names using the mhcnames package. - Names like "HLA-A0201" or "A*02:01" will be normalized to - "HLA-A*02:01", so most naming conventions can be used with methods - such as "predict". - -For more detailed results, we can use "predict_to_dataframe". - - >>> predictor.predict_to_dataframe(allele="HLA-A0201", peptides=["SIINFEKL", "SIINFEQL"]) - allele peptide prediction prediction_low prediction_high \ - 0 HLA-A0201 SIINFEKL 4899.047843 2767.763654 7269.683643 - 1 HLA-A0201 SIINFEQL 5685.256827 3815.923563 7476.714466 - - prediction_percentile - 0 6.509787 - 1 7.436687 - -Instead of a single allele and multiple peptides, we may need -predictions for allele/peptide pairs. We can predict across pairs by -specifying the "alleles" argument instead of "allele". The list of -alleles must be the same length as the list of peptides (i.e. it is -predicting over pairs, *not* taking the cross product). - - >>> predictor.predict(alleles=["HLA-A0201", "HLA-B*57:01"], peptides=["SIINFEKL", "SIINFEQL"]) - array([ 4899.04794216, 26704.22011499]) - - -Training -******** - -Let’s fit our own MHCflurry predictor. First we need some training -data. If you haven’t already, run this in a shell to download the -MHCflurry training data: - - $ mhcflurry-downloads fetch data_curated - -We can get the path to this data from Python using -"mhcflurry.downloads.get_path": - - >>> from mhcflurry.downloads import get_path - >>> data_path = get_path("data_curated", "curated_training_data.csv.bz2") - >>> data_path - '/Users/tim/Library/Application Support/mhcflurry/4/1.0.0/data_curated/curated_training_data.csv.bz2' - -Now let’s load it with pandas and filter to reasonably-sized peptides: - - >>> import pandas - >>> df = pandas.read_csv(data_path) - >>> df = df.loc[(df.peptide.str.len() >= 8) & (df.peptide.str.len() <= 15)] - >>> df.head(5) - allele peptide measurement_value measurement_type \ - 0 BoLA-1*21:01 AENDTLVVSV 7817.0 quantitative - 1 BoLA-1*21:01 NQFNGGCLLV 1086.0 quantitative - 2 BoLA-2*08:01 AAHCIHAEW 21.0 quantitative - 3 BoLA-2*08:01 AAKHMSNTY 1299.0 quantitative - 4 BoLA-2*08:01 DSYAYMRNGW 2.0 quantitative - - measurement_source original_allele - 0 Barlow - purified MHC/competitive/fluorescence BoLA-1*02101 - 1 Barlow - purified MHC/direct/fluorescence BoLA-1*02101 - 2 Barlow - purified MHC/direct/fluorescence BoLA-2*00801 - 3 Barlow - purified MHC/direct/fluorescence BoLA-2*00801 - 4 Barlow - purified MHC/direct/fluorescence BoLA-2*00801 - -We’ll make an untrained "Class1AffinityPredictor" and then call -"fit_allele_specific_predictors" to fit some models. - - >>> new_predictor = Class1AffinityPredictor() - >>> single_allele_train_data = df.loc[df.allele == "HLA-B*57:01"].sample(100) - >>> new_predictor.fit_allele_specific_predictors( - ... n_models=1, - ... architecture_hyperparameters={ - ... "layer_sizes": [16], - ... "max_epochs": 5, - ... "random_negative_constant": 5, - ... }, - ... peptides=single_allele_train_data.peptide.values, - ... affinities=single_allele_train_data.measurement_value.values, - ... allele="HLA-B*57:01") - Train on 112 samples, validate on 28 samples - Epoch 1/1 - - 112/112 [==============================] - 0s 3ms/step - loss: 0.3730 - val_loss: 0.3472 - Epoch 0 / 5: loss=0.373015. Min val loss (None) at epoch None - Train on 112 samples, validate on 28 samples - Epoch 1/1 - - 112/112 [==============================] - 0s 38us/step - loss: 0.3508 - val_loss: 0.3345 - Train on 112 samples, validate on 28 samples - Epoch 1/1 - - 112/112 [==============================] - 0s 37us/step - loss: 0.3375 - val_loss: 0.3218 - Train on 112 samples, validate on 28 samples - Epoch 1/1 - - 112/112 [==============================] - 0s 36us/step - loss: 0.3227 - val_loss: 0.3092 - Train on 112 samples, validate on 28 samples - Epoch 1/1 - - 112/112 [==============================] - 0s 37us/step - loss: 0.3104 - val_loss: 0.2970 - [<mhcflurry.class1_neural_network.Class1NeuralNetwork object at 0x11e28ad10>] - -The "fit_allele_specific_predictors" method can be called any number -of times on the same instance to build up ensembles of models across -alleles. The "architecture_hyperparameters" we specified are for -demonstration purposes; to fit real models you would usually train for -more epochs. - -Now we can generate predictions: - - >>> new_predictor.predict(["SYNPEPII"], allele="HLA-B*57:01") - array([ 610.30706541]) - -We can save our predictor to the specified directory on disk by -running: - - >>> new_predictor.save("/tmp/new-predictor") - -and restore it: - - >>> new_predictor2 = Class1AffinityPredictor.load("/tmp/new-predictor") - >>> new_predictor2.supported_alleles - ['HLA-B*57:01'] - - -Lower level interface -********************* - -The high-level "Class1AffinityPredictor" delegates to low-level -"Class1NeuralNetwork" objects, each of which represents a single -neural network. The purpose of "Class1AffinityPredictor" is to -implement several important features: - -ensembles - More than one neural network can be used to generate each - prediction. The predictions returned to the user are the geometric - mean of the individual model predictions. This gives higher - accuracy in most situations - -multiple alleles - A "Class1NeuralNetwork" generates predictions for only a single - allele. The "Class1AffinityPredictor" maps alleles to the relevant - "Class1NeuralNetwork" instances - -serialization - Loading and saving predictors is implemented in - "Class1AffinityPredictor". - -Sometimes it’s easiest to work directly with "Class1NeuralNetwork". -Here is a simple example of doing so: - - >>> from mhcflurry import Class1NeuralNetwork - >>> network = Class1NeuralNetwork() - >>> network.fit( - ... single_allele_train_data.peptide.values, - ... single_allele_train_data.measurement_value.values, - ... verbose=0) - Epoch 0 / 500: loss=0.533378. Min val loss (None) at epoch None - Early stopping at epoch 124 / 500: loss=0.0115427. Min val loss (0.0719302743673) at epoch 113 - >>> network.predict(["SIINFEKLL"]) - array([ 23004.58985458]) - - -Supported alleles and peptide lengths -===================================== - -Models released with the current version of MHCflurry (1.0.0) support -peptides of length 8-15 and the following 124 alleles: - - BoLA-6*13:01, Eqca-1*01:01, H-2-Db, H-2-Dd, H-2-Kb, H-2-Kd, H-2-Kk, - H-2-Ld, HLA-A*01:01, HLA-A*02:01, HLA-A*02:02, HLA-A*02:03, - HLA-A*02:05, HLA-A*02:06, HLA-A*02:07, HLA-A*02:11, HLA-A*02:12, - HLA-A*02:16, HLA-A*02:17, HLA-A*02:19, HLA-A*02:50, HLA-A*03:01, - HLA-A*11:01, HLA-A*23:01, HLA-A*24:01, HLA-A*24:02, HLA-A*24:03, - HLA-A*25:01, HLA-A*26:01, HLA-A*26:02, HLA-A*26:03, HLA-A*29:02, - HLA-A*30:01, HLA-A*30:02, HLA-A*31:01, HLA-A*32:01, HLA-A*32:07, - HLA-A*33:01, HLA-A*66:01, HLA-A*68:01, HLA-A*68:02, HLA-A*68:23, - HLA-A*69:01, HLA-A*80:01, HLA-B*07:01, HLA-B*07:02, HLA-B*08:01, - HLA-B*08:02, HLA-B*08:03, HLA-B*14:02, HLA-B*15:01, HLA-B*15:02, - HLA-B*15:03, HLA-B*15:09, HLA-B*15:17, HLA-B*15:42, HLA-B*18:01, - HLA-B*27:01, HLA-B*27:03, HLA-B*27:04, HLA-B*27:05, HLA-B*27:06, - HLA-B*27:20, HLA-B*35:01, HLA-B*35:03, HLA-B*35:08, HLA-B*37:01, - HLA-B*38:01, HLA-B*39:01, HLA-B*40:01, HLA-B*40:02, HLA-B*42:01, - HLA-B*44:01, HLA-B*44:02, HLA-B*44:03, HLA-B*45:01, HLA-B*45:06, - HLA-B*46:01, HLA-B*48:01, HLA-B*51:01, HLA-B*53:01, HLA-B*54:01, - HLA-B*57:01, HLA-B*58:01, HLA-B*73:01, HLA-B*83:01, HLA-C*03:03, - HLA-C*03:04, HLA-C*04:01, HLA-C*05:01, HLA-C*06:02, HLA-C*07:01, - HLA-C*07:02, HLA-C*08:02, HLA-C*12:03, HLA-C*14:02, HLA-C*15:02, - Mamu-A*01:01, Mamu-A*02:01, Mamu-A*02:0102, Mamu-A*07:01, - Mamu-A*07:0103, Mamu-A*11:01, Mamu-A*22:01, Mamu-A*26:01, - Mamu-B*01:01, Mamu-B*03:01, Mamu-B*08:01, Mamu-B*10:01, Mamu-B*17:01, - Mamu-B*17:04, Mamu-B*39:01, Mamu-B*52:01, Mamu-B*66:01, Mamu-B*83:01, - Mamu-B*87:01, Patr-A*01:01, Patr-A*03:01, Patr-A*04:01, Patr-A*07:01, - Patr-A*09:01, Patr-B*01:01, Patr-B*13:01, Patr-B*24:01 - - -mhcflurry -========= - -Open source neural network models for peptide-MHC binding affinity -prediction - -The adaptive immune system depends on the presentation of protein -fragments by MHC molecules. Machine learning models of this -interaction are used in studies of infectious diseases, autoimmune -diseases, vaccine development, and cancer immunotherapy. - -MHCflurry supports Class I peptide/MHC binding affinity prediction -using ensembles of allele-specific models. You can fit MHCflurry -models to your own data or download models that we fit to data from -IEDB and Kim 2014. Our combined dataset is available for download -here. - -Pan-allelic prediction is supported in principle but is not yet -performing accurately. Infrastructure for modeling other aspects of -antigen processing is also implemented but experimental. - -If you find MHCflurry useful in your research please cite: - - O’Donnell, T. et al., 2017. MHCflurry: open-source class I MHC - binding affinity prediction. bioRxiv. Available at: - http://www.biorxiv.org/content/early/2017/08/09/174243. - - -Setup (pip) -*********** - -Install the package: - - pip install mhcflurry - -Then download our datasets and trained models: - - mhcflurry-downloads fetch - -From a checkout you can run the unit tests with: - - nosetests . - -The MHCflurry predictors are implemented in Python using keras. - -MHCflurry works with both the tensorflow and theano keras backends. -The tensorflow backend gives faster model-loading time but is -undergoing more rapid development and sometimes hits issues. If you -encounter tensorflow errors running MHCflurry, try setting this -environment variable to switch to the theano backend: - - export KERAS_BACKEND=theano - -You may also needs to "pip install theano". - - -Setup (conda) -************* - -You can alternatively get up and running with a conda environment as -follows: - - conda create -q -n mhcflurry-env python=3.6 'tensorflow>=1.1.0' - source activate mhcflurry-env - -Then continue as above: - - pip install mhcflurry - mhcflurry-downloads fetch - -If you wish to test your installation, you can install "nose" and run -the tests from a checkout: - - pip install nose - nosetests . - - -Making predictions from the command-line -**************************************** - - $ mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ - allele,peptide,mhcflurry_prediction,mhcflurry_prediction_low,mhcflurry_prediction_high - HLA-A0201,SIINFEKL,5326.541919062165,3757.86675352994,7461.37693353508 - HLA-A0201,SIINFEKD,18763.70298522213,13140.82000240037,23269.82139560844 - HLA-A0201,SIINFEKQ,18620.10057358322,13096.425874678192,23223.148184869413 - HLA-A0301,SIINFEKL,24481.726678691946,21035.52779725433,27245.371837497867 - HLA-A0301,SIINFEKD,24687.529360239587,21582.590014592537,27749.39869616437 - HLA-A0301,SIINFEKQ,25923.062203902562,23522.5793450799,28079.456657427705 - -The predictions returned are affinities (KD) in nM. The -"prediction_low" and "prediction_high" fields give the 5-95 percentile -predictions across the models in the ensemble. The predictions above -were generated with MHCflurry 0.9.2. Your exact predictions may vary -slightly from these (up to about 1 nM) depending on the Keras backend -in use and other numerical details. Different versions of MHCflurry -can of course give results considerably different from these. - -You can also specify the input and output as CSV files. Run -"mhcflurry-predict -h" for details. - - -Making predictions from Python -****************************** - - >>> from mhcflurry import Class1AffinityPredictor - >>> predictor = Class1AffinityPredictor.load() - >>> predictor.predict_to_dataframe(peptides=['SIINFEKL'], allele='A0201') - - - allele peptide prediction prediction_low prediction_high - A0201 SIINFEKL 6029.084473 4474.103253 7771.297702 - -See the class1_allele_specific_models.ipynb notebook for an overview -of the Python API, including fitting your own predictors. - - -Scanning protein sequences for predicted epitopes -************************************************* - -The mhctools package provides support for scanning protein sequences -to find predicted epitopes. It supports MHCflurry as well as other -binding predictors. Here is an example: - - # First install mhctools if needed: - pip install mhctools - - # Now generate predictions for protein sequences in FASTA format: - mhctools \ - --mhc-predictor mhcflurry \ - --input-fasta-file INPUT.fasta \ - --mhc-alleles A02:01,A03:01 \ - --mhc-peptide-lengths 8,9,10,11 \ - --extract-subsequences \ - --out RESULT.csv - - -Details on the downloadable models -********************************** - - -Environment variables -********************* - -The path where MHCflurry looks for model weights and data can be set -with the "MHCFLURRY_DOWNLOADS_DIR" environment variable. This -directory should contain subdirectories like “models_class1”. diff --git a/docs/package_readme/readme.template.rst b/docs/package_readme/readme.template.rst deleted file mode 100644 index 52d542e4..00000000 --- a/docs/package_readme/readme.template.rst +++ /dev/null @@ -1,183 +0,0 @@ -.. include:: /intro.rst - :start-line: 3 - -.. include:: /commandline_tutorial.rst - -.. include:: /python_tutorial.rst - -.. include:: /models_supported_alleles.rst - -mhcflurry -========= - -Open source neural network models for peptide-MHC binding affinity -prediction - -The `adaptive immune -system <https://en.wikipedia.org/wiki/Adaptive_immune_system>`__ depends -on the presentation of protein fragments by -`MHC <https://en.wikipedia.org/wiki/Major_histocompatibility_complex>`__ -molecules. Machine learning models of this interaction are used in -studies of infectious diseases, autoimmune diseases, vaccine -development, and cancer immunotherapy. - -MHCflurry supports Class I peptide/MHC binding affinity prediction using -ensembles of allele-specific models. You can fit MHCflurry models to -your own data or download models that we fit to data from -`IEDB <http://www.iedb.org/home_v3.php>`__ and `Kim -2014 <http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-241>`__. -Our combined dataset is available for download -`here <https://github.com/hammerlab/mhcflurry/releases/download/pre-1.0.0-alpha/data_curated.tar.bz2>`__. - -Pan-allelic prediction is supported in principle but is not yet -performing accurately. Infrastructure for modeling other aspects of -antigen processing is also implemented but experimental. - -If you find MHCflurry useful in your research please cite: - - O'Donnell, T. et al., 2017. MHCflurry: open-source class I MHC - binding affinity prediction. bioRxiv. Available at: - http://www.biorxiv.org/content/early/2017/08/09/174243. - -Setup (pip) ------------ - -Install the package: - -:: - - pip install mhcflurry - -Then download our datasets and trained models: - -:: - - mhcflurry-downloads fetch - -From a checkout you can run the unit tests with: - -:: - - nosetests . - -The MHCflurry predictors are implemented in Python using -`keras <https://keras.io>`__. - -MHCflurry works with both the tensorflow and theano keras backends. The -tensorflow backend gives faster model-loading time but is undergoing -more rapid development and sometimes hits issues. If you encounter -tensorflow errors running MHCflurry, try setting this environment -variable to switch to the theano backend: - -:: - - export KERAS_BACKEND=theano - -You may also needs to ``pip install theano``. - -Setup (conda) -------------- - -You can alternatively get up and running with a -`conda <https://conda.io/docs/>`__ environment as follows: - -:: - - conda create -q -n mhcflurry-env python=3.6 'tensorflow>=1.1.0' - source activate mhcflurry-env - -Then continue as above: - -:: - - pip install mhcflurry - mhcflurry-downloads fetch - -If you wish to test your installation, you can install ``nose`` and run -the tests from a checkout: - -:: - - pip install nose - nosetests . - -Making predictions from the command-line ----------------------------------------- - -.. code:: shell - - $ mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ - allele,peptide,mhcflurry_prediction,mhcflurry_prediction_low,mhcflurry_prediction_high - HLA-A0201,SIINFEKL,5326.541919062165,3757.86675352994,7461.37693353508 - HLA-A0201,SIINFEKD,18763.70298522213,13140.82000240037,23269.82139560844 - HLA-A0201,SIINFEKQ,18620.10057358322,13096.425874678192,23223.148184869413 - HLA-A0301,SIINFEKL,24481.726678691946,21035.52779725433,27245.371837497867 - HLA-A0301,SIINFEKD,24687.529360239587,21582.590014592537,27749.39869616437 - HLA-A0301,SIINFEKQ,25923.062203902562,23522.5793450799,28079.456657427705 - -The predictions returned are affinities (KD) in nM. The -``prediction_low`` and ``prediction_high`` fields give the 5-95 -percentile predictions across the models in the ensemble. The -predictions above were generated with MHCflurry 0.9.2. Your exact -predictions may vary slightly from these (up to about 1 nM) depending on -the Keras backend in use and other numerical details. Different versions -of MHCflurry can of course give results considerably different from -these. - -You can also specify the input and output as CSV files. Run -``mhcflurry-predict -h`` for details. - -Making predictions from Python ------------------------------- - -.. code:: python - - >>> from mhcflurry import Class1AffinityPredictor - >>> predictor = Class1AffinityPredictor.load() - >>> predictor.predict_to_dataframe(peptides=['SIINFEKL'], allele='A0201') - - - allele peptide prediction prediction_low prediction_high - A0201 SIINFEKL 6029.084473 4474.103253 7771.297702 - -See the -`class1_allele_specific_models.ipynb <https://github.com/hammerlab/mhcflurry/blob/master/examples/class1_allele_specific_models.ipynb>`__ -notebook for an overview of the Python API, including fitting your own -predictors. - -Scanning protein sequences for predicted epitopes -------------------------------------------------- - -The `mhctools <https://github.com/hammerlab/mhctools>`__ package -provides support for scanning protein sequences to find predicted -epitopes. It supports MHCflurry as well as other binding predictors. -Here is an example: - -:: - - # First install mhctools if needed: - pip install mhctools - - # Now generate predictions for protein sequences in FASTA format: - mhctools \ - --mhc-predictor mhcflurry \ - --input-fasta-file INPUT.fasta \ - --mhc-alleles A02:01,A03:01 \ - --mhc-peptide-lengths 8,9,10,11 \ - --extract-subsequences \ - --out RESULT.csv - -Details on the downloadable models ----------------------------------- - -Environment variables ---------------------- - -The path where MHCflurry looks for model weights and data can be set -with the ``MHCFLURRY_DOWNLOADS_DIR`` environment variable. This -directory should contain subdirectories like "models_class1". - -.. |Build Status| image:: https://travis-ci.org/hammerlab/mhcflurry.svg?branch=master - :target: https://travis-ci.org/hammerlab/mhcflurry -.. |Coverage Status| image:: https://coveralls.io/repos/github/hammerlab/mhcflurry/badge.svg?branch=master - :target: https://coveralls.io/github/hammerlab/mhcflurry?branch=master diff --git a/docs/package_readme/readme_header.rst b/docs/package_readme/readme_header.rst deleted file mode 100644 index dcffcad6..00000000 --- a/docs/package_readme/readme_header.rst +++ /dev/null @@ -1,13 +0,0 @@ -:orphan: - -.. image:: https://travis-ci.org/hammerlab/mhcflurry.svg?branch=master - :target: https://travis-ci.org/hammerlab/mhcflurry - -.. image:: https://coveralls.io/repos/github/hammerlab/mhcflurry/badge.svg?branch=master - :target: https://coveralls.io/github/hammerlab/mhcflurry - -mhcflurry -=================== - -Open source neural network models for peptide-MHC binding affinity prediction - -- GitLab