Skip to content
Snippets Groups Projects
Commit a84ef111 authored by Tim O'Donnell's avatar Tim O'Donnell
Browse files

working on docs

parent a6875fad
No related branches found
No related tags found
No related merge requests found
......@@ -49,21 +49,19 @@ help:
@echo " coverage to run coverage check of the documentation (if enabled)"
@echo " dummy to check syntax errors of document sources"
# Added by Tim:
# Added by Tim
.PHONY: generate
generate:
sphinx-apidoc -M -f -o _build/ ../mhcflurry
mhcflurry-downloads fetch models_class1_pan
python generate_class1_pan.py --out-dir model-info
# Added by Tim:
.PHONY: readme
readme: text
rm -f package_readme/readme.generated.txt
cat package_readme/readme_header.rst \
_build/text/package_readme/readme.template.txt \
> package_readme/readme.generated.txt
chmod 444 package_readme/readme.generated.txt # read only
# Added by Tim
.PHONY: generate_model_info
generate_model_info:
sphinx-apidoc -M -f -o _build/ ../mhcflurry
mhcflurry-downloads fetch models_class1_pan
python generate_class1_pan.py --out-dir model-info
.PHONY: clean
clean:
......@@ -72,6 +70,10 @@ clean:
mv $(BUILDDIR)/html /tmp/html-bk
rm -rf $(BUILDDIR)/*
mv /tmp/html-bk $(BUILDDIR)/html
# Added by Tim
.PHONY: clean_model_info
clean_model_info:
rm -rf model-info
.PHONY: html
......
# MHCflurry documentation
Due to our use of `sphinxcontrib-autorun2` we unfortunately require Python 2.7
to build to the docs. Python 3 is not supported.
To generate Sphinx documentation, from this directory run:
```
......
......@@ -8,24 +8,14 @@ See also the :ref:`tutorial <commandline_tutorial>`.
.. autoprogram:: mhcflurry.predict_command:parser
:prog: mhcflurry-predict
.. _mhcflurry-predict-scan:
.. autoprogram:: mhcflurry.predict_scan_command:parser
:prog: mhcflurry-predict-scan
.. _mhcflurry-downloads:
.. autoprogram:: mhcflurry.downloads_command:parser
:prog: mhcflurry-downloads
.. _mhcflurry-class1-train-allele-specific-models:
.. autoprogram:: mhcflurry.train_allele_specific_models_command:parser
:prog: mhcflurry-class1-train-allele-specific-models
.. _mhcflurry-calibrate-percentile-ranks:
.. autoprogram:: mhcflurry.calibrate_percentile_ranks_command:parser
:prog: mhcflurry-calibrate-percentile-ranks
.. _mhcflurry-class1-select-allele-specific-models:
.. autoprogram:: mhcflurry.select_allele_specific_models_command:parser
:prog: mhcflurry-class1-select-allele-specific-models
......@@ -14,12 +14,12 @@ are distributed separately from the pip package and may be downloaded with the
.. code-block:: shell
$ mhcflurry-downloads fetch models_class1
$ mhcflurry-downloads fetch models_class1_presentation
Files downloaded with :ref:`mhcflurry-downloads` are stored in a platform-specific
directory. To get the path to downloaded data, you can use:
.. command-output:: mhcflurry-downloads path models_class1
.. command-output:: mhcflurry-downloads path models_class1_presentation
:nostderr:
We also release a few other "downloads," such as curated training data and some
......@@ -28,6 +28,10 @@ experimental models. To see what's available and what you have downloaded, run:
.. command-output:: mhcflurry-downloads info
:nostderr:
Most users will only need ``models_class1_presentation``, however, as the
presentation predictor includes a peptide / MHC I binding affinity (BA) predictor
as well as an antigen processing (AP) predictor.
.. note::
The code we use for *generating* the downloads is in the
......@@ -37,8 +41,9 @@ experimental models. To see what's available and what you have downloaded, run:
Generating predictions
----------------------
The :ref:`mhcflurry-predict` command generates predictions from the command-line.
By default it will use the pre-trained models you downloaded above; other
The :ref:`mhcflurry-predict` command generates predictions for individual peptides
(as opposed to scanning protein sequences for epitopes).
By default it will use the pre-trained models you downloaded above. Other
models can be used by specifying the ``--models`` argument.
Running:
......@@ -68,6 +73,38 @@ on the Keras backend and other details.
In most cases you'll want to specify the input as a CSV file instead of passing
peptides and alleles as commandline arguments. See :ref:`mhcflurry-predict` docs.
Scanning protein sequences for predicted MHC I ligands
-------------------------------------------------
Starting in version 1.6.0, MHCflurry supports scanning proteins for MHC I binding
peptides using the ``mhcflurry-predict-scan`` command.
We'll generate predictions across ``example.fasta``, a FASTA file with two short
sequences:
.. literalinclude:: /example.fasta
Here's the ``mhctools`` invocation.
.. command-output::
mhctools
--mhc-predictor mhcflurry
--input-fasta-file example.fasta
--mhc-alleles A02:01,A03:01
--mhc-peptide-lengths 8,9,10,11
--extract-subsequences
--output-csv /tmp/subsequence_predictions.csv
:ellipsis: 2,-2
:nostderr:
This will write a file giving predictions for all subsequences of the specified lengths:
.. command-output::
head -n 3 /tmp/subsequence_predictions.csv
See the :ref:`mhcflurry-predict-scan` docs for more options.
Fitting your own models
-----------------------
......@@ -115,42 +152,6 @@ It looks like this:
:nostderr:
Scanning protein sequences for predicted epitopes
-------------------------------------------------
The `mhctools <https://github.com/hammerlab/mhctools>`__ package
provides support for scanning protein sequences to find predicted
epitopes. It supports MHCflurry as well as other binding predictors.
Here is an example.
First, install ``mhctools`` if it is not already installed:
.. code-block:: shell
$ pip install mhctools
We'll generate predictions across ``example.fasta``, a FASTA file with two short
sequences:
.. literalinclude:: /example.fasta
Here's the ``mhctools`` invocation. See ``mhctools -h`` for more information.
.. command-output::
mhctools
--mhc-predictor mhcflurry
--input-fasta-file example.fasta
--mhc-alleles A02:01,A03:01
--mhc-peptide-lengths 8,9,10,11
--extract-subsequences
--output-csv /tmp/subsequence_predictions.csv
:ellipsis: 2,-2
:nostderr:
This will write a file giving predictions for all subsequences of the specified lengths:
.. command-output::
head -n 3 /tmp/subsequence_predictions.csv
Environment variables
......
......@@ -54,7 +54,7 @@ extensions = [
'sphinx.ext.viewcode',
'sphinx.ext.githubpages',
'numpydoc',
'sphinxcontrib.autorun2',
'sphinx_autorun',
'sphinxcontrib.programoutput',
'sphinxcontrib.autoprogram',
'sphinx.ext.githubpages',
......@@ -76,7 +76,7 @@ master_doc = 'index'
# General information about the project.
project = 'MHCflurry'
copyright = '2019, Timothy O\'Donnell'
copyright = 'Timothy O\'Donnell'
author = 'Timothy O\'Donnell'
# The version info for the project you're documenting, acts as replacement for
......
......@@ -12,36 +12,24 @@ from os import mkdir
import pandas
import logomaker
import tqdm
from matplotlib import pyplot
from mhcflurry.downloads import get_path
from mhcflurry.amino_acid import COMMON_AMINO_ACIDS
from mhcflurry.class1_affinity_predictor import Class1AffinityPredictor
AMINO_ACIDS = sorted(COMMON_AMINO_ACIDS)
parser = argparse.ArgumentParser(usage=__doc__)
parser.add_argument(
"--class1-models-dir-with-ms",
"--class1-models",
"--class1-models-dir",
metavar="DIR",
default=get_path(
"models_class1_pan", "models.combined", test_exists=False),
help="Class1 models. Default: %(default)s",
)
parser.add_argument(
"--class1-models-dir-no-ms",
metavar="DIR",
default=get_path(
"models_class1_pan", "models.no_mass_spec", test_exists=False),
help="Class1 models. Default: %(default)s",
)
parser.add_argument(
"--class1-models-dir-refined",
metavar="DIR",
default=get_path(
"models_class1_pan_refined", "models.affinity", test_exists=False),
help="Class1 refined models. Default: %(default)s",
)
parser.add_argument(
"--logo-cutoff",
default=0.01,
......@@ -84,6 +72,9 @@ parser.add_argument(
def model_info(models_dir):
allele_to_sequence = Class1AffinityPredictor.load(
models_dir).allele_to_sequence
length_distributions_df = pandas.read_csv(
join(models_dir, "length_distributions.csv.bz2"))
frequency_matrices_df = pandas.read_csv(
......@@ -104,10 +95,21 @@ def model_info(models_dir):
normalized_frequency_matrices.loc[:, AMINO_ACIDS] = (
normalized_frequency_matrices[AMINO_ACIDS] / distribution)
sequence_to_alleles = defaultdict(list)
for allele in normalized_frequency_matrices.allele.unique():
sequence = allele_to_sequence[allele]
sequence_to_alleles[sequence].append(allele)
allele_equivalance_classes = sorted([
sorted(equivalence_group)
for equivalence_group in sequence_to_alleles.values()
], key=lambda equivalence_group: equivalence_group[0])
return {
'length_distributions': length_distributions_df,
'normalized_frequency_matrices': normalized_frequency_matrices,
'observations_per_allele': observations_per_allele,
'allele_equivalance_classes': allele_equivalance_classes,
}
......@@ -191,7 +193,7 @@ def go(argv):
mkdir(args.out_dir)
predictors = [
("combined", args.class1_models_dir_with_ms),
("combined", args.class1_models_dir),
]
info_per_predictor = OrderedDict()
alleles = set()
......@@ -224,7 +226,6 @@ def go(argv):
w(".. contents:: :local:", "")
def image(name):
if name is None:
return ""
......@@ -234,7 +235,7 @@ def go(argv):
if args.max_alleles:
alleles = alleles[:args.max_alleles]
for allele in alleles:
for allele in tqdm.tqdm(alleles):
w(allele, "-" * 80, "")
for (label, info) in info_per_predictor.items():
length_distribution = info["length_distributions"]
......
......@@ -5,7 +5,8 @@ MHCflurry is an open source package for peptide/MHC I binding affinity predictio
provides competitive accuracy with a fast and documented implementation.
You can download pre-trained MHCflurry models fit to affinity measurements
deposited in IEDB or train a MHCflurry predictor on your own data.
deposited in IEDB (and a few other sources)
or train a MHCflurry predictor on your own data.
Currently only allele-specific prediction is implemented, in which separate models
are trained for each allele. The released models therefore support a fixed set of common
......
sphinx
sphinxcontrib-autorun2
sphinxcontrib-autorun
sphinxcontrib-programoutput
sphinxcontrib-autoprogram
sphinx-rtd-theme
......@@ -9,3 +9,4 @@ mhctools
pydot
tabulate
logomaker
tqdm
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment