Skip to content
Snippets Groups Projects
Commit 95833a79 authored by Tim O'Donnell's avatar Tim O'Donnell
Browse files

Improve docs

parent e3fa457d
No related branches found
No related tags found
No related merge requests found
...@@ -10,9 +10,9 @@ Open source peptide/MHC I binding affinity prediction ...@@ -10,9 +10,9 @@ Open source peptide/MHC I binding affinity prediction
Introduction and setup Introduction and setup
---------------------- ----------------------
MHCflurry is a peptide/MHC I binding affinity prediction package written in Python. It aims to provide state of the art accuracy in a documented, fast, and open source implementation. MHCflurry is a peptide/MHC I binding affinity prediction package written in Python. It aims to provide state of the art accuracy with a documented, fast, and open source implementation.
MHCflurry users may download trained predictors fit to affinity measurements deposited in IEDB. The complete workflow to generate these models is available in the "downloads\_generation/models\_class1" directory in the repository. It is also easy for users with their own data to fit their own models. MHCflurry users may download trained predictors fit to affinity measurements deposited in IEDB. See the "downloads\_generation/models\_class1" directory in the repository for the workflow used to train these predictors. It is also easy for users with their own data to fit their own models.
Currently only allele-specific prediction is implemented, in which separate models are trained for each allele. The released models therefore support a fixed set of common class I alleles for which sufficient published training data is available. Currently only allele-specific prediction is implemented, in which separate models are trained for each allele. The released models therefore support a fixed set of common class I alleles for which sufficient published training data is available.
......
...@@ -2,13 +2,13 @@ Introduction and setup ...@@ -2,13 +2,13 @@ Introduction and setup
======================= =======================
MHCflurry is a peptide/MHC I binding affinity prediction package written in MHCflurry is a peptide/MHC I binding affinity prediction package written in
Python. It aims to provide state of the art accuracy in a documented, fast, and Python. It aims to provide state of the art accuracy with a documented, fast, and
open source implementation. open source implementation.
MHCflurry users may download trained predictors fit to affinity measurements MHCflurry users may download trained predictors fit to affinity measurements
deposited in IEDB. The complete workflow to generate these models deposited in IEDB. See the "downloads_generation/models_class1" directory in the
is available in the "downloads_generation/models_class1" directory in the repository for the workflow used to train these predictors. It is also easy
repository. It is also easy for users with their own data to fit their own models. for users with their own data to fit their own models.
Currently only allele-specific prediction is implemented, in which separate models Currently only allele-specific prediction is implemented, in which separate models
are trained for each allele. The released models therefore support a fixed set of common are trained for each allele. The released models therefore support a fixed set of common
......
# Copyright (c) 2016. Mount Sinai School of Medicine """
# Functions for encoding fixed length sequences of amino acids into various
# Licensed under the Apache License, Version 2.0 (the "License"); vector representations, such as one-hot and BLOSUM62.
# you may not use this file except in compliance with the License. """
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import ( from __future__ import (
print_function, print_function,
...@@ -118,8 +109,12 @@ def vector_encoding_length(name): ...@@ -118,8 +109,12 @@ def vector_encoding_length(name):
def index_encoding(sequences, letter_to_index_dict): def index_encoding(sequences, letter_to_index_dict):
""" """
Given a sequence of n strings all of length k, return a k * n array where Encode a sequence of same-length strings to a matrix of integers of the
the (i, j)th element is letter_to_index_dict[sequence[i][j]]. same shape. The map from characters to integers is given by
`letter_to_index_dict`.
Given a sequence of `n` strings all of length `k`, return a `k * n` array where
the (`i`, `j`)th element is `letter_to_index_dict[sequence[i][j]]`.
Parameters Parameters
---------- ----------
...@@ -128,7 +123,7 @@ def index_encoding(sequences, letter_to_index_dict): ...@@ -128,7 +123,7 @@ def index_encoding(sequences, letter_to_index_dict):
Returns Returns
------- -------
numpy.array of integers with shape (k, n) numpy.array of integers with shape (`k`, `n`)
""" """
df = pandas.DataFrame(iter(s) for s in sequences) df = pandas.DataFrame(iter(s) for s in sequences)
result = df.replace(letter_to_index_dict) result = df.replace(letter_to_index_dict)
...@@ -137,19 +132,22 @@ def index_encoding(sequences, letter_to_index_dict): ...@@ -137,19 +132,22 @@ def index_encoding(sequences, letter_to_index_dict):
def fixed_vectors_encoding(index_encoded_sequences, letter_to_vector_df): def fixed_vectors_encoding(index_encoded_sequences, letter_to_vector_df):
""" """
Given a sequence of n strings all of length k, and a dataframe mapping each Given a `n` x `k` matrix of integers such as that returned by `index_encoding()` and
character to an arbitrary vector, return a n * k * m array where a dataframe mapping each index to an arbitrary vector, return a `n * k * m`
the (i, j)th element is letter_to_vector_df.loc[sequence[i][j]]. array where the (`i`, `j`)'th element is `letter_to_vector_df.iloc[sequence[i][j]]`.
The dataframe index and columns names are ignored here; the indexing is done
entirely by integer position in the dataframe.
Parameters Parameters
---------- ----------
sequences : list of length n of strings of length k index_encoded_sequences : `n` x `k` array of integers
letter_to_vector_df : pandas.DataFrame of shape (alphabet size, m)
The index of the dataframe should be amino acid characters. letter_to_vector_df : pandas.DataFrame of shape (`alphabet size`, `m`)
Returns Returns
------- -------
numpy.array of integers with shape (n, k, m) numpy.array of integers with shape (`n`, `k`, `m`)
""" """
(num_sequences, sequence_length) = index_encoded_sequences.shape (num_sequences, sequence_length) = index_encoded_sequences.shape
target_shape = ( target_shape = (
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment