diff --git a/docs/commandline_tutorial.rst b/docs/commandline_tutorial.rst
index a2a6d22e505c09c0bc39e569bfb4904af22c9c1c..e9118abde664fc2e71488adbccdd548e31f2d561 100644
--- a/docs/commandline_tutorial.rst
+++ b/docs/commandline_tutorial.rst
@@ -62,9 +62,9 @@ The binding affinity predictions are given as affinities (KD) in nM in the
 ``mhcflurry_affinity`` column. Lower values indicate stronger binders. A commonly-used
 threshold for peptides with a reasonable chance of being immunogenic is 500 nM.
 
-The ``mhcflurry_affinity_percentile`` gives the quantile of the affinity
-prediction among a large number of random peptides tested on that allele. Lower
-is stronger. Two percent is a commonly-used threshold.
+The ``mhcflurry_affinity_percentile`` gives the percentile of the affinity
+prediction among a large number of random peptides tested on that allele (range
+0 - 100). Lower is stronger. Two percent is a commonly-used threshold.
 
 The last two columns give the antigen processing and presentation scores,
 respectively. These range from 0 to 1 with higher values indicating more
@@ -72,13 +72,13 @@ favorable processing or presentation.
 
 .. note::
 
-    The processing predictor is experimental and under
-    development. It models allele-independent effects that influence whether a
+    The processing predictor is experimental. It models allele-independent
+    effects that influence whether a
     peptide will be detected in a mass spec experiment. The presentation score is
     a simple logistic regression model that combines the (log) binding affinity
     prediction with the processing score to give a composite prediction. The resulting
-    prediction is appropriate for prioritizing potential epitopes to test, but no
-    thresholds have yet been established for what constitutes a "high enough"
+    prediction may be useful for prioritizing potential epitopes, but no
+    thresholds have been established for what constitutes a "high enough"
     presentation score.
 
 In most cases you'll want to specify the input as a CSV file instead of passing
@@ -122,20 +122,65 @@ a few options. If you have data for only one or a few MHC I alleles, the best
 approach is to use the
 :ref:`mhcflurry-class1-train-allele-specific-models` command to fit an
 "allele-specific" predictor, in which separate neural networks are used for
-each allele. Here's an example:
+each allele.
+
+To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
+training data. The data we use for our released predictors can be downloaded with
+:ref:`mhcflurry-downloads`:
+
+.. code-block:: shell
+
+    $ mhcflurry-downloads fetch data_curated
+
+It looks like this:
+
+.. command-output::
+    bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
+    :shell:
+    :nostderr:
+
+Here's an example invocation to fit a predictor:
 
 .. code-block:: shell
 
     $ mhcflurry-class1-train-allele-specific-models \
-        --data TRAINING_DATA.csv \
+        --data curated_training_data.csv.bz2 \
         --hyperparameters hyperparameters.yaml \
         --min-measurements-per-allele 75 \
         --out-models-dir models
 
+The ``hyperparameters.yaml`` file gives the list of neural network architectures
+to train models for. Here's an example specifying a single architecture:
+
+.. code-block:: yaml
+
+    - activation: tanh
+      dense_layer_l1_regularization: 0.0
+      dropout_probability: 0.0
+      early_stopping: true
+      layer_sizes: [8]
+      locally_connected_layers: []
+      loss: custom:mse_with_inequalities
+      max_epochs: 500
+      minibatch_size: 128
+      n_models: 4
+      output_activation: sigmoid
+      patience: 20
+      peptide_amino_acid_encoding: BLOSUM62
+      random_negative_affinity_max: 50000.0
+      random_negative_affinity_min: 20000.0
+      random_negative_constant: 25
+      random_negative_rate: 0.0
+      validation_split: 0.1
+
+The available hyperparameters for binding predictors are defined in
+`~mhcflurry.Class1NeuralNetwork`. To see exactly how
+these are used you will need to read the source code.
+
 .. note::
 
     MHCflurry predictors are serialized to disk as many files in a directory. The
-    command above will write the models to the output directory specified by the
+    model training command above will write the models to the output directory specified by the
     ``--out-models-dir`` argument. This directory has files like:
 
     .. program-output::
@@ -150,27 +195,19 @@ each allele. Here's an example:
     histogram of model predictions for each allele over a large number of random
     peptides. It is used for generating the percent ranks at prediction time.
 
-To call :ref:`mhcflurry-class1-train-allele-specific-models` you'll need some
-training data. The data we use for our released predictors can be downloaded with
-:ref:`mhcflurry-downloads`:
-
-.. code-block:: shell
-
-    $ mhcflurry-downloads fetch data_curated
-
-It looks like this:
-
-.. command-output::
-    bzcat "$(mhcflurry-downloads path data_curated)/curated_training_data.csv.bz2" | head -n 3
-    :shell:
-    :nostderr:
-
 To fit pan-allele models like the ones released with MHCflurry, you can use
-a similar tool, ``mhcflurry-class1-train-pan-allele-models``. You'll probably
+a similar tool, :ref:`mhcflurry-class1-train-pan-allele-models`. You'll probably
 also want to take a look at the scripts used to generate the production models,
 which are available in the *downloads-generation* directory in the MHCflurry
-repository. The production MHCflurry models were fit using a cluster with several
-dozen GPUs over a period of about two days.
+repository. See the scripts in the *models_class1_pan* subdirectory to see how the
+fitting and model selection was done for models currently distributed with MHCflurry.
+
+.. note::
+
+    The production MHCflurry models were fit using a cluster with several
+    dozen GPUs over a period of about two days. If you model select over fewer
+    architectures, however, it should be possible to fit a predictor using less
+    resources.
 
 
 Environment variables
diff --git a/docs/python_tutorial.rst b/docs/python_tutorial.rst
index b906c29fe57c534ef3afdd63f8871e8ee0c948ff..9e74e77b0a4354d97199a9260dcc67d7cacff8e8 100644
--- a/docs/python_tutorial.rst
+++ b/docs/python_tutorial.rst
@@ -151,7 +151,7 @@ useful methods.
 Lower level interfaces
 ----------------------------------
 
-The `~mhcflurry.Class1PresentationPredictor` predictor delegates to a
+The `~mhcflurry.Class1PresentationPredictor` delegates to a
 `~mhcflurry.Class1AffinityPredictor` instance for binding affinity predictions.
 If all you need are binding affinities, you can use this instance directly.