Workflow for generating phenotype score combinations and correlating them to biofilm.
There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to.
First things first:
1. Generate normalized scores from the sorted scores.
* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate.
*`score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package.
* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions.
* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary.
* Finally, the program makes a file with all the score data in it. Repeatability. No Excel.
* I also had it combine all the scores. That just made things a lot easier.
2. Correlate all the normalized sum scores with biofilm.
1. Correlate all the normalized sum scores with biofilm.
* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics.
*`additive_correlator.R` Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)
# Gross header : MAY.Strain.. Species Soll.Clade Isolation.Site MTL.Genotype Media Temperature..C. MJD.Phenotype.Score MJD.Score.St..Dev. RJF.Phenotype.Score RJF.Score.St..Dev. Total.Average.Phenotype.Score Total.Phenotype.Score.St..Dev. Normalized.Scores
new_header="May Strain, Species, Soll Clade, Isolation Site, MTL Genotype, Media, Temperature ("+u"\N{DEGREE SIGN}"+"C), MJD Phenotype Score, MJD Score St. Dev., RJF Phenotype Score, RJF Score St. Dev., Total Average Phenotype Score, Total Phenotype Score St. Dev., Normalized Scores"
biofilm_header="May Strain, Species, Soll Clade, Isolation Site, Media, Temperature ("+u"\N{DEGREE SIGN}"+"C), Total Average Phenotype Score, Total Phenotype Score St. Dev., Normalized Scores"