Newer
Older
Workflow for generating phenotype score combinations and correlating them to biofilm.
There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to.
First things first:
1. Generate normalized scores from the sorted scores.
* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate.
* `score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package.
* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions.
* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary.
* Finally, the program makes a file with all the score data in it. Repeatability. No Excel.
* I also had it combine all the scores. That just made things a lot easier.
2. Correlate all the normalized sum scores with biofilm.
* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics.
* `additive_correlator.R` Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)