Skip to content
Snippets Groups Projects
README.md 1.54 KiB
Newer Older
Workflow for generating phenotype score combinations and correlating them to biofilm. 

Robert Fillinger's avatar
Robert Fillinger committed
There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to. 

First things first: 

1. Generate normalized scores from the sorted scores. 
	* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate. 

Robert Fillinger's avatar
Robert Fillinger committed
	* `score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package. 
	
	* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions. 
Robert Fillinger's avatar
Robert Fillinger committed
	* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary. 
Robert Fillinger's avatar
Robert Fillinger committed
	* Finally, the program makes a file with all the score data in it. Repeatability. No Excel. 
Robert Fillinger's avatar
Robert Fillinger committed
	* I also had it combine all the scores. That just made things a lot easier. 
Robert Fillinger's avatar
Robert Fillinger committed
	
2. Correlate all the normalized sum scores with biofilm.  
	* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics. 

Robert Fillinger's avatar
Robert Fillinger committed
	* `additive_correlator.R` Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)