More changes

1213dada · Robert Fillinger · c31a5e3a · 1213dada · 1213dada · 1213dada
Commit 1213dada authored 4 years ago by Robert Fillinger
--- a/README.md
+++ b/README.md
 Workflow for generating phenotype score combinations and correlating them to biofilm. 

-There is one rule: no Excel. 
+There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to. 

 First things first: 

 1. Generate normalized scores from the sorted scores. 
 	* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate. 

-	`score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package. 
+	* `score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package. 
+	
+	* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions. 

-	This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions. 
+	* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary. 

-	After this, the files are modified with `column_clean.py` to remove the leading column and to clean up the column content if necessary. 
+	* Finally, the program makes a file with all the score data in it. Repeatability. No Excel. 

-	Make a file with all the scores in it. 
+	* I also had it combine all the scores. That just made things a lot easier. 

 2. Add normalized scores in different combinations.
 	* Adhesion, filamentation, and invasion scores need to be summed together in all combinations of pairs and once all together for *each* condition. 
 	* There are 6 conditions (3 different media and 2 temperatures, which don't match across the biofilm assays).

-
+	Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)

 3. Correlate all the normalized sum scores with biofilm.  
 	* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics. 

--- a/additive_correlations.csv
+++ b/additive_correlations.csv
+"Media","Temperature(°C)","Adhesion Pearson Statistic","Adhesion P-value","Filamentation Pearson Statistic","Filamentation P-value","Invasion Pearson Statistic","Invasion P-value","Adh. + Fil. Pearson Statistic","Adh. + Fil. P-value","Adh. + Inv. Pearson Statistic","Adh. + Inv. P-value","Fil. + Inv. Pearson Statistic","Fil. + Inv. P-value","Adh. + Fil. + Inv. Pearson Statistic","Adh. + Fil. + Inv. P-value"
+"LEE","30","-1.25016287101517","0.224992461286585","-0.207243834094401","0.837815880814208","-0.706141723587309","0.487860936567474","-1.01955033214067","0.319544593487795","-1.00860222054796","0.324646373820887","-0.718953443190146","0.480093974695464","-1.06860049184066","0.297380018574805"
+"LEE","37","2.45954085389506","0.0226704726604204","0.480277032992017","0.635995473365843","2.51115659055467","0.0202830422795272","1.85119392146459","0.0782577186354199","2.87852323322219","0.00899114701841621","1.93346734137877","0.0667725169091707","2.52329342970287","0.0197568540413681"
+"SPI","30","-0.516940171399622","0.610599801630142","-1.49645753678255","0.149413542849774","-0.867243634891197","0.395610433603792","-1.06936374854033","0.297044038550212","-0.76134349614126","0.454917296954279","-1.24973644633469","0.225145145329053","-1.05146543939314","0.304994525896292"
+"SPI","37","2.05011972367224","0.0530443851695684","0.566624643850994","0.576975754476763","2.43912710706865","0.0236850076580968","1.48087892979267","0.153487620828728","2.6285717593594","0.0157003241550514","1.83046253248039","0.0814110075732709","2.14262553510982","0.0440164751028548"
+"YPD","30","-1.10030182522826","0.283654024886279","1.28314352316422","0.213422503084265","1.16094353017772","0.258692214127873","0.0768972727468174","0.939433203222128","-0.0161548158873173","0.987263383826116","1.37355922449991","0.184065757202811","0.708750323800643","0.486273626156987"
+"YPD","37","-2.09862136324188","0.0481221781344565","-1.24779631599426","0.225840823438716","0.175862236935268","0.862086966608244","-2.85524081355217","0.00947636153292339","-1.2398243569074","0.228716631262432","-0.671956918543493","0.5089369791254","-1.94280674405923","0.065567621722181"
--- a/additive_correlator.R
+++ b/additive_correlator.R
+# Reference: http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r
+
+media = c( "LEE", "SPI", "YPD" )
+temps = c( "30", "37" )
+
+data <- as.data.frame(read.csv( "compiled_and_combination_scores_data.csv" ))
+new_file_name <- "additive_correlations.csv"
+header = c("Media", "Temperature(°C)", "Adhesion Pearson Statistic", "Adhesion P-value", "Filamentation Pearson Statistic", "Filamentation P-value", "Invasion Pearson Statistic", "Invasion P-value",	"Adh. + Fil. Pearson Statistic", "Adh. + Fil. P-value", "Adh. + Inv. Pearson Statistic", "Adh. + Inv. P-value",	"Fil. + Inv. Pearson Statistic", "Fil. + Inv. P-value", "Adh. + Fil. + Inv. Pearson Statistic", "Adh. + Fil. + Inv. P-value")
+
+write.table( t(header), new_file_name, sep = ",", col.names = FALSE, row.names = FALSE, append = FALSE )
+
+for ( medium in media ){
+
+	med_data <- data[ data$Media == medium, ]
+
+	for ( temp in temps ){
+		
+		d <- med_data[ (med_data$Temperature...C. == temp), ]
+
+		biofilm <- d$Biofilm.norm
+
+		adh.corr <- cor.test(d$Adhesion.norm, d$Biofilm.norm, method = c("pearson"))
+		fil.corr <- cor.test(d$Filamentation.norm, d$Biofilm.norm, method = c("pearson"))
+		inv.corr <- cor.test(d$Invasion.norm, d$Biofilm.norm, method = c("pearson"))
+		adh_fil.corr <- cor.test(d$adh_fil, d$Biofilm.norm, method = c("pearson"))
+		adh_inv.corr <- cor.test(d$adh_inv, d$Biofilm.norm, method = c("pearson"))
+		fil_inv.corr <- cor.test(d$fil_inv, d$Biofilm.norm, method = c("pearson"))
+		all_sum.corr <- cor.test(d$all_sum, d$Biofilm.norm, method = c("pearson"))
+
+		line <- c( adh.corr$statistic, adh.corr$p.value, fil.corr$statistic, fil.corr$p.value, inv.corr$statistic, inv.corr$p.value, adh_fil.corr$statistic, adh_fil.corr$p.value, adh_inv.corr$statistic, adh_inv.corr$p.value, fil_inv.corr$statistic, fil_inv.corr$p.value, all_sum.corr$statistic, all_sum.corr$p.value)
+
+		printable <- c(medium, temp, line)
+		write.table( t(printable), new_file_name, sep = ",", col.names = !file.exists(new_file_name), row.names = FALSE, append = TRUE )
+
+	}
+
+}
\ No newline at end of file
--- a/compiled_and_combination_scores_data.csv
+++ b/compiled_and_combination_scores_data.csv
--- a/score_wrangler.R
+++ b/score_wrangler.R
@@ -94,6 +94,11 @@ big_df$Biofilm.norm <- bio_sorted$Normalized.Scores
 # New file
 write.csv( big_df, "compiled_data.csv", row.names = FALSE )

+big_df$adh_fil <- big_df$Adhesion.norm + big_df$Filamentation.norm
+big_df$adh_inv <- big_df$Adhesion.norm + big_df$Invasion.norm
+big_df$fil_inv <- big_df$Filamentation.norm + big_df$Invasion.norm
+big_df$all_sum <- big_df$Adhesion.norm + big_df$Filamentation.norm + big_df$Invasion.norm

+write.csv( big_df, "compiled_and_combination_scores_data.csv", row.names = FALSE )