Skip to content
Snippets Groups Projects
Commit 1213dada authored by Robert Fillinger's avatar Robert Fillinger
Browse files

More changes

parent c31a5e3a
No related merge requests found
Workflow for generating phenotype score combinations and correlating them to biofilm.
There is one rule: no Excel.
There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to.
First things first:
1. Generate normalized scores from the sorted scores.
* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate.
`score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package.
* `score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package.
* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions.
This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions.
* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary.
After this, the files are modified with `column_clean.py` to remove the leading column and to clean up the column content if necessary.
* Finally, the program makes a file with all the score data in it. Repeatability. No Excel.
Make a file with all the scores in it.
* I also had it combine all the scores. That just made things a lot easier.
2. Add normalized scores in different combinations.
* Adhesion, filamentation, and invasion scores need to be summed together in all combinations of pairs and once all together for *each* condition.
* There are 6 conditions (3 different media and 2 temperatures, which don't match across the biofilm assays).
Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)
3. Correlate all the normalized sum scores with biofilm.
* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics.
......
"Media","Temperature(°C)","Adhesion Pearson Statistic","Adhesion P-value","Filamentation Pearson Statistic","Filamentation P-value","Invasion Pearson Statistic","Invasion P-value","Adh. + Fil. Pearson Statistic","Adh. + Fil. P-value","Adh. + Inv. Pearson Statistic","Adh. + Inv. P-value","Fil. + Inv. Pearson Statistic","Fil. + Inv. P-value","Adh. + Fil. + Inv. Pearson Statistic","Adh. + Fil. + Inv. P-value"
"LEE","30","-1.25016287101517","0.224992461286585","-0.207243834094401","0.837815880814208","-0.706141723587309","0.487860936567474","-1.01955033214067","0.319544593487795","-1.00860222054796","0.324646373820887","-0.718953443190146","0.480093974695464","-1.06860049184066","0.297380018574805"
"LEE","37","2.45954085389506","0.0226704726604204","0.480277032992017","0.635995473365843","2.51115659055467","0.0202830422795272","1.85119392146459","0.0782577186354199","2.87852323322219","0.00899114701841621","1.93346734137877","0.0667725169091707","2.52329342970287","0.0197568540413681"
"SPI","30","-0.516940171399622","0.610599801630142","-1.49645753678255","0.149413542849774","-0.867243634891197","0.395610433603792","-1.06936374854033","0.297044038550212","-0.76134349614126","0.454917296954279","-1.24973644633469","0.225145145329053","-1.05146543939314","0.304994525896292"
"SPI","37","2.05011972367224","0.0530443851695684","0.566624643850994","0.576975754476763","2.43912710706865","0.0236850076580968","1.48087892979267","0.153487620828728","2.6285717593594","0.0157003241550514","1.83046253248039","0.0814110075732709","2.14262553510982","0.0440164751028548"
"YPD","30","-1.10030182522826","0.283654024886279","1.28314352316422","0.213422503084265","1.16094353017772","0.258692214127873","0.0768972727468174","0.939433203222128","-0.0161548158873173","0.987263383826116","1.37355922449991","0.184065757202811","0.708750323800643","0.486273626156987"
"YPD","37","-2.09862136324188","0.0481221781344565","-1.24779631599426","0.225840823438716","0.175862236935268","0.862086966608244","-2.85524081355217","0.00947636153292339","-1.2398243569074","0.228716631262432","-0.671956918543493","0.5089369791254","-1.94280674405923","0.065567621722181"
# Reference: http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r
media = c( "LEE", "SPI", "YPD" )
temps = c( "30", "37" )
data <- as.data.frame(read.csv( "compiled_and_combination_scores_data.csv" ))
new_file_name <- "additive_correlations.csv"
header = c("Media", "Temperature(°C)", "Adhesion Pearson Statistic", "Adhesion P-value", "Filamentation Pearson Statistic", "Filamentation P-value", "Invasion Pearson Statistic", "Invasion P-value", "Adh. + Fil. Pearson Statistic", "Adh. + Fil. P-value", "Adh. + Inv. Pearson Statistic", "Adh. + Inv. P-value", "Fil. + Inv. Pearson Statistic", "Fil. + Inv. P-value", "Adh. + Fil. + Inv. Pearson Statistic", "Adh. + Fil. + Inv. P-value")
write.table( t(header), new_file_name, sep = ",", col.names = FALSE, row.names = FALSE, append = FALSE )
for ( medium in media ){
med_data <- data[ data$Media == medium, ]
for ( temp in temps ){
d <- med_data[ (med_data$Temperature...C. == temp), ]
biofilm <- d$Biofilm.norm
adh.corr <- cor.test(d$Adhesion.norm, d$Biofilm.norm, method = c("pearson"))
fil.corr <- cor.test(d$Filamentation.norm, d$Biofilm.norm, method = c("pearson"))
inv.corr <- cor.test(d$Invasion.norm, d$Biofilm.norm, method = c("pearson"))
adh_fil.corr <- cor.test(d$adh_fil, d$Biofilm.norm, method = c("pearson"))
adh_inv.corr <- cor.test(d$adh_inv, d$Biofilm.norm, method = c("pearson"))
fil_inv.corr <- cor.test(d$fil_inv, d$Biofilm.norm, method = c("pearson"))
all_sum.corr <- cor.test(d$all_sum, d$Biofilm.norm, method = c("pearson"))
line <- c( adh.corr$statistic, adh.corr$p.value, fil.corr$statistic, fil.corr$p.value, inv.corr$statistic, inv.corr$p.value, adh_fil.corr$statistic, adh_fil.corr$p.value, adh_inv.corr$statistic, adh_inv.corr$p.value, fil_inv.corr$statistic, fil_inv.corr$p.value, all_sum.corr$statistic, all_sum.corr$p.value)
printable <- c(medium, temp, line)
write.table( t(printable), new_file_name, sep = ",", col.names = !file.exists(new_file_name), row.names = FALSE, append = TRUE )
}
}
\ No newline at end of file
This diff is collapsed.
......@@ -94,6 +94,11 @@ big_df$Biofilm.norm <- bio_sorted$Normalized.Scores
# New file
write.csv( big_df, "compiled_data.csv", row.names = FALSE )
big_df$adh_fil <- big_df$Adhesion.norm + big_df$Filamentation.norm
big_df$adh_inv <- big_df$Adhesion.norm + big_df$Invasion.norm
big_df$fil_inv <- big_df$Filamentation.norm + big_df$Invasion.norm
big_df$all_sum <- big_df$Adhesion.norm + big_df$Filamentation.norm + big_df$Invasion.norm
write.csv( big_df, "compiled_and_combination_scores_data.csv", row.names = FALSE )
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment