Skip to content
Snippets Groups Projects
Commit 0b113eda authored by Robert Fillinger's avatar Robert Fillinger
Browse files

Updated README and file removal

parent 82ff2c76
No related merge requests found
Workflow for generating phenotype score combinations and correlating them to biofilm.
There is one rule: no Excel. Every time I use excel, I have to rename the file and they get lost and I can't retrace my steps. Forcing no excel, I can see every step and fix them where I need to.
First things first:
1. Generate normalized scores from the sorted scores.
* A sorted score is a the average of the raw scores from the biological replicates. An individual photo is a biological replicate.
* `score_wrangler.R` takes in the un-normalized scores and generates a normalized column using the `preProcess()` function from the `caret` package.
* This program will also remove data that we do not want (we removed certain non-albicans *Candida* species that didn't grow under certain conditions.
* After this, the files are modified with `column_clean.py` (called inside the R script) to remove the leading column and to clean up the column content if necessary.
* Finally, the program makes a file with all the score data in it. Repeatability. No Excel.
* I also had it combine all the scores. That just made things a lot easier.
2. Correlate all the normalized sum scores with biofilm.
1. Correlate all the normalized sum scores with biofilm.
* I need a table for these that include the information on what scores are included in the composite scores, the media, and the temperature, as well as the correlation metrics.
* `additive_correlator.R` Using the `cor.test()` function described by [STHDA](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r)
......
import sys
file_name = sys.argv[1]
file = open( file_name, "r" )
biofilm = 0
if "biofilm" in file_name:
biofilm = 1
# Gross header : MAY.Strain.. Species Soll.Clade Isolation.Site MTL.Genotype Media Temperature..C. MJD.Phenotype.Score MJD.Score.St..Dev. RJF.Phenotype.Score RJF.Score.St..Dev. Total.Average.Phenotype.Score Total.Phenotype.Score.St..Dev. Normalized.Scores
new_header = "May Strain, Species, Soll Clade, Isolation Site, MTL Genotype, Media, Temperature ("+ u"\N{DEGREE SIGN}" + "C), MJD Phenotype Score, MJD Score St. Dev., RJF Phenotype Score, RJF Score St. Dev., Total Average Phenotype Score, Total Phenotype Score St. Dev., Normalized Scores"
biofilm_header = "May Strain, Species, Soll Clade, Isolation Site, Media, Temperature ("+ u"\N{DEGREE SIGN}" + "C), Total Average Phenotype Score, Total Phenotype Score St. Dev., Normalized Scores"
new_file = open( "nc_" + file_name[5:],"w") #nc = normalized clean
header = 1
for lines in file:
if header:
if biofilm:
print( biofilm_header, file = new_file )
else:
print( new_header, file = new_file )
file_header = lines.split(",")
header = 0
continue
line_list = lines.split(",")
if biofilm:
new_line_list = line_list[0:]
else:
new_line_list = line_list[0:]
# new_line_list = line_list[1:7] # Include these two lines if you want to remove the per-person scoring
# new_line_list.extend( line_list[11:] )
line_str = ",".join(new_line_list).strip()
print( line_str, file = new_file )
file.close()
new_file.close()
adh_file = open( "adhesion_scores_sorted.csv", "r" )
fil_file = open( "filamentation_scores_sorted.csv", "r")
inv_file = open( "invasion_scores_sorted.csv","r" )
adh_file.close()
fil_file.close()
inv_file.close()
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment