Home •Search •Dr. Hain •Clinic website •Information for Dizzy Patients •Music •FLW • Various and Sundry
In databases, one usually structures data as a series of records, indexed by time. For example, with audiograms, one might get a new one every year. Thus a new record every year.
This produces what R calls a "wide" data series -- In the columns (cells) going across the X axis, there is each measurement made on that particular occasion.
For example, for our audiogram database, here is a headers and a record (made up)
8250,s00001558,1/16/2017 0:00,Smith,1/30/1984,outside,WNL bilaterally. L PTA=5; R PTA=7,,5,10,0,5,10,10,,15,10,10,0,15,0,,,,,,,,,randi,4/24/2018
A table of data (such as from a csv file), can be read into a variable in R using something like this:
audio=read.table("yourfile.csv", header=TRUE, sep=",",stringsAsFactors=FALSE).
This is a "wide" format table.
A piece of it (from the "fix(audio)" function in R, looks like:
R mostly requires "long" datasets. What this means is that the records as above are split into many rows consisting of an identifier and a value.
For the above data, to plot PTA (pure tone average) for each ear over time, you might want something like this:
This format has just the the ID, the Dates (DOS), the side being tested, and the values of the PTA. The ID column are useless here, but we keep it in for generality.
There are many ways to do this. Starting with the more general method,
Use the "melt" function. (this requires the libraries tidyr, and reshape2).
audio_melt=melt(audio, id.vars=c("ID", "DOS"), measure.vars=c("L_PTA", "R_PTA"))
audio_melt$DOS=as.Date(audio_melt$DOS) ## get rid of time stamps
audio_sep = separate(audio_melt, variable, c("side", NA)) ## create "side, get rid of PTA.
This method is a 3 liner, no matter how many variables you have.
Next is another method -- that takes about as much space -- just pull out two columns and stick them together.
Select out the left and right columns and rbind them together
L_PTA= data.frame(side=rep("L", length(audio$L_PTA)), date=as.Date(audio$DOS), PTA=audio$L_PTA)
R_PTA= data.frame(side=rep("R", length(audio$R_PTA)), date=as.Date(audio$DOS), PTA=audio$R_PTA)
PTA=rbind(L_PTA,R_PTA) # Sticks two data frames together vertically.
One would need one more line for each additional column, so this method could take a lot of code..
Both of these produce "long" format output. The melt function produced the data above.
|© Copyright August 5, 2021 , Timothy C. Hain, M.D. All rights reserved.|