home
HomeSearchDr. Hain Clinic website Information for Dizzy Patients MusicFLWVarious and Sundry

Dr. Hain's various/sundry has helpful information about things that Dr. Hain has discovered by trial and error.

Restructuring to plot audiometric data using R

What the input looks like:

In databases, one usually structures data as a series of records, indexed by time. For example, with audiograms, one might get a new one every year. Thus a new record every year.

This produces what R calls a "wide" data series -- In the columns (cells) going across the X axis, there is each measurement made on that particular occasion.

For example, for our audiogram database, here is a headers and a record (made up)

ID,SSN,DOS,LastName,Birth,TEST,RESULT,L125,L250,L500,L1000,L2000,L4000,L8000,R125,R250,R500,R1000,R2000,R4000,R8000, R_SRT,R_dB,L_SRT,L_dB,L_PTA,R_PTA,Audio_Comments,Speech_comments,user,DateEntered


8250,s00001558,1/16/2017 0:00,Smith,1/30/1984,outside,WNL bilaterally. L PTA=5; R PTA=7,,5,10,0,5,10,10,,15,10,10,0,15,0,,,,,,,,,randi,4/24/2018

A table of data (such as from a csv file), can be read into a variable in R using something like this:

audio=read.table("yourfile.csv", header=TRUE, sep=",",stringsAsFactors=FALSE).

This is a "wide" format table.

A piece of it (from the "fix(audio)" function in R, looks like:

Wide input

What we want the output to look like:

R mostly requires "long" datasets. What this means is that the records as above are split into many rows consisting of an identifier and a value.

For the above data, to plot PTA (pure tone average) for each ear over time, you might want something like this:

Long output

This format has just the the ID, the Dates (DOS), the side being tested, and the values of the PTA. The ID column are useless here, but we keep it in for generality.

How to convert from wide to long

There are many ways to do this. Starting with the more general method,

Use the "melt" function. (this requires the libraries tidyr, and reshape2).

audio_melt=melt(audio, id.vars=c("ID", "DOS"), measure.vars=c("L_PTA", "R_PTA"))
audio_melt$DOS=as.Date(audio_melt$DOS) ## get rid of time stamps
audio_sep = separate(audio_melt, variable, c("side", NA)) ## create "side, get rid of PTA.

This method is a 3 liner, no matter how many variables you have.

Next is another method -- that takes about as much space -- just pull out two columns and stick them together.

Select out the left and right columns and rbind them together

L_PTA= data.frame(side=rep("L", length(audio$L_PTA)), date=as.Date(audio$DOS), PTA=audio$L_PTA)
R_PTA= data.frame(side=rep("R", length(audio$R_PTA)), date=as.Date(audio$DOS), PTA=audio$R_PTA)
PTA=rbind(L_PTA,R_PTA) # Sticks two data frames together vertically.

One would need one more line for each additional column, so this method could take a lot of code..

Both of these produce "long" format output. The melt function produced the data above.

Copyright August 5, 2021 , Timothy C. Hain, M.D. All rights reserved.
Dr Hain's CV Clinic dizziness-and-hearing.com FLW Various and Sundry Dr. Hain's CV