Home Search Dr. Hain Clinic website Information for Dizzy Patients Fun Various and Sundry |

R is an immensely popular and powerful programming language, that seems to be a viable alternative to Matlab for many data analysis tasks. Dr. Hain decided to learn R, mainly in an attempt to produce graphics from his large collection of clinical data stored in a mysql database.

Some general observations.

- R is open source. The beauty of R is that it is open source. The problem of R is that it is open source. To expand on this, R is IMMENSE, there are a gigantic number of contributed packages.
- There are MANY ways to do almost ANYTHING in R. For example, R has 4 different graphics "libraries".
- R is lightly documented -- the way you do things in R is to google them up.
- R is complicated -- it operates on vectors (or matrices), and these sorts of programs are tricky.

- R is a "Matlab" competitor. Matlab is a similar language, also largely based at MIT, that also operates on vectors (or matrices). Matlab is not "Open source". It is pretty much locked down. You pay lots of money to keep it running. Even if you bought the program, the licencing manager stops you from using it on two different computers (i.e. your work computer and home computer).
- More will follow on this page.

The way things work with R is that you put stuff into R, you fiddle around with it for a while, and then you save some sort of graph or numerical output.

My data is in mysql, and it comes out in rows, with one row each for a combination of a test:dos:patient_id. (DOS means date of service).

I save my data (using PHP) to a "csv" file, including the msyql field names in a header, and then I import the data using an R function. This produces rows with multiple column names.

R mainly wants a "dataframe" type construct. The workflow described above works fine if you want to process data based on field values -- for example, plot PTA (average hearing) values vs date of service.

An audiogram is basically two scatterplots, one for each ear, plotting the threshold against frequency.

The typical audiogram record has a test:dos:patient_id identifier, and 7 threshold measurements for each ear.

Because R graphic functions expect data to be in columns of dataframes, this doesn't work very well for audiograms, because the thresholds are along the X axis. This means you have to flip the data on it's side, so that thresholds become a column rather than a subset of a row.

© Copyright October 1, 2019 , Timothy C. Hain, M.D. All rights reserved. |