Lies, Damn Lies: Experimental design

Thursday, November 19, 2009

Experimental design

I enjoy making long blog posts about subjects which no-one will understand. So for your(lack of) entertainment, heres a mini-post on experimental design, the field of my research:

Most scientists, at some point, will need to perform experiments. Typically the process they are looking at will not be devoid of error. That is, when you measure someones height, you are unlikely to a-get the same answer each time and b-get the correct answer (most people would round up to 1cm at least, which imposes inaccuracies in your measurements). Such errors mean that even if you are certain as to what causes the process, your predictions would be out by some amount.

Such errors can usually be given a probability distribution, and we want to do our best to minimise these, so our predictions can be as accurate as possible. If we have a model to describe a process, we might know the form, but not the parameters of said model. Lets suppose we have a model for the temperature of a meal. We say that

temperature=a+b*time cooked for. We know age, and we can measure height, but for us to make predictions we need the values of a and b. Well we can run experiments, cooking our items for certain amounts of time, and then measuring the temperature. If there was no error, there would be no need to run more than two experiments, as we could perfectly estimate these parameters. However, we exist in the real world, and experimental error is a fact of existence. So lets suppose our food can be baked for between 10 and 200 minutes. If we had 10 cakes to test this with, a naeive approach might be to take these evenly spaced across time. In fact if you use statistical theory you can do much better, and take 5 observations at 10 minutes, and 5 at 200. Why? Well we only have two things to estimate, a and b, so we only need to take a minimum of two observations as we know. The thing we need to estimate here is the amount of error in our estimations, so repeating our observations at these point helps to minimise it.

This result is counter-intuitive, and important, because it extends. If we had two things we could vary (time cooked and weight of cake), an instinct for many experimenters would be to vary only one thing at a time- so look at changing time, while holding weight fixed, then holding time and varying weight. The best thing to do is to vary both at once, because this allows you to see if weight and time are interacting in any way.

Now this is an extremely simplistic look at the subject, with lots of the subtlties glazed over, but some important things to note are:

Experimental design has been demonstrated to be effective in multiple situations. Given a set of goals the experimenter wants to acheieve, statisticans can almost always find a design that will do better than the experimenter currently use.
Huge amounts of scientists are completely ignorant of this field.

Its an interesting field, with many problems left to solve, and one that most people, including many mathematicians, are entirely ignorant of.

Labels: phd, statistics

2 Comments:

At 5:47 pm, Unknown said...: A critique - if you are WRONG about what varies, or both things you vary are based on a third variable, then if you vary both at once you could get some very confusing results
At 1:39 pm, Mr K said...: oh good. Adverts. Yes, you are entirely correct, this approach assumes you are correct about the model.

You can usually control against unknown variables however- experimental design is for situations in which we can make changes in a controlled manner (and we can control against unknown variables by using random ordering)

Lies, Damn Lies

Thursday, November 19, 2009

Experimental design

2 Comments:

About Me

Previous Posts