The ‘aha’ moment

I dived into machine learning by buying the excellent book by Kevin Murphy. It is comprehensive, so daunting, but I was given a huge boost of confidence to continue when I realised that I, like most other scientists, and indeed anyone else who taken high school science, was familiar with a rudimentary form of machine learning that turns out, for me at least, to be a wonderful starting point for turning on the lights inside the black box. Remember

y = mx + c,

the equation that fits a straight line through some data? We measure some property, y, as we vary some other quantity, x, and attempt to describe the relationship using m and c. So how does this help me to start demystifying AI? Before computers became so commonplace, at school we were taught how to find the line-of-best fit by eye, which is OK for rough estimates but not very good if you are planning on writing a research article. These days we use computer software to do the work for us. It is easier and much more reliable but it also means I’ve stopped thinking about the math that lies behind finding the line-of-best-fit. For such a simple equation, the math turns out be quite sophisticated, as much in terms of concepts as in terms of the actual math itself. Reading about it also reminded me that the way we find a line-of-best-fit requires us to make assumptions about the nature of our measurements. The most common assumption is that the scatter that we see in our data, from a variety of causes, has a Gaussian distribution. This tends to work well for most scientific data, but there are other reasonable assumptions.

So how does this amount to an ‘aha’ moment. I’ve used this equation and more complicated variants of it on a number of occasions with little thought about the algorithm the software uses. But the point is that there is an algorithm, it is relatively straightforward to work through the math for anyone who has studied high school statistics, so I can know what is going on inside the black box, even though most of the time I’m not particularly interested. In other words, I’ve already used a rudimentary form of machine learning! And what is more, I’ve happily been doing so for years in complete ignorance. The machine takes my data and churns out a description of how it thinks future data might behave. The best guess for y if I take another measurement at a different point x? Well it just uses the m and the c that it learnt from previous data. If I like, I can also use the new data to improve my guesses for m and c.

With hindsight this all seems rather simple and obvious, but it turns out to provide a great way into understanding the foundations of machine learning and this particular form of artificial intelligence.