The Hidden Difficulties of Machine Learning


On-campus presentation to 30 local sixth form students who intend to study Engineering at university. This presentation immediately followed the AI & ML:Cutting Through The Hype talk and was used to show how ML tasks are often not as straightforward as they may seem. This talk is very interactive with the aim that the students are able to discover the problems that appear themselves and see why certain solutions may not be sufficient for a problem.

I discuss topics such as “what counts as a difficult problem?” where I go through common (and often comical) image classification problems such as “Hot Dogs or Legs” and “Chihuahua or Muffin”. I end this section with a set of 8 images (some of which are real photos and others are randomly selected from and take the majority vote for each image for whether they think they are real. The difficulty of the problem was nicely proven when they were only able to get 1 image correct identified.

I guide them through an example problem of identifying cars in a dataset full of pixelated vehicles - a task that is analogous to my current research in Galaxy classification. Topics such as splitting the dataset, feature selection, overfitting, accuracy vs F1, and incorrect labels are all explored as I progressively introduce the problems into the task (always with the aim that they discover these problems on their own).

My hope is that this talk gives them a feel for “real-world” machine learning, as well as an introduction into what research is like.