Mathematical & Computer Sciences
Permanent URI for this communityhttps://dspace-upgrade.is.ed.ac.uk/handle/10399/20
Browse
1 results
Search Results
Item Training sets for algorithm tuning : exploring the impact of structural distribution(Heriot-Watt University, 2024-05) Alkhalifah, Mashal Saleh; Corne, Professor David Wolfe; Paechter, Professor Ben; Vallejo, Doctor MartaAlgorithm tuning is a growing area of research and practice. In particular, this area of work is developing in the context of metaheuristic algorithms, which involve configuring the parameters of an algorithm on the basis of a "training set" of problem occurrences. There has been significant interest (but to date, limited research) in the relationship between the training set and the generalisation performance, often assuming that if an algorithm is targeted at problems of a certain type, then the training set of problem instances should represent a diverse collection of problems of that type. The current research investigates this assumption, focusing on how certain structural aspects of training set problem instances affect generalisation performance. In our first study, using travelling salesmen problem (TSP) instances across both uniform and Gaussian distributions, we sought evidence regarding the veracity of a "widely held assumption" about training sets and generalisations that training sets should be pulled from the same distribution as the problem instances. We found evidence against this assumption, showing that the algorithms on problem instances from the same distribution as the training sets did not perform better than those that were trained on distinct training sets; algorithms did, however, appear to perform consistently better when TSP distributions were Gaussian, as opposed to uniform. In our second study, we replicated our findings from the first, showing again that the performance of algorithms on problem instances from the same distribution as the training set did not perform better. Extending these findings, we explored whether problem difficulty played a role in generalisation, finding that greater difficulty by using fitness-distance correlation in training appears to be linked to better performance, but only in two-region problems. Finally, the third study explored these effects in the context of vehicle routing problems (VRPs). In our final study, we again found evidence against the ‘widely held assumption’, and even found that algorithms trained on different training sets seemed to perform better than those trained on the same distributions. We further showed that the difficulty of distribution which we measured by using fitness-distance correlation was tied to higher performance, confirming what was found in the second study. Overall, the primary finding across both TSP and VRP contexts is that many researchers have a prior assumption that they must train it on instances from a distribution similar to the problem instances. One significant limitation is that these data were generated using a synthetic data approach, and simplified to an extent that creates a differentiation between these and real-world data. However, the findings do have implications for the development of a-world optimisers for specific problem classes, such as VRPs with particular depot and customer distribution setups.