Author

Jonas Bühler

Jonas is a software engineer at preML and writes about computer vision and machine learning research projects.

Advances in the field of artificial intelligence (AI) are also opening up new opportunities for automation in visual quality inspection. However, a large amount of data is required to train an AI model, which must first be collected and then annotated manually. The generation of synthetic images significantly reduces the amount of data required and therefore simplifies the use of AI models.

Image 1: Example of performance improvement with synthetic training data.

Context: AI in quality control

Thanks to innovations in computing power and algorithms, AI has developed rapidly in recent years and has proven to be a disruptive technology, particularly in image processing.

The preML GmbH recognized the advantages of AI in the automation of visual quality inspection at a very early stage. The founders, who first came into contact with AI in the field of autonomous driving and the analysis of satellite data, established the company at the Karlsruhe Institute of Technology (KIT) in 2020 and now provide solutions for the automation of visual inspection in various industries. AI models are primarily used to reliably detect cracks in concrete products, inspect aluminum surfaces, check the completeness of assembly sets or analyze tunnel segments, for example. While the hardware works with sensors available on the market, the innovation is hidden in the company’s software. Based on its own software infrastructure, AI models can be trained quickly and cost-effectively and delivered as real-time applications. This has several advantages for customers. Application systems are more cost-effective, more individual and more flexible than conventional systems. As AI is an active field of research in which new breakthroughs are regularly made, new topics are constantly emerging. This includes the use of artificial, often called “synthetic” images for training AI models. In the following, insights into the training of AI models with synthetic images are given using a real application from the injection molding sector.

Problem: Collecting and annotating training data

One of the biggest hurdles in the use of AI is the need for a large amount of training data for AI, especially for deep learning approaches, a sub-area of machine learning that enables complex pattern recognition and interpretation. Even when using pre-trained networks, i.e. neural networks that have already been trained on large image datasets and can serve as a basis for specific tasks, a considerable amount of training data is still required.

In practice, this poses several problems. Firstly, it is difficult and cost-intensive to obtain many components with as many known fault patterns as possible. Furthermore, collecting and annotating training data is time-consuming and error-prone. Many images have to be taken with a camera setup. These must then be annotated individually and as consistently as possible. Manual work cannot be avoided and there are always edge cases in which human processors or experts decide differently, and therefore inconsistently, whether an error is present or not. However, as the AI models are only as good as the training data used to train them, such errors have a negative impact on performance.

Solution: Generation of synthetic images

For the individual use cases of preML at mostly medium-sized companies, the high costs of creating training data sets are often unacceptable and lead to projects not being implemented. In recent years, the company has therefore been looking at ways to reduce the cost of using a new AI model. One approach that has not yet been used much in practice is the creation of synthetic datasets. Synthetic images of the components with the desired defects are generated on the basis of CAD files using graphics software. This significantly reduces the amount of real images required.

Workflow: Application example from the inspection of injection molded parts

This technology can be used, for example, to automate the quality control of injection molded parts. In this specific case, a felt element is enclosed in an injection molding process to produce high-quality furniture glides. This can lead to so-called overmolding. In the worst case, such glides can do damage to floors. These over-injections are to be detected and localized by an AI model so that the faulty furniture glides can be automatically sorted out.

Image 2: inverted furniture glider from the side with low (acceptable) overmolding

Image 3: inverted furniture glider from the side with clear overmolding (box marked in red)

To generate synthetic images, the first step is to convert the existing CAD model of the furniture glider in order to import it into the Blender graphics software. There, various textures are applied to the 3D model for the plastic and felt, which come as close as possible to the real material. Other parameters such as the lighting or the background can also be customized.

Image 4: Synthetically generated image of a furniture glider

Various types of errors can then be displayed, in our case a randomly generated overspray. Depending on the type of error and the object in question, it may be necessary to adapt or implement the display of the error for the individual case.

B4 Beispielbild Synthetisch Überspritzung.png

Image 5: Synthetically generated image with problematic overmolding

This process has several advantages: Firstly, any number of images can be generated in a short space of time. These images are automatically and consistently annotated during generation so that manual post-processing is no longer necessary. Rarely occurring errors can also be generated in large quantities and variations.

In order to make the best possible use of the potential of synthetic images, there are a large number of parameters that can be changed as required. Even modifications that are difficult to recreate in practice, e.g. different colors of the material, can be easily adjusted in many cases with synthetically generated images.

Evaluation: More variations bring advantages in the application example

For the final dataset, consisting of 147 real and 250 synthetic images, various tests were carried out to determine how the performance behaves with a different number of synthetic and real images.
Two thirds of the real images were used for training and validation and one third for determining performance, while the synthetic images were used exclusively for training.

The experiments have shown that training with only synthetic images has already produced initial successes, but these are not yet sufficient for practical use. Real data is required for reliable and stable results in this application. However, a few real images already lead to a significant increase in results.
Even if a few real images have already been collected, both the detection of faults and the localization of faults can be further improved using artificial data.

Figure 1: Comparison of performance with and without artificial images, with varying numbers of real images in the training data. mAP@50 is a performance metric. The averages from 10 experiments each are shown.

Outlook: The future of AI quality control with synthetic data

The generation of synthetic images has the potential to further improve visual quality inspection and make it even more cost-effective. The images can be used in the initial phase of a project to quickly produce initial results, even with very little real data. The data can also be useful in a later project phase to further improve the results or to retrain the AI model for new product variants before a product change.

Whether it is worth using this method in a project depends, among other things, on how easily the errors in particular can be displayed and varied. Another decisive factor is how time-consuming it is to record and annotate real defect images. In future, preML plans to build on synthetic data in other projects. However, due to the necessary expertise in the field of graphic design and the high variability in the error images to be generated, synthetic data will initially remain an internal tool that can be used for suitable applications or requested as a development service.

This article first appeared in QZ issue 7/2024 and online.

Some developments in the field of synthetic data at preML GmbH were funded in the CAD2SYNTH research project in the InvestBW funding program of the Baden-Württemberg Ministry of Economic Affairs, Labour and Tourism.

Autor