Christoph Wagner is the CEO of Scanbot SDKa software development company specializing in data capture software for mobile and web applications.
Recent leaps in generative AI have demonstrated the disruptive power of machine learning. Understandably, business leaders are currently focused on how this particular technology could affect their companies. Even so, other areas of applied artificial intelligence that are already at a much more mature stage should not be overlooked. A perfect example of this is computer vision.
With computer vision technology, machines can understand visual information from the world around them through image processing techniques that approximate how human vision works. The software compares features extracted from images or videos with standards it already knows. If there is a match, it initiates an appropriate predefined action. For example, an autonomous car can respond to a stop sign by braking.
While useful, the potential of this technology was long held back by its reliance on rule-based algorithms. Computer vision systems could only handle what was explicitly programmed into them. Because real-world conditions are rarely ideal, your performance would drop significantly when lighting is poor or objects are partially obscured.
The advent of machine learning changed that. Modern software no longer relies solely on pre-programmed rules, but can instead learn specific tasks by extracting patterns from training data and applying them to information it has never encountered before. By updating your model’s parameters, it improves its performance incrementally.
The Silent Revolution
One of the most popular real-time object detection systems is YOLO, short for “You Only Look Once”. When first introduced in 2015, it demonstrated the feasibility of image recognition in a single take: instead of first locating objects in an image and then identifying them, YOLO bundles these steps into one, making it extremely fast. Some versions can handle up to 155 frames per second, at the expense of some accuracy. This means you can analyze a typical movie shot at 24 FPS six times faster than in real time – provided you have the necessary computing power.
With such powerful technology, it’s no wonder that new applications for computer vision systems are constantly being developed. Drones equipped with high-resolution cameras are now surveying acres of farmland, detecting even the smallest anomalies that indicate plant disease or soil that is poor in nutrition. In manufacturing, computer vision models count output and detect defective products. To enforce safety regulations, smart cameras automatically check that all workers are wearing a helmet.
Healthcare has also benefited immensely from advances in computer vision: while X-ray and MRI diagnostics previously depended on the expertise of a medical expert, machine learning models trained on large numbers of scans now accurately classify up to even the smallest deviations.
Even in the operating room, computer vision systems help surgeons make precise incisions and alert them to any unusual visual information they might otherwise miss. Subsequently, hourly procedure recordings are automatically analysed, segmented and annotated to provide material for research and education.
Of course, there are also dangers. Surveillance systems equipped with facial recognition software, for example, are now more effective than ever at identifying individuals. As with generative AI, we must consider the ramifications of these technological advances, carefully weigh the benefits against the risks, and work towards effective regulation.
Giving new life to old technologies
Many well-established technologies have also made leaps in performance through machine learning-enhanced computer vision systems. If you’ve ever used OCR software in the 90s or early 2000s and compared it to today, you’ll notice that the difference in text recognition quality is day and night – all thanks to machine learning.
Barcodes are another excellent example: these simple data carriers have been around for nearly 50 years. For much of that time, only specialized scanners could read them. Today, any smartphone can.
Powerful barcode scanning software with computer vision and machine learning components can read large numbers of barcodes in record time. It’s also robust, easily handling poor lighting and bad codes. We are already seeing robots that make manual inventory obsolete, gliding down aisles and scanning entire shelves of products at once.
This technology doesn’t require powerful hardware or a lot of storage space: a mobile app like this is less than 100 megabytes, runs smoothly on budget phones, and doesn’t even require Internet access.
Speaking of mobile devices, the ubiquity of smartphones is another reason companies should reevaluate their workflows. Many tasks that required specialized hardware just a few years ago can now be performed with the same device we carry with us wherever we go. Switching from hardware to software solutions can lead to significant long-term cost savings.
The time has come for companies to take a deep look at their business processes and identify any inefficiencies that can now be eliminated with technology solutions. Computer vision is just one example whose capabilities are being underestimated. However, it is prominent: as most of us rely heavily on our vision in our daily lives, visual automation has enormous potential for process optimization. Now is the perfect time to take advantage of this opportunity.
The Forbes Technology Council is an invite-only community for world-class CIOs, CTOs and technology executives. Do I qualify?