Ambarella, Inc. - Experts & Thought Leaders

Jérôme Gigot Senior Director, Marketing, Ambarella, Inc.

William Xu Senior Product Marketing Manager, Ambarella, Inc.

Alberto Broggi General Manager, Ambarella, Inc.

Chan Lee VP of VLSI, Ambarella, Inc.

Chris Day VP of Marketing and Business Development, Ambarella, Inc.

Cliff Lv Project Manager, Ambarella, Inc.

Fermi Wang CEO, Ambarella, Inc.

Jerome Gigot Senior Director, Marketing, Ambarella, Inc.

Latest Ambarella, Inc. news & announcements

Ambarella’s latest 5nm AI SoC family runs vision-language models and AI-based image processing

Ambarella, Inc., an edge AI semiconductor company, announced during the ISC West security expo, the continued expansion of its AI system-on-chip (SoC) portfolio with the 5nm CV75S family. These new SoCs provide the industry’s most power- and cost-efficient option for running the latest multi-modal vision-language models (VLMs) and vision-transformer networks. This efficiency makes these cutting-edge AI technologies feasible for a broad range of cost- and power-constrained devices within security cameras for enterprises, smart cities and retail; industrial robotics and access control; and a host of AI-enabled consumer video devices, such as sports and conferencing cameras. Integrating latest technology “With the CV75S family, we are enabling mass-market product designers with the ability to integrate the latest vision-transformer technologies, including VLMs that allow zero shot image classification and multi-modal inferencing for real-time visual analytics without the need for training,” said Chris Day, VP of marketing and business development at Ambarella. “We’re also bringing our advanced AI-based image processing technology to cameras with a wide range of price points, offering significantly greater image quality for a broad spectrum of applications.” Utilising the CV75S This is Ambarella’s first mass-market SoC family to integrate its latest CVflow® 3.0 AI engine A typical example of how the CV75S will be used to run VLMs in enterprise cameras is a natural-language search that is processed within the camera to look for any object or scene among the content it has captured. A multi-modal VLM, such as the contrastive language–image pre-training (CLIP) model, can scour the footage and provide instantaneous results without being trained on that specific object or context. This opens a whole new range of AI capabilities for enterprise cameras, which can now run AI tasks tailored to their installation and user needs without retraining and deploying new AI models for each task. Additional integration This is Ambarella’s first mass-market SoC family to integrate its latest CVflow® 3.0 AI engine, which provides 3x the performance over the prior generation with support for VLMs and vision transformers, as well as advanced AI-based image processing. Additionally, the CV75S integrates the latest generation of Ambarella’s industry-leading image signal processor, 4KP30 H.264/5 video encoding, dual Arm® Cortex-A76 1.6GHz cores, and USB 3.2 connectivity. Ambarella’s Cooper™ Developer Platform To accelerate time to market, the CV75S family is supported by Ambarella’s Cooper™ Developer Platform. This recently introduced platform provides comprehensive hardware and software solutions for creating edge AI systems, including powerful, safe, and secure compute and software capabilities. It consists of industrial-grade hardware tools, collectively called Cooper Metal; along with Cooper Foundry, which provides a multi-layer software stack that supports Ambarella’s entire portfolio of AI SoCs. The CV75S is sampling now and will be demonstrated at Ambarella’s invitation-only exhibition at ISC West in Las Vegas this week.

Ambarella enlarges AI SoC portfolio with new 5nm CV75S family exhibited at ISC West 2024

Ambarella, Inc., an edge AI semiconductor company, announced during the ISC West 2024 security expo, the continued expansion of its AI system-on-chip (SoC) portfolio with the 5nm CV75S family. Power- and cost-efficient option These new SoCs provide the industry’s most power- and cost-efficient option for running the latest multi-modal vision-language models (VLMs) and vision-transformer networks. This efficiency makes these cutting-edge AI technologies feasible for a broad range of cost- and power-constrained devices within security cameras for enterprises, smart cities, and retail; industrial robotics and access control; and a host of AI-enabled consumer video devices, such as sports and conferencing cameras. CV75S family “With the CV75S family, we are enabling mass-market product designers with the ability to integrate the latest vision-transformer technologies, including VLMs that allow zero-shot image classification and multi-modal inferencing for real-time visual analytics without the need for training,” said Chris Day, VP of Marketing and Business Development at Ambarella. He adds, “We’re also bringing our advanced AI-based image processing technology to cameras with a wide range of price points, offering significantly greater image quality for a broad spectrum of applications.” Multi-modal VLM A typical example of how the CV75S will be used to run VLMs in enterprise cameras is a natural-language search A typical example of how the CV75S will be used to run VLMs in enterprise cameras is a natural-language search that is processed within the camera to look for any object or scene among the content it has captured. A multi-modal VLM, such as the contrastive language–image pre-training (CLIP) model, can scour the footage and provide instantaneous results without being trained on that specific object or context. This opens a whole new range of AI capabilities for enterprise cameras, which can run AI tasks tailored to their installation and user needs without retraining and deploying new AI models for each task. CVflow® 3.0 AI engine This is Ambarella’s first mass-market SoC family to integrate its latest CVflow® 3.0 AI engine, which provides 3x the performance over the prior generation with support for VLMs and vision transformers, as well as advanced AI-based image processing. Additionally, the CV75S integrates the latest generation of Ambarella’s industry-leading image signal processor, 4KP30 H.264/5 video encoding, dual Arm® Cortex-A76 1.6GHz cores, and USB 3.2 connectivity. Cooper™ Developer Platform To accelerate time to market, the CV75S family is supported by Ambarella’s Cooper™ Developer Platform. This recently introduced platform provides comprehensive hardware and software solutions for creating edge AI systems, including powerful, safe, and secure computing and software capabilities. It consists of industrial-grade hardware tools, collectively called Cooper Metal; along with Cooper Foundry, which provides a multi-layer software stack that supports Ambarella’s entire portfolio of AI SoCs. The CV75S is sampling and will be demonstrated at Ambarella’s invitation-only exhibition at ISC West event in Las Vegas.

IDS NXT malibu camera combines advanced consumer image processing and AI technology from Ambarella and industrial quality from IDS

IDS NXT Malibu marks a new class of intelligent industrial cameras that act as edge devices and generate AI overlays in live video streams. For the new camera series, IDS Imaging Development Systems has collaborated with Ambarella, a pioneering developer of visual AI products, making consumer technology available for demanding applications in industrial quality. CVflow® AI vision system It features Ambarella’s CVflow® AI vision system on chip and takes full advantage of the SoC’s advanced image processing and on-camera AI capabilities. Consequently, Image analysis can be performed at high speed (>25fps) and displayed as live overlays in compressed video streams via the RTSP protocol for end devices. SoC’s integrated image signal processor The information captured by the light-sensitive onsemi AR0521 image sensor is processed directly on the camera Due to the SoC’s integrated image signal processor (ISP), the information captured by the light-sensitive onsemi AR0521 image sensor is processed directly on the camera and accelerated by its integrated hardware. The camera also offers helpful automatic features, such as brightness, noise, and colour correction, which significantly improve image quality. Real-time image analysis "With IDS NXT Malibu, we have developed an industrial camera that can analyse images in real-time and incorporate results directly into video streams,” explained Kai Hartmann, Product Innovation Manager at IDS. “The combination of on-camera AI with compression and streaming is a novelty in the industrial setting, opening up new application scenarios for intelligent image processing." Industrial-grade edge AI cameras These on-camera capabilities were made possible through close collaboration between IDS and Ambarella These on-camera capabilities were made possible through close collaboration between IDS and Ambarella, leveraging the companies’ strengths in industrial camera and consumer technology. "We are proud to work with IDS, a leading company in industrial image processing,” said Jerome Gigot, senior director of marketing at Ambarella. “The IDS NXT Malibu represents a new class of industrial-grade edge AI cameras, achieving fast inference times and high image quality via our CVflow AI vision SoC." IDS NXT all-in-one AI system IDS NXT Malibu has entered series production. The camera is part of the IDS NXT all-in-one AI system. Optimally coordinated components from the camera to the AI vision studio accompany the entire workflow. This includes the acquisition of images and their labelling, through to the training of a neural network and its execution on the IDS NXT series of cameras.

Insights & Opinions from thought leaders at Ambarella, Inc.

Newest systems-on-chips (SOCs) to expand AI inside cameras, says Ambarella

When it comes to security cameras, the end user always wants more—more resolution, more artificial intelligence (AI), and more sensors. However, the cameras themselves do not change much from generation to generation; that is, they have the same power budgets, form factors and price. To achieve “more,” the systems-on-chips (SoCs) inside the video cameras must pack more features and integrate systems that would have been separate components in the past. For an update on the latest capabilities of SoCs inside video cameras, we turned to Jérôme Gigot, Senior Director of Marketing for AIoT at Ambarella, a manufacturer of SOCs. AIoT refers to the artificial intelligence of things, the combination of AI and IoT. Author's quote “The AI performance on today’s cameras matches what was typically done on a server just a generation ago,” says Gigot. “And, doing AI on-camera provides the threefold benefits of being able to run algorithms on a higher-resolution input before the video is encoded and transferred to a server, with a faster response time, and with complete privacy.” Added features of the new SOC Ambarella expects the first cameras with the SoC to emerge on the market during early part of 2024 Ambarella’s latest System on Chip (SOC) is the CV72S, which provides 6× the AI performance of the previous generation and supports the newer transformer neural networks. Even with its extra features, the CV72S maintains the same power envelope as the previous-generation SoCs. The CV72S is now available, sampling is underway by camera manufacturers, and Ambarella expects the first cameras with the SoC to emerge on the market during the early part of 2024. Examples of the added features of the new SOC include image processing, video encoders, AI engines, de-warpers for fisheye lenses, general compute cores, along with functions such as processing multiple imagers on a single SoC, fusion among different types of sensors, and the list goes on. This article will summarise new AI capabilities based on information provided by Ambarella. AI inside the cameras Gigot says AI is by far the most in-demand feature of new security camera SoCs. Customers want to run the latest neural network architectures; run more of them in parallel to achieve more functions (e.g., identifying pedestrians while simultaneously flagging suspicious behavior); run them at higher resolutions in order to pick out objects that are farther away from the camera. And they want to do it all faster. Most AI tasks can be split between object detection, object recognition, segmentation and higher-level “scene understanding” types of functions, he says. The latest AI engines support transformer network architectures (versus currently used convolutional neural networks). With enough AI horsepower, all objects in a scene can be uniquely identified and classified with a set of attributes, tracked across time and space, and fed into higher-level AI algorithms that can detect and flag anomalies. However, everything depends on which scene is within the camera’s field of view. “It might be an easy task for a camera in an office corridor to track a person passing by every couple of minutes; while a ceiling camera in an airport might be looking at thousands of people, all constantly moving in different directions and carrying a wide variety of bags,” Gigot says. Changing the configuration of video systems Low-level AI number crunching would typically be done on camera (at the source of the data) Even with more computing capability inside the camera, central video servers still have their place in the overall AI deployment, as they can more easily aggregate and understand information across multiple cameras. Additionally, low-level AI number crunching would typically be done on camera (at the source of the data). However, the increasing performance capabilities of transformer neural network AI inside the camera will reduce the need for a central video server over time. Even so, a server could still be used for higher-level decisions and to provide a representation of the world; along with a user interface for the user to make sense of all the data. Overall, AI-enabled security cameras with transformer network-based functionality will greatly reduce the use of central servers in security systems. This trend will contribute to a reduction in the greenhouse gases produced by data centres. These server farms consume a lot of energy, due to their power-hungry GPU and CPU chips, and those server processors also need to be cooled using air conditioning that emits additional greenhouse gases. New capabilities of transformer neural networks New kinds of AI architectures are being deployed inside cameras. Newer SoCs can accommodate the latest transformer neural networks (NNs), which now outperform currently used convolutional NNs for many vision tasks. Transformer neural networks require more AI processing power to run, compared to most convolutional NNs. Transformers are great for Natural Language Processing (NLP) as they have mechanisms to “make sense” of a seemingly random arrangement of words. Those same properties, when applied to video, make transformers very efficient at understanding the world in 3D. Transformer NNs require more AI processing power to run, compared to most convolutional NNs For example, imagine a multi-imager camera where an object needs to be tracked from one camera to the next. Transformer networks are also great at focussing their attention on specific parts of the scene—just as some words are more important than others in a sentence, some parts of a scene might be more significant from a security perspective. “I believe that we are currently just scratching the surface of what can be done with transformer networks in video security applications,” says Gigot. The first use cases are mainly for object detection and recognition. However, research in neural networks is focussing on these new transformer architectures and their applications. Expanded use cases for multi-image and fisheye cameras For multi-image cameras, again, the strategy is “less is more.” For example, if you need to build a multi-imager with four 4K sensors, then, in essence, you need to have four cameras in one. That means you need four imaging pipelines, four encoders, four AI engines, and four sets of CPUs to run the higher-level software and streaming. Of course, for cost, size, and power reasons, it would be extremely inefficient to have four SoCs to do all this processing. Therefore, the latest SoCs for security need to integrate four times the performance of the last generation’s single-imager 4K cameras, in order to process four sensors on a single SoC with all the associated AI algorithms. And they need to do this within a reasonable size and power budget. The challenge is very similar for fisheye cameras, where the SoC needs to be able to accept very high-resolution sensors (i.e., 12MP, 16MP and higher), in order to be able to maintain high resolution after de-warping. Additionally, that same SoC must create all the virtual views needed to make one fisheye camera look like multiple physical cameras, and it has to do all of this while running the AI algorithms on every one of those virtual streams at high resolution. The power of ‘sensor fusion’ Sensor fusion is the ability to process multiple sensor types at the same time and correlate all that information Sensor fusion is the ability to process multiple sensor types at the same time (e.g., visual, radar, thermal and time of flight) and correlate all that information. Performing sensor fusion provides an understanding of the world that is greater than the information that could be obtained from any one sensor type in isolation. In terms of chip design, this means that SoCs must be able to interface with, and natively process, inputs from multiple sensor types. Additionally, they must have the AI and CPU performance required to do either object-level fusion (i.e., matching the different objects identified through the different sensors), or even deep-level fusion. This deep fusion takes the raw data from each sensor and runs AI on that unprocessed data. The result is machine-level insights that are richer than those provided by systems that must first go through an intermediate object representation. In other words, deep fusion eliminates the information loss that comes from preprocessing each individual sensor’s data before fusing it with the data from other sensors, which is what happens in object-level fusion. Better image quality AI can be trained to dramatically improve the quality of images captured by camera sensors in low-light conditions, as well as high dynamic range (HDR) scenes with widely contrasting dark and light areas. Typical image sensors are very noisy at night, and AI algorithms can be trained to perform excellently at removing this noise to provide a clear colour picture—even down to 0.1 lux or below. This is called neural network-based image signal processing, or AISP for short. AI can be trained to perform all these functions with much better results than traditional video methods Achieving high image quality under difficult lighting conditions is always a balance among removing noise, not introducing excessive motion blur, and recovering colours. AI can be trained to perform all these functions with much better results than traditional video processing methods can achieve. A key point for video security is that these types of AI algorithms do not “create” data, they just remove noise and clean up the signal. This process allows AI to provide clearer video, even in challenging lighting conditions. The results are better footage for the humans monitoring video security systems, as well as better input for the AI algorithms analysing those systems, particularly at night and under high dynamic range conditions. A typical example would be a camera that needs to switch to night mode (black and white) when the environmental light falls below a certain lux level. By applying these specially trained AI algorithms, that same camera would be able to stay in colour mode and at full frame rate--even at night. This has many advantages, including the ability to see much farther than a typical external illuminator would normally allow, and reduced power consumption. ‘Straight to cloud’ architecture For the cameras themselves, going to the cloud or to a video management system (VMS) might seem like it doesn’t matter, as this is all just streaming video. However, the reality is more complex; especially for cameras going directly to the cloud. When cameras stream to the cloud, there is usually a mix of local, on-camera storage and streaming, in order to save on bandwidth and cloud storage costs. To accomplish this hybrid approach, multiple video-encoding qualities/resolutions are being produced and sent to different places at the same time; and the camera’s AI algorithms are constantly running to optimise bitrates and orchestrate those different video streams. The ability to support all these different streams, in parallel, and to encode them at the lowest bitrate possible, is usually guided by AI algorithms that are constantly analyzing the video feeds. These are just some of the key components needed to accommodate this “straight to cloud” architecture. Keeping cybersecurity top-of-mind Ambarella’s SoCs always implement the latest security mechanisms, both hardware and software Ambarella’s SoCs always implement the latest security mechanisms, both in hardware and software. They accomplish this through a mix of well-known security features, such as ARM trust zones and encryption algorithms, and also by adding another layer of proprietary mechanisms with things like dynamic random access memory (DRAM) scrambling and key management policies. “We take these measures because cybersecurity is of utmost importance when you design an SoC targeted to go into millions of security cameras across the globe,” says Gigot. ‘Eyes of the world’ – and more brains Cameras are “the eyes of the world,” and visual sensors provide the largest portion of that information, by far, compared to other types of sensors. With AI, most security cameras now have a brain behind those eyes. As such, security cameras have the ability to morph from just a reactive and security-focused apparatus to a global sensing infrastructure that can do everything from regulating the AC in offices based on occupancy, to detecting forest fires before anyone sees them, to following weather and world events. AI is the essential ingredient for the innovation that is bringing all those new applications to life, and hopefully leading to a safer and better world.

Top 10 articles of 2021 reflect a changing security marketplace

Our most popular articles in 2021 provide a good reflection of the state of the industry. Taken together, the Top 10 Articles of 2021, as measured by reader clicks, cover big subjects such as smart cities and cybersecurity. They address new innovations in video surveillance, including systems that are smarter and more connected, and a new generation of computer chips that improve capabilities at the edge. A recurring theme in 2021 is cybersecurity's impact on physical security, embodied by a high-profile hack of 150,000 cameras and an incident at a Florida water plant. There is also an ongoing backlash against facial recognition technology, despite promising technology trends. Cross-agency collaboration Our top articles also touch on subjects that have received less exposure, including use of artificial intelligence (AI) for fraud detection, and the problem of cable theft in South Africa. Here is a review of the Top 10 Articles of 2021, based on reader clicks, including links to the original content: Smart cities have come a long way in the last few decades, but to truly make a smart city safe Safety in Smart Cities: How Video Surveillance Keeps Security Front and Center The main foundations that underpin smart cities are 5G, Artificial Intelligence (AI), and the Internet of Things (IoT) and the Cloud. Each is equally important, and together, these technologies enable city officials to gather and analyse more detailed insights than ever before. For public safety in particular, having IoT and cloud systems in place will be one of the biggest factors to improving the quality of life for citizens. Smart cities have come a long way in the last few decades, but to truly make a smart city safe, real-time situational awareness and cross-agency collaboration are key areas that must be developed as a priority. Fraud detection technology How AI is Revolutionising Fraud Detection Fraud detection technology has advanced rapidly over the years and made it easier for security professionals to detect and prevent fraud. Artificial Intelligence (AI) is revolutionising fraud detection. Banks can use AI software to gain an overview of a customer’s spending habits online. Having this level of insight allows an anomaly detection system to determine whether a transaction is normal or not. Suspicious transactions can be flagged for further investigation and verified by the customer. If the transaction is not fraudulent, then the information can be put into the anomaly detection system to learn more about the customer’s spending behaviour online. For decades, cable theft has caused disruption to infrastructure across South Africa Remote Monitoring Technology: Tackling South Africa’s Cable Theft Problem For decades, cable theft has caused disruption to infrastructure across South Africa, and it’s an issue that permeates the whole supply chain. In November 2020, Nasdaq reported that, “When South Africa shut large parts of its economy and transport network during its COVID-19 lockdown, organised, sometimes armed, gangs moved into its crumbling stations to steal the valuable copper from the lines. Now, more than two months after that lockdown ended, the commuter rail system, relied on by millions of commuters, is barely operational.” Physical security consequences Hack of 150,000 Verkada Cameras: It Could Have Been Worse When 150,000 video surveillance cameras get hacked, it’s big news. The target of the hack was Silicon Valley startup Verkada, which has collected a massive trove of security-camera data from its 150,000 surveillance cameras inside hospitals, companies, police departments, prisons and schools. The data breach was accomplished by an international hacker collective and was first reported by Bloomberg. Water Plant Attack Emphasises Cyber’s Impact on Physical Security At an Oldsmar, Fla., water treatment facility on Feb. 5, an operator watched a computer screen as someone remotely accessed the system monitoring the water supply and increased the amount of sodium hydroxide from 100 parts per million to 11,100 parts per million. The chemical, also known as lye, is used in small concentrations to control acidity in the water. The incident is the latest example of how cybersecurity attacks can translate into real-world, physical security consequences – even deadly ones. Video surveillance technologies Organisations around the globe embraced video surveillance technologies to manage social distancing Video Surveillance is Getting Smarter and More Connected The global pandemic has triggered considerable innovation and change in the video surveillance sector. Last year, organisations around the globe embraced video surveillance technologies to manage social distancing, monitor occupancy levels in internal and external settings, and enhance their return-to-work processes. Forced to reimagine nearly every facet of their operations for a new post-COVID reality, companies were quick to seize on the possibilities offered by today’s next-generation video surveillance systems. The Post-Pandemic Mandate for Entertainment Venues: Digitally Transform Security Guards At sporting venues, a disturbing new trend has hit the headlines — poor fan behaviour. At the same time, security directors are reporting a chronic security guard shortage. Combining surveillance video with AI-based advanced analytics can automatically identify fan disturbances or other operational issues, and notify guards in real time, eliminating the need to have large numbers of guards monitoring video feeds and patrons. The business benefits of digitally transformed guards are compelling. Important emerging technology Why Access Control Is Important In a workspace, access control is particularly crucial in tracking the movement of employees should an incident occur, as well as making the life of your team much easier in allowing them to move between spaces without security personnel and site managers present. It can also reduce the outgoings of a business by reducing the need for security individuals to be hired and paid to remain on site. The city of Baltimore has banned the use of facial recognition systems by residents Baltimore Is the Latest U.S. City to Target Facial Recognition Technology The city of Baltimore has banned the use of facial recognition systems by residents, businesses and the city government (except for police). The criminalisation in a major U.S. city of an important emerging technology in the physical security industry is an extreme example of the continuing backlash against facial recognition throughout the United States. Several localities – from Portland, Oregon, to San Francisco, from Oakland, California, to Boston – have moved to limit use of the technology, and privacy groups have even proposed a national moratorium on use of facial recognition. Powerful artificial intelligence Next Wave of SoCs Will Turbocharge Camera Capabilities at The Edge A new generation of video cameras is poised to boost capabilities dramatically at the edge of the IP network, including more powerful artificial intelligence (AI) and higher resolutions, and paving the way for new applications that would have previously been too expensive or complex. Technologies at the heart of the coming new generation of video cameras are Ambarella’s newest systems on chips (SoCs). Ambarella’s CV5S and CV52S product families are bringing a new level of on-camera AI performance and integration to multi-imager and single-imager IP cameras.

Next wave of SoCs will turbocharge camera capabilities at the edge

A new generation of video cameras is poised to boost capabilities dramatically at the edge of the IP network, including more powerful artificial intelligence (AI) and higher resolutions, and paving the way for new applications that would have previously been too expensive or complex. Technologies at the heart of the coming new generation of video cameras are Ambarella’s newest systems on chips (SoCs). Ambarella’s CV5S and CV52S product families are bringing a new level of on-camera AI performance and integration to multi-imager and single-imager IP cameras. Both of these SoCs are manufactured in the ‘5 nm’ manufacturing process, bringing performance improvements and power savings, compared to the previous generation of SoCs manufactured at ‘10nm’. CV5S and CV52S AI-powered SoCs The CV5S, designed for multi-imager cameras, is able to process, encode and perform advanced AI on up to four imagers at 4Kp30 resolution, simultaneously and at less than 5 watts. This enables multi-headed camera designs with up to four 4K imagers looking at different portions of a scene, as well as very high-resolution, single-imager cameras of up to 32 MP resolution and beyond. The CV52S, designed for single-imager cameras with very powerful onboard AI, is the next-generation of the company’s successful CV22S mainstream 4K camera AI chip. This new SoC family quadruples the AI processing performance, while keeping the same low power consumption of less than 3 watts for 4Kp60 encoding with advanced AI processing. Faster and ubiquitous AI capabilities Ambarella’s newest AI vision SoCs for security, the CV5S and CV52S, are competitive solutions" “Security system designers desire higher resolutions, increasing channel counts, and ever faster and more ubiquitous AI capabilities,” explains John Lorenz, Senior Technology and Market Analyst, Computing, at Yole Développement (Yole), a French market research firm. John Lorenz adds, “Ambarella’s newest AI vision SoCs for security, the CV5S and CV52S, are competitive solutions for meeting the growing demands of the security IC (integrated circuit) sector, which our latest report forecasts to exceed US$ 4 billion by 2025, with two-thirds of that being chips with AI capabilities.” Edge AI vision processors Ambarella’s new CV5S and CV52S edge AI vision processors enable new classes of cameras that would not have been possible in the past, with a single SoC architecture. For example, implementing a 4x 4K multi-imager with AI would have traditionally required at least two SoCs (at least one for encoding and one for AI), and the overall power consumption would have made those designs bulky and prohibitively expensive. By reducing the number of required SoCs, the CV5S enables advanced camera designs such as AI-enabled 4x 4K imagers at price points much lower than would have previously been possible. “What we are usually trying to do with our SoCs is to keep the price points similar to the previous generations, given that camera retail prices tend to be fairly fixed,” said Jerome Gigot, Ambarella's Senior Director of Marketing. 4K multi-imager cameras “However, higher-end 4K multi-imager cameras tend to retail for thousands of dollars, and so even though there will be a small premium on the SoC for the 2X improvement in performance, this will not make a significant impact to the final MSRP of the camera,” adds Jerome Gigot. In addition, the overall system cost might go down, Gigot notes, compared to what could be built today because there is no longer a need for external chips to perform AI, or extra components for power dissipation. The new chips will be available in the second half of 2021, and it typically takes about 12 to 18 months for Ambarella’s customers (camera manufacturers) to produce final cameras. Therefore, the first cameras, based on these new SoCs, should hit the market sometime in the second half of 2022. Reference boards for camera manufacturers The software on these new SoCs is an evolution of our unified Linux SDK" As with Ambarella’s previous generations of edge AI vision SoCs for security, the company will make available reference boards to camera manufacturers soon, allowing them to develop their cameras based on the new CV5S and CV52S SoC families. “The software on these new SoCs is an evolution of our unified Linux SDK that is already available on our previous generations SoCs, which makes the transition easy for our customers,” said Jerome Gigot. Better crime detection Detecting criminals in a crowd, using face recognition and/or licence plate recognition, has been a daunting challenge for security, and one the new chips will help to address. “Actually, these applications are one of the main reasons why Ambarella is introducing these two new SoC families,” said Jerome Gigot. Typically, resolutions of 4K and higher have been a smaller portion of the security market, given that they came at a premium price tag for the high-end optics, image sensor and SoC. Also, the cost and extra bandwidth of storing and streaming 4K video were not always worth it for the benefit of just viewing video at higher resolution. 4K AI processing on-camera The advent of on-camera AI at 4K changes the paradigm. By enabling 4K AI processing on-camera, smaller objects at longer distances can now be detected and analysed without having to go to a server, and with much higher detail and accuracy compared to what can be done on a 2 MP or 5 MP cameras. This means that fewer false alarms will be generated, and each camera will now be able to cover a longer distance and wider area, offering more meaningful insights without necessarily having to stream and store that 4K video to a back-end server. “This is valuable, for example, for traffic cameras mounted on top of high poles, which need to be able to see very far out and identify cars and licence plates that are hundreds of meters away,” said Jerome Gigot. The advent of on-camera AI at 4K changes the paradigm Enhanced video analytics and wider coverage “Ambarella’s new CV5S and CV52S SoCs truly allow the industry to take advantage of higher resolution on-camera for better analytics and wider coverage, but without all the costs typically incurred by having to stream high-quality 4K video out 24/7 to a remote server for offline analytics,” said Jerome Gigot. He adds, “So, next-generation cameras will now be able to identify more criminals, faces and licence plates, at longer distances, for an overall lower cost and with faster response times by doing it all locally on-camera.” Deployment in retail applications Retail environments can be some of the toughest, as the cameras may be looking at hundreds of people at once Retail applications are another big selling point. Retail environments can be some of the toughest, as the cameras may be looking at hundreds of people at once (e.g., in a mall), to provide not only security features, but also other business analytics, such as foot traffic and occupancy maps that can be used later to improve product placement. The higher resolution and higher AI performance, enabled by the new Ambarella SoCs, provide a leap forward in addressing those scenarios. In a store setup, a ceiling-mounted camera with four 4K imagers can simultaneously look at the cashier line on one side of the store, sending alerts when a line is getting too long and a new cashier needs to be deployed, while at the same time looking at the entrance on the other side of the store, to count the people coming in and out. This leaves two additional 4K imagers for monitoring specific product aisles and generating real-time business analytics. Use in cashier-less stores Another retail application is a cashier-less store. Here, a CV5S or CV52S-based camera mounted on the ceiling will have enough resolution and AI performance to track goods, while the customer grabs them and puts them in their cart, as well as to automatically track which customer is purchasing which item. In a warehouse scenario, items and boxes moving across the floor could also be followed locally, on a single ceiling-mounted camera that covers a wide area of the warehouse. Additionally, these items and boxes could be tracked across the different imagers in a multi-headed camera setup, without the video having to be sent to a server to perform the tracking. Updating on-camera AI networks Another feature of Ambarella’s SoCs is that their on-camera AI networks can be updated on-the-fly, without having to stop the video recording and without losing any video frames. So, for example in the case of a search for a missing vehicle, the characteristics of that missing vehicle (make, model, colour, licence plate) can be sent to a cluster of cameras in the general area, where the vehicle is thought to be missing, and all those cameras can be automatically updated to run a live search on that specific vehicle. If any of the cameras gets a match, a remote operator can be notified and receive a picture, or even a live video feed of the scene. Efficient traffic management With the CV52S edge AI vision SoC, those decisions can be made locally at each intersection by the camera itself Relating to traffic congestion, most big cities have thousands of intersections that they need to monitor and manage. Trying to do this from one central location is costly and difficult, as there is so much video data to process and analyse, in order to make those traffic decisions (to control the traffic lights, reverse lanes, etc.). With the CV52S edge AI vision SoC, those decisions can be made locally at each intersection by the camera itself. The camera would then take actions autonomously (for example, adjust traffic-light timing) and only report a status update to the main traffic control centre. So now, instead of having one central location trying to manage 1,000 intersections, a city can have 1,000 smart AI cameras, each managing its own location and providing updates and metadata to a central server. Superior privacy Privacy is always a concern with video. In this case, doing AI on-camera is inherently more private than streaming the video to a server for analysis. Less data transmission means fewer points of entry for a hacker trying to access the video. On Ambarella’s CV5S and CV52S SoCs, the video can be analysed locally and then discarded, with just a signature or metadata of the face being used to find a match. No actual video needs to be stored or transmitted, which ensures total privacy. In addition, the chips contain a very secure hardware cyber security block, including OTP memory, Arm TrustZones, DRAM scrambling and I/O virtualisation. This makes it very difficult for a hacker to replace the firmware on the camera, providing another level of security and privacy at the system level. Privacy Masking Another privacy feature is the concept of privacy masking. This feature enables portions of the video (say a door or a window) to be blocked out, before being encoded in the video stream. The blocked portions of the scene are not present in the recorded video, thus providing a privacy option for cameras that are facing private areas. “With on-camera AI, each device becomes its own smart endpoint, and can be reconfigured at will to serve the specific physical security needs of its installation,” said Jerome Gigot, adding “The possibilities are endless, and our mission as an SoC maker is really to provide a powerful and easy-to-use platform, complete with computer-vision tools, that enable our customers and their partners to easily deploy their own AI software on-camera.” Physical security in parking lots With a CV5S or CV52S AI-enabled camera, the camera will be able to cover a much wider portion of the parking lot One example is physical security in a parking lot. A camera today might be used to just record part of the parking lot, so that an operator can go back and look at the video if a car were broken into or some other incident occurred. With a CV5S or CV52S AI-enabled camera, first of all, the camera will be able to cover a much wider portion of the parking lot. Additionally, it will be able to detect the licence plates of all the cars going in and out, to automatically bill the owners. If there is a special event, the camera can be reprogrammed to identify VIP vehicles and automatically redirect them to the VIP portion of the lot, while reporting to the entrance station or sign how many parking spots are available. It can even tell the cars approaching the lot where to go. Advantages of using edge AI vision SoCs Jerome Gigot said, “The possibilities are endless and they span across many verticals. The market is primed to embrace these new capabilities. Recent advances in edge AI vision SoCs have brought about a period of change in the physical security space. Companies that would have, historically, only provided security cameras, are now getting into adjacent verticals such as smart retail, smart cities and smart buildings.” He adds, “These changes are providing a great opportunity for all the camera makers and software providers to really differentiate themselves by providing full systems that offer a new level of insights and efficiencies to, not only the physical security manager, but now also the store owner and the building manager.” He adds, “All of these new applications are extremely healthy for the industry, as they are growing the available market for cameras, while also increasing their value and the economies of scale they can provide. Ambarella is looking forward to seeing all the innovative products that our customers will build with this new generation of SoCs.”