As the new generation of vehicles evolves, so do their features and connected capabilities, which inevitably leads to data overload, and a highly-burdened network with extreme computing complexity in the on-board system. There is a need for higher capacity, high performance storage, optimized for the myriad demands of the automotive environment.
On the road today there are an estimated 21 million connected vehicles, gathering endless terabytes of data. Gartner predicts that by 2020 the average connected vehicle will generate over 280 petabytes annually, with at least 4 TB of data being generated per day. That data comes from the on-board hardware, which includes cameras (generating 20-60MB/s), sonar (10-100 kB/s) radar (10kB/s), LIDAR systems (10-70MB/s) and GPS (50 kB/s). That is a lot of data to be stored, transferred and secured across many different end-points over various delivery networks.
When it comes to data inside vehicles, we need to think differently. Autonomous vehicles are driven by data intelligence, both figuratively and literally. More than ever, the growing volume of data produced by new cars and during the development of new innovations is driving the need for automotive companies to improve the capture and management of their data. Overall, the focus on data inside connected vehicles is geared towards retaining the right data and enabling efficient ingest of that data into AI, analytics, development workflows and other applications. The characteristics and demands of these environments often depend on the type of car that is being driven. Over the coming years we will see various types of connected vehicles, with various data storage requirements.
New storage architectures for autonomous vehicles
Development fleet vehicles are going to have their own storage architecture to manage their data, while a consumer mid-range vehicle will focus heavily on infotainment systems (streaming of video and audio content) that requires an altogether different type of architecture. Automakers will have a host of considerations for how they architect the systems that receive and retain in-vehicle captured data. Primary among them is how to create an infrastructure that enables a smooth transition of data from all stages of its life from initial in-vehicle capture all the way through long term retention of data that will be critical for future development, reference, compliance or litigation. This infrastructure likely needs to provide active access and search to all data, extreme durability including DR and air gap protection as well as tools to ensure GDPR compliance or depending on the region, compliant with the local data protection laws.
In all cases, one of the most proven approaches includes a high-performance front-end, based on SSD or disk, and low-cost storage for massive, scalable retention based on tape or object storage. Public cloud infrastructures are also useful for cloud burst analysis when demand for processing resources spike beyond the level of what is typically required or available on premise. For massive archives beyond a few petabytes, tape is the media of choice. The larger the scale, the more compelling the economics are for tape, particularly as today’s tape solutions are front ended with high performance cache and virtually all the handling is automated and can be managed remotely. The key in infrastructure design is how easily these storage and processing resources can be made available to data users for their specific workflows during any phase of the data’s life.
We recently worked with an award-winning developer of next-generation electronics for active safety systems and autonomous vehicle technology that needed a solution for a workflow bottleneck caused by an explosion of data and a more complex workflow. We recommended a combination of high-performance disk with tape archive under a single point of management. Picture this: an automaker’s research data from their recent AI vehicle testing is ingested into high-performance disk arrays. The engineering teams’ HPC systems access the data directly using high-speed Fibre Channel storage area network (SAN) connections. As soon as data lands on disk, automatically the solution makes two copies on a tape archive. The copies serve as a backup initially, and later one becomes part of an active archive. When the data on high performance storage is no longer actively being worked on, the disk space is reclaimed, but one of the tape copies remains in the archive where it can be accessed. The second copy of data on tape has two uses. It provides an off-site insurance copy for disaster recovery (DR) protection and ransomware, but it has also become a very effective way for design and engineering teams in different parts of the world to share data.
The future of data
Without a doubt, the future of the automotive industry is heading full speed towards artificial intelligence. As the technology required to store data from sensors, connectivity, mapping and security, is rapidly expanding, vehicles are becoming increasingly more data dependent. Automakers are aware of the importance of data and specifically, the volume of data needed to deliver on the full intended user experience. That is being followed by a complex web of platforms and algorithms meant to work in unison to unify the principle of a truly autonomous system and change consumers’ relationship with vehicles for ever.
The big question facing automakers in the future is: which storage approach are they going to use to manage their vehicle’s data? This question becomes even more interesting as the 5G age is fast approaching. First, storage solutions have already proven that they can address the requirement of an end-to-end workflow. The onus will be on flexible storage solutions that can scale performance and capacity independently, and allow secondary storage to grow while keeping the primary active storage work area as small as possible to control costs. Automakers are best off to follow tiered storage principles. This looks at moving data to the most optimal storage option according to the different access requirements. Each tier is tuned for specific cost, and performance based on the use cases of each vehicle and the amount of sensors and data retention requirements within that specific vehicle. With this approach data can be accessed, processed and managed at the needed performance and at the lowest cost to achieve the desired results.
By Mark Pastor
Mark Pastor is product marketing director for archive products at Quantum. Mark represents Quantum within the Active Archive Alliance, the LTO Consortium and the Object Storage Alliance. He regularly blogs on topics relating to data protection and archival