Edge computing is shaping up as the most practical way to manage the growing volume of data being generated by remote sources such as IoT and 5G devices.
A key benefit of edge computing is that it provides greater computation, network access, and storage capabilities closer to the source of the data, allowing organizations to reduce latency. As a result, enterprise are embracing the model: Gartner estimates that 50% of enterprise data will be generated at the edge by 2023, and PricewaterhouseCoopers predicts the global market for edge data centers will reach $13.5 billion in 2024, up from $4 billion in 2017.
Since edge computing’s basic mission is rapid data delivery, edge data centers require fast and reliable storage capabilities, but that’s not without challenges.
“Data-center storage is typically uniformly formed, but edge computing is unique for each application and requires a non-standard approach to the selection of file systems, data formats, and methods of data transfer,” says Pavel Savyhin, head of embedded-systems engineering for Klika Tech, an IoT and cloud-native product and solutions development company.
When compared to data stored in a traditional data center, most data arriving at the edge is either ephemeral or raw. This means a significant amount of edge data will only remain in storage temporarily before being replaced by fresh incoming data. “Edge data either needs to be passed … to the central data center or processed locally, with only the analytical summary being sent to the central data center and the original raw data then discarded,” says Tong Zhang, a professor at the Rensselaer Polytechnic Institute and chief scientist for ScaleFlux, a computational storage provider.
When various data types are involved, there may be a need for storage tiering at the edge. “This depends on the lifespan of the data at the edge,” Zhang says. Temporary data, such as raw sensor data with frequent writes, may require one level of tiering. Meanwhile, data with a longer anticipated lifespan – particularly types with a “read-often, overwrite occasionally” usage need, such as an AI model that’s only updated at predefined intervals – can live on a different storage tier, he says.
Each edge-computing application is unique, so planning storage tiers for them is different from a traditional data-center approach. But it’s not necessarily more difficult. “The edge computing application itself will dictate what data-storage tier is required,” Savyhin notes.
Capacity, power, connectivity
Edge data centers are generally small-scale facilities that have the same components as traditional data centers but are squeezed into a much smaller footprint.
In terms of capacity, determining edge storage requirements is similar to estimating the storage needs of a traditional data center, however workloads can be difficult to predict, says Jason Shepherd, a vice president at distributed edge-computing startup Zededa.
Edge-computing adopters also need to be aware of the cost of upgrading or expanding storage resources, which can be substantial given size and speed constraints. “This will change a bit over time as grid-based edge storage solutions evolve, because there will be more leeway to elastically scale storage in the field by pooling resources across discrete devices,” Shepherd predicts.
A more recent option for expanding edge-storage capacity independently from edge-compute capacity are computational storage-drive devices that feature transparent compression. They provide compute services within the storage system while not requiring any modifications to the existing storage I/O software stack or I/O interface protocols, such as NVMe or SATA.
“These drives can expand the effective storage capacity of edge devices in a very power-efficient manner without taxing the CPU and without adding any components to the systems,” says Zhang, who helped develop the technology for ScaleFlux.
When selecting memory and storage options for an edge data center, project leaders must carefully analyze power requirements for both consistency and the likely maximum wattage demands. When total system power is limited, memory and storage efficiency become crucial in that only limited resources are available to achieve performance goals, Zhang says. “If the power supply is inconsistent, then the memory and storage need to be able to handle surprise power loss without losing or corrupting data,” he says.
Continuous power availability is just as important in an edge data center as it is in a conventional data facility. “For use cases where power failure is a high probability, an electrically erasable, programmable, read-only memory (EEPROM) would be a top choice to store critical data, rather than a low-quality battery-powered device that may not be a reliable back-up,” Savyhin says.
On the downside, EEPROM has higher price points compared to flash.
Additional critical data storage options, according to Savyhin, include journaling file systems, embedded failsafe data formats, and flash wear leveling.
These capabilities can be important at the edge. A journaling file system keeps a log of changes made to the file system during disk writing, so that in the event of a power failure or system shutdown, the log can be used to repair corruptions that may have occurred. Embedded failsafe data formats are designed for accurately storing data that doesn’t need to be maintained for extended lengths of time, and wear leveling is a technique used to control how data is written to blocks on a flash device, with a goal of extending the life of solid-state storage.
Something else to consider when equipping an edge data center, particularly a facility dependent on cloud storage, is the need for redundant network access. “Outages do happen, and that means lost productivity without a backup path,” says Jeff Looman, vice president of engineering at cloud service provider FileShadow. The extra cost of a second line usually pales in comparison to the loss of service and employee time due to an outage, he says.
Device size matters at the edge
Edge data centers come in many different forms, most only a fraction of the size of a traditional installation and positioned in locations as varied as spare offices, storefronts, cell towers, and even lampposts.
To conserve space, many edge servers are tiny, using form factors that one wouldn’t ordinarily see in a traditional data center, Zhang says. Mini-servers, such as those in the Intel NUC line, can conserve edge data center space to an amazing degree, but such stripped-down devices will also limit available choices for storage and other components.
“Larger edge servers utilize more traditional 1U, 2U or even larger form factors, broadening storage options as well as graphics and networking components.”
Storage device physical sizes and capacities also vary widely. “For installations that use standard platforms, the physical size of storage will be the same as [for the] enterprise and cloud—standard 2.5-inch and 3.5-inch form factors for SSDs and HDDs, respectively,” says Gary Kotzur, storage business group CTO at chipmaker Marvell.
“For installations that have restricted space constraints, smaller SSD form factors, like M.2 and E1.S may be required, or even custom storage form factors specifically designed for a unique hardware platform,” he adds.
Physical size considerations are typically determined by the nature of the industry, data replacement cost, and data security requirements. “Highly sensitive data, especially in small volumes, can be stored on something as simple as a flash drive,” Looman says. “This type of data may require only a wall safe for storage.”
Retention requirements also play an important role in how data is stored. “Some industries must comply with data-retention regulations that specify data is to be ‘audit ready’ for several years, so the type of long-term physical storage plays a role in the consideration,” Looman says.
Off-the-shelf vs. vendor-supplied appliances
As when constructing any type of data center, enterprises face the choice between committing to a single vendor’s master plan or deploying off-the shelf products from multiple suppliers. The selection between off-the-shelf hardware – typically in the form of white box systems with user-supplied and configured system and support software – and vendor appliances depends on an organization’s goals and internal IT skills.
“Off-the-shelf hardware may provide more flexibility, but more investment in application compatibility testing,” Kotzur cautions. “Vendor appliances integrate a total solution, including software, but typically cost more for a given function.” The downside is that while vendor-suppled appliances can be attractive for their compatibility and simplicity, they also bring the risk of lock-in.
Looman advises IT leaders and their teams not to be intimidated by the challenge of following an off-the-shelf approach. “You will need to wade through some industry jargon in setting things up, but it is possible with a bit of homework,” he states. “You may also have to orchestrate communication between different vendors to ensure success.”
Copyright © 2021 IDG Communications, Inc.