TY - GEN
T1 - Sketching volume capacities in deduplicated storage
AU - Harnik, Danny
AU - Hershcovitch, Moshik
AU - Shatsky, Yosef
AU - Epstein, Amir
AU - Kat, Ronen
PY - 2019/1/1
Y1 - 2019/1/1
N2 - The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the reclaimable capacity) and how much of the physical space should be attributed to each of the volumes in the system (the attributed capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data center level and obtaining insights on deduplication in the field.
AB - The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the reclaimable capacity) and how much of the physical space should be attributed to each of the volumes in the system (the attributed capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data center level and obtaining insights on deduplication in the field.
UR - http://www.scopus.com/inward/record.url?scp=85077078496&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85077078496
T3 - Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019
SP - 107
EP - 119
BT - Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019
PB - USENIX Association
T2 - 17th USENIX Conference on File and Storage Technologies, FAST 2019
Y2 - 25 February 2019 through 28 February 2019
ER -