Blog |

BLOG: ✨ StarWind SAN & NAS over Fibre Channel: MDRAID vs Graid Technology

StarWind |  Graid Partnership - " Graid shoes outstanding results with high data values: it was capable of receiving the seemingly impossible number of 10,8M IOPS." - StarWind

(StarWind Software, Volodymyr Khrystenko, July 20, 2022)

Introduction

The new build of StarWind SAN & NAS has just gotten released and it will bring us the long-awaited FC (Fibre Channel) ecosystem support. StarWind SAN & NAS has been designed to give a new life to your existing hardware. Installed either on top of the hypervisor of your choice or bare-metal, it turns your server into a fast shared storage pool that can be accessed over iSCSI, SMB, or NFS. And in the new build, via FC as well. It uses the same SDS engine as StarWind vSAN which means high performance and also adds new features such as ZFS support to build the utmost resilient storage system using the commodity hardware.

This was a great chance to test how fast StarWind SAN & NAS can go using FC. Folks from StorageReview were kind to provide us with the testing infrastructure where we performed the benchmark. Thanks, StorageReview team once again!

Testing scope

We have tested the performance of shared storage presented from a dedicated storage node full of NVMe drives and StarWind SAN & NAS on top over FC to client nodes. We have decided to include only good old FCP (FCP – Fibre Channel Protocol) benchmark results in this article since the results of NVMe-FC were at the same level (on certain patterns even lower than FCP). To collect NVMe drives into a redundant storage array, we have used MDRAID and Graid tools and tested them separately. MDRAID is a Linux software RAID that is present as part of StarWind SAN & NAS and serves to collect drives into a redundant array. Graid is an extremely fast NVMe/NVMeoF RAID card, designed to deliver the full potential of PCIe Gen4 systems.

It is worth mentioning that Graid SupremeRAID is the only NVMe RAID card as of now capable of delivering the highest SSD performance possible that removes performance bottlenecks altogether. What is the difference, you may wonder? Well, Graid SupremeRAID SR-1010 is based on an NVIDIA A2000 GPU. In most characteristics, that doesn’t make this solution anything special, but when it comes to the NVMe RAID bottlenecks, the GPU can give a head start to lots of alternatives. In particular, the SupremeRAID is capable of processing all the I/O operations directly, and we don’t need to tell you just how much this frees up the CPU resources. Standard RAID cards are simply no match for the computing potential of the GPU card. Even though the Graid solution is a software RAID, the NVIDIA GPU card is essential to a lot of benefits that Graid has to offer. Additionally, thanks to the specifics of Graid software architecture, data can flow directly from the CPU and straight to the storage, passing by the Graid card.

Traditionally, NVIDIA cards serve various purposes. They are in demand for use in gaming, video acceleration, cryptocurrency mining, and professional working tools such as VDI. Moreover, NVIDIA also produces GPUs for vehicles. Now? NVIDIA hardware powers storage appliances. This novelty embarks nothing less but a breakthrough in utilizing the computing potential of the GPU in a whole new field.

Testing bed

Here is the list of the hardware and software used in our benchmark.

Storage node:

Hardware
sw-sannas-01 Dell PowerEdge R750
CPU Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Sockets 2
Cores/Threads 80/160
RAM 1024Gb
Storage 8x (NVMe) – PBlaze6 6926 12.8TB
GRAID SupremeRAID SR-1010
HBAs 4x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters
Software
StarWind SAN&NAS
Version 1.0.2 (build 2175 – FC)

Client nodes:

Hardware
win-cli-{01..04} PowerEdge R740xd
CPU Intel® Xeon® Gold 6130 Processor 2.10GHz
Sockets 2
Cores/Threads 32/64
RAM 256Gb
HBAs 1x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters
Software
OS Windows Server 2019 Standard Edition

Testbed architecture overview:

StarWind Testbed Architecture

The communication between storage nodes and client nodes has been carried out over 32GFC Fibre Channel fabric. The storage node had 4x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters while each client node had one. The storage and client nodes were connected using two Brocade G620 Fibre Channel Switches to ensure resilience.

The interesting thing behind Marvell Qlogic 2772 Fibre Channel adapters is that the ports on it are independently resourced which gives an additional layer of resilience.  The complete port-level isolation across the FC controller architecture prevents errors and firmware crashes from propagating across all ports. Find out more about Marvell Qlogic 2772 Fibre Channel adapters in terms of high availability and reliability.

Marvell QLogic Architecture vs Other Company's Architecture comparison diagram
Marvell QLogic ports act independently from each other giving more flexibility in terms of resilience. More details are here

Storage connection diagram:

StarWind Storage Connection Diagram

We have collected 8 NVMe drives on the storage node in the RAID5 array:

We have collected 8 NVMe drives on the storage node in RAID5 array:

First, using MDRAID:

First, using MDRAID:

And then, with Graid correspondingly:

And then, with  Graid correspondingly:

Once the RAID arrays were ready, we sliced them into 32 LUNs, 1TB each. These were distributed by 8 LUNs per client node. This was done since 1 LUN has a performance limitation and we wanted to squeeze the max out of our storage.

RAID arrays sliced

This is the example of 8 LUNs connected on one client node:

Example image of the Windows Device Manager with StarWind hardware installed

Testing Methodology

The benchmark was held using the fio utility. fio is a cross-platform, industry-standard benchmark tool used to test local storage as well as shared.

Testing patterns:

  • 4k random read;
  • 4k random write;
  • 4k random read/write 70/30;
  • 64K random read;
  • 64K random write;
  • 1M sequential read;
  • 1M sequential write.

Test duration:

  • Single test duration = 600 seconds;
  • Before starting the write benchmark, storage has been first warmed ups for 2 hours;

Testing stages

  1. Testing single NVMe drive performance to get reference numbers;
  2. Testing MDRAID and Graid RAID5 array performance locally;
  3. Running benchmark remotely from client nodes.

1. Testing single NVMe drive performance:

A table comparing specifications of the PBlaze6 6920, D6290, and D6926 series
The values for NVMe drive speed according to the vendor

Talking about these NVMe SSDs, an interesting thing is that they support 10W~35W flexible power management and 25W power mode by default. Basically, Memblaze’s NVMe drives to increase performance on sequential writes by consuming more power, which gives a flexible way to tune drive performance as per specific workload.

We have received the optimal (speed/latency) performance patterns of a single NVMe drive under the following number of jobs and IO depth values:

1 NVMe PBlaze6 D6926 Series 12.8TB
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 16 32 1514000 5914 0,337
4k random write 4 4 537000 2096 0,029
64k random read 4 8 103000 6467 0,308
64k random write 2 2 42000 2626 0,094
1M read 1 4 6576 6576 0,607
1M write 1 2 5393 5393 0,370

Before running the actual tests, we have determined the time needed to warm up the NVMe drives to Steady State:

A graph of the PBlaze D6926 12.8 TB 4k random write performance (IOPs/time)

P.S. More information about Performance States

From the graph, it was visible that the NVMe drives should be warmed up for around 2 hours.

2. Testing MD and Graid RAID arrays performance locally:

Fewer words, more numbers. Heading to MDRAID and Graid local performance tests.

4k random read:

Table result:

MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random read 16 16 2670000 10430 0,095 7% 1281000 5004 0,198 3% 48% 48% 208% 46%
4k random read 16 32 3591000 14027 0,141 10% 2451000 9574 0,207 6% 68% 68% 147% 60%
4k random read 32 32 4049000 15816 0,250 20% 4474000 17477 0,227 10% 110% 110% 91% 50%
4k random read 32 64 4032000 15750 0,504 30% 7393000 28879 0,275 16% 183% 183% 55% 53%
4k random read 64 64 4061000 15863 0,998 40% 10800000 42188 0,377 25% 266% 266% 38% 63%

Graphs:

A graph of 4k random read - local IOPs (IOPs over Numjobs/IOdepth)
A graph of 4k random read - local CPU Usage (percentage) (CPU usage (percentage) over Numjobs/IOdepth)

4k random write:

Table result:

MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random read 16 16 2670000 10430 0,095 7% 1281000 5004 0,198 3% 48% 48% 208% 46%
4k random read 16 32 3591000 14027 0,141 10% 2451000 9574 0,207 6% 68% 68% 147% 60%
4k random read 32 32 4049000 15816 0,250 20% 4474000 17477 0,227 10% 110% 110% 91% 50%
4k random read 32 64 4032000 15750 0,504 30% 7393000 28879 0,275 16% 183% 183% 55% 53%
4k random read 64 64 4061000 15863 0,998 40% 10800000 42188 0,377 25% 266% 266% 38% 63%

– in order to get maximum performance of 1.5M IOPs with Graid SR-1010, you need PCIe Gen4x16. Our server however had only Gen4x8 PCIe slots.

Graphs:

A graph of 4k random write - Local latency (ms) (Latency (ms) over Numjobs/IOdepth)
A graph of 4k random write - local IOPs (IOPs over Numjobs/IOdepth)
A graph of 4k random read - local CPU usage (percentage) (CPU usage (percentage) over Numjobs/IOdepth)

4k random read/write 70/30:

Table result:

MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random read/write 70/30 8 16 765000 2988 0,202 5% 429000 1676 0,344 1% 56% 56% 170% 31%
4k random read/write 70/30 16 16 1078000 4211 0,285 14% 776000 3031 0,382 2% 72% 72% 134% 14%
4k random read/write 70/30 16 32 1100000 4297 0,518 17% 1253000 4895 0,470 3% 114% 114% 91% 18%
4k random read/write 70/30 32 32 1147000 4480 0,960 30% 1944000 7594 0,608 5% 169% 169% 63% 15%
4k random read/write 70/30 32 64 1154000 4508 1,847 30% 2686000 10492 0,882 6% 233% 233% 48% 20%
4k random read/write 70/30 64 64 1193000 4660 5,298 49% 3140000 12266 1,529 8% 263% 263% 29% 15%

Graphs:

4k random read/write 70/30 - local IOPS (IOPs over Numjobs/IOdepth)
4k random read/write 70/30 - local Latency (ms) (Latency (ms) over Numjobs/IOdepth)
4k random read/write 70/30 - local CPU usage (%) (CPU usage (%) over Numjobs/IOdepth)

64k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
64k random read 8 8 186000 11625 0,343 5% 175000 10938 0,364 1% 94% 94% 106% 16%
64k random read 8 16 188000 11750 0,679 5% 292000 18250 0,438 2% 155% 155% 65% 30%
64k random read 16 16 196000 12250 1,309 10% 461000 28813 0,554 2% 235% 235% 42% 20%
64k random read 16 32 195000 12188 2,624 10% 646000 40375 0,792 3% 331% 331% 30% 30%
64k random read 32 32 195000 12188 5,242 20% 740000 46250 1,382 3% 379% 379% 26% 15%

Graphs:

64k random read - MiB\s (MiB\s over Numjobs/IOdepth)
64k random read - Local Latency (ms) (Local Latency (ms) over Numjobs/IOdepth)
64k random read - local CPU usage (%) (CPU usage (%) over Numjobs/IOdepth)

64k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
64k random write 8 8 92200 5763 0,693 7% 67400 4213 0,948 1% 73% 73% 137% 10%
64k random write 8 16 118000 7375 1,081 14% 104000 6500 1,229 1% 88% 88% 114% 10%
64k random write 16 16 117000 7313 2,179 16% 135000 8438 1,895 2% 115% 115% 87% 11%
64k random write 16 32 117000 7313 4,369 16% 146000 9125 3,496 2% 125% 125% 80% 13%

Graphs:

64k random write - MiB\s (MiB\s over Numjobs/IOdepth)
64k random write - local Latency (ms) (Latency (ms) over Numjobs/IOdepth)
64k random write - local CPU usage (%) (CPU usage (%) over Numjobs/IOdepth)

1M read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
1M read 4 4 10000 10000 1,592 3% 18200 18200 0,880 0% 182% 182% 55% 12%
1M read 8 4 11000 11000 2,673 5% 28600 28600 1,120 1% 260% 260% 42% 10%
1M read 8 8 11900 11900 5,393 5% 39400 39400 1,623 1% 331% 331% 30% 10%
1M read 8 16 12100 12100 10,563 5% 44700 44700 2,865 1% 369% 369% 27% 12%
1M read 16 16 12100 12100 21,156 10% 47000 47000 5,442 1% 388% 388% 26% 6%

Graphs:

1M read - local MiB\s (MiB\s over Numjobs/IOdepth)
1M read - local Latency (ms) (Latency (ms) over Numjobs/IOdepth)
1M read - local CPU usage (%) (CPU usage (%) over Numjobs/IOdepth)

1M write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
1M write 4 4 6938 6938 2,300 9% 5363 5363 2,981 1% 77% 77% 130% 9%
1M write 8 4 6730 6730 4,753 11% 8251 8251 3,876 1% 123% 123% 82% 12%
1M write 8 8 6782 6782 9,434 12% 10100 10100 6,312 2% 149% 149% 67% 17%
1M write 8 16 6780 6780 18,870 12% 11100 11100 11,530 2% 164% 164% 61% 17%
1M write 16 16 7071 7071 36,182 17% 11400 11400 22,490 3% 161% 161% 62% 15%

Graphs:

1M write - local MiB\s (MiB\s over Numjobs/IOdepth)
1M write - local Latency (ms) (Latency (ms) over Numjobs/IOdepth)
1M write - local CPU usage (%) (CPU usage (%) over Numjobs/IOdepth)

Results:

MDRAID shows decent performance on low Numjobs and IOdepth values but as the workload increases, so does the latency and performance stops growing. On the other hand, Graid gives better results with high Numjobs and IOdepth values: on a 4k random read pattern, we have received the incredible 10,8M IOPs with a latency of just 0,377 ms. That is basically the speed of 7 NVMe drives out of 8.  On large block reads 64k/1M, Graid reaches the throughput of 40/47GiB/s., while MDRAID reached the ceiling with 12GiB/s.

3. Running benchmark remotely from client nodes:

Once we have received such impressive local storage results, we were fully ready to give FCP a try and see if it can deliver comparable performance on the client nodes.
In the results below, the Numjobs parameter is stated for all 32 LUNs.

4k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 16 16 1664285 6501 0,132 1067226 4169 0,230 64% 64% 174%
4k random read 32 16 3184359 12439 0,141 2104438 8221 0,233 66% 66% 165%
4k random read 64 16 3531393 13795 0,274 3687970 14406 0,264 104% 104% 96%
4k random read 128 16 3544646 13847 0,563 4563635 17827 0,430 129% 129% 76%
4k random read 16 32 1783060 6965 0,199 1772981 6926 0,261 99% 99% 131%
4k random read 32 32 3500411 13674 0,253 3475477 13576 0,268 99% 99% 106%
4k random read 64 32 3532084 13797 0,563 4459783 17421 0,436 126% 126% 77%
4k random read 128 32 3549901 13867 1,139 4578663 17886 0,873 129% 129% 77%

Graphs:

4k random read - FCP IOPs (IOPs over Numjobs/IOdepth)
4k random read - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

4k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random write 16 16 204612 799 1,241 304228 1188 0,833 149% 149% 67%
4k random write 32 16 238109 930 2,143 513988 2008 0,988 216% 216% 46%
4k random write 64 16 271069 1059 3,769 514719 2011 1,980 190% 190% 53%
4k random write 128 16 331108 1294 6,176 511970 2000 3,991 155% 155% 65%
4k random write 16 32 247398 966 2,059 307504 1201 1,657 124% 124% 80%
4k random write 32 32 285527 1115 3,578 512118 2001 1,992 179% 179% 56%
4k random write 64 32 341017 1332 5,996 491534 1920 4,157 144% 144% 69%
4k random write 128 32 385361 1506 10,617 498065 1946 8,212 129% 129% 77%

Graphs:

4k random write - FCP IOPs (IOPs over Numjobs/IOdepth)
4k random write - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

4k random read/write 70/30:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read/write 70/30 16 16 538622 2104 0,683 646787 2 527 0,470 120% 120% 69%
4k random read/write 70/30 32 16 670407 2619 1,136 1109071 4 332 0,554 165% 165% 49%
4k random read/write 70/30 64 16 805986 3149 1,955 1072219 4 188 1,370 133% 133% 70%
4k random read/write 70/30 128 16 927080 3622 3,493 1089414 4 256 2,912 118% 118% 83%
4k random read/write 70/30 16 32 700225 2735 1,065 644987 2 520 1,133 92% 92% 106%
4k random read/write 70/30 32 32 817516 3194 1,928 1103024 4 309 1,329 135% 135% 69%
4k random read/write 70/30 64 32 933090 3645 3,471 1098277 4 290 2,888 118% 118% 83%
4k random read/write 70/30 128 32 997943 3899 6,616 1061938 4 149 6,202 106% 106% 94%

Graphs:

4k random read/write 70/30 - FCP IOPs (IOPs over Numjobs/IOdepth)
4k random read/write 70/30 - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

64k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
random read 64K 8 8 192015 12001 0,326 149755 9360 0,420 78% 78% 129%
random read 64K 8 16 193967 12123 0,652 260821 16302 0,483 134% 134% 74%
random read 64K 8 32 194089 12131 1,311 397736 24859* 0,634 205% 205% 48%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Graphs:

64k random read - FCP MiB\s (MiB\s over Numjobs/IOdepth)
64k random read - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

64k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
random write 64K 8 8 37343 2334 1,705 61839 3865 1,027 166% 166% 60%
random write 64K 8 16 51048 3191 2,497 100093 6256 1,269 196% 196% 51%
random write 64K 16 16 65517 4095 3,895 132669 8292 1,915 202% 202% 49%
random write 64K 16 32 85255 5330 5,992 138609 8664 3,677 163% 163% 61%

Graphs:

64k random write - FCP MiB\s (MiB\s over Numjobs/IOdepth)
64k random write - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

1M read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
1M read 4 2 9690 9690 0,803 8542 8542 0,915 88% 88% 114%
1M read 4 4 10495 10495 1,503 14799 14799 1,059 141% 141% 70%
1M read 4 8 11018 11018 2,874 19841 19841 1,584 180% 180% 55%
1M read 4 16 11713 11713 5,442 25150 25150* 2,520 215% 215% 46%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Graphs:

1M read - FCP MiB\s (MiB\s over Numjobs/IOdepth)
1M read - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

1M write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
1M write 4 2 6028 6028 1,284 2991 2991 2,633 50% 50% 205%
1M write 4 4 7222 7222 2,167 4497 4497 3,509 62% 62% 162%
1M write 4 8 6992 6992 4,521 6748 6748 4,684 96% 97% 104%
1M write 4 16 6819 6819 9,310 8902 8902 7,125 131% 131% 77%
1M write 8 16 7144 7144 17,832 10493 10493 12,117 147% 147% 68%

Graphs:

1M write - FCP MiB\s (MiB\s over Numjobs/IOdepth)
1M write - FCP Latency (ms) (Latency (ms) over Numjobs/IOdepth)

Comparing local and remote performance results:

In the tables below, we have provided the best results achieved from each test as to performance/latency ratio. The full performance benchmark results are provided above.

MDRAID:

  MDRAID5 – local MDRAID5 – FCP Comparison
Pattern IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 4049000 15816 0,250 3531393 13795 0,274 87% 87% 110%
4k random write 478000 1867 0,535 341017 1332 5,996 71% 71% 1121%
4k random read/write 70/30 1078000 4211 0,285 927080 3622 3,493 86% 86% 1226%
64k random read 186000 11625 0,343 192015 12001 0,326 103% 103% 95%
64k random write 118000 7375 1,081 85255 5330 5,992 72% 72% 554%
1M read 11900 11900 5,393 11709 11709 5,442 98% 98% 101%
1M write 6938 6938 2,300 7221 7221 2,167 104% 104% 94%

GRAID:

  GRAID5 – local GRAID5 – FCP Comparison
Pattern IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 10800000 42188 0,377 4563635 17827 0,430 42% 42% 114%
4k random write 975000 3809 2,100 514719 2011 1,980 53% 53% 94%
4k random read/write 70/30 3140000 12266 1,529 1109071 4332 0,554 35% 35% 36%
64k random read 740000 46250 1,382 397736 24859 0,634 54% 54% 46%
64k random write 135000 8438 1,895 132669 8292 1,915 98% 98% 101%
1M read 47000 47000 5,442 25150 25150* 2,520 54% 54% 46%
1M write 11100 11100 11,530 10493 10493 12,117 95% 95% 105%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Conclusions

Essentially, the most impressive shared storage performance was presented by a redundant Graid storage array full of PBlaze6 6920 Series NVMe SSDs with StarWind SAN & NAS on top and running over Fibre Channel to client nodes, using Marvell Qlogic 2772 Fibre Channel adapters.  Graid is the only technology to guarantee probably the highest performance software-defined shared storage can get as of now. The Graid build has managed to receive around 50% of the local RAID array performance with the approximately same latency as with the local storage. The only reason the results on 64k/1M large block reads were different is the natural technical limitations of achieving near or at maximum bandwidth speeds for a 32G Fibre Channel environment.

Locally, Graid shows outstanding results with high data values: it was capable of receiving the seemingly impossible number of 10,8M IOPs with the latency of just 0,377 ms on a 4k random read pattern. Also, since Graid offloads IO requests processing to GPU, the CPU usage on the storage node is 2-10 times lower than that of MDRAID which allows using free CPU resources for other tasks. With MDRAID, we have managed to practically achieve the full performance that the RAID array could provide locally but at a cost of significantly higher latency.

If you want to unleash the full Graid performance potential, we would advise looking into NVMe-oF and RDMA which will be added in the subsequent StarWind SAN & NAS new builds.  You can find more about the NVMe-oF and StarWind NVMe-oF initiator performance in one of the following articles.


About GRAID

Graid Technology™️ is headquartered in Silicon Valley, California, with an office in Ontario, California, and an R&D center in Taipei, Taiwan. Named one of the Ten Hottest Data Storage Startups of 2021 by CRN, Graid SupremeRAID performance is breaking world records as the first NVMe and NVMeoF RAID card to unlock the full potential of your SSD performance: a single SupremeRAID card delivers 19 million IOPS and 110GB/s of throughput. For more information on Graid Technology, visit  Graid Technology or connect with us on Twitter or LinkedIn.

Additional Resources

Related Posts

News | Oct 13, 2022

Blocks & Files Compares Graid Technology’s Killer RAID Against The Competition

(BlocksandFiles.com, Chris Mellor, October 12, 2022) “The top line is SupremeRAID™ is 25 times faster than the Broadcom hardware.” Graid…

Read More
News | Apr 12, 2022

NEWS: Graid Technology Named Startup Challenge Winner by AFCOM Data Center World

Graid SupremeRAID™ cutting-edge NVMe RAID continues to disrupt the global storage industry as winner of two categories: “Most Disruptive Technology”…

Read More
News | Sep 7, 2022

SupremeRAID™ by Graid Technology Featured In COTS: Journal of Military Electronics & Computing

At FMS 2022, SupremeRAID™ by Graid Technology was named Most Innovative Flash Memory Startup, Best of Show for sophisticated software-composed…

Read More