Saturday 28 March 2020

Understanding Disk Performance and IOPS in cloud platform

In the past many years, apart from working as DBA, I was working with multiple cloud technologies and actively involved in the migration process to cloud infrastructure. In between, I managed to get certified in AWS as Associate Solution Architect.

While working with database operation in the cloud,  the most challenging part was to understand the Storage/disk performance. In this post, we will discuss the IOPS and disk performance in both AWS and Azure platforms.

Before jumping into that, let us understand the size of the IO, This depends on two factors. The Allocation unit size of the disk and application requesting the IO operation. If the application IO request size is more than the size of the Allocation Unit Size (Block Size), OS split the IO request into smaller IO operations. For example, if an application requesting to open 10MB file residing in a disk with an allocation block size of 64 KB, the request will get divided into 160 requests with a size of 64KB (10 MB=10240 kb/64kb =160 IO operation). If the block size of the disk is 8 KB, this requires 1280 IO operations. To identify the Allocation Unit Size of the existing disk, run the below command from the command prompt  From the result look for Bytes Per Cluster, which is the allocation unit size of the disk. This is the value you set while formatting the drive.

C:\>fsutil fsinfo ntfsinfo D:  (Where D stands for the drive letter of your drive)

Now let us understand the concepts of IOPS and throughput. Let us assume we have a disk that supports 3000 IOPS with an allocation unit size of 64KB.  The maximum throughput of the disk is 

Throughput of the disk = 3000 IOPS X 64KB (size of the IO) = 192000KB =187.5 MB . 

As the IOPS is IO per second, throughput is  187.5 MB/sec. Whether the disk can achieve this max throughput limits depends on the application requesting for the IO operation. If the application request comes with smaller than the size of the Allocation unit size, we can't make use of the max throughput of the disk. Let us take an example of the application requesting for IO operation with the size of 8KB, then max throughput will be :

Throughput of the disk = 3000 IOPS X 4KB (size of the IO Operation) = 24000KB =23.44 MB/sec 

Let us keep these three points in mind while discussing further the performance of the disks in the cloud environment. The performance of the disks is controlled by IOPS and throughput limits. These limits are there in both disk and instance (VM) level. 

Before going into more detail, let us look into the different types of disk available on both platforms.
AWS, provide two major types of disks :
  • General Purpose SSD (gp2)
  • Provisioned IOPS SSD (io1
Apart from those, AWS provides a couple of other disk types that are not relevant for our discussion. gp2 disk comes with predefined IOPS based on the size of the disk. Amazon offers 3 IOPS per GB. A single volume can have a maximum of  16000  IOPS (16 Kib IO) and 250 Mib/s throughput based on the disk size.  To make it clear both 5.333 TB  (5333 GB * 3 IOPS=~16000) and 10 TB (10240 GB*3 IOPS=30720 =16000(max limit per volume) provides 16000 IOPS and maximum throughput od 250 Mib/s . If the maximum throughput is 250 Mib/s with 16000 IOPS, what is the IO size 

IO size = 250MB=(256000 KB) / 16000 IOPS =16 KB.

That tells us that one IO operation can read /write a maximum of 16KB. It is very important to look into the size of the IO disk supports while reading the IOPS parameter of the disk. Let us take the same example of opening a 10MB file that resides on disk with the allocation unit size of 64KB. In this case, it requires a 640 IO operation. If the same files reside in a disk with the allocation unit size of 4KB, then it requires 1280 IO Operation. So the number of IO operation requires to complete one IO request is =

                                      Size of the IO request  (in KB)                                                                       
MIN(size of the IO request, size of the IO supported by the disk, Allocation unit size of the disk)

In this :
size of the IO request: decided by the application design
size of the IO supported by the disk: decided by the manufacture (provider) of the disk
Allocation unit size of the disk: decided the by the user who is configuring the disk in VM

All these three plays a vital role in the performance of the disk.

With the provisioned IOPS (io1) disk, the user can define the IOPS of the disk. The user needs to pay an extra amount for the IOPS of the ioI disk, whereas this is included in the cost of the disk for gp2 disks. Io1 disk supports a maximum of 64000(16Kib I/O) IOPS and 1000MiB/s throughput. The maximum throughput is nothing but 64000*16=1024000KB=1000MiB.

Apart from this, instance(ec2) IOPS and throughput throttling comes into the picture and plays a role in the performance of the disk. For example, 13.xlarge support maximum throughput of 106.25Mib/s and 6000 IOPS. If you attach a gp2 disk with the capacity of 5 TB to this instance, we will not able to make use of the maximum IOPS and throughput of the disk as throttling happen at the instance level. The disk can support 5TB=5120GBX3 IOPS =15360 IOPS and maximum throughput of 15360X16KB=240MiB/s but we will not get the full benefit of this due to throttling in the instance level. To get the maximum benefit of disk performance, we need to attach this disk to i3.4xlarge which supports a maximum throughput of 437.5MiB/s and 16000 IOPS.

I have tried my level best to explain the elements that affect the performance of the disks in cloud platform . Keep these points in mind while selecting the ec2 instances and disk for your workload. We will discuss further on the azure side in another blog post.