1 post tagged with "x86"

View All Tags

ARM-backed Servers - Better Performance for Less Money

Arseny Yankovski

Arseny Yankovski

Lead Architect @ eMarketeer

This year Apple changed the game of the desktop CPUs with their announcement of the Apple Silicon. A similar thing happened a year ago in the world of cloud computing. AWS released a new type of instance backed by their custom-built ARM processors called AWS Graviton2. They're supposed to have up to 40% better price-performance than their x86 counterparts. Another huge recent update is the introduction of Graviton2-based Amazon RDS instances. Let's run a couple of benchmarks and load-test a real-world backend application to see how good ARM servers are and how easy they are to use.

Performance#

I compared a t4g.small (ARM) instance to a t3.small (x86) EC2 instance. Currently, the on-demand hourly cost in the us-east-1 region for t3.small (x86) is $0.0208 and t4g.small (ARM) is $0.0168. The ARM-backed instance is already around 20% cheaper.

First, I ran a load-test on a fresh recap.dev setup with wrk.

It's a docker-compose template running 4 processes. A handler process puts every request into a RabbitMQ. A separate background process inserts traces in batches of 1000 into a PostgreSQL database.

A typical recap.dev setup consists of 4 processes

I ran wrk on a t3.2xlarge instance in the same region using the following command:

wrk -t24 -c1000 -d300s -s ./post.lua <hostname>

It bombarded the target instance with trace requests for 5 minutes using 24 threads and 1000 HTTP connections.

This is the result I got for t4g.small (ARM) instance:

24 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 473.53ms 53.06ms 1.96s 81.33%
Req/Sec 115.83 96.65 494.00 71.32%
620751 requests in 5.00m, 85.84MB read
Socket errors: connect 0, read 0, write 0, timeout 225
Requests/sec: 2068.48
Transfer/sec: 292.90KB

For the t3.small (x86) instance:

24 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 600.28ms 70.23ms 2.00s 72.53%
Req/Sec 92.77 82.25 404.00 70.26%
488218 requests in 5.00m, 67.51MB read
Socket errors: connect 0, read 0, write 0, timeout 348
Requests/sec: 1626.87
Transfer/sec: 230.37KB

ARM-backed instance served 27% more requests per second 26% faster (on average).

ARM-backed instance served 27% more requests per second

Then I ran a couple of benchmarks from the Phoronix Test Suite.

pts/compress-7zip-1.7.1 gave 6833 MIPS on t4g.small (ARM) versus 5029 MIPS on t3.small (x86). A 35% higher result on an ARM processor.

ARM-backed instance got a 35% better result in pts/compress-7zip benchmark

ARM-backed server finished the pts/c-ray benchmark more than 2 times faster on average. 958 seconds for x86 versus just 458 for ARM.

The ARM-backed instance was more than 2 times faster in pts/c-ray benchmark

I also ran a bunch of RAM speed tests from pts/ramspeed that measure memory throughput on different operations.

Benchmark Typet4g.small (ARM)t3.small (x86)
Add/Integer50000 MB/s13008 MB/s
Copy/Integer58650 MB/s11772 MB/s
Scale/Integer31753 MB/s11989 MB/s
Triad/Integer36869 MB/s12818 MB/s
Average/Integer44280 MB/s12314 MB/s
Add/Floating Point49775 MB/s12750 MB/s
Copy/Floating Point58749 MB/s11694 MB/s
Scale/Floating Point58721 MB/s11765 MB/s
Triad/Floating Point49667 MB/s12809 MB/s
Average/Floating Point54716 MB/s12260 MB/s

RAM on the Graviton2 instance was 3 to 5 times faster than on its x86 counterpart

In short, the memory on the t4g.small equipped with a Graviton2 processor was 3 to 5 times faster.

Just looking at the performance and the instance price the conclusion is that the switch to the ARM-based instances is a no-brainer. You get more power for less money.

Compatibility#

The big question when switching processor architectures is compatibility.

I found that a lot of things were already recompiled for the ARM processors. Mainly, Docker was available as .rpm and .deb and so were most of the images (yes, they need to be built for different architectures). Docker-compose, however, wasn't. Which was a huge bummer for me. I had to jump through some hoops building several dependencies from source code to make it work. The situation will hopefully improve in the future as the ARM adoption on the servers grows, but right now you might pay more in working hours than you save by migrating.

The RDS (AWS managed RDBMS service) on Graviton2 is where I think the real win-win is. You don't have to do any setup and get all the benefits of an ARM processor on your server.

We also made sure recap.dev is easy to run on ARM processors and introduced multi-arch docker images and made pre-built ARM AMIs available on AWS.