Discussion:
Rather poor (36x) performance in AES-GCM mode?
Sid Shetye
2013-01-29 03:44:32 UTC
Permalink
Hi guys,



I was seeing some very poor performance in AES-GCM mode. I tried profiling
the code but want into a wall (posted on stackoverflow at
http://stackoverflow.com/questions/14573972/code-profiling-to-improve-perfor
mance-see-cpu-cycles-inside-mscorlib-dll)



In particular, it seems that AES-GCM should be faster than AES-CBC; at least
based on literature:



"The Galois/Counter Mode of Operation (GCM)" paper by David A. McGrew, I
read "Multiplication in a binary field can use a variety of time-memory
tradeoffs. It can be implemented with no key-dependent memory, in which case
it will generally run several times slower than AES. Implementations that
are willing to sacrifice modest amounts of memory can easily realize speeds
greater than that of AES."



To test that theory, I tried the same on OpenSSL. I found AES-GCM is about
2x the speed of AES-CBC. What I did was measure (100 times=> average it)

* Time to only in invoking OpenSSL (=15ms)

* Time to OpenSSL AES-GCM encrypt a 10MB file (=32ms = > just
encrypt = 17ms)

* Time to OpenSSL AES-CBC encrypt a 10MB file (=45ms => just encrypt
= 30ms)

The code itself is rough but tiny - listed at
https://gist.github.com/4661615.



This experiment seems to corroborate the theory in that paper. However in
BouncyCastle (C#, 1.7) I'm finding the opposite. AES-GCM is about 18x slower
compared to AES-CBC mode! Altogether, the difference is about 2x fast in
OpenSSL + 18x slow in BouncyCastle => 36x slow from it's potential!



Question: Does anyone know why that is the case? Is anyone interested in
poking deeper? I think there might be a lot of room for improvement.
Unfortunately, I don't think I'm qualified to implement any such improvement
(not without substantial investment of time!)



Thanks

Sid
Peter Dettman
2013-01-29 11:35:49 UTC
Permalink
Hi Sid,
I've made an initial comment on your stackoverflow post, and perhaps it
would be best to just continue the discussion there, but a few comments
here for those interested.

It appears the 1.7 release was used, which would be expected to give
very poor results for many small encryptions with the same key. This is
almost certainly due to the precomputation that is done (the
key-dependent memory McGrew is referring to). Recent changes ensure the
precomputation is not repeated unless the key actually changed, but it
was previously done for every Init call.

The default precomputation is Tables8KGcmMultiplier, which is a fairly
good default choice. The Tables1K- or Tables64K- alternatives give
different tradeoffs, and might be considered for specific use cases. One
of the GcmBlockCipher constructors allows this to be configured.

The stackoverflow results use very small texts. It's not clear if you
made an equivalent test (to the OpenSSL one below) for BC with 10MB
file. I would be very surprised if BC is 18x slower than OpenSSL on a
single 10MB encryption (even with the 1.7 code)! That also sounds like a
case where the Tables64KGcmMultiplier might be worth the extra setup time.

Nevertheless if there are still performance issues, we'll be glad to
investigate them further.

Regards,
Pete Dettman
Post by Sid Shetye
Hi guys,
I was seeing some very poor performance in AES-GCM mode. I tried
profiling the code but want into a wall (posted on stackoverflow at
http://stackoverflow.com/questions/14573972/code-profiling-to-improve-performance-see-cpu-cycles-inside-mscorlib-dll)
In particular, it seems that AES-GCM should be faster than AES-CBC; at
"/The Galois/Counter Mode of Operation (GCM)/" paper by David A.
McGrew, I read "/Multiplication in a binary field can use a variety of
time-memory tradeoffs. It can be implemented with no key-dependent
memory, in which case it will generally run several times slower than
AES. Implementations that are willing to sacrifice modest amounts of
memory can easily realize///*/speeds greater than that of AES/*/./"
To test that theory, I tried the same on OpenSSL. I found AES-GCM is
about 2x the speed of AES-CBC. What I did was measure (100 times=>
average it)
·Time to only in invoking OpenSSL (=15ms)
·Time to OpenSSL AES-GCM encrypt a 10MB file (=32ms = > just encrypt =
17ms)
·Time to OpenSSL AES-CBC encrypt a 10MB file (=45ms => just encrypt =
30ms)
The code itself is rough but tiny -- listed at
https://gist.github.com/4661615.
This experiment seems to corroborate the theory in that paper. However
in BouncyCastle (C#, 1.7) I'm finding the opposite. AES-GCM is about
_18x slower_ compared to AES-CBC mode! Altogether, the difference is
about 2x fast in OpenSSL + 18x slow in BouncyCastle => 36x slow from
it's potential!
*Question: Does anyone know why that is the case? Is anyone interested
in poking deeper?*I think there might be a lot of room for
improvement. Unfortunately, I don't think I'm qualified to implement
any such improvement (not without substantial investment of time!)
Thanks
Sid
Peter Dettman
2013-01-30 07:38:51 UTC
Permalink
Sid,
Thanks for the new numbers. The first thing to get sorted out is
avoiding precomputation for every encryption/decryption. For the BC
implementation, if you are encrypting many small texts (with the same
key), you should keep the same GcmBlockCipher object and just call Init
for each iteration (important to change the IV). Ideally you'd pass a
null KeyParameter to indicate key re-use, though the cipher also will
detect that the key hasn't changed, so the penalty might not be too bad.
I've sent a pull request on github for my suggested changes along these
lines.

Another minor improvement is to simply increment the IV (e.g. the last 8
bytes, considered as an Int64) by one for each encryption instead of a
fully random IV each time, but I haven't tried that yet.

This will still leave us to explain the difference with OpenSSL, since
the result for 10MB/Tables64K is unlikely to improve much.

Let me add that I am not reading any complaining into your posts; this
sort of analysis is very welcome.

Regards,
Pete Dettman
Sid Shetye
2013-02-04 06:37:28 UTC
Permalink
Peter, this is great. The speedup is *tremendous* since the high-cost
initialization is not (re)run everytime. Sorry it took longer than usual
for the pull request.

Regards,
Sid
Ray Kelly
2013-02-04 14:06:04 UTC
Permalink
Does this effect any other AES Modes?
Thanks
Ray
Peter, this is great. The speedup is tremendous since the high-cost initialization is not (re)run everytime. Sorry it took longer than usual for the pull request.
Regards,
Sid
Sid Shetye
2013-02-04 17:12:59 UTC
Permalink
Short version: Not really.

Long version: BounchBench compared symmetric ciphers in different byte
sizes in a loop. To simulate dB style usage, it defaults to "small size,
many writes" (eg: 200 writes, 10 bytes each) and each write is encrypted
individually. To facilitate comparing different ciphers, everything is
against a standardized against a higher order interface (than IBlockCipher or
IAeadBlockCipher) that has stuff like byte[] Encrypt(...) and byte[]
Decrypt(...) meaning there is a small wrapper layer around them all. All
ciphers are instantiated inside the test loop and for AES-GCM this meant
incurring a heavy initialization penalty inside the loop, so what Peter did
is
1) moved some portion of the AES-GCM init code into the class' constructor
( this portion: GcmBlockCipher cipher = new GcmBlockCipher(new
AesFastEngine(), new Tables8kGcmMultiplier()))
2) Tweaked the test by instantiating the cipher objects outside the test
loop (i.e. the new part) and keeping only the cipher object initialization
(at Init(Key)) inside the loop with another check *inside *Init to skip key
init if it's the same.

#2 affects all benchmarks and after the above AES-GCM is far more
competitive as,say, AES-CBC. Yeah, you could argue we're cheating the
benchmark because the other ciphers can run full steam without that sort of
"tuning" or demanding object reuse. At the same time it's a good coding
technique anyway, so I pulled that change into BouncyBench. If I had more
time, I would have opened profiled it better and seen if deeper/core
optimizations are possible and compared to OpenSSL's AES-GCM code. Sadly, I
don't have that time with real world commitments ...

Regards,
Sid

Loading...