Calculating hash while processing stream

Recently I needed to calculate hash of a unspecified Stream while I was processing its data. Along the way I discovered 3 additional methods to calculate hash – all suitable when you can’t rely on seeking in the stream.

The direct approach

Stream s = GetStreamFromSomewhere();
 
byte[] hash;
using (MD5 md5 = MD5.Create()) {
  hash = md5.ComputeHash(s);
}
s.Seek(0, SeekOrigin.Begin);
...

This is the direct approach to calculate hash from data in stream, but its setback is that after calculating the hash value the stream is read to the end. If the source stream was seekable (FileStream, MemoryStream…), you can just seek back and process the stream normally, but what if you can’t seek in the processing stream?

The block approach

You can calculate the hash on the go, using TransformBlock and TransformFinalBlock methods of MD5 class (or any other class implementing ICryptoTransform).

Stream s = GetStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  byte[] data = new byte[4096];
  int byteCount = 0;
  while ((byteCount = s.Read(data, 0, data.Length)) > 0) 
  {
    md5.TransformBlock(data, 0, byteCount, null, 0); // feed the data to MD5 algorithm
 
    // do something useful with the actual read data here
  }
  md5.TransformFinalBlock(data, 0, 0); // tell the algorithm that all data is read
 
  byte[] hash = md5.Hash;
}

This allows you to process the stream and calculate hash of the data at the same time, thus removing the need to read the entire stream twice, and removing the need to seek in the stream.

The nice approach – CryptoStream scenario #1

Stream s = GetStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  using (CryptoStream cs = new CryptoStream(s, md5, CryptoStreamMode.Read) 
  {
    int byteCount;
    byte[] data = new byte[4096];
    while ((byteCount = cs.Read(data, 0, data.Length)) > 0)
    {
      // do something useful with the actual read data here
    }
    byte[] hash = md5.Hash;
  }
}

As you can see, you can enclose the hash processing into CryptoStream, which is calculating the hash value during reading of the stream, which results in cleaner code.

Note – do not use CryptoStream.FlushFinalBlock() while using CryptoStreamMode.Read, because the CryptoStream itself can tell that the source stream reached its end and called this method already.

The nice approach – CryptoStream scenario #2

The scenario #1 is useful when you have a single source stream, but when you want to calculate hash value of output of your algorighm and do not have input in form of a stream, you can use CryptoStreamMode.Write.

Stream s = GetOutputStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  using (CryptoStream cs = new CryptoStream(s, md5, CryptoStreamMode.Write) 
  {
    while (! finished)
    {
      ...
      byte[] data = getNextDataChunk();
      ...
      cs.Write(data, 0, data.Length);
    }
    cs.FlushFinalBlock();
    byte[] hash = md5.Hash;
  }
}

Note – when using CryptoStreamMode.Write, you need to indicate to the hash algorithm that all data is written.

Leave a Comment