Monthly Archives: September 2011

Calculating hash while processing stream

Recently I needed to calculate hash of a unspecified Stream while I was processing its data. Along the way I discovered 3 additional methods to calculate hash – all suitable when you can’t rely on seeking in the stream.

The direct approach

Stream s = GetStreamFromSomewhere();
 
byte[] hash;
using (MD5 md5 = MD5.Create()) {
  hash = md5.ComputeHash(s);
}
s.Seek(0, SeekOrigin.Begin);
...

This is the direct approach to calculate hash from data in stream, but its setback is that after calculating the hash value the stream is read to the end. If the source stream was seekable (FileStream, MemoryStream…), you can just seek back and process the stream normally, but what if you can’t seek in the processing stream?

The block approach

You can calculate the hash on the go, using TransformBlock and TransformFinalBlock methods of MD5 class (or any other class implementing ICryptoTransform).

Stream s = GetStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  byte[] data = new byte[4096];
  int byteCount = 0;
  while ((byteCount = s.Read(data, 0, data.Length)) > 0) 
  {
    md5.TransformBlock(data, 0, byteCount, null, 0); // feed the data to MD5 algorithm
 
    // do something useful with the actual read data here
  }
  md5.TransformFinalBlock(data, 0, 0); // tell the algorithm that all data is read
 
  byte[] hash = md5.Hash;
}

This allows you to process the stream and calculate hash of the data at the same time, thus removing the need to read the entire stream twice, and removing the need to seek in the stream.

The nice approach – CryptoStream scenario #1

Stream s = GetStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  using (CryptoStream cs = new CryptoStream(s, md5, CryptoStreamMode.Read) 
  {
    int byteCount;
    byte[] data = new byte[4096];
    while ((byteCount = cs.Read(data, 0, data.Length)) > 0)
    {
      // do something useful with the actual read data here
    }
    byte[] hash = md5.Hash;
  }
}

As you can see, you can enclose the hash processing into CryptoStream, which is calculating the hash value during reading of the stream, which results in cleaner code.

Note – do not use CryptoStream.FlushFinalBlock() while using CryptoStreamMode.Read, because the CryptoStream itself can tell that the source stream reached its end and called this method already.

The nice approach – CryptoStream scenario #2

The scenario #1 is useful when you have a single source stream, but when you want to calculate hash value of output of your algorighm and do not have input in form of a stream, you can use CryptoStreamMode.Write.

Stream s = GetOutputStreamFromSomewhere();
 
using (MD5 md5 = MD5.Create())
{
  using (CryptoStream cs = new CryptoStream(s, md5, CryptoStreamMode.Write) 
  {
    while (! finished)
    {
      ...
      byte[] data = getNextDataChunk();
      ...
      cs.Write(data, 0, data.Length);
    }
    cs.FlushFinalBlock();
    byte[] hash = md5.Hash;
  }
}

Note – when using CryptoStreamMode.Write, you need to indicate to the hash algorithm that all data is written.

WCF Streaming

In my recent project I needed to use WCF streaming to transfer files between server and client, and I decided to share gained knowledge so you can use this techinque in your project more easily.

I will start with some more basic stuff, so feel free to skip some sections if you are sure you already know that topic.

1. If you plan to transmit more complex data, you need to control MessageContracts

WCF streaming supports only one stream in request or response message (that means: one in request; one in response; or one in request and one in response), but that is not all. In fact, message body should contain only the stream and nothing else. Thus you need to define message contracts so that all data other than the stream will be transmitted in message header.
This is quite easy even for programmer, who has only marginal overview of what the message contracts.

What this means for the host class definition:

Instead of specifying the transmitted content directly like this…

[ServiceContract(Namespace = "http://blog.monogram.sk/pokojny/example5/")]
public interface IMyAwesomeInterface {
  [OperationContract]
  bool PushFile(string filename, Stream data);
 
  [OperationContract]
  Stream PullFile(string filename);
}

… you should create classess representing transmitted messages …

[MessageContract]
public class PushFileRequest
{
  [MessageHeader]
  public string Filename
  { get; set; }
 
  [MessageBodyMember]
  public Stream Data
  { get; set; }
}
 
[MessageContract]
public class PushFileResponse
{
  [MessageHeader]
  public bool Result
  { get; set; }
}

… and modify your service interface (and its implementation(s)) to use these messages.

[ServiceContract(Namespace = "http://blog.monogram.sk/pokojny/example5/")]
public interface IMyAwesomeInterface {
  [OperationContract]
  PushFileResponse PushFile(PushFileRequest request);
 
  [OperationContract]
  PullFileResponse PullFile(PullFileRequest request);
}

2. Don’t even try wsHttpBinding

WS HTTP Binding does not support streaming, the whole message is buffered on the server, transfered to the client and buffered on the client. You should use basicHttpBinding or netTcpBinding instead.

3. Testing if the streaming really works

I had to be sure that I understand the differences in consuming and providing streamed vs buffered requests, so I wrote some test services and clients. Returning a MemoryStream or FileStream was not enough – I felt need to see that I had access to response before the whole stream was on client side. And this got me frustrated, because as I found out later (by trial and error), when using basicHttpStream, service proxy method won’t return control until:
1. the whole message including whole content of stream is on the client side
or 2. the receive buffer fills up

I implemented a simple stream that could be read once a second, returned current time and eventually stopped after 20 iterations. This simple test indicated, that something is wrong with the streaming, because the proxy method returned result only after the whole stream was read on the server. But after modifications (lowering the maxBufferSize, more data read from the time stream) I found out, that the message was really streaming.

When using netTcpBinding on the other hand, the proxy method returned response even before the receive buffer was full.

So when you are testing your application, I advise you to lower maxBufferSize and use netTcpBinding if possible, so you can get better feeling of what is actually happening.

4. How to implement processing of streamed messages

Client sending Stream – beware of disposing

This is actually straight forward, you just need to bear one thing in head – the actual usage of sent stream does not end the moment the control flow returned from the proxy class. Actually WCF could be reading the sent stream for quite a while after you got response from the server. And here comes the problem with disposing – you should dispose all used FileStreams, MemoryStreams etc, but you dont really know when. You should not use “using” blocks for this, as the stream could be disposed before it was read completely.

I came with a solution – to wrap actual stream into a object that will close and dispose the stream when the WCF finished reading it. This class must be derived from Stream class and should just forward method calls to the actual source stream. But when the source stream’s Read() method returned 0, the wrapper should close and dispose the source stream.

Service receiving Stream – not all Streams are meant to be closed

Beware of closing the WCF Stream client sent you – if you close it, you also close the WCF connection (at least when using netTcpBinding), which is definitely something you should let the WCF do.

Secondly, you can take advantage of the fact, that you can return result even before you read the whole sent stream – you can achieve this by reading the stream in another thread and return the response normally.

Service returning Stream – disposing #2

The same disposing problem as in “client sending stream” occurs also in this scenario, but it is more critical here. Let’s say that you want to stream contents of a file – you can do that just by returning opened FileStream (as WCF Sample from MS does :-| ), but the file remains opened even after it is read, which could be problem for a service, that should be running for long periods of time. (More possible scenarios could happen, not one of them is good.)

Thankfully, you can use the same trick as in “Client sending Stream” scenario – wrap the actual Stream and close it after it reads to the end.

Client receiving Stream – do not close that Stream

When client receives stream, the same problem as in “Service receiving Stream” scenario occurs. Just remember not to close the WCF Stream even after you processed/rewrote all its contents.

Final words

And that’s it, these are my basic hints for anyone trying to implement WCF Streaming. If you have additional questions or if you need example source code, just write so in the comments.