MIME type sniffing in C#

If you create a webpage where your users can upload files to your website, you probably want to filter what type of files can be uploaded. If you want to resize the user’s avatar, then the uploaded file must be an image file, or if you want to extract data from the uploaded XLSX file, then it must be really an Excel file.

The classic solution to this problem is to examine the extension of the uploaded file. Please always keep in mind that the file name which contains the file extension is posted to the webserver in the HTTP request, so you can trust it only as much as you trust the rest of the request. Not at all.

A better solution is to read the raw contents of the file, and look for the bytes that characterize the gives file type. These bytes are called the file type signatures, and you can find them on Gary Kessler webpage with additional links to articles, databases and tools.

For us the question is how to implement this in C#? A natural solution is to open the file, seek to the given offset, read the required number of bytes and compare them with the expected signature. This works perfectly as long as the vendor doesn’t changes the file format.

A bit more “official” solution is using the FindMimeFromData Win32 API function which serves as the basic of the MIME type sniffing feature of Internet Explorer. Using P/Invoke, you can call this function from .NET as you can see it in the following example:

using System;
using System.Runtime.InteropServices;

/// <summary>
/// Helper class to detect the MIME type based on the file header signature.
/// </summary>
public static class MimeSniffer
    /// <summary>
    /// Internet Explorer 9. Returns image/png and image/jpeg instead of 
image/x-png and image/pjpeg. /// </summary> private const uint FMFD_RETURNUPDATEDIMGMIMES = 0x20; /// <summary> /// The zero (0) value for Reserved parameters. /// </summary> private const uint RESERVED = 0; /// <summary> /// The value that is returned when the MIME type cannot be recognized. /// </summary> private const string UNKNOWN = "unknown/unknown"; /// <summary> /// The return value which indicates that the operation completed successfully. /// </summary> private const uint S_OK = 0; /// <summary> /// Determines the MIME type from the data provided. /// </summary> /// <param name="pBC">A pointer to the IBindCtx interface. Can be set to NULL.</param> /// <param name="pwzUrl">A pointer to a string value that contains the URL of the data. Can be set to NULL if <paramref name="pBuffer"/> contains the data to be sniffed.</param> /// <param name="pBuffer">A pointer to the buffer that contains the data to be sniffed. Can be set to NULL if <paramref name="pwzUrl"/> contains a valid URL.</param> /// <param name="cbSize">An unsigned long integer value that contains the size of the buffer.</param> /// <param name="pwzMimeProposed">A pointer to a string value that contains the proposed MIME type. This value is authoritative if type cannot be determined from the data. If the proposed type contains a semi-colon (;) it is removed. This parameter can be set to NULL.</param> /// <param name="dwMimeFlags">The flags which modifies the behavior of the function.</param> /// <param name="ppwzMimeOut">The address of a string value that receives the suggested MIME type.</param> /// <param name="dwReserverd">Reserved. Must be set to 0.</param> /// <returns>S_OK, E_FAIL, E_INVALIDARG or E_OUTOFMEMORY.</returns> /// <remarks> /// Read more: http://msdn.microsoft.com/en-us/library/ms775107(v=vs.85).aspx /// </remarks> [DllImport( @"urlmon.dll", CharSet = CharSet.Auto )] private extern static uint FindMimeFromData( uint pBC, [MarshalAs( UnmanagedType.LPStr )] string pwzUrl, [MarshalAs( UnmanagedType.LPArray )] byte[] pBuffer, uint cbSize, [MarshalAs( UnmanagedType.LPStr )] string pwzMimeProposed, uint dwMimeFlags, out uint ppwzMimeOut, uint dwReserverd ); /// <summary> /// Returns the MIME type for the specified file header. /// </summary> /// <param name="header">The header to examine.</param> /// <returns>The MIME type or "unknown/unknown" if the type cannot be recognized.</returns> /// <remarks> /// NOTE: This method recognizes only 26 types used by IE. /// http://msdn.microsoft.com/en-us/library/ms775147(VS.85).aspx#Known_MimeTypes /// </remarks> public static string GetMime( byte[] header ) { try { uint mimetype; uint result = FindMimeFromData( 0,
uint) header.Length,
out mimetype,
if( result != S_OK ) { return UNKNOWN; } IntPtr mimeTypePtr = new IntPtr( mimetype ); string mime = Marshal.PtrToStringUni( mimeTypePtr ); Marshal.FreeCoTaskMem( mimeTypePtr ); return mime; } catch { return UNKNOWN; } } }

This function is reliable, however it recognizes only the 26 most frequent file types. That’s not a small number, a more painful problem is that it recognizes all Office files as ZIP compressed files (which is true by the way).

With some googling you can find other projects in the net, for example the FileTypeDetective on CodePlex recognizes fewer file types, but detects the Office file formats specifically. And you can see how it works in the source code.

Whichever solution you choose, don’t forget that you have just introduced an external dependency into your project, moreover you can’t know how future-proof the solution is.


Technorati-címkék: ,,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s