I just found out that a file created by Flexcel is now sending the standard HEX signature file so I can recognize it as a XSLX document.
This is the standard HEX signature to recognize a Microsoft Office 2007 document:
' If this is a Microsoft Office 2007 document
If Mid(lcString, 1, 14) = Chr(80) + Chr(75) + Chr(3) + Chr(4) + Chr(20) + Chr(0) + Chr(6) + Chr(0) + _
Chr(8) + Chr(0) + Chr(0) + Chr(0) + Chr(33) + Chr(0) Then
Can you explain?
...I meant "NOT" sending the standard HEX signature
If you look at this page, you will see bytes 7th and up are not as it should:
http://www.garykessler.net/library/file_sigs.html
For byte 7th, you put 0 instead of 6.
For byte 8th, you put 8 instead of 0.
Etc.
Is this a new protocol?
Hi,
Yes, this is exactly what my class does. I mentioned that header for Microsoft Office 2007. But, in reality, I should have said this is the main condition (PKZip) to go through the unzipped version, open the XML and so on to detect XLSX, DOCX, PPTX, etc.
Now, if I understand correctly, it works so far in my class as all my tests, as well as all files processed so far in production, where using a PKZip header I was able to recognize. You mentioned the start of the file might not always work in exceptions.
From that, to make it work, it seems I would need to grab only the first 6 bytes of the start of the file to detect if this is a zip. Then, it would go in my condition code, unzip, open the XML, etc. The goal is to simply have the first condition to pick it up. So, instead of verifying for the first 16 bytes, I should verify the first 6 only. If I do that, it would recognize your file as well as a zip and would eventually detect the final detection to be a XLSX. Is this what you recommend?
the zip file works like this:
Thanks, I will try to get more info on the 64kb detection for that header. For now, the routine works with the Flexcel created XLXS file.