If you do not go into details, there are .xlsx files that are named as .xls and on the contrary, .xls that are named as .xlsx.
There are also those that correspond to their extension.
In general, you need to go through the files and rename them correctly, but I did not find how to determine the file format without relying on the extension in its name
Answer 1, Authority 100%
on the header. The XLSX new exel files begin PK signature ZIP-A 50 4B
. With such a signature can be not only XLSX and also docx and zip archives, you need to deduct a list of packaged files, for clarification.
.xls files – there are two types, “new” XLS (usually it is usually found).
They have a signature D0 CF 1 E0 A1 B1 1A E1
(in fact, this is an office document signature, for a detailed identification, i.e. doc and xls will have the same signature, you need to search for “main” attachment In BIFF, the wrap, and check there signature, either by the list of BIFF-Overteads files), and the same old (up to version 7, 6-mAya was not in use) have a signature 09 08
For example, you can so
byte [] xlsdoc = f (); // downloaded a document in bytes
if ((xlsdoc [0] == 0x50) & amp; & amp; (xlsdoc [1] == 0x4b)) {
// New Ekel
}
else if ((xlsdoc [0] == 0xd0) & amp; & amp; (xlsdoc [1] == 0xCF) /*...*/) {
// Office 97.
}
ELSE If ((XLSDOC [0] == 9) & amp; & amp; (xlsdoc [1] == 8)) {
// old one-satellite
} else {
// in other cases - I think that garbage
}
Because The file can be large, it is better to deduct the first 16 bytes, and analyze them, and then return stream.position
in zero.
More with Excel I came across jokes
- xml
& lt;? xml
– file with Excel (if in Excel Save file as XML) - HTML Excel
& lt; HTML
– you often advise such a file to do on the Internet javascipt-Ohm php and t n, – Table with& lt; TD & GT;
& lt; TR & GT;
tags and specific assistants.
References to signatures
- http://uk.wikipedia.org/wiki/%D0%A1%D0%B8%D0%B3%D0%BD%D0%B0%D1%82 % D1% 83% D1% 80% D0% B0_% D1% 84% D0% B0% D0% B9% D0% BB% D1% 83 _ (% D0% BF% D0% B5% D1% 80% D0% B5% D0% BB% D1% 96% D0% BA)
- http://www.filesignatures.net /INDEX.php?page=Search& ;Search=XLS& ;Mode=Ext
- http://www.filesignatures.net /INDEX.php?page=Search& ;Search=XLSX& ;Mode=EXT
- http://en.wikipedia.org/wiki/Microsoft_excel
Answer 2, Authority 67%
Alternatively, in Windows to determine whether the file is a correct XLS file, you can use the STRUCTURD STORAGE API. According to the specification, the XLS format is a Structured Storage format file that contains a stream named Workbook.
MS-XLS : Excel Binary File Format Structure, paragraph 2.1.2 :
A File of the Type Specified by This Document Consists of Storages and Streams AS Specified in [MS-CFB] …
A Workbook Must Contain The WorkBook Stream …
You can use the following code for checking on XLS, based on this rule:
using system;
Using System.Collections;
using System.Runtime.interopServices;
Namespace ConsoleApplication1
{
Class Program
{
[DLLIMPORT ("OLE32.DLL")]
Static Extern Int StgopenstorageEx (
[Marshalas (UNMANAGEDTYPE.LPWSTR)] String PwcSname,
UINT GRFMODE,
UINT STGFMT,
UINT GRFATTRS,
Intptr pstgoptions
Intptr reserved2,
[In] Ref GUID RIID,
Out iStorage ppobjectOpen);
Const uint stgm_direct = 0;
const uint stgm_read = 0;
Const uint stgm_share_exclusive = 0x10;
const uint stgfmt_storage = 0;
const uint pid_first_usable = 2;
const uint STGC_DEFAULT = 0;
[GUID ("0000000B-0000-0000-C000-000000000046")]
[InterfaceType (CominterfaceType.interfaceInkNown)]
Public Interface Istorage
{
void a ();
[PreserveVesig]
INT OPENSTREAM (String PwcSname,
Intptr reserved1
UINT GRFMODE,
UINT RESERVED2,
[Marshalas (UNMANAGEDTYPE.Interface)] Out Object PPSTM);
Void CreateStorage (String Pwcsname, Uint GRFMode, Uint Reserve1, Uint Reserved2, Out Istorage PPSTG);
Void OpenStorage (String Pwcsname, Istorage PSTGPRIORITY, UINT GRFMODE, INTPTR SNBEXCLUDE, UINT RESERVED, OUT ISTIRAGE PPSTG);
Void Copyto (Uint CiidexClude, GUID [] RGIIDEXCLUDE, Intptr SnbexClude, Istorage Pstgdest);
Void MoveElementTo (String Pwcsname, Istorage Pstgdest, String PwcsnewName, Uint Grfflags);
Void Commit (uint grfcommitflags);
void revert ();
void b ();
Void DestroyElement (String PwcSname);
Void RenameElement (String PwcsOndName, String PwcsnewName);
void c ();
Void Setclass (Ref GUID CLSID);
Void SetStateBits (Uint GrfStatebits, Uint GRFMASK);
void d ();
}
Public Static Bool Isxls (String Path)
{
IStorage PSTORAGE = NULL;
Object O = NULL;
int hr;
GUID Guidstorage = Typeof (IStorage) .guid;
Try.
{
// Open the file
HR = STGOPENSTORAGEEX (Path, STGM_READ | STGM_SHARE_EXCLUSIVE, STGFMT_STORAGE,
0, intptr.zero, intptr.zero, Ref Guidstorage, Out PStorage);
if (hr! = 0) Return False; // NOT Structured Storage File
// Open the flow
hr = pstaorage.openstream ("Workbook", intptr.zero, stgm_direct | STGM_READ | STGM_SHARE_EXCLUSIVE, 0, OUT O);
RETURN HR == 0;
}
Finally
{
// Liberation of resources
if (PStorage! = NULL) Marshal.ReleaseComobject (PSTORAGE);
if (O! = NULL) Marshal.ReleaseComobject (O);
}
}
}
}
Since the XLSX file is a zip archive of a specific structure, you can apply the same logic to check and use any library to work with ZIP-archives (in .NET 4.5+ there is a built-in System.IO.COMPRESSION ).