Writing CSV format limitations

Hello,

we experienced a serious issue for one of our current projects. For a semiconductor customer we need to export a huge amount of numeric data for a wafer inspection system. Samples with 70-100 dies, each die contains several thousand flatbumps. For the CSV text export, there seems to be an Int16 (=65536) limit for writing rows.

Bild

In one of our test scenarious we are having 150.000-200.000 values we need to write, and more values are possible, depending on the sample size. The report itself runs without problem, also saving the report in Xlsx format, but this is not an option.

Any idea how to overcome this limitation?

With best regards - Ulrich Groffy (NanoFocus AG)

Hi,
The only format limited to 65k is xls, because xls files can't handle more than that (indeed, they store the row in an int16, so you can't write a number bigger than 65k in the file).

But both xlsx or csv are limited to 1,048,576 rows by 16,384 columns (see Excel specifications and limits )

Csv works completely fine with more than 65k, but I've just tested it now just to be 100% sure and there aren't issues. Also the message you see only happens when saving as xls. So my guess is that somehow you are saving the file as xls, not csv. What is your code to save? Are you using Save with TFileFormats.Text? Or if using reports, calling Run with the overload that takes a TFileFormats variable? Save will automatically detect between xls and xlsx depending on the extension, but it can't automatically choose csv because there are too many variants of csv and you need to specify other stuff like the encoding and the field separator (see Understanding CSV files. | FlexCel Studio for the .NET Framework documentation )

Hi,

thank you for the quick response. We are using an Excel file template in xls format, that seems to be the problem...

The format of the template should't be the problem, as FlexCel will internally update the file to have 1 million rows on loading. The problem is when you save the file, you need to specify explicitly that you want a CSV file.

The best way is to run your report like this:

var xls = new XlsFile("template.xls", true); 
using (FlexCelReport report = new FlexCelReport(true))
 {     
  report.Run(xls);     
  xls.Save("result.csv", TFileFormats.Text, ',', Encoding.UTF8); 
} 

So far, I can't make it work

I think, there is another issue with an invalid cell format, the exception is a different one.

For invalid results we are writing Double.NaN, can this cause this kind of exception?

Any additional informations I can provide?

This looks like a different issue. It should not at first glance be caused by NaN (I just tested it here and Double.NaN will just write NaN in the csv file)

Does it work if you save it to xlsx? If it does, can you send me the xlsx file to adrian@tmssoftware.com so I can investigate if there is something on it that can't be exported to CSV?

Saving the report in xlsx format is no problem. I just sent you the file.

I think the Double.NaN values are causing the exception. I changed the code to replace it with a different value, and the writing to text format works.

I found the code which caused this 65k overflow exception. I tested saving the report file in a separate thread might bring some advantages, therefore I implemented a XlsFile clone function:

    public static XlsFile Clone(this XlsFile ReportFile)
    {
      using (MemoryStream stream = new MemoryStream())
      {
        ReportFile.Save(stream, TFileFormats.Xlsx);

        stream.Seek(0, SeekOrigin.Begin);
        stream.Flush();

        return new XlsFile(stream, ReportFile.AllowOverwritingFiles);
      }
    }

In a previous version of this code I didn't save with an explicit file format (which seems to be xls by default), assigning the xlsx format solved the issue.

Yes, if you call save without parameters it defaults to xls. This is because we couldn't break old code: Originally there was only xls files, so Save would save as xls. When xlsx appeared, we couldn't change Save to save by default as xlsx without breaking all existing code that assumed xls was the default.

You can change the default with TExcelFile.DefaultFileFormat Property | FlexCel Studio for VCL and FireMonkey documentation but it is not really worth it, it is better to be explicit when you call Save() which format you use.

About the NaN issue, I've tried it here in .NET 4.8 and 6 and I can't reproduce it, but I imagine it can depend in the flags used by the fpu. I'll see to add some guard there for nan. (but you really shouldn't set nan numbers in Excel, since those don't make sense: Excel stores them as #Num! errors, and when exporting to csv you can't export a Nan)

I agree, that this wasn't the best design decision for the Excel reporting/exporting. Anyway, I replaced it with value 0, and its running as expected. Thanks to FlexCel we can create template based customized csv exports which are running pretty fast, regarding the amount of data.

Thank you for your help Adrian, I really appreciate.

Best regards.