Save speed and progress

I've recently purchased FlexCel and currently implementing it into our projects, and so far I'm finding it an excellent product and very easy to use. I do have a couple of questions:

Firstly, I have a spreadsheet with approx 100 columns and 70000 rows. When I save this to a new .xlsx file, it takes about 90 seconds to save. If I load this into Excel and then save from there, it only takes about 10 seconds. Just wondering if that is what you'd expect, or if there's some setting I've missed.

Secondly, is there a way to display a progress bar while saving? Ideally, some events like OnSaveStart, OnSaveProgress and OnSaveDone that I can hook into.

Thanks.

Hi,
90 seconds seems a little too much of time, but then that might depend on the type of file you are creating. Do you have lots of formulas? It might be that a lot of time is spent calculating the file before saving it.

I tried the following code here:


program Project26;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils, VCL.FlexCel.Core, FlexCel.XlsAdapter, System.Diagnostics;

var
  xls: TXlsFile;
  sw: TStopWatch;
  r, c: integer;
begin
  xls := TXlsFile.Create(1, true);
  try
    for r := 1 to 100000 do
    begin
      for c := 1 to 100 do
      begin
        xls.SetCellValue(r, c, r);
      end;
    end;

    sw := TStopWatch.Create;
    sw.Start;
    xls.Save('r:\test.xlsx');
    sw.Stop;

  finally
    xls.Free;
  end;
  WriteLn(sw.ElapsedMilliseconds);
  Readln;
end.



And it takes about 12 seconds here in my machine to save the file. For the record Excel is faster, taking about 5 seconds in my machine to save the file, but this is in part caused because Excel uses the "fastest" setting for zip compression while we use "default" which is slower but gets smaller files. (the file saved by Excel is 27mb, while the one created by FlexCel is 20 mb).

By setting the compression level to "fastest" in FlexCel the seconds go down to 7, which is still slower than 5, but not that much. (and by using fastest the size grows to 27mb, same as Excel)

But I wonder why you are getting about an order of magnitude slower times. How long it takes to run the code above in your machine? If it is near 90 seconds, can you check if disabling the antivirus speeds up the saving?
If this file saves faster but yours not, can you send me a file showing the issue to adrian@tmssoftware.com ? While saving to xlsx is slower than xls because we have to compress the file and generate mb of xml, it is normally fast enough that you don't really need to worry much about it.

About the progress events, we sadly don't have them for that same reasons that saving should be fast enough. Adding an even would actually slow down the saving process, and calculating when to fire it is difficult. I mean, if we fire a "progress" event for every cell written, as we are writing 10 million cells, it would be too slow. But the correct number to call depends in the speed of the machine running the process. If we call the progress say every 1000 cells written, it might slow down anyway the fastest machines, and be too jerky for the slowest machines which might need a more granular progress.

For that reason we normally recommend running the code that can take long in a thread, and show some spinning progress in the UI + the ellapsed time in the UI thread. You can see an example of this in the "Custom preview" demo when you export to pdf: You can try it with one of those big files to see it. There are no events firing when exporting to pdf, but we can show you the progress anyway (And allow you to cancel). Maybe something similar can be added for saving process, but as said, the idea is that saving should be fast enough so you don't have to care.


Hi Adrian,

I've been doing some timing tests this morning! On my machine your program saves the file in around 17 seconds, and Excel takes around 8.

I've just written another small test app to just load and save a file. When I load and directly save the file created by your test app, it saves in about 17 seconds again, ie same time as above. However, if I load and directly save my file, it saves in about 12 seconds!! (rather than 90 when it's generated first in code!)

I'm assuming, therefore, that after I've generated my FlexCel file in code, when I call Save, before the actual writing to disk something else is happening! (maybe checking if the sheet needs to be recalculated etc?)

In this case, my file doesn't have any formulas. It's basically a dump of a database table from our system. The columns are the database fields and the rows are database records. When I generate the file, i'm simply looping through the database table writing each field into the appropriate cell, and I also set a column format for each column based on the database field type. Finally, I just call Save.

Given I know that I dont need to recalculate the sheet before saving, is there a setting somewhere to tell FlexCel to just save the file directly?

Thanks for your help
Steve

Hi Adrian,

Many apologies, I've just been doing further checks and I've just realised that before I save I'm doing an AutoFit, and it's obviously that that's taking the time! The save itself is about 15 seconds (just a few seconds longer than the save/load test I described earlier).

Could you just confirm, though, if I don't have any formalae and dont ever need to recalculate before saving a file, is there some setting to turn this off, or is the time FlexCel takes to check this negligable anyway?

Also, you mentioned in your original reply that FlexCel uses a 'default' compression setting, but can be changed to 'fastest' - could you point me to the property to change this so I can see what effect it has on my system.

Finally, thanks for the comments about the progress, though now I know the saving is not as long this point is probably moot now anyway.

Kind Regards
Steve

Hi,
Thanks for all the extra information. About the questions:
1)Indeed, if there are no formulas the time spent recalculating is negligible. FlexCel has a linked list of all the cells that need to be recalculated, and if there are no formulas, the list is empty, so it won't lose time with that.
If you wanted (for a case where there are lots of formulas), you can set recalculation to manual with:
            xls.RecalcMode = TRecalcMode.Manual;
(you need to change this mode just right after creating the TXlsFile object).
But as said, in this case it won't make any difference as the list of cells to recalculate is empty and the time to check list.count = 0 is not measurable.

About the compression setting, sadly this is not exposed as a property, even when it should be simple to expose it. You could change the source code (it is in the line "inherited Create(aZipStream, zcDefault, -15);" in zippy.pas ) but then you would have to recompile the packages and it is probably not worth. I have added a new static property:
TExcelFile.XlsxCompressionLevel
which you can set to zcNone, zcFastest, zcDefault or zcMax
We should be publishing 6.13 very soon (I hope tomorrow) an it will have this property.

About autofitting, it is on our todo list to make it faster: The current autofit algorithm is very focused in correctness, using GDI+ which is like 10 times slower than GDI but gets more exact results, and it also considers a lot of stuff like Right to left text, etc. But as a side effect of all of that, it is slow.

I have some ideas to make it faster without losing correctness, but I haven't got time to implement them yet. But one thing you can normally do is to limit the autofit to say the first 1000 rows. Reality is that if you have a larger string below it won't really matter much anyway if it is truncated, and normally you would even want to truncate that single really big string at row 40,000 and not make clumn B too big for the rest of the 39,999 strings anyway. There are overloads in autofit that will allow you to limit the first and last row used in autofit, and autofitting only the first rows will speed up the process a lot.

Hi Adrian, thanks for that info. I have indeed already changed my AutoFit to limit to the first 100 rows, which is sufficient for my needs here. I'll look forward to getting 6.13 to check compression timings!