Out of memory with lots of rows

I try to get started with FlexCel - so bear with me if I'm doing something stupid.
I need to be able to generate large Excel files so I wrote a little test project that fills an Excel sheet to its limits. With 500 columns I'm running out of memory.
Could you give me some hints or tips how I can avoid that.

The code I'm using looks like this

  Xls := TXlsFile.Create(True);
    i := 1;
    while True do
      if i = 1048577 then // End of Excel 2007
      for j := 1 to 500 do
        Xls.SetCellValue(i, j, 'Test');



FlexCel, same as Excel, stores the file you are creating into memory, in order to be able to calculate it. So if you write cells you will eventually go out of memory. But not only in FlexCel, in Excel too.

If you are writing 1 million rows x 500 columns, you are writing 500 million cells. While FlexCel is very optimized in memory usage (so for example it won't keep in memory 500 million of strings "test", but keep it only once), we can't do magic.

The maximum memory that a 32bit process can have is 2GB, or 2^32 bytes. 2^32/(500* 1048577) = 8.2 bytes per cell, which seems correct. We need 2 bytes to store the format of a cell + 4 bytes for a pointer to the string, + 2 extra bytes to store the row and column of the cell, which means 8 bytes per cell.

And even if we could magically reduce the memory from cell to say 4 bytes per cell (remember just a pointer is 4 bytes in 32 bits), you would still be able to run out of memory by writing 1000 columns. And even if you managed to save that file, a 32 bit Excel would run out of memory trying to open it, because Excel has the same limitations than us. (that's why until Excel 2007 you couldn't write more than 65535 rows, Excel 2003 was limited to use 1gb of memory, so it didn't made sense to allow much more than that).

So after all the "theory talk", if you want a fast and simple solution, just use 64 bits. The 64 bit app will use more memory than the 32 bit one (every pointer in 64 bits is 8 bytes), but it won't be limited to 2 gb, or to the available memory in your machine, since the memory will be paged if if you need more memory. Remember that to open such file, your users will likely need Excel 64 bits, or they will run out of memory too opening it.

For a more complex solution, you should really rethink what data you are writing to the spreadsheet. Excel isn't a database (even if many of us like to use it like that), and for massive amounts of data it doesn't work well. After all, you are still limited to 1 million rows, which would be ridiculous in a database. And no human is ever going to read those 500 million cells anyway, it is impossible.

Sometimes when looking at those big numbers our mind plays tricks to us and they don't seem that big, after all it is "only" 1 million rows and 500 columns. But to get a better idea of the magnitude we are speaking here, let's think you want to read that information (if you are writing it, we can assume someone will read it). If you spend 1 second looking at every cell, you would spend 500 million seconds, or 500* 1048577/3600/24/365 = more than 16 years (!) Of course FlexCel will write the cells much faster than 1 per second, but anyway, expect it to take a long time to generate that file. And expect Excel to take a long time to read all the cells too. For that kind of information, it doesn't really make sense to have it in a spreadsheet, since nobody has 16 years to look at it (16 years for those speed readers who can read 1 cell per second). This is information you will probably need to query to get the information you want. The spreadsheets you generate should have this queried data, so someone opening them will be able to actually read them. To query the data, the best is to have a database which is designed for that. If you want that data to be consumed by another app and not by an human, then the best is to send say an sqllite file and not an xlsx file, it will work much better. Spreadsheets in general should be "human size", not "machine size".

It is actually in our plans to allow you to write spreadhseets not limited by memory by paging out to disk as needed (as said at the beginning the main issue being to calculate formulas), but even if we do implement that, you should probably not use it. Your example is trivial, you are writing the same string "Test" everywhere, so both Excel and FlexCel are able to figure that out, and keep a single string in memory and pointers to that. But a real file would have real data, and use even more memory. So even if you were able to create this file:
1)Excel 32 bits will not be able to open it.
2)Excel 64 bits will take forever to open it.
3)FlexCel will take forever to generate it.
4)The size of the xlsx file will be huge. Remember you are hitting a 2GB memory. While the data in an xlsx file is zipped, with actual data (not a test string repeated), the resulting file might be say 1gb. You don't want to email that. 

Adrian Gallero2014-09-11 07:20:00

Hi Adrian,
thanks for your quick and very comprehensive answer. It's appreciated!
This helps me explaining my customers why they can't export their whole database to an Excel file. ,-)