Fastest way reading very large file


Let say for example I have an excel file with 25 sheets.
Every sheet has 1mln records
I want to read only one sheet row by row.
Eg read the row and pass it the function.

Some of the rows have no data
Some of the cells have no data
But I still want to read all of them.

If I do not use virtual mode entire excel file will be loaded into memory plus it will load the sheets I don't need.

If I use virtual mode empty cells/rows will be skipped.
Yes, I can use a sparse array but again the entire sheet will be loaded into memory.

What would be the best approach from a performance/resources point of view?


Virtual mode reports only cells with values, because that's what is stored in the file (and in memory in Excel/FlexCel). Normal mode works the same way, indeed the cells are loaded into a sparse array, and when you call GetCellValue, if a cell isn't in the array then GetCellValue will return empty.

I am not sure on what your exact use case is, but this seems like a use-case for virtual mode. I also think an sparse array wouldn't be necessary, but again, I am not sure how your setup is. But let's imagine for example that you are reading the files and creating a CSV. You could use virtual mode, then read say cell A1, then call WriteCSV(A1, A1Value). when the next cell arrives and it is B5, you call WriteCSV(cellRef, Empty) from cells A1 to B5, then call WriteCSV( B5, B5Value). And so on.

If your case truly needs an sparse array it might not be that different from reading it directly (after all, that's what FlexCel does in normal mode), but well, you do avoid reading 24 sheets. Depending on the size of those sheets, that might make a difference.

Thanks for clarification

I Have another silly question
Is reading always sequential?

EG from top to bottom and from left to right.

A1>B1>C1 and not B1>C1>A1

I guess it depends on the internal structure of the excel file which I do not know much about

Indeed Excel files go from left to right, then top to bottom. That is A1->B1->C1->A2->B2... etc.
Another order would cause Excel to report the file as invalid