I have two excels (which are actually database table dumps from old customers system) from which I need to convert them to our database structure. So in practice I need to read every row excel one and check if there exists records for matching by 3 columns in excel 2.
rowcounts in both are between 100k and 300k (depending on customer)
Is there faster searching method build to flexcell (by one column atleast) than simple stepping trough row by row. I can sort the data. (just asking to avoid writing my own binary search cause I'm doing this FlexCell which is integratated to TMS Scripter)
FlexCel has a Find command (see https://doc.tmssoftware.com/flexcel/vcl/api/FlexCel.Core/TExcelFile/Find.html ) but I will not be faster than doing a linear search. That's because Find just does a linear search (it can't assume the workbook is sorted).
I think the best and fastest solution here is to use a Dictionary: Load Excel 2 into the Dictionary (not the full Excel 2: Use the 3 columns for the Dictionary key, and an integer with the row number as a value)
So after loading all file 2 in the dictionary, run over all the rows in Excel 1, and for each row see if the row is in the dictionary. The method will use more memory, but will be way faster than doing linear searches all the time in Excel2.
Another option if the Dictionary consumes too much memory would be to sort Excel2 (you can use https://doc.tmssoftware.com/flexcel/vcl/api/FlexCel.Core/TExcelFile/Sort.html ) and then do a binary search (sorry, we don't have one built-in). But that will be probably slower. The fastest should be a dictionary and the extra memory shouldn't be that much.
Thanks. I'm not sure if I can access dictionary from TMS Scripter directly but I realized that I could sort excel (the primary column for searching is string) and read them to stringlist, put assign sorted to true.
I'll get index where to start ..
This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.