Flexcel, Exel, Delphi, long strings

Hi,

I use Flexcel and have a problem with cells having strings longer than 255 chars. 
As I use Delphi, I wondered how to manage these cells ?
My Excel file doesn't have any special types, just general type. I have to use it as it is, despite I can create a copy with cells formatted. 
Too, I want to put a CRC32 count with one of these cells, and as I get the same CRC32 with strings shorter than 255, I suspect that the strings doesn't use one byte per char (when the CRC are the same, that's always when the end of the string doesn't have the same final words).
I could switch to C++ but it would spend time, as RAD doesn't permit to mix Delphi and C in the same project.
Could you help me ? The simplier way for me should be to replace the strings with an array of chars as I doesn't need fancy trings, but I doesn't know how to do it, and neither how Flexcel works with these long strings.
Thanks for your help,
Milos

Hi, 

I found why the CRC was wrong, that's my old habit to pass the address of s[1] as the first of an array of char, guessing that all that follows in memory is the string (for i:=s[0] to s[length(s)].
When I pass the adress of s instead of s[0] the CRC aren't identical.
But I wonder what I'm checking really with (s,length(s)) ? Do I check in fact antything in memory instead of the string ?
That would be worrying as this CRC is done to be used as a secondary index and should be always the same for the same string.. 
Let me know if I should copy all the s[x] in an array[1.255] of char.
I saw too that in my file, there is no string longer than 255, some strings are truncated. As this file is an export of a HFSQL file, I guess that the original file contains strings longer than 255, the truncated strings are cropped in the middle of words...
Does it come from the feature used by the guy who gave me the file (he's using a feature of a program done by an author who is dead now), or is it a limitation of Excel?
Thanks to you,
Milos

Hi,

Sorry to waste your time, I see that if s is a string its elements aren't a byte but a char, thus I added a function that uses an array[1.255] of byte. That's seem to give the CRC32 correctly (I hope)..
I still wonder if a string in Excel can exceed 255 char or if it is a bug from the export from HFSQL ?
Thanks,
Milos

Hi,

Excel strings can indeed exceed 255 characters, and FlexCel should report them right. But there are some concepts we need to know about strings in Delphi to further understand the problem.
Since Delphi 2009, there are 3 main types of strings in delphi:

1. ShortStrings. Those strings seems to be the ones you are using. They are array of bytes (or ASCII chars) with a maximum length of 255. The length of the string is stored in s[0], and being a byte, it can't be longer than 256.

You can declare a short string like
var
 a: string[255];

2. AnsiStrings. Those strings are stored in the heap, and can have any length. In those strings the length is not stored at s[0] and accessing s[0] is an error. But each character of the string is still a byte.
You declare AnsiStrings like:
 var
   a: AnsiString;

3. Unicode Strings. These are the ones FlexCel uses everywhere, and the ones you get if you declare
var
   a: String;

They have the following characteristics:
   3.1.  Their length is not limited to 255 characters. s[0] does not have the length of the string and it is an error to read s[0]
  3.2.  Each character in the string is 2 bytes, not one. A "Char" in delphi since 2009 is 2 bytes, holding unicode characters from 0 to 65535. If you declare:
var
  c: Char;
This is not a byte, but 2 bytes.
 c:AnsiChar would be 1 byte.

You can read more about strings here:
http://docwiki.embarcadero.com/RADStudio/Tokyo/en/String_Types_(Delphi)

Now, about your problem. FlexCel returns an UnicodeString when it reads the file. If you want to convert them to an array of ASCII bytes, you can use
  TEncoding.ASCII.GetBytes(s);

This will return an array of bytes (not limited to 255 characters) with the ASCII characters. Any special unicode character (like ñ or é) will be converted to ?

As an alternative, 
TEncoding.ANSI.GetBytes(s);
will return a string where character are still bytes, but using your codepage to encode special characters like é. The problem of this is that other apps with a different encoding will read all those special characters wrong. For example, if your codepage is 1252 (most common in western locales), é is character 233:
https://en.wikipedia.org/wiki/Windows-1252
But if some other machine with Cyrillic locale (1251, used in Russian countries) opens the file. he will see an й instead of an é since й is char 233 in cyrillic:
https://en.wikipedia.org/wiki/Windows-1251

If you have special characters, it is best to use UTF8 (and you can use TEncoding.UTF8 to convert strings to utf8 array of bytes and array of bytes to strings). If you don't have special characters, you can use ASCII, which is the same everywhere. ANSI is an ASCII extension that depends in the locale of the machine for chars > 127, so it is not great to use it, unless the HFSQL export expects the results in ANSI.

Hi Adrian,


I soon saw that string was made with more than one byte per element,:

function CalcCRC(s:string): DWORD;
var t: array[1..255] of byte; 
    i: integer;
    CRC : DWORD;
begin
  CRC := 0;
  for i:= 1 to length(s) do t[i] := byte(s[i]);  
  if length(s)>0 then begin
    CalcCRC32(Addr(t),Length(s),CRC);
    CalcCRC:= CRC;
  end
  else CalcCRC := $FFFFFFFF;
end;

The compiler gave me an error, that's why I added byte ( byte(s[i] instead of s[i] )
I wonder now what this cast gives : the hi or low byte of the chat, I mean the correct value I wish to have ?
Your functions are obliouvsly more sure and doen't look dirty as my byte(char).. I wonder where you always find such jewels.

Aside, I should look on the topics, but I wondered if XlsFile is persistent.
When I write smething as

var xlsin: XlsFile,

procedure Open_File;
begin
..
end;

procedure GetInFile(var s: string; row, col:integer);
begin
  .. GetCellValue
end;

procedure CloseFile,
begin
..
end;

it raises an exception when I try to get a value from a cell. 
So I put OpenFile, GetInFile, CloseFile in one procedure only and it works, that's not really a waste of time as the speed of your functions is amazing.

But as I will have to access 40 times to get each time the value of about 8 cells in the same row of a file Excell (2500 rows), that won't take many time, I only ask  as the code wouldn't look very pretty.

Kind regards,

Milos

Hi Adrian,


I have a question more : can an Excel file contain Rich Text Format, and what should be the type of variable to use ?

Thanks agains, best regards,

Milos

Hi,

The cast from char to byte keeps the low part. That is, if your number is $ABCD, byte($ABCD) will return $CD. The UTF16 used in Delphi is Little-endian, so casting the char to a byte will work, as long as the characters are <= 127. For bigger characters, the only way is to use TEncoding since you can't cast an UTF16 char to a Windows 1252 ansi byte.

About the persistence I am not sure I understood exactly what you are doing. Can you post a more complete example? There should be no need to close and reopen the same file.

I've attached a simple example here:
http://www.tmssoftware.biz/flexcel/samples/openclose.zip

As you can see, you can press the "open" button, then click many times in "read" and each read will read a new value from a new row. When you press close (or when you close the app) the XlsFile is destroyed.

About RTF, Excel has very basic RTF support: basically what you can do is to change the font of part of the cell. The simplest way to access it is with GetHtmlFromCell and SetCellFromHtml, if you can work with html. For example:
xlsin.SetCellFromHtml(row, col, 'Hello <b>world</b>');
Will enter the text "Hello world" into the cell at row, col.
If you can't use html, then FlexCel will return TRichStrings and you can set a TRcihString into a cell too. TRichStrings are strings with information for the font of the different sections on it.

Hi,

Thanks for your answer. Even if byte(word) works, encoding is better. I would have preferred for the CRC 'é' or 'è' beeing computed, at least with an 'e', but the orginal would be better. Do you know how to do the calculus with an usigned byte that could be greater than 127 ?
'Peristence' is a french word ? It's the opposite of volatile. I destroyed my source, I guess that i put xls in a wrong place. Btw, if I did place xls in the implementation of the library instead of the header, could I have an instable xls ? I will retry. But as I'm installing XE 10.2, I get some errors eache time (here, TMSDiagram and TMSWlow, I believe, did'nt compile) and uninstalling an reinstalling the essential TMS can take 2 days.. 
I would especially use Flexcel to do a "Excel-like" that could add the CRC cell, and allow the administrator to edit, modify, add lines. But I still don't achieve installing Grids and the free component binding them.
Many thanks again
Milos

Hi,

Thanks again for the RTF informations. Doesn't it will be easy to use TRichString ? I'm thinking about the administror, who will be a final user too and isn't familiar wiith HTML. Why did you give the HTML option first, does TRichString complicate the source, or anything else like using a separate window to edit the cell ?
Thx,
Milos

Hi Adrian,

I still doesn't find how to convert an Unicode to UIT8, as I still try fo find a CRC32 with the part >127 (in french, '?', '?', ..)
Could you please give me a clue ? The samples of Delphi give only something as "if myText is Unicode then.. " but don't show how to force a conversion from String to another UTF8 string
Kind regards,
Milos

Hi,

This code sounds womewhat rough and I got a warning, but do you think if will do the job for a CRC32 ?

function CalcCRC(S:String): DWORD;
var i, size: integer;
    CRC : DWORD;
    t : array[0..1047] of byte;
    s1 : String;
begin
  s1 := UTF8String(S);
  size := s1.Length;
  for i := 0 to size do t:=byte(s1);

  CRC := 0;
//   CRCvalue := (CRCvalue SHR 8) XOR
//    Table[ q^ XOR (CRCvalue AND $000000FF) ];
  for i:= 0 to size do CRC := (CRC SHR 8) XOR const_table[t XOR (CRC AND $000000FF)];
  CalcCRC := CRC;
end;

Sincerly yours,

Milos

Hi, 


Sorry it is in fact

function CalcCRC(S:String): DWORD;
var i, size: integer;
    CRC : DWORD;
    t : array[0..1047] of byte;
    s1 : String;
begin
  s1 := UTF8String(S);
  size := s1.Length;
  for i := 0 to size do t[i]:=byte(s1[i]);

  CRC := 0;
//   CRCvalue := (CRCvalue SHR 8) XOR
//    Table[ q^ XOR (CRCvalue AND $000000FF) ];
  for i:= 0 to size do CRC := (CRC SHR 8) XOR const_table[t[i] XOR (CRC AND $000000FF)];
  CalcCRC := CRC;
end;

Hi,

String is always UTF16. So if you have your text in a string, you always have it in 2 bytes per char. What I would use is:



function CalcCRC(S:String): DWORD;
var i: integer;
    CRC : DWORD;
    t : TArray<byte>;


begin
  t := TEncoding.Unicode.GetBytes(S);


  CRC := 0;
//   CRCvalue := (CRCvalue SHR 8) XOR
//    Table[ q^ XOR (CRCvalue AND $000000FF) ];
  for i:= 0 to Length(t) do CRC := (CRC SHR 8) XOR const_table[t[i] XOR (CRC AND $000000FF)];
  CalcCRC := CRC;
end;




That is, use TEncoding.GetBytes to get the bytes representing the string, then run the CRC in those bytes. There is no need to convert to UTF8

Edit: I forgot to remove size in the code example.
Edit2: The formatting of the text was wrong and t[i] was replaced by t



Adrian Gallero2018-04-08 13:08:28

Hi,


Many thanks. I tried how to get this array without success, your help is always very precious.

Milos