LoadFromCSVStream

I find that this function reads only the first byte if you pass it a stream containing Unicode characters (in Delphi XE2).  Yet the DELPHI_UNICODE version of LoadFromCSV (with a TEncoding parameter) does exactly that by saving a Unicode stringlist into the stream.


So it would appear that LoadFromCSVStream does not handle Unicode, but LoadFromCSV assumes that it does.

Would be possible to enhance these functions by providing a LoadFromCSVStream overload which takes an encoding parameter and is Unicode-aware?   And then of course, fixing the Unicode-aware LoadFromCSV to use this overload.

In the absence of any comment from TMS I have had to solve this issue myself.  LoadFromCSVStream seems to work only because (unlike all other Delphi 'loadfrom' functions) it leaves the BOM in the stream. And LoadFromCSVStream seems to require a BOM embedded at the start of the stream if you pass it a stream in other than system default encoding.  It's been nearly five years since streams with different encodings became widely used in Delphi and in most of the VCL an opitonal encoding parameter has been added on stream functions.  There is nothing to indicate that requiring a BOM in the stream is your alternative solution.  Now that I know this, all is well, but it would be great if you could make a note to document this feature. 

I could not reproduce a problem. From my tests, I can only see that LoadFromCSVStream loads as expected data with & without unicode.


Test with 2 default grids on the form:

var
  ms: TMemoryStream;
begin
  AdvStringGrid1.Randomfill(false);
  AdvStringGrid1.Cells[1,1] := '漢語';
  Advstringgrid1.SaveToCSV('c:\temp\stream.csv',false);  // tested with both false & true

  ms := TMemoryStream.Create;
  try
    ms.LoadFromFile('c:\temp\stream.csv');
    advstringgrid2.LoadFromCSVStream(ms);
  finally
    ms.Free;
  end;
end;

When SaveToCSV() is called with unicode = true, the unicode chars in cell 1,1 are correctly loaded, otherwise not, which imho is the expected result.

You are correct when you say that the LoadFromCSVStream function works correctly with and without Unicode.  The only failure occurs when the Unicode data in the stream does not begin with a BOM.  If you construct such a stream by program (not by loading it) you will find that only the first byte is loaded, because the stream is terminated at the next zero byte.  This was my initial issue.


My point is that Unicode streams in Delphi in general are not required to start with a BOM. More usually the functions to which one passes such a stream are equipped with an encoding parameter.  And stringlist 'loadfromfile functions normally strip the BOM: for example TStringlist.LoadfromFile does not pass the BOM into the first list item.  

The fact that your internal stringlist loadfromfile function retains the BOM is unusual, but that's fine because it is internal.  But the fact that it does so means that the LoadFromCSVStream function gets the BOM so works correctly.  Now that I know this I am able to use these functions with no issues.  My only concern is that the behaviour is somewhat un-Delphi-like and at the same time also undocumented.  Nowhere do you explain how the LoadFromCSVStream is able to accept streams with different encodings: the answer is that it requires a BOM at the start of the stream to do this.

In the absence of this knowledge I had to go to the source to work out how it worked.  And I found the internal function names and guessed, wrongly, that the delphi-like names meant that the functions operated like similarly named Delphi functions.  I was wrong, and as a result wasted a bit of time because the documentation was incomplete.  Now that I understand it, my remaining goal was to save someone else from wasting their time in the same way; hopefully it will show up in a search here, even if the documentation is not clarified.

I'm sorry but I still can't reproduce this.

Starting from an ANSI CSV file without BOM and the code:

var
  ms: TMemoryStream;
begin
  ms := TMemoryStream.Create;
  ms.LoadFromFile('c:\tmssoftware\cars.csv');
  ms.Position := 0;
  advstringgrid1.LoadFromCSVStream(ms);
  ms.Free;
end;

this still loads as expected the CSV file correct in the grid.
Thanks for your answer; but how the function behaves with a stream containing ANSI text is irrelevant to this issue.  There is no BOM involved at all, nor is one needed, because you have system default encoding in the stream.  Please try this test instead:

procedure TForm1.Button1Click(Sender: TObject);
var
  s: Unicodestring;
  ms: TMemoryStream;
begin
  advstringgrid1.clear;
  s := chr($feff) + 'ABC,DEF,GHI';
  ms := Tmemorystream.Create;
  ms.write(s[1], 24);  // write BOM plus Unicode chars
  ms.Position := 0;
  advstringgrid1.Delimiter := ',';
  advstringgrid1.LoadFromCSVStream(ms);  // shows ABC DEF GHI
  ms.Free;
end;

procedure TForm1.Button2Click(Sender: TObject);
var
  s: Unicodestring;
  ms: TMemoryStream;
begin
  advstringgrid1.clear;
  s := 'ABC,DEF,GHI';
  ms := Tmemorystream.Create;
  ms.write(s[1], 22);  // write valid Unicode chars to stream
  ms.Position := 0;
  advstringgrid1.Delimiter := ',';
  advstringgrid1.LoadFromCSVStream(ms);  // shows  A
  ms.Free;
end;

I really don't know how else to get my point across to you.  Hopefully the code above will demonstrate that in order to work correctly with a Unicode stream content, LoadFromCSVStream REQUIRES a BOM at the start.  Without it only a single letter is loaded.  The existing Unicode version of LoadFromCSV works PERFECTLY, because it silently preserves any existing BOM in the file.  I have NO COMPLAINT about the operation of the functions.  I have found NO BUGS.

The ONLY request I have is that you DOCUMENT how the LoadFromCSVStream is able to accept Unicode streams without an explicit parameter to tell it what the encoding is.  The documentation should explain that a BOM is REQUIRED if the stream encoding is Unicode.  The fact that this is not stated forced me to investigate how it worked in the source, and this led me astray (through my own initial error).

This "problem" is actually due to the way this is handled in the VCL.
You will see the same behavior when you do the following:


var
  s: unicodestring;
  ms: TMemoryStream;
  sl: TStringList;

begin
  s := 'ABC,DEF,GHI';
  ms := Tmemorystream.Create;
  ms.write(s[1], 22);  // write valid Unicode chars to stream
  ms.Position := 0;

  sl := TStringList.Create;
  sl.LoadFromStream(ms);
  showmessage(sl.Strings[0]);
  sl.Free;
end;

This will output A instead of ABC,DEF,GHI.


Bruno Fierens2014-03-12 06:21:14

I give up. You will be pleased to know that this is my last posting to this thread.  One last time, there is no 'problem'.  I fully understand how streams work with content in different encodings.  But the TMS function is really clever and magically handles streams of different types without any obvious parameter to specify the content.  Unlike VCL functions which have such a parameter, it simply takes any stream, Unicode or ANSI, and works perfectly in both cases.  How can this be, since the streams have completely different content?  Nobody knows, because it is not documented!  So before using the function, your users have to either guess, or experiment and find only one byte of data is read, or eventually work out that a BOM is essential for passing in Unicode stream content.  All I am suggesting, to benefit your other users, is a dozen words of explanation, even only in the source code, of how this function was designed to work and how smart it is.  It is by no means obvious, even to someone with long experience of the VCL and TMS products.  Would this not be a worthwhile product improvement?

Sorry, I thought all the time you had a problem with this and insisted this was a bug.

We'll make a note about this.