RTC Forums
March 28, 2024, 08:33:53 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   Home   Help Login Register  
Pages: [1]
  Print  
Author Topic: UTF8 encoded JSON response from host  (Read 2973 times)
JeffW
RTC License+
****
Posts: 7


« on: June 23, 2021, 12:21:32 AM »

We have  implemented RtcHttpClient to communicate with a web url. In the main this works ok, but falls over a little when handling accented characters. eg é is returned as  C3 83 c2 a9 (Ãfé) which indicates a problem with UTF8 encoding/decoding. I sent the payload using Talend API tester, and it returns  é. We are using
Code:
procedure TMonerisCloudCommunications.RtcDataRequestDataReceived(Sender: TRtcConnection);
var
  DataClient: TRtcDataClient absolute Sender;

  strReceived: ansiString;
  utf8Received: String;
begin
  if DataClient.Response.Started then
  begin
    // This occurs until data from the request is fully received
  end;

  utf8Received := DataClient.Read;
  strReceived := Utf8Decode(utf8Received);

as per forum entry with subject UniCode. This adds extra characters rather than interpret what is there. The problem seems to be that the incoming json is not treated as UTF8 format. The Talend Api Tester shows content type of application/json; charset=utf-8. Is there a way to get around this and preserve the format as UTC8? I also experimented with UNICODE compiler directives with no success.
Logged
D.Tkalcec (RTC)
Administrator
*****
Posts: 1881


« Reply #1 on: June 23, 2021, 07:31:46 AM »

I see that you are using a Unicode String to read the content you have received, while at the same time using an AnsiString to decode that content into Unicode Text. Please, keep in mind that all the data sent and received over TCP/IP and HTTP is always going to be 8-bit (like AnsiString), but Unicode characters are 16- or 32-bit and require a Unicode String or a Wide String type in Delphi to be stored correctly in memory (not encoded -> after decoding, before encoding).

I other words, try this ...

procedure TMonerisCloudCommunications.RtcDataRequestDataReceived(Sender: TRtcConnection);
var
  DataClient: TRtcDataClient absolute Sender;

  dataReceived: AnsiString; // Data received over HTTP (always 8-bit), can be UTF-8 encoded text
  textReceived: UnicodeString; // Unicode String, required to store Unicode Text (after decoding it from UTF-8)

begin
  if DataClient.Response.Done then
     begin
     dataReceived := DataClient.Read;   // receive 8-bit data, in this case a UTF-8 encoded String
     textReceived := Utf8Decode(dataReceived); // decode UTF-8 encoded data to a native Unicode String
     end;


Best Regards,
Danijel Tkalcec
Logged
JeffW
RTC License+
****
Posts: 7


« Reply #2 on: June 25, 2021, 02:50:31 AM »

Actually, that was a transposition of the variable types, sorry. I was doing it the way you showed, with no success. I'm unclear as to the actual datatype is involved with DataClient.Read, which is defined as different datatypes depending on the compiler directive(s). RTCString is always RTCSTring, but how it is defined in rtcTypes varies, depending on the compiler directives. I can't set a breakpoint there to see the actual definition, to verify the way it changes according to the compiler directives. I am assuming it provides an ansistring.

I also used UTF8toString, as utf8Decode/Encode are deprecated. I got different results with this. UTF8ToString does show é from the string received, but does not decode É correctly. Using

var
  DataClient: TRtcDataClient absolute Sender;

  strReceived: UnicodeString;
  utf8Received: ansistring;
begin
  if DataClient.Response.Done then
  begin
    // This is the final part of the request when it is done
    utf8Received := DataClient.Read;
    strReceived := utf8toString(utf8Received);


'Réponse' in the ansistring becomes (correctly) 'Réponse'
but
'RELEVÃ?' becomes 'RELEV�?' when it should be 'RELEVÉ'

and using utf8Decode produces

'Réponse' is not part of the string as it is cut short at the point the first É appears
'RELEVÃ?' becomes 'RELEV' and the string terminates at that point.

Seems to be a decoding issue as such. I also tried rawbytestring instead of ansistring with no change.
Logged
D.Tkalcec (RTC)
Administrator
*****
Posts: 1881


« Reply #3 on: June 25, 2021, 09:02:14 AM »

Okay, then try this ...

uses // ...
   rtcTypes,rtcSystem; // make sure these two units are at the end of your "uses" clause
// ...
var
  DataClient: TRtcDataClient absolute Sender;
  dataReceived: RtcByteArray;   // raw byte array
  textReceived: RtcWideString;  // closest match to a Unicode String type (=String in Delphi 2009+)
begin
  if DataClient.Response.Done then
  begin
    // This is the final part of the request when it is done
    dataReceived := DataClient.ReadEx;
    textReceived := Utf8DecodeEx(dataReceived);

ReadEx returns a byte array and Utf8DecodeEx works on that byte array, so the content you receive should be preserved, regardless of the compiler you use. Let me know if this gives you the correct result.

Best Regards,
Danijel Tkalcec
Logged
JeffW
RTC License+
****
Posts: 7


« Reply #4 on: June 26, 2021, 02:18:56 AM »

Thanks Danijel. That works absolutely fine. Appreciated
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.025 seconds with 16 queries.