Need some help debugging a strange condition

Andrew.W

RTC License

Posts: 43

Need some help debugging a strange condition

« on: October 25, 2010, 08:35:13 PM »

Hi Daniel

I'm trying to debug a strange and infrequent problem with our server application. I don't think this is a bug in RTC, but I'm trying to find out what could cause the conditions I see.

Our application has a 'console' that is very simple, served as a web page via an RTC web server - all this web server does is serve simple console HTML. The application has 2 other RTC web servers handling user requests.

The problem I see if that the 'console' web pages stop returning HTML. If you try to browse, you get an empty page returned, or so it seems.

This is the code that sometimes returns a blank. What conditions might be arising to cause this? There is no proxy involved. Today I saw this whilst using a browser on the same machine as my server (ie, localhost).

Code:

procedure TConsoleIndex.handleCheckRequest(asoSender: TRtcConnection);
begin
	with asoSender as TRtcDataServer do
		if lowercase(Request.FileName)=WEBPAGE_INDEX then
			Accept;
end;

procedure TConsoleIndex.handleDataReceived(asoSender: TRtcConnection);
var
	lsPage: string;
begin
	with TRtcDataServer(asoSender) do
	if Request.Complete then
	begin
		lsPage := '<p><strong>' + loadString(1028) +  '</p></strong>' + #$D#$A +

		'<p>' + makeHRef(WEBPAGE_MESSAGELOG, loadString(1029)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_STATISTICS, loadString(1030)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_QOS, loadString(1108)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_APPSERVICES, loadString(1031)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_ACCOUNTS, loadString(1083)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_CONNECTIONS, loadString(1045)) + '</p>' + #$D#$A +
		'<p>' + makeHRef(WEBPAGE_ENGAGESESSIONS, loadString(1185)) + '</p>' + #$D#$A +
          '<p>' + makeHRef(WEBPAGE_DRSTATUS, loadString(1199)) + '</p>';

		Write(pageStart() + #$D#$A +
			lsPage + pageEnd());
	end;



end;


procedure TConsoleIndex.startup;
begin
	osoModule.osoIndex.OnCheckRequest := handleCheckRequest;
	osoModule.osoIndex.OnDataReceived := handleDataReceived;
end;


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #1 on: October 25, 2010, 08:47:55 PM »

You aren't catching exceptions in your "handleDataReceived" method (I guess it's the "OnDataReceived" event). If an exception happens there, the Server will close the connection and the Browser will get to see an empty page.

To find out what is causing these exceptions, place your code inside a try/except block and write the exception class and message out with Write(). Here is an example how I would write the "handleDataReceived" method ...

Code:

procedure TConsoleIndex.handleDataReceived(asoSender: TRtcConnection);
  var
    lsPage: string;
  begin
  with TRtcDataServer(asoSender) do
    if Request.Complete then
      begin
      try
        lsPage := '<p><strong>' + loadString(1028) +  '</p></strong>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_MESSAGELOG, loadString(1029)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_STATISTICS, loadString(1030)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_QOS, loadString(1108)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_APPSERVICES, loadString(1031)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_ACCOUNTS, loadString(1083)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_CONNECTIONS, loadString(1045)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_ENGAGESESSIONS, loadString(1185)) + '</p>' + #$D#$A +
            '<p>' + makeHRef(WEBPAGE_DRSTATUS, loadString(1199)) + '</p>';
        Write(pageStart() + #$D#$A + lsPage + pageEnd());
      except
        on E:Exception do
          Write('<html><body>'+E.ClassName+': '+E.Message'</body></html>');
        end;
      end;
  end;

This won't fix the exceptions, but it should help you narrow down possible problem sources. You can also use the "Log()" or "xLog()" method from the "rtcLog.pas" unit to write the exception into a LOG file (stored in the "LOG" sub-folder inside the folder where the Server EXE lies). This will help you notice exceptions raised when someone else is using the Server ...

Code:

uses ...
   ,rtcLog;
...
      except
        on E:Exception do
          begin
          Log('handleDataReceived', E);
          Write('<html><body>'+E.ClassName+': '+E.Message'</body></html>');
          end;
        end;

Best Regards,
Danijel Tkalcec


	Logged

Andrew.W

RTC License

Posts: 43

Re: Need some help debugging a strange condition

« Reply #2 on: October 28, 2010, 03:18:12 PM »

Thanks for the reply, I'm making changes to the app.

I have a follow-up question.

Suppose one of my OnDataReceived handlers blocks, tying up an RTC thread, and clients keep calling, typing up more threads.

What will happen? Is there some event I can connect to that I will know if, for example, RTC has run out of threads of other resources?

BTW, I am trying very hard to make sure nothing blocks, but am looking for ways of understanding what is happening.


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #3 on: October 28, 2010, 05:45:34 PM »

If your own code running inside an event called by the RTC SDK takes a long time to execute, it will block the RTC thread. If new clients come to the Servert and the RTC thread pool has free threads, everything will be fine and you don't have to do anything special. But if you get more clients executing long-running code inside RTC events and blocking threads, all threads from the thread pool could become busy and if that happens, no new client connections will be accepted by the RTC SDK until at least one thread becomes idle.

You can set global variables in the rtcThrPool unit to your wanted thread limit values, but make sure to keep these values low because threads are shared among all processes and a single app using up all threads could get Windows to crash or other apps stop responding. Windows NT thread limit is somewhere around 2000, but your app should NOT try to use more than 1500 threads - even if it is the only app running on the PC.

Best Regards,
Danijel Tkalcec


	Logged

Andrew.W

RTC License

Posts: 43

Re: Need some help debugging a strange condition

« Reply #4 on: November 15, 2010, 08:30:57 PM »

I'm still no nearer debugging this. I have spent a long time creating a logging service, and have put exception handling into all my RTC data functions.

Daniel, can you help me with the following: what can I monitor in RTC?

I'm interested in the size of the thread pool so I can write it to file, and any other key events I can write to file so I can understand what is going on.


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #5 on: November 15, 2010, 09:02:15 PM »

If you declare the RTC_DEBUG compiler directive for your project and rebuild it, RTC SDK will be writing logs in case anything unexpected happens in the RTC SDK. You will still need to log your own exceptions (as I've explained in my other post).

Best Regards,
Danijel Tkalcec


	Logged

Andrew.W

RTC License

Posts: 43

Re: Need some help debugging a strange condition

« Reply #6 on: November 15, 2010, 09:09:56 PM »

Thanks for the speedy reply, but I know I won't get permission to do that as it's too big a change. It's something that will be applied to a live system.

We have a logging system, so what I'm hoping is there there is some way I can monitor things using the existing logging system. For instance, how many threads are in use.


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #7 on: November 15, 2010, 09:41:16 PM »

Even if you would have access to the number of active threads (which you don't), logging the number of active threads in real-time by writing only changes would result in a really HUGE log file.

If you think the problem is caused by the RTC SDK, you should either declare the RTC_DEBUG compiler directive or set one or all of the following global variables to true ...

rtcCliModule unit:
- LOG_CLIENTMODULE_ERRORS

rtcConn unit:
- LOG_TIMEOUT_DISCONNECTS

rtcInfo unit:
- LOG_INFO_ERRORS

rtcLog unit:
- LOG_AV_ERRORS

rtcSockBase unit:
- LOG_SOCKET_ERRORS
- LOG_MESSAGE_ERRORS
- LOG_EVENT_ERRORS
- LOG_REFUSED_CONNECTIONS
- LOG_PLUGIN_ERRORS

rtcSockBaseSrvProv unit:
- LOG_REFUSED_CONNECTIONS

rtcThrPool unit:
- LOG_THREAD_EXCEPTIONS

rtcTimer unit:
- LOG_TIMER_EXCEPTIONS

rtcWInetHttpCliProv unit:
- LOG_WINET_ERRORS

rtcWinHttpCliProv unit:
- LOG_WINHTTP_ERRORS

And ... if you think the problem is somewhere in your code, you should use try/except blocks around any code which you think might be causing problems and log all exceptions you catch there.

Best Regards,
Danijel Tkalcec


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #8 on: November 15, 2010, 09:49:42 PM »

Btw ... did you try setting MultiThreaded to FALSE (on the TRtcHttpServer component on the Server and TRtcHttpClient on the CLient) to see if your problem persists? If your problems vanish after you set MultiThreaded to FALSE, chances are high that your code (methods called from RTC events) isn't thread-safe.

I don't know what you are doing, but if you are accessing global objects. For example, if you are using a single Data Module for all database access, or if you are calling code which is reading and writing to and from global variables will result in serious problems.

Best Regards,
Danijel Tkalcec


	Logged

Andrew.W

RTC License

Posts: 43

Re: Need some help debugging a strange condition

« Reply #9 on: November 17, 2010, 05:52:06 PM »

Hi Daniel

This is the best summary of the situation I can give:

- System is live and for the most part working well over many days. I cannot make significant changes.
- Reliability is good. Typically 1,000,000 transactions per week.
- The system has 3 RTC web servers. A main web server, a console used for monitoring, and an 'internal' web server used for requests made on the server machine itself.
- The problem is infrequent, but does happen. For example, it can be 3 weeks between such episodes.
- For no reason that I can see, the console web server can just stop working. By working, I mean it does not process any requests. I have put exception handling around every piece of code, and this is linked to a disk logging system. However, no exceptions are raised. RTC just says 'no'. Even a single page that says 'hello world' is not returned.
- No proxy issues. If you log onto the server and access directly via the web server port you get the same thing.

What I need is information - something to tell my what RTC is doing, and what has happened to the server so that I can make adjustments. I cannot try different things on a live system, I need information.


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #10 on: November 17, 2010, 06:12:05 PM »

Does a RTC Server continue receiving connections and calls the "OnCheckReuest" event for each request but refuses to respond to them, or does it stop receiving connections and/or requests?

Add logging to OnConnect, OnClientConnect, OnDisconnect and OnClientDisconnect events and write out Sender.PeerAddr, Sender.PeerPort as well as Sender.TotalConnectionCount properties there. That will tell you how many connections are currently in use and will show you if clients are able to connect or not.

I also strongly recommend you to enable the RTC_DEBUG compiler directive to catch possible exceptions inside the RTC SDK. If you are completely clueless about what is happening, enabling RTC debugging could shed some light on the problem. If your system is stable, RTC should not generate a large LOG file.

Best Regards,
Danijel Tkalcec


	Logged

D.Tkalcec (RTC)

Administrator

Posts: 1881

Re: Need some help debugging a strange condition

« Reply #11 on: November 20, 2010, 01:58:10 AM »

Another thing ... did I understand you right that you are having problems with the console Server and that you are using that Server for monitoring the status of other Servers? If that is correct, are you doing this by sending Client requests from the console Server (maybe to your other Servers or to another PC)?

If you are using Client connections from a Server (console or any other), you need to be careful to keep your Client connection count low. Use permanent direct connections to the Server(s) you are working with. In other words, avoid opening and closing Client connections frequently. Also avoid forcing a Server-side connection close wherever possible (for example by using very long Timeout values on the Server).

Should you open and close Client connections to other Servers very often, or force-close connections received on the Server too often, you can get into a situation where WinSock will get out of port numbers and your Server as well as Client connections will simply stop working. This condition will usually fix itself within 2 hours (once WinSock decides to free up unused ports), but it will result in your Server app and any other app using WinSock on the same PC to stop accepting new incoming connections and be unable to open new outgoing connections during these 2 hours.

If you are using Client connections inside a Server app, do NOT create Client connections dynamically for every request you make, but instead create a fixed pool of connections to the Server(s) you need to talk to, use the AutoConnect:=True property on the rtcHttpClient component and high Timeout values (on the Client as well as Server) to keep these connections open for as long as possible.

If you aren't using Client connections inside your Server, then the problem could be the Server force-closing idle connections very often (connections are idle longer than your Timeout periods) or your Clients are opening and closing connections too often (maybe even for each request), in which case you should change your Clients to use AutoConnect:=True (do NOT call Connect explicitly) and set relatively high Timeout values on the Client and the Server (at least 120 seconds).

Best Regards,
Danijel Tkalcec


	Logged

Pages: [1]

« previous next »