Search This Blog

Thursday 23 October 2008

Undocumented - WatchFTP stops after 10 fails

Please note: the current WatchFTP release (2.2.6, released February 14, 2010) has a new window where you can change the settings mentioned in this blogpost.




FTP Server connections are unreliable

Connections to any internet based services, including FTP servers, are quite unreliable if a program expects to "look at it" 24 hrs a day.

Your PC may be down, your internet connection may be down, the DNS server that translates "ftp.helloworld.com" to "208.xx.16.38" may be down, the FTP server software itself may have crashed (or is overloaded).......

So you see, a program such as our WatchFTP that intends to monitor an FTP server 24/7 needs to take care of this problem. It must not just "crash" when a connection or download fails, or should it?

Why the program *SHOULD* crash

I've listed a few reasons above why a connection to the FTP server may fail. I am quite sure there are a few hundred others, most notably: you entered the wrong userid/password for the FTP server. In such cases (wrong password), do you want the program to keep running? Trying again and again to login with the wrong password? Giving you a nice green icon, suggesting everything is OK - no need to look in the logs???
Surely Not! You only want to see the green icon if everything is OK.
Green Icon = No Action Required from you
Red Icon = Something is wrong - You must investigate!

When the program should *NOT* crash

When connection failures are temporary, you want WatchFTP to retry. Most connection problems (not, for example, password problems) will solve themselves after a few seconds and work can continue.

How WatchFTP solves this for "new tasks"

When you create a new task, or change an existing task, the WatchFTP user interface (wfcc.exe) writes a "bit" to the configuration file, telling the runtime component (wfrun.exe) this is an "untrusted" task. A task that has never run successfully before.

The very first run of a "new task"

When you start the new task, and wfrun.exe encounters an internet error, it will look at that "bit". If the bit is set (this is a new/untrusted task), wfrun will write the error to the log and exit. 99.9% of the time, the error will be caused by an invalid setting - you have entered a wrong password?.

If the first run is successful, wfrun will reset the "untrusted" bit, so the next time it connects to your FTP server, wfrun will behave as in the next paragraph.

Next Runs - The task already exists

This task has proved itself. A previous run has connected to your FTP server and downloaded some files. If this task encounters an error it should ignore any error and retry.

Well, almost. Maybe the administrator of the FTP server has changed the permissions (password)? Should WatchFTP try to connect again and again?

Undocumented Settings

(note that these settings are currently not available in the User Interface of WatchFTP. Future versions may change this).

What will happen on connection failures (for existing tasks, not for new tasks), is that wfrun will retry several times (with a delay in-between) to connect to the FTP server and download new/changed files.

The number of times to retry is a setting inside the "config file" (contains all settings of this task - see below). If it is not explicitly set (and remember, the user interface doesn't have an option to set it), it defaults to 10 retries. If it fails 10 times, the task will stop.

Between each retry, the task will pause a while (to give the problem time to resolve itself). This pause is also read from the "config file" and, if not present, defaults to 60 seconds.

The above defaults give your internet/FTP server 10 minutes (10 retries * 60 seconds) to resolve its problems.

Changing the Defaults

Like I said earlier, currently these setting can not be changed using the User Interface of WatchFTP. You need to change the file containing the settings directly. Below, I describe how to change both settings for a task called MyTask.

  • Stop the task.
  • Right-click it - a popup menu opens.
  • In the popup menu, select Explore Task Directory
    (actually, the menu-entry below it showing MyTask as the last part of the path) - Windows Explorer opens inside the directory with log-files of MyTask.
  • This is one directory to deep, select the containing ("higher") directory in explorer
  • In this directory, you will find a file called MyTask.config. This is the file that contains all settings for the MyFile task.
  • Open this file with notepad (not with a rich-text editor like MS-Word)

In notepad, find the line that says

[FTP]

Immediately after (on a new line) it enter those 2 new lines:

maxfailures=123
sleepafterfailure=10

so it becomes

[FTP]
maxfailures=123
sleepafterfailure=10
... the rest that was here before ...

(If you read this in a blog-reader like Google-reader, the formatting above may be wrong, please see the original post)

The above settings will retry to connect a lot more (123 times, instead of 10 times) and will wait 10 seconds between each retry (instead of the default 60 seconds). This will allow ~20 minutes downtime for your FTP server (123 times * 10 seconds = ~ 20 minutes)

So, if your FTP server if often not reachable for short periods, you will want to set sleepafterfailure to a low value (retry with short intervals).

If your FTP server is sometimes not reachable for long periods, you will want to set sleepafterfailure to a high value (retry with long intervals).

The maximum value for both settings is ~30000. 30,000 retries, 30,000 seconds. Unless I miscalculate, that will give your FTP server ~28 years to recover ;-)

No comments: