I'm in the process of completing an all new program for use with Microsoft Outlook 2002 and Outlook
2000 called WOPR Junk Mail Remover. As it's name implies, the program is an enhanced junk mail filtering program that is
specifically for use with Microsoft Outlook 2002/2000.
The program uses Bayesian statistics incorporated into some
fairly simple algorithms to generate what is called a "Bayesian" or "Content-Based" e-mail filter. As new e-mail arrives in
your Inbox, it is broken down into tokens (i.e. all of its individual words), including any HTML tags, RTF tags, embedded
Java Script, and Internet Headers, and then all of those tokens/words are compared statistically against a corpus/dictionary
of known "good" (i.e. "Non-Spam") words and known "bad" (i.e. "Spam") words. The combined probability of the most interesting
"good" and "bad" words in the e-mail message is then used to determine if the newly arrived e-mail is junk mail or not. The
filtering process works so well that it's almost scary!
Since the Bayesian Filter is customized to each user's
individual e-mail habits, it must be trained (or taught) to know what e-mails are Non-Spam and which are Spam on each user's
system. Thus, when you first install the program the Bayesian Filter will need to be trained by selecting a "Mark Message as
Junk Mail" or "Mark Message as Non-Junk Mail" toolbar button when new e-mail arrives. After you get about five or six
messages trained/marked, the program will start to take off and it will begin filtering your newly arrived mail. However, the
more you train the filter, the smarter it gets. I suggest a minimum of 50 Spam and Non-Spam trainings to start with, but even
more than that would be even better (I currently have around 300 of each trained and sitting in my corpus/dictionary).
To backup the Bayesian Filter, there is a complete set of secondary filters that can be used to filter an incoming
e-mail message based on all of the normal items (actually the program was originally written around these filters and I only
recently added in the Bayesian filter. However, since the Bayesian Filter worked so well it quickly became the focal point of
the program). For example, you can define custom filters (using the included "Create New Filter Wizard" tool) to filter
incoming message's based on the sender's user name, the sender's domain name, the message's subject content, the message's
text/body content, the message's Internet Header fields, the message's country of origin, etc. All of the standard (i.e.
"built-in") and "user-defined" filters are managed via the programs Options dialog box (which is accessed via a button on the
program's main toolbar).
The program also allows you to define a white list or a "Friends" list where anyone on
that list automatically bypasses all of the filters so you are assured of getting their e-mail without it getting trapped by
one of the filters (this also has the advantage of speeding up the program since mail from friends doesn't have to be
filtered). The program alerts you when new e-mail arrives and keeps a running total of the numbers of mails that have arrived
(i.e. accepted e-mail, e-mail from friends, possible junk mail, and confirmed junk mail). The e-mail is flagged in Outlook
(using standard red and white Outlook flags) with the reason why it was filtered and it's mail status icon is changed
accordingly so that you can easily identify filtered mail. Confirmed junk mail is automatically moved into a "Junk Mail"
folder and the folder can be setup to automatically purge its contents after a set number of days have past since the message
was first received/filtered (and the Junk Mail folder's location can be changed as well).
There's event a neat
little "Message Details" tool that lets you view the plain text content of the message along with all of its Internet
Headers. The tool even has an option for showing you a "Word Analysis" of the message where the individual tokens/words in
the message are colorized based on their probability of being Spam or not. That way you can get a bird's eye view of how the
Bayesian filter sees the message. I personally hate opening Spam because 99% of it is written in HTML format and Outlook
always tries to access the Internet in order to download the graphics for the HTML messages. This little tool stops that from
happening as all you see is the plain text of the message with all of the HTML stripped out (which is great for looking at
those messages that you don't really know if they are Spam or not).
The program has another useful little tool
that sends an "Unknown User" error message back to the originator of the junk mail message informing the Spammer that the
e-mail address they have sent their Spam to is invalid (even though it really isn't). That way, the Spammer will think that
your e-mail address is invalid and will hopefully remove you from their list. A standard error message is provided, but it
can be fully customized, and the default action of any filter can be setup to automatically send the error message to the
sender or you can do it manually form the program's main toolbar at any time.
If all of this sounds like a lot, it
is... The program is quite a piece of work in my opinion and I wouldn't want to be without it. It's very addicting...
Anyway, as I've already mentioned, the program will run under Windows 98 and Outlook 2000, but it really shines when
you install it on a system running Windows Me, Windows 2000, or Windows XP (Home Edition or Pro) and Outlook 2002. The reason
for this is that the program does some pretty fancy API work to display information messages via the Windows System Tray and
that feature only works on systems with version 5.0 or greater of the Windows Shell Library (i.e. any of Microsoft's O.S.'s
greater than Windows 98 and NT 4.0). It also looks a lot better if you are running more than 256 colors for your screen
display (although it has been tested and will run just fine on 16 color displays and Shell versions less than 5.0. In those
cases, the info messages are displayed via the Office Assistant instead.).
The program includes an Installer and
an Uninstall so installation should be a total breeze (just follow the on-screen prompts). After you've installed the
program, and you run Outlook for the first time, the program will build it's backend database and then prompt you to create
an account (since there's no programmatic way of getting the account info from Outlook itself). The account info is simply
used by the program to determine your e-mail addresses (for filtering purposes) and to send the "Unknown User" error messages
that I told you about above. The "Unknown User" Error Message feature requires that you have a POP3 or IMAP account in order
to deliver the error message anonymously (again, since Outlook itself doesn't allow such things). Once you've set up your
accounts, the program will prompt you to import your address books into your "Friends" list. After that, you are ready to
begin training the program (which is done by selecting one or more messages and clicking on the "Mark Message as Junk Mail"
or "Mark Message as Non-Junk Mail" buttons on the programs main toolbar.) Once you've trained it enough, just sit back and
watch it filter all of that junk mail. It's absolutely amazing!
I've tried real hard to keep the filtering process
as fast as possible, but as you might expect, filtering a large number of messages can take some time (since each incoming
message must be broken down into its individual words and then compared against the "good" and "bad" word dictionaries,
etc.). Thus, it works best if you let Outlook continually grab your mail every so often.
Anyway, after many months
of hard work I think that I'm finally at a stage where I could use some other folks looking at the new "WOPR Junk Mail
Remover" program for me. While it still has some minor flaws and still needs a few more loose ends tied up, all in all, I
feel that it is working quite spectacularly. In the last week alone, it has removed over 1,000 pieces of junk mail from my
Inbox with less than 1% false positives (and the only reason I think I'm getting the false positives is because I don't
receive enough "good" e-mail to train the filter as well as I can with all of the "bad" e-mail I get. Once I can get an even
balance of "good" and "bad" e-mail trained, I think it will be near flawless in its filtering).
Thus, I'm looking
for a very limited number of beta testers who would be interested in helping me test out the new program and get it ready for
final release. I'm only looking for around 10 or so serious testers who would have the time available to test the program
over the next two weeks and would be willing to provide me with some useful feedback (I'm not only interested in finding
bugs, but I'd like to know how you feel about the user interface, if there is anything I can do to make the program easier to
use/understand, etc.). If you are interested in testing the program, please drop me a line at firstname.lastname@example.org letting me know
your current operating system (including any installed service packs), your current version of Office/Outlook (including any
installed service releases), and what mode you are running Outlook in if you are running Outlook 2000 (i.e. Internet Mail
Only (IMO) mode or Corporate or Workgroup (C/W) mode). Please remember that I can't except everyone as the available testing
slots are very limited.
I'm looking for one or two of the following testers:
1. Someone running Outlook
2002/2000 on Microsoft Exchange Server (under any O.S.).
2. Someone running Outlook 2000 in Corporate or Workgroup
mode (under any O.S.).
3. Someone running Outlook 2000/2002 under Windows 98.
4. Someone running
Outlook 2000/2002 under Windows 98 SE.
5. Someone running Outlook 2000/2002 under Windows XP Home Edition
6. Someone running Outlook 2000/2002 under Windows XP Home Edition SP1
7. Someone running Outlook
2000/2002 under Windows XP Pro
8. Someone running Outlook 2000/2002 under Windows XP Pro SP1
Someone running Outlook 2002 SP2 (under any O.S.).
10. Someone running Outlook 2000 with WordMail enabled (under
11. Someone running Outlook 2002/2000 with a dial-up connection (under any O.S.)
also looking for feedback on how the program looks at a variety of screen colors and screen resolutions, so if you don't mind
flipping your system between 16, 256, high color, and true color, or between 640x480, 800x600, and/or 1024x768 or higher
screen resolutions, then you'd make a good testing candidate as well.
Thanks for your interest in the new program
and I look forward to hearing your comments...