7ff2d050f3
which will merge into the main branch fairly easily
546 lines
16 KiB
Perl
546 lines
16 KiB
Perl
#!perl -w
|
|
|
|
=head1 NAME
|
|
|
|
dspam - dspam integration for qpsmtpd
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
Uses dspam to classify messages. Use B<spamassassin>, B<karma>, and B<naughty>
|
|
to train dspam.
|
|
|
|
Adds the X-DSPAM-Result and X-DSPAM-Signature headers to messages. The latter is essential for
|
|
training dspam and the former is useful to MDAs, MUAs, and humans.
|
|
|
|
Adds a transaction note to the qpsmtpd transaction. The note is a hashref
|
|
with at least the 'class' field (Spam,Innocent,Whitelisted). It will normally
|
|
contain a probability and confidence rating.
|
|
|
|
=head1 TRAINING DSPAM
|
|
|
|
Do not just enable dspam! Its false positive rate when untrained is high. The
|
|
good news is; dspam learns very, very fast.
|
|
|
|
To get dspam into a useful state, it must be trained. The best method way to
|
|
train dspam is to feed it two large equal sized corpuses of spam and ham from
|
|
your mail server. The dspam authors suggest avoiding public corpuses. I train
|
|
dspam as follows:
|
|
|
|
=over 4
|
|
|
|
=item learn from SpamAssassin
|
|
|
|
See the SPAMASSASSIN section.
|
|
|
|
=item periodic training
|
|
|
|
I have a script that searches the contents of every users maildir. Any read
|
|
messages that have changed since the last processing run are learned as ham
|
|
or spam.
|
|
|
|
The ham message list consists of read messages in any folder not named like
|
|
Spam, Junk, Trash, or Deleted. This catches messages that users have read
|
|
and left in their inbox or filed away into subfolders.
|
|
|
|
=item on-the-fly training
|
|
|
|
The dovecot IMAP server has an antispam plugin that will train dspam when
|
|
messages are moved to/from the Spam folder.
|
|
|
|
=back
|
|
|
|
=head1 CONFIG
|
|
|
|
=head2 dspam_bin
|
|
|
|
The path to the dspam binary. If yours is installed somewhere other
|
|
than /usr/local/bin/dspam, set this.
|
|
|
|
=head2 autolearn [ naughty | karma | spamassassin | any ]
|
|
|
|
=over 4
|
|
|
|
=item naughty
|
|
|
|
learn naughty messages as spam (see plugins/naughty)
|
|
|
|
=item karma
|
|
|
|
learn messages with negative karma as spam (see plugins/karma)
|
|
|
|
=item spamassassin
|
|
|
|
learn from spamassassins messages with autolearn=(ham|spam)
|
|
|
|
=item any
|
|
|
|
all of the above, and any future tests too!
|
|
|
|
=back
|
|
|
|
=head2 reject
|
|
|
|
Set to a floating point value between 0 and 1.00 where 0 is no confidence
|
|
and 1.0 is 100% confidence.
|
|
|
|
If dspam's confidence is greater than or equal to this threshold, the
|
|
message will be rejected. The default is 1.00.
|
|
|
|
dspam reject .95
|
|
|
|
To only reject mail if dspam and spamassassin both think the message is spam,
|
|
set I<reject agree>.
|
|
|
|
=head2 reject_type
|
|
|
|
reject_type [ perm | temp | disconnect ]
|
|
|
|
By default, rejects are permanent (5xx). Set I<reject_type temp> to
|
|
defer mail instead of rejecting it.
|
|
|
|
Set I<reject_type disconnect> if you'd prefer to immediately disconnect
|
|
the connection when a spam is encountered. This prevents the remote server
|
|
from issuing a reset and attempting numerous times in a single connection.
|
|
|
|
=head1 dspam.conf
|
|
|
|
dspam must be configured and working properly. I had to modify the following
|
|
settings on my system:
|
|
|
|
=over 4
|
|
|
|
=item mysql storage
|
|
|
|
=item Trust smtpd
|
|
|
|
=item TrainingMode tum
|
|
|
|
=item Tokenizer osb
|
|
|
|
=item Preference "trainingMode=TOE"
|
|
|
|
=item Preference "spamAction=deliver"
|
|
|
|
=item Preference "signatureLocation=headers"
|
|
|
|
=item TrainPristine off
|
|
|
|
=item ParseToHeaders off
|
|
|
|
=back
|
|
|
|
Of those changes, the most important is the signature location. This plugin
|
|
only supports storing the signature in the headers. If you want to train dspam
|
|
after delivery (ie, users moving messages to/from spam folders), then the
|
|
dspam signature must be in the headers.
|
|
|
|
When using the dspam MySQL backend, use InnoDB tables. DSPAM training
|
|
is dramatically slowed by MyISAM table locks and dspam requires lots
|
|
of training. InnoDB has row level locking and updates are much faster.
|
|
|
|
=head1 DSPAM periodic maintenance
|
|
|
|
Install this cron job to clean up your DSPAM database.
|
|
|
|
http://dspam.git.sourceforge.net/git/gitweb.cgi?p=dspam/dspam;a=tree;f=contrib/dspam_maintenance;hb=HEAD
|
|
|
|
|
|
|
|
=head1 SPAMASSASSIN
|
|
|
|
DSPAM can be trained by SpamAssassin. This relationship between them requires
|
|
attention to several important details:
|
|
|
|
=over 4
|
|
|
|
=item 1
|
|
|
|
dspam must be listed B<after> spamassassin in the config/plugins file.
|
|
Because SA runs first, I set the SA reject_threshold up above 100 so that
|
|
all spam messages will be used to train dspam.
|
|
|
|
Once dspam is trained and errors are rare, I plan to run dspam first and
|
|
reduce the SA load.
|
|
|
|
=item 2
|
|
|
|
Autolearn must be enabled and configured in SpamAssassin. SA autolearn will
|
|
determine if a message is learned by dspam. The settings to pay careful
|
|
attention to in your SA local.cf file are I<bayes_auto_learn_threshold_spam>
|
|
and I<bayes_auto_learn_threshold_nonspam>. Make sure they are set to
|
|
conservative values that will yield no false positives.
|
|
|
|
If you are using I<autolearn spamassassin> and reject, messages that exceed
|
|
the SA threshholds will cause dspam to reject them. Again I say, make sure
|
|
the SA autolearn threshholds are set high enough to avoid false positives.
|
|
|
|
=back
|
|
|
|
=head1 MULTIPLE RECIPIENT BEHAVIOR
|
|
|
|
For messages with multiple recipients, the user that dspam is running as will
|
|
be the dspam username.
|
|
|
|
When messages have a single recipient, the recipient address is used as the
|
|
dspam username. For dspam to trust qpsmtpd with modifying the username, you
|
|
B<must> add the username that qpsmtpd is running to to the dspamd.conf file.
|
|
|
|
ie, (Trust smtpd).
|
|
|
|
=head1 CHANGES
|
|
|
|
2012-06 - Matt Simerson - added karma & naughty learning support
|
|
- worked around the DESTROY bug in dspam_process
|
|
|
|
=head1 AUTHOR
|
|
|
|
2012 - Matt Simerson
|
|
|
|
=cut
|
|
|
|
use strict;
|
|
use warnings;
|
|
|
|
use Qpsmtpd::Constants;
|
|
use Qpsmtpd::DSN;
|
|
use IO::Handle;
|
|
use Socket qw(:DEFAULT :crlf);
|
|
|
|
sub register {
|
|
my ($self, $qp) = shift, shift;
|
|
|
|
$self->log(LOGERROR, "Bad parameters for the dspam plugin") if @_ % 2;
|
|
|
|
$self->{_args} = { @_ };
|
|
$self->{_args}{reject} = 1 if ! defined $self->{_args}{reject};
|
|
$self->{_args}{reject_type} ||= 'perm';
|
|
|
|
$self->register_hook('data_post', 'data_post_handler');
|
|
}
|
|
|
|
sub data_post_handler {
|
|
my $self = shift;
|
|
my $transaction = shift || $self->qp->transaction;
|
|
|
|
$self->autolearn( $transaction );
|
|
return (DECLINED) if $self->is_immune();
|
|
|
|
if ( $transaction->data_size > 500_000 ) {
|
|
$self->log(LOGINFO, "skip, too big (" . $transaction->data_size . ")" );
|
|
return (DECLINED);
|
|
};
|
|
|
|
my $username = $self->select_username( $transaction );
|
|
my $filtercmd = $self->get_filter_cmd( $transaction, $username );
|
|
$self->log(LOGDEBUG, $filtercmd);
|
|
|
|
my $response = $self->dspam_process( $filtercmd, $transaction );
|
|
if ( ! $response ) {
|
|
$self->log(LOGWARN, "skip, no dspam response. Check logs for errors.");
|
|
return (DECLINED);
|
|
};
|
|
|
|
$self->attach_headers( $response, $transaction );
|
|
|
|
return $self->log_and_return( $transaction );
|
|
};
|
|
|
|
sub select_username {
|
|
my ($self, $transaction) = @_;
|
|
|
|
my $recipient_count = scalar $transaction->recipients;
|
|
$self->log(LOGDEBUG, "Message has $recipient_count recipients");
|
|
|
|
if ( $recipient_count > 1 ) {
|
|
$self->log(LOGINFO, "skipping user prefs, $recipient_count recipients detected.");
|
|
return getpwuid($>);
|
|
};
|
|
|
|
# use the recipients email address as username. This enables user prefs
|
|
my $username = ($transaction->recipients)[0]->address;
|
|
return lc($username);
|
|
};
|
|
|
|
sub assemble_message {
|
|
my ($self, $transaction) = @_;
|
|
|
|
$transaction->body_resetpos;
|
|
|
|
my $message = "X-Envelope-From: "
|
|
. $transaction->sender->format . "\n"
|
|
. $transaction->header->as_string . "\n\n";
|
|
|
|
while (my $line = $transaction->body_getline) { $message .= $line; };
|
|
|
|
$message = join(CRLF, split/\n/, $message);
|
|
return $message . CRLF;
|
|
};
|
|
|
|
sub dspam_process {
|
|
my ( $self, $filtercmd, $transaction ) = @_;
|
|
|
|
return $self->dspam_process_backticks( $filtercmd );
|
|
#return $self->dspam_process_open2( $filtercmd, $transaction );
|
|
|
|
# yucky. This method (which forks) exercises a bug in qpsmtpd. When the
|
|
# child exits, the Transaction::DESTROY method is called, which deletes
|
|
# the spooled file from disk. The contents of $self->qp->transaction
|
|
# needed to spool it again are also destroyed. Don't use this.
|
|
my $message = $self->assemble_message( $transaction );
|
|
my $in_fh;
|
|
if (! open($in_fh, '-|')) { # forks child for writing
|
|
open(my $out_fh, "|$filtercmd") or die "Can't run $filtercmd: $!\n";
|
|
print $out_fh $message;
|
|
close $out_fh;
|
|
exit(0);
|
|
};
|
|
my $response = <$in_fh>;
|
|
close $in_fh;
|
|
chomp $response;
|
|
$self->log(LOGDEBUG, $response);
|
|
return $response;
|
|
};
|
|
|
|
sub dspam_process_backticks {
|
|
my ( $self, $filtercmd ) = @_;
|
|
|
|
my $filename = $self->qp->transaction->body_filename;
|
|
#my $response = `cat $filename | $filtercmd`; chomp $response;
|
|
my $response = `$filtercmd < $filename`; chomp $response;
|
|
$self->log(LOGDEBUG, $response);
|
|
return $response;
|
|
};
|
|
|
|
sub dspam_process_open2 {
|
|
my ( $self, $filtercmd, $transaction ) = @_;
|
|
|
|
my $message = $self->assemble_message( $transaction );
|
|
|
|
# not sure why, but this is not as reliable as I'd like. What's a dspam
|
|
# error -5 mean anyway?
|
|
use FileHandle;
|
|
use IPC::Open2;
|
|
my ($dspam_in, $dspam_out);
|
|
my $pid = open2($dspam_out, $dspam_in, $filtercmd);
|
|
print $dspam_in $message;
|
|
close $dspam_in;
|
|
#my $response = join('', <$dspam_out>); # get full response
|
|
my $response = <$dspam_out>; # get first line only
|
|
waitpid $pid, 0;
|
|
chomp $response;
|
|
$self->log(LOGDEBUG, $response);
|
|
return $response;
|
|
};
|
|
|
|
sub log_and_return {
|
|
my $self = shift;
|
|
my $transaction = shift || $self->qp->transaction;
|
|
|
|
my $d = $self->get_dspam_results( $transaction ) or return DECLINED;
|
|
|
|
if ( ! $d->{class} ) {
|
|
$self->log(LOGWARN, "skip, no dspam class detected");
|
|
return DECLINED;
|
|
};
|
|
|
|
my $status = "$d->{class}, $d->{confidence} c.";
|
|
my $reject = $self->{_args}{reject} or do {
|
|
$self->log(LOGINFO, "skip, reject disabled ($status)");
|
|
return DECLINED;
|
|
};
|
|
|
|
if ( $reject eq 'agree' ) {
|
|
return $self->reject_agree( $transaction, $d );
|
|
};
|
|
|
|
if ( $d->{class} eq 'Innocent' ) {
|
|
$self->log(LOGINFO, "pass, $status");
|
|
return DECLINED;
|
|
};
|
|
if ( $self->qp->connection->relay_client ) {
|
|
$self->log(LOGINFO, "skip, allowing spam, user authenticated ($status)");
|
|
return DECLINED;
|
|
};
|
|
if ( $d->{probability} <= $reject ) {
|
|
$self->log(LOGINFO, "pass, $d->{class} probability is too low ($d->{probability} < $reject)");
|
|
return DECLINED;
|
|
};
|
|
if ( $d->{confidence} != 1 ) {
|
|
$self->log(LOGINFO, "pass, $d->{class} confidence is too low ($d->{confidence})");
|
|
return DECLINED;
|
|
};
|
|
|
|
# dspam is more than $reject percent sure this message is spam
|
|
$self->log(LOGINFO, "fail, $d->{class}, ($d->{confidence} confident)");
|
|
my $deny = $self->get_reject_type();
|
|
return Qpsmtpd::DSN->media_unsupported($deny, 'dspam says, no spam please');
|
|
}
|
|
|
|
sub reject_agree {
|
|
my ($self, $transaction, $d ) = @_;
|
|
|
|
my $sa = $transaction->notes('spamassassin' );
|
|
|
|
my $status = "$d->{class}, $d->{confidence} c";
|
|
|
|
if ( ! $sa->{is_spam} ) {
|
|
$self->log(LOGINFO, "pass, cannot agree, SA results missing ($status)");
|
|
return DECLINED;
|
|
};
|
|
|
|
if ( $d->{class} eq 'Spam' ) {
|
|
if ( $sa->{is_spam} eq 'Yes' ) {
|
|
if ( defined $self->connection->notes('karma') ) {
|
|
$self->connection->notes('karma', $self->connection->notes('karma') - 2);
|
|
};
|
|
$self->log(LOGINFO, "fail, agree, $status");
|
|
my $reject = $self->get_reject_type();
|
|
return ($reject, 'we agree, no spam please');
|
|
};
|
|
|
|
$self->log(LOGINFO, "fail, disagree, $status");
|
|
return DECLINED;
|
|
};
|
|
|
|
if ( $d->{class} eq 'Innocent' ) {
|
|
if ( $sa->{is_spam} eq 'No' ) {
|
|
if ( $d->{confidence} > .9 ) {
|
|
if ( defined $self->connection->notes('karma') ) {
|
|
$self->connection->notes('karma', $self->connection->notes('karma') + 2);
|
|
};
|
|
};
|
|
$self->log(LOGINFO, "pass, agree, $status");
|
|
return DECLINED;
|
|
};
|
|
$self->log(LOGINFO, "pass, disagree, $status");
|
|
};
|
|
|
|
$self->log(LOGINFO, "pass, other $status");
|
|
return DECLINED;
|
|
};
|
|
|
|
sub get_dspam_results {
|
|
my $self = shift;
|
|
my $transaction = shift || $self->qp->transaction;
|
|
|
|
if ( $transaction->notes('dspam') ) {
|
|
return $transaction->notes('dspam');
|
|
};
|
|
|
|
my $string = $transaction->header->get('X-DSPAM-Result') or do {
|
|
$self->log(LOGWARN, "get_dspam_results: failed to find the header");
|
|
return;
|
|
};
|
|
|
|
my @bits = split(/,\s+/, $string); chomp @bits;
|
|
my $class = shift @bits;
|
|
my %d;
|
|
foreach (@bits) {
|
|
my ($key,$val) = split(/=/, $_);
|
|
$d{$key} = $val;
|
|
};
|
|
$d{class} = $class;
|
|
|
|
my $message = $d{class};
|
|
if ( defined $d{probability} && defined $d{confidence} ) {
|
|
$message .= ", prob: $d{probability}, conf: $d{confidence}";
|
|
};
|
|
$self->log(LOGDEBUG, $message);
|
|
$transaction->notes('dspam', \%d);
|
|
return \%d;
|
|
};
|
|
|
|
sub get_filter_cmd {
|
|
my ($self, $transaction, $user) = @_;
|
|
|
|
my $dspam_bin = $self->{_args}{dspam_bin} || '/usr/local/bin/dspam';
|
|
my $default = "$dspam_bin --user $user --mode=tum --process --deliver=summary --stdout";
|
|
|
|
my $learn = $self->{_args}{autolearn} or return $default;
|
|
return $default if ( $learn ne 'spamassassin' && $learn ne 'any' );
|
|
|
|
$self->log(LOGDEBUG, "attempting to learn from SA");
|
|
|
|
my $sa = $transaction->notes('spamassassin' );
|
|
if ( ! $sa || ! $sa->{is_spam} ) {
|
|
$self->log(LOGERROR, "SA results missing");
|
|
return $default;
|
|
};
|
|
|
|
if ( ! $sa->{autolearn} ) {
|
|
$self->log(LOGERROR, "SA autolearn unset");
|
|
return $default;
|
|
};
|
|
|
|
if ( $sa->{is_spam} eq 'Yes' && $sa->{autolearn} eq 'spam' ) {
|
|
return "$dspam_bin --user $user --mode=tum --source=corpus --class=spam --deliver=summary --stdout";
|
|
}
|
|
elsif ( $sa->{is_spam} eq 'No' && $sa->{autolearn} eq 'ham' ) {
|
|
return "$dspam_bin --user $user --mode=tum --source=corpus --class=innocent --deliver=summary --stdout";
|
|
};
|
|
|
|
return $default;
|
|
};
|
|
|
|
sub attach_headers {
|
|
my ($self, $response, $transaction) = @_;
|
|
$transaction ||= $self->qp->transaction;
|
|
|
|
# X-DSPAM-Result: user@example.com; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00; signature=N/A
|
|
# X-DSPAM-Result: smtpd; result="Innocent"; class="Innocent"; probability=0.0023; confidence=1.00; signature=4f8dae6a446008399211546
|
|
my ($result,$prob,$conf,$sig) = $response =~ /result=\"(Spam|Innocent)\";.*?probability=([\d\.]+); confidence=([\d\.]+); signature=(.*)/;
|
|
my $header_str = "$result, probability=$prob, confidence=$conf";
|
|
$self->log(LOGDEBUG, $header_str);
|
|
my $name = 'X-DSPAM-Result';
|
|
$transaction->header->delete($name) if $transaction->header->get($name);
|
|
$transaction->header->add($name, $header_str, 0);
|
|
|
|
# the signature header is required if you intend to train dspam later.
|
|
# In dspam.conf, set: Preference "signatureLocation=headers"
|
|
$transaction->header->add('X-DSPAM-Signature', $sig, 0);
|
|
};
|
|
|
|
sub learn_as_ham {
|
|
my $self = shift;
|
|
my $transaction = shift;
|
|
|
|
my $user = $self->select_username( $transaction );
|
|
my $dspam_bin = $self->{_args}{dspam_bin} || '/usr/local/bin/dspam';
|
|
my $cmd = "$dspam_bin --user $user --mode=tum --source=corpus --class=innocent --deliver=summary --stdout";
|
|
$self->dspam_process( $cmd, $transaction );
|
|
};
|
|
|
|
sub learn_as_spam {
|
|
my $self = shift;
|
|
my $transaction = shift;
|
|
|
|
my $user = $self->select_username( $transaction );
|
|
my $dspam_bin = $self->{_args}{dspam_bin} || '/usr/local/bin/dspam';
|
|
my $cmd = "$dspam_bin --user $user --mode=tum --source=corpus --class=spam --deliver=summary --stdout";
|
|
$self->dspam_process( $cmd, $transaction );
|
|
};
|
|
|
|
sub autolearn {
|
|
my ( $self, $transaction ) = @_;
|
|
|
|
my $learn = $self->{_args}{autolearn} or return;
|
|
|
|
if ( $learn eq 'naughty' || $learn eq 'any' ) {
|
|
if ( $self->connection->notes('naughty') ) {
|
|
$self->log(LOGINFO, "training naughty as spam");
|
|
$self->learn_as_spam( $transaction );
|
|
};
|
|
};
|
|
if ( $learn eq 'karma' || $learn eq 'any' ) {
|
|
my $karma = $self->connection->notes('karma');
|
|
if ( defined $karma && $karma <= -1 ) {
|
|
$self->log(LOGINFO, "training poor karma as spam");
|
|
$self->learn_as_spam( $transaction );
|
|
};
|
|
if ( defined $karma && $karma >= 1 ) {
|
|
$self->log(LOGINFO, "training good karma as ham");
|
|
$self->learn_as_ham( $transaction );
|
|
};
|
|
};
|
|
};
|