#!perl -w

=head1 NAME

spamassassin - SpamAssassin integration for qpsmtpd

=head1 DESCRIPTION

Plugin that checks if the mail is spam by using the "spamd" daemon
from the SpamAssassin package.  F<http://www.spamassassin.org>

SpamAssassin 2.6 or newer is required.

Stores the results in a note named spamassassin (for other plugins). The note
is a hashref with whatever fields are defined in your spamassassin config.
These are the common ones: score,required,autolearn,tests,version

=head1 CONFIG

Configured in the plugins file without any parameters, the
spamassassin plugin will add relevant headers from the spamd
(X-Spam-Status etc).

The format goes like

  spamassassin  option value  [option value]

Options being those listed below and the values being parameters to
the options.  Confused yet?  :-)  It looks like this in practice:

  spamassassin reject 7 leave_old_headers keep

=over 4

=item reject [threshold]

Set the threshold where the plugin will reject the mail.  Some
mail servers are so useless that they ignore 55x responses not coming
after RCPT TO, so they might just keep retrying and retrying and
retrying until the mail expires from their queue.

Depending on your spamassassin configuration a reasonable setting is
typically somewhere between 12 to 20.

The default is to never reject mail based on the SpamAssassin score.

=item munge_subject_threshold [threshold]

Set the threshold where the plugin will prefix the subject with the
value of C<subject_prefix>. A modified subject is easier to filter on
than the other headers for many people with not so clever mail
clients.  You might want to make another plugin that does this on a
per user basis.

The default is to never munge the subject based on the SpamAssassin score.

=item subject_prefix [prefix]

What to prefix the subject with if the message is detected as spam
(i.e. if score is greater than C<munge_subject_threshold>.  Defaults to
C<*** SPAM ***>

=item spamd_socket [/path/to/socket|spamd.host:port]

Beginning with Mail::SpamAssassin 2.60, it is possible to use Unix
domain sockets for spamd. This is faster and more secure than using a
TCP connection, but if you run spamd on a remote machine, you need to
use a TCP connection.

=item leave_old_headers [drop|rename|keep]

Another mail server before might have checked this mail already and may have
added X-Spam-Status, X-Spam-Flag and X-Spam-Check-By lines. Normally you can
not trust such headers and should either rename them to X-Old-... (default,
parameter 'rename') or have them removed (parameter 'drop'). If you know
what you are doing, you can also leave them intact (parameter 'keep').

=item spamd_user [username]

The username to pass to spamd, if different from the user qpsmtpd runs as.

=item relayclient skip

What special treatment is offered to connection with relay permission? Relay
permissions are granted when the connecting IP is listed in the relayclients
file and/or when the user has authenticated. The only valid option at present
is 'skip', which skips SA scoring.

If SpamAssasin has certain network tests enabled, users may get elevated spam
scores because their dynamic IP space is properly listed on DUL blocking lists.
If the user is authenticated or coming from a trusted IP, odds are we don't
want to be reject their messages. Especially when running qpsmtpd on port 587.

=back

With both of the first options the configuration line will look like the following

 spamasssasin  reject 18  munge_subject_threshold 8


=head1 MULTIPLE RECIPIENT BEHAVIOR

This plugin supports per-user SpamAssassin preferences. When per-user SA prefs
are enabled (by setting spamd_user = vpopmail), the message recipient is used
as the spamd username. If SpamAssassin has per-user preferences enabled, it
will consult the users spam preferences when scoring the message.

When a message has multiple recipients, we do not change the spamd username.
The message is still scored by SA, but per-user preferences are not
consulted. To aid in debugging, messages with multiple recipents will
have an X-Spam-User header inserted. Admins and savvy users can look for
that header to confirm the reason their personal prefs were not consulted.

To get per-user SA prefs to work for messages with multiple recipients, the
LDA should be configured to check for the presence of the X-Spam-User header.
If the X-Spam-User header is present, the LDA should submit the message to
spamd for re-processing with the recipients address.


=head1 TODO

Make the "subject munge string" configurable

=head1 CHANGES

2012.04.02 - Matt Simerson

  * refactored for ease of maintenance
  * added support for per-user SpamAssassin preferences
  * updated get_spam_results so that score=N.N works (as well as hits=N.N)
  * rewrote the X-Spam-* header additions so that SA generated headers are
    not discarded. Admin can alter SA headers with add_header in their SA
    config. Subverting their changes there is unexpected. Making them read
    code to figure out why is an unnecessary hurdle.
  * added assemble_message, so we can calc content size which spamd wants

=cut

use strict;
use warnings;

use Qpsmtpd::Constants;
use Qpsmtpd::DSN;

use Socket qw(:DEFAULT :crlf);
use IO::Handle;

sub register {
    my ($self, $qp, %args) = @_;

    $self->log(LOGERROR, "Bad parameters for the spamassassin plugin") if @_ % 2;

    $self->{_args} = { %args };

    # backwards compatibility with previous config syntax
    if ( ! defined $self->{_args}{reject} && defined $self->{_args}{reject_threshold} ) {
        $self->{_args}{reject} = $self->{_args}{reject_threshold};
    };
    if ( ! defined $self->{_args}{reject_type} ) {
        $self->{_args}{reject_type} = 'perm';
    };

    $self->register_hook('data_post', 'data_post_handler');
}

sub data_post_handler {
    my ($self, $transaction) = @_;

    return (DECLINED) if $self->is_immune();

    if ( $transaction->data_size > 500_000 ) {
        $self->log(LOGINFO, "skip: too large (".$transaction->data_size.")");
        return (DECLINED);
    };

    my $SPAMD    = $self->connect_to_spamd() or return (DECLINED);
    my $username = $self->select_spamd_username( $transaction );
    my $message  = $self->assemble_message($transaction);
    my $length   = length $message;

    $self->print_to_spamd( $SPAMD, $message, $length, $username );
    shutdown($SPAMD, 1); # close our side of the socket (tell spamd we're done)
    my $headers = $self->parse_spamd_response( $SPAMD ) or return (DECLINED);

    $self->insert_spam_headers( $transaction, $headers, $username );
    $self->munge_subject( $transaction );
    return $self->reject( $transaction );
};

sub select_spamd_username {
    my ($self, $transaction) = @_;

    my $username = $self->{_args}{spamd_user} || getpwuid($>);

    my $recipient_count = scalar $transaction->recipients;
    if ( $recipient_count > 1 ) {
        $self->log(LOGDEBUG, "Message has $recipient_count recipients");
        return $username;
    };

    if ( $username eq 'vpopmail' ) {
# use the recipients email address as username. This enables per-user SA prefs
        $username = ($transaction->recipients)[0]->address;
    }
    else {
        $self->log(LOGDEBUG, "skipping per-user SA prefs");
    };

    return $username;
};

sub parse_spamd_response {
    my ( $self, $SPAMD ) = @_;

    my $line0 = <$SPAMD>; # get the first protocol line
        if ( $line0 !~ /EX_OK/ ) {
            $self->log(LOGERROR, "invalid response from spamd: $line0");
            return;
        };

    my (%new_headers, $last_header);
    while (<$SPAMD>) {
        s/[\r\n]//g;
        if ( m/^(X-Spam-.*?): (.*)?/ ) {
            $new_headers{$1} = $2 || '';
            $last_header = $1;
            next;
        }
        if ( $last_header && m/^(\s+.*)/ ) { # a folded line, append to last
            $new_headers{$last_header} .= CRLF . "\t" . $1;
            next;
        }
        $last_header = undef;
    }
    close $SPAMD;
    $self->log(LOGDEBUG, "finished reading from spamd");

    return scalar keys %new_headers ? \%new_headers : undef;
};

sub insert_spam_headers {
    my ( $self, $transaction, $new_headers, $username ) = @_;

    my $recipient_count = scalar $transaction->recipients;

    $self->_cleanup_spam_header($transaction, 'X-Spam-User'); # always clean up
    if ( $recipient_count > 1 ) {                 # add for multiple recipients
        $transaction->header->add('X-Spam-User', $username . ", $recipient_count recipients", 0);
    };

    foreach my $name ( keys %$new_headers ) {
        next if $name eq 'X-Spam-Prev-Subject'; # might exist if SA rewrote subject
        if ( $name eq 'X-Spam-Report' ) {
            next;   # Mail::Header mangles this prefolded header
#           $self->log(LOGDEBUG, $new_headers->{$name} );
        };
        if ( $name eq 'X-Spam-Status' ) {
            $self->parse_spam_header( $new_headers->{$name} );
        };
        $new_headers->{$name} =~ s/\015//; # hack for outlook
        $self->_cleanup_spam_header($transaction, $name);
        $transaction->header->add($name, $new_headers->{$name}, 0);
    };
}

sub assemble_message {
    my ($self, $transaction) = @_;

    $transaction->body_resetpos;

    my $message = "X-Envelope-From: "
        . $transaction->sender->format . "\n"
        . $transaction->header->as_string . "\n\n";

    while (my $line = $transaction->body_getline) { $message .= $line; };

    $message = join(CRLF, split/\n/, $message);
    return $message . CRLF;
};

sub connect_to_spamd {
    my $self = shift;
    my $socket = $self->{_args}{spamd_socket};
    my $SPAMD;
    if ( $socket && $socket =~ /\// ) {  # file path
        $SPAMD = $self->connect_to_spamd_socket( $socket );
    }
    else {
        $SPAMD = $self->connect_to_spamd_tcpip( $socket );
    };

    return if ! $SPAMD;
    $SPAMD->autoflush(1);
    return $SPAMD;
};

sub connect_to_spamd_socket {
    my ( $self, $socket ) = @_;

    if ( ! $socket || $socket !~ /^([\w\/.-]+)$/ ) {   # Unix Domain Socket
        $self->log(LOGERROR, "not a valid path");
        return;
    };

    socket(my $SPAMD, PF_UNIX, SOCK_STREAM, 0) or do {
        $self->log(LOGERROR, "Could not open socket: $!");
        return;
    };
    my $paddr = sockaddr_un( $socket );

    connect($SPAMD, $paddr) or do {
        $self->log(LOGERROR, "Could not connect to spamd socket: $!");
        return;
    };

    $self->log(LOGDEBUG, "connected to spamd");
    return $SPAMD;
};

sub connect_to_spamd_tcpip {
    my ( $self, $socket ) = @_;

    my $remote  = 'localhost';
    my $port    = 783;

    if (defined $socket && $socket =~ /^([\w.-]+):(\d+)$/) {
        $remote  = $1;
        $port    = $2;
    }
    if ($port =~ /\D/) { $port = getservbyname($port, 'tcp') };
    if ( ! $port ) {
        $self->log(LOGERROR, "No spamd port, check your spamd_socket config.");
        return;
    };
    my $iaddr = inet_aton($remote) or do {
        $self->log(LOGERROR, "Could not resolve host: $remote");
        return;
    };
    my $paddr = sockaddr_in($port, $iaddr);
    my $proto = getprotobyname('tcp');

    socket(my $SPAMD, PF_INET, SOCK_STREAM, $proto) or do {
        $self->log(LOGERROR, "Could not open socket: $!");
        return;
    };

    connect($SPAMD, $paddr) or do {
        $self->log(LOGERROR, "Could not connect to spamd: $!");
         return;
    };

    $self->log(LOGDEBUG, "connected to spamd");
    return $SPAMD;
};

sub print_to_spamd {
    my ( $self, $SPAMD, $message, $length, $username ) = @_;

    print $SPAMD "HEADERS SPAMC/1.4" . CRLF;
    print $SPAMD "Content-length: $length" . CRLF;
    print $SPAMD "User: $username" . CRLF;
    print $SPAMD CRLF;
    print $SPAMD $message or $self->log(LOGWARN, "Could not print to spamd: $!");

    $self->log(LOGDEBUG, "check_spam: finished sending to spamd");
};

sub reject {
    my ($self, $transaction) = @_;

    my $sa_results = $self->get_spam_results($transaction) or do {
        $self->log(LOGNOTICE, "skip, no results");
        return DECLINED;
    };
    my $score  = $sa_results->{score} or do {
        $self->log(LOGERROR, "skip, error getting score");
        return DECLINED;
    };

    my $ham_or_spam = $sa_results->{is_spam} eq 'Yes' ? 'Spam' : 'Ham';
    my $status = "$ham_or_spam, $score";
    my $learn;
    if ( $sa_results->{autolearn} ) {
        $learn = "learn=". $sa_results->{autolearn};
    };

    my $reject = $self->{_args}{reject} or do {
        $self->log(LOGERROR, "skip, reject disabled ($status, $learn)");
        return DECLINED;
    };

    if ( $score < $reject ) {
        if ( $ham_or_spam eq 'Spam' ) {
            $self->log(LOGINFO, "fail, $status < $reject, $learn");
            return DECLINED;
        }
        else {
            $self->log(LOGINFO, "pass, $status < $reject, $learn");
            return DECLINED;
        }
    }

    $self->connection->notes('karma', $self->connection->notes('karma') - 1);
# default of media_unsupported is DENY, so just change the message
    $self->log(LOGINFO, "deny, $status, > $reject, $learn");
    return ($self->get_reject_type(), "spam score exceeded threshold");
}

sub munge_subject {
    my ($self, $transaction) = @_;

    my $sa = $self->get_spam_results($transaction) or return;
    my $qp_num = $self->{_args}{munge_subject_threshold};

    my $required = $sa->{required} || $qp_num or do {
        $self->log(LOGDEBUG, "skipping munge, no user or qpsmtpd pref set");
        return;
    };
    return unless $sa->{score} > $required;

    my $subject_prefix = $self->qp->config('subject_prefix') || '*** SPAM ***';
    my $subject = $transaction->header->get('Subject') || '';
    $transaction->header->replace('Subject', "$subject_prefix $subject");
}

sub get_spam_results {
    my ($self, $transaction) = @_;

    if ( defined $transaction->notes('spamassassin') ) {
        return $transaction->notes('spamassassin');
    };

    my $header = $transaction->header->get('X-Spam-Status') or return;
    my $r = $self->parse_spam_header( $header );

    $self->log(LOGDEBUG, "$r->{is_spam}, $r->{score}");
    $transaction->notes('spamassassin', $r);

    return $r;
}

sub parse_spam_header {
    my ($self, $string) = @_;

# the X-Spam-Score header contents vary based on the settings in
# the spamassassin *.cf files. Rather than parse via regexp, split
# on the consistent whitespace and = delimiters. More reliable and
# likely faster.
    my @parts = split(/\s+/, $string);
    my $is_spam = shift @parts;
    chomp @parts;
    chop $is_spam;  # remove trailing ,

    my %r;
    foreach ( @parts ) {
        my ($key,$val) = split(/=/, $_);
        $r{$key} = $val;
    }
    $r{is_spam} = $is_spam;

    # backwards compatibility for SA versions < 3
    if ( defined $r{hits} && ! defined $r{score} ) {
        $r{score} = delete $r{hits};
    };
    return \%r;
};

sub _cleanup_spam_header {
    my ($self, $transaction, $header_name) = @_;

    my $action = 'rename';
    if ( $self->{_args}->{leave_old_headers} ) {
        $action = lc($self->{_args}->{leave_old_headers});
    };

    return unless $action eq 'drop' || $action eq 'rename';

    my $old_header_name = $header_name;
    $old_header_name = ($old_header_name =~ s/^X-//) ? "X-Old-$old_header_name" : "Old-$old_header_name";

    for my $header ( $transaction->header->get($header_name) ) {
        $transaction->header->add($old_header_name, $header, 0) if $action eq 'rename';
        $transaction->header->delete($header_name);
    }
}