That grumpy BSD guy: blacklists

Showing posts with label blacklists. Show all posts

Sunday, June 8, 2025

Should I Stop Caring and Let IP Address Reputation Sort Them Out?

How long does data on misbehaving hosts on the Internet stay relevant in an IP Address Reputation context?

log of pop3 gropers in action, failing of course

My main presence on the Internet also serves, for all practical purposes, as a honeypot, and is seen as mainly that by what appears to be a small but significant number of people who run IP Reputation services (yes, link to SpamHaus because apparently Wikipedia does not have a page dedicated to the topic yet). The article Badness, Enumerated by Robots (also here) describes the data collected by the honeypot, with links to the data as well as to other relevant resources.

But there is a thing about putting something on the Internet for free so anybody at all can download it: People will not necessarily read the instructions.

Note: This piece is also available without trackers but classic formatting only here.

The honeypot service has been collecting and sharing data for years, as will be clear from the articles linked in a previous paragraph. All the lists have their consumers and get downloaded regularly.

Although there are signs that the list data is further processed into various services, including those that provide IP reputation rankings, the only people who seem to care enough to actually contact me about specific entries in the data are people who own one or more IP addresses that have for one reason or other been included in the lists.

Recently I was contacted by somebody who claimed that some of their traffic seemed to be filtered due to IP reputation, and they had tracked down the problem to the POP3 Gropers list we publish. The big one, that is.

I was a bit surprised by this, give that I had provided a fairly clear description of the lists and their expiry times in the published material. That material also clearly stated, in my view at least, that the big POP3 gropers list does not have an expiry set, and should for that reason be used with caution, if at all.

But apparently one or more operators of IP address reputation services did not actually read that far.

I am still pondering what is the correct action here, so I created a fediverse poll,

Post by @pitrh@mastodon.social

View on Mastodon

(For those without working fediverse links, the question posed is "Should I", with the options: "Stop publishing the BIG pop3gropers list", "Stop caring and let IP reputation sort them out", or "No opinion, show results", prepended with a shorter version of the description in the first part of this article) which may of course have run its course when you read this.

If the poll has run its course and you don't get to vote, you are of course welcome to contact me or comment where you find reference to this article.

I am genuinely interested in hearing informed opinions on how to deal with data collected in the context of contributing to IP address reputation context.

In addition to the poll, I added a note to the Badness, Enumerated by Robots (also here) article as well as to the exported text files:

NOTE: The BIG pop3 gropers list is for history only, use the sixweeks one for IP reputation evaluations instead
As stated earlier, the big list of pop3 gropers was intended as a collection of all hosts that had ever tried and failed in guessing passwords (see Password Gropers Take the Spamtrap Bait for background). This means that the list only exists as a historical collection of sorts, and if you are intrerested in seeing when a particular host entered the data set, you can look it up in the pop3 gropers archive directory.

For any reasonably current IP Reputation purposes, you will be better served with the pop3 gropers during the last six weeks list, which conveniently is also archived for those who wish to study developments.

For what it's worth, there is an archive of the greytrapped hosts list available too, along with a separate archive of the SSH bruteforcers list, all kept around for as long as I find it at least a little useful to do so.

For reference on just what triggered the inclusion, see the log extracts preserved in the pop3logs directory, which has entries going back to February of 2024.

I hope at least some of the relevant people -- people running IP address reputation services -- take the time to read that little piece of text and have a think.

For my own part, I will be pondering the ethics and practicalities of blocklists much along the same lines as I wrote about in the 2013 piece Maintaining A Publicly Available Blacklist - Mechanisms And Principles (also here).

If you found this piece to be useful, informative, annoying or would for some other reason like to contact me or comment, please do.

You might also be interested in reading selected pieces via That Grumpy BSD Guy: A Short Reading List (also here).

Upcoming events:

Ottawa, Canada: BSDCan 2025 has tutorials June 11-12, 2025 and talks June 13-14. A new version of Network Management with the OpenBSD Packet Filter Toolset will go ahead there.

A little later on in 2025, the EuroBSDcon 2025 conference is still accepting submissions for papers and tutorials, so if you have an interesting BSD-related topic you want the world to know about, your submissions will be welcome at the EuroBSDcon submissions system, where the deadline is 2025-06-21, or June 21st, 2025 (full disclosure: I'm on the program committee). This year's conference is set in beautiful Zagreb, Croatia in late September.

At EuroBSDcon 2025, there will be a Network Management with the OpenBSD Packet Filter Toolset session, a full day tutorial starting at 2025-09-25 10:30 CET. You can register for the conference and tutorial by following the links from the conference Registration and Prices page.

Separately, pre-orders of The Book of PF, 4th edition are now open. For a little background, see the blog post Yes, The Book of PF, 4th Edition Is Coming Soon. We are hoping to have physical copies of the book available in time for the conference, and hopefully you will be able to find it in good book stores by then.

Monday, August 13, 2018

Badness, Enumerated by Robots

A condensed summary of the blocklist data generated from traffic hitting bsdly.net and cooperating sites.

After my runbsd.info entry (previously bsdjobs.com) was posted, there has been an uptick in interest about the security related data generated at the bsdly.net site. I have written quite extensively about these issues earlier so I'll keep this piece short. If you want to go deeper, the field note-like articles I reference and links therein will offer some further insights.

There are three separate sets of downloadable data, all automatically generated and with only very occasional manual intervention.

Known spam sources during the last 24 hours

This is the list directly referenced in the runbsd.info piece.

This is a greytrapping based list, where the conditions for inclusion are simple: Attempts at delivery to known-bad addresses (download link here) in domains we handle mail for have happened within the last 24 hours.

In addition there will occasionally be some addresses added by cron jobs I run that pick the IP addresses of hosts that sent mail that made it through greylisting performed by our spamd(8) but did not pass the subsequent spamassassin or clamav treatment. The bsdly.net system is part of the bgp-spamd cooperation.

The traplist has a home page and at one point was furnished with a set of guidelines.

A partial history (the log starts 2017-05-20) of when spamtraps were added and from which sources can be found in this log directory (or at this alternate location). Read on for a bit of information on the alternate sources.

Note: The list is generated at ten past every full hour by a script that uses essentially the one-liner

spamdb | grep TRAPPED | awk -F\| '{print $2}' >bsdly.net.traplist

to generate the body of the list.

For those interested in the entire history of the greytrapping-based blocklist, the 2025 article Eighteen Years of Greytrapping - Is the Weirdness Finally Paying Off? (2025), also available tracked, prettified will be of interest.

Misc other bots: SSH Password bruteforcing, malicious web activity, POP3 Password Bruteforcing.

The bruteforcers list is really a combination of several things, delivered as one file but with minimal scripting ability you should be able to dig out the distinct elements, described in this piece.

The (usually) largest chunk is a list of hosts that hit the rate limit for SSH connections described in the article or that was caught trying to log on as a non-existent user or other undesirable activity aimed at my sshd(8) service. Some as yet unpublished scriptery helps me feed the miscreants that the automatic processes do not catch into the table after a manual quality check. For a more thorough treatment of ssh bruteforcers, see the The Hail Mary Cloud and the Lessons Learned overview article which links to several other articles in the sequence.

The second part is a list of IP addresses that tried to access our web service in undesirable ways, including trying for specific URLs or files that will never be found at any world-facing part of our site.

After years of advocating short lifetimes (typically 24 hours) for blocklist entries only to see my logs fill up with attempts made at slightly slower speeds, I set the lifetime for entries in this data set to 28 days (since expanded to 2419200 seconds, or if you will, six weeks). The background including some war stories of monitoring SSH password groping can be found in this piece, while the more recent piece here covers some of the weeding out bad web activity.

The POP3 gropers list comes in two variations. Again lists of IP addresses caught trying to access a service, most of those accesses are to non-existent user names with an almost perfect overlap with the spamtraps list, local-part only (the part before the @ sign).

The big list is a complete corpus of IP addresses that have tried these kinds of accesses since I started recording and trapping them (see this piece for some early experience and this one for the start of the big collection).

There is also a smaller set, produced from the longterm table described in this piece. For much the same reason I did not stick to 24-hour expiry for the SSH list, this one has six-week expiry. With some minimal scriptery I run by hand one or two times per day, any invalid POP3 accesses to valid accounts get their IP adresses added to the longterm table and the exported list.

For reference on just what triggered the inclusion, see the log extracts preserved in the pop3logs directory, which has entries going back to February of 2024.

The most recent exports of all lists generated here can be found in this directory. Before making any inguiries on removal from any of the lists, check all files in this directory for occurences or not of the IP address in question.

Note: The lists generated by table exports are generated by variations of pfctl's show table subcommand. At ruleset reload such as reboots after a sysupgrade, the tables are re-initialized from these same exported files.

If you're wondering about the title, the term "enumerating badness" stems from Marcus Ranum's classic piece The Six Dumbest Ideas in Computer Security. Please do read that one.

Here are a few other references other than those referenced in the paragraphs above that you might find useful:

The Book of PF, 3rd edition
Hey, spammer! Here's a list for you! which contains the announcement of the bsdly.net traplist.
Effective Spam and Malware Countermeasures, a more complete treatment of those keywords

If you're interested in further information on any of this, the most useful contact information is in the comment blocks in the exported lists.

Update 2020-07-29: I added a direct link to the complete list of spamtraps, since the web page seemed a bit crowded to at least one visitor. Direct link again here for your convenience.

Update 2021-01-15: Note that at some point after the article was written I cranked up expiry for the bruteforce tables to six weeks (sorry, I forgot to note the exact date).

Update 2021-03-11: In light of recent Microsoft Exchange exploits it might interest some that any request to bsdly.net for "GET /owa/" lands the source in the webtrash table, exported as part of the bruteforcers list.

Update 2021-08-03: Added notes about how the lists are generated and table maintenance.

Update 2025-03-23: Addresses matching a jumble of regexps for "silly web things" are now also exported separately as webtrash.

Update 2025-07-19: If you found this piece to be useful, informative, annoying or would for some other reason like to contact me or comment, please do.

You might also be interested in reading selected pieces via That Grumpy BSD Guy: A Short Reading List (also here).

Saturday, April 23, 2016

Does Your Email Provider Know What A "Joejob" Is?

Anecdotal evidence seems to indicate that Google and possibly other mail service providers are either quite ignorant of history when it comes to email and spam, or are applying unsavoury tactics to capture market dominance.

The first inklings that Google had reservations about delivering mail coming from my bsdly.net domain came earlier this year, when I was contacted by friends who have left their email service in the hands of Google, and it turned out that my replies to their messages did not reach their recipients, even when my logs showed that the Google mail servers had accepted the messages for delivery.

Note: This piece is also available without trackers but classic formatting only here.

Contacting Google about matters like these means you first need to navigate some web forums. In this particular case (I won't give a direct reference, but a search on the likely keywords will likely turn up the relevant exchange), the denizens of that web forum appeared to be more interested in demonstrating their BOFHishness than actually providing input on debugging and resolving an apparent misconfiguration that was making valid mail disappear without a trace after it had entered Google's systems.

The forum is primarily intended as a support channel for people who host their mail at Google (this becomes very clear when you try out some of the web accessible tools to check domains not hosted by Google), so the only practial result was that I finally set up DKIM signing for outgoing mail from the domain, in addition to the SPF records that were already in place. I'm in fact less than fond of either of these SMTP addons, but there were anyway other channels for contact with my friends, and I let the matter rest there for a while.

If you've read earlier instalments in this column, you will know that I've operated bsdly.net with an email service since 2004 and a handful of other domains from some years before the bsdly.net domain was set up, sharing to varying extents the same infrastructure. One feature of the bsdly.net and associated domains setup is that in 2006, we started publishing a list of known bad addresses in our domains, that we used as spamtrap addresses as well as publising the blacklist that the greytrapping generates.

Over the years the list of spamtrap addresses -- harvested almost exclusively from records in our logs and greylists of apparent bounces of messages sent with forged From: addresses in our domains - has grown to a total of 29757 spamtraps, a full 7387 in the bsdly.net domain alone. At the time I'm writing this 31162 hosts have attempted to deliver mail to one of those spamtrap addresses during the last 24 hours. The exact numbers will likely change by the time you read this -- blacklisted addresses expire 24 hours after last contact, and new spamtrap addresses generally turn up a few more each week. With some simple scriptery, we pick them out of logs and greylists as they appear, and sometimes entire days pass without new candidates appearing. For a more general overview of how I run the blacklist, see this post from 2013.

In addition to the spamtrap addresses, the bsdly.net domain has some valid addresses including my own, and I've set up a few addresses for specific purposes (actually aliases), mainly set up so I can filter them into relevant mailboxes at the receiving end. Despite all our efforts to stop spam, occasionally spam is delivered to those aliases too (see eg the ethics of running the traplist page for some amusing examples).

Then this morning a piece of possibly well intended but actually quite clearly unwanted commercial email turned up, addressed to one of those aliases. For no actually good reason, I decided to send an answer to the message, telling them that whoever sold them the address list they were using were ripping them off.

That message bounced, and it turns out that the domain was hosted at Google.

Reading that bounce message is quite interesting, because if you read the page they link to, it looks very much like whoever runs Google Mail doesn't know what a joejob is.

The page, which again is intended mainly for Google's own customers, specifies that you should set up SPF and DKIM for domains. But looking at the headers, the message they reject passes both those criteria:

Received-SPF: pass (google.com: domain of peter@bsdly.net designates 2001:16d8:ff00:1a9::2 as permitted sender) client-ip=2001:16d8:ff00:1a9::2;
Authentication-Results: mx.google.com;
       dkim=pass (test mode) header.i=@bsdly.net;
       spf=pass (google.com: domain of peter@bsdly.net designates 2001:16d8:ff00:1a9::2 as permitted sender) smtp.mailfrom=peter@bsdly.net
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bsdly.net; s=x;
 h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:References:To:Subject; bh=OonsF8beQz17wcKmu+EJl34N5bW6uUouWw4JVE5FJV8=;
 b=hGgolFeqxOOD/UdGXbsrbwf8WuMoe1vCnYJSTo5M9W2k2yy7wtpkMZOmwkEqZR0XQyj6qoCSriC6Hjh0WxWuMWv5BDZPkOEE3Wuag9+KuNGd7RL51BFcltcfyepBVLxY8aeJrjRXLjXS11TIyWenpMbtAf1yiNPKT1weIX3IYSw=;

Then for reasons known only to themselves, or most likely due to the weight they assign to some unknown data source, they reject the message anyway.

We do not know what that data source is. But with more than seven thousand bogus addresses that have generated bounces we've registered it's likely that the number of invalid bsdly.net From: addresses Google's systems has seen is far larger than the number of valid ones. The actual number of bogus addresses is likely higher, though: in the early days the collection process had enough manual steps that we're bound to have missed some. Valid bsdly.net addresses that do not eventually resolve to a mailbox I read are rare if not entirely non-existent. But the 'bulk mail' classification is bizarre if you even consider checking Received: headers.

The reason Google's systems most likely has seen more bogus bsdly.net From: addresses than valid ones is that by historical accident faking sender email addresses in SMTP dialogues is trivial.

Anecdotal evidence indicates that if a domain exists it will sooner or later be used in the from: field of some spam campaign where the messages originate somewhere else completely, and for that very reason the SPF and DKIM mechanisms were specified. I find both mechanisms slightly painful and inelegant, but used in their proper context, they do have their uses.

For the domains I've administered, we started seeing log entries, and in the cases where the addresses were actually deliverable, actual bounce messages for messages that definitely did not originate at our site and never went through our infrastructure a long time before bsdly.net was created. We didn't even think about recording those addresses until a practical use for them suddenly appeared with the greytrapping feature in OpenBSD 3.3 in 2003.

A little while after upgrading the relevant systems to OpenBSD 3.3, we had a functional greytrapping system going, at some point before the 2007 blog post I started publishing the generated blacklist. The rest is, well, what got us to where we are today.

From the data we see here, mail sent with faked sender addresses happens continuously and most likely to all domains, sooner or later. Joejobs that actually hit deliverable addresses happen too. Raw data from a campaign in late 2014 that used my main address as the purported sender is preserved here, collected with a mind to writing an article about the incident and comparing to a similar batch from 2008. That article could still be written at some point, and in the meantime the messages and specifically their headers are worth looking into if you're a bit like me. (That is, if you get some enjoyment out of such things as discovering the mindbogglingly bizarre and plain wrong mail configurations some people have apparently chosen to live with. And unless your base64 sightreading skills are far better than mine, some of the messages may require a bit of massaging with MIME tools to extract text you can actually read.)

Anyone who runs a mail service and bothers even occasionally to read mail server logs will know that joejobs and spam campaigns with fake and undeliverable return addresses happen all the time. If Google's mail admins are not aware of that fact, well, I'll stop right there and refuse to believe that they can be that incompentent.

The question then becomes, why are they doing this? Are they giving other independent operators the same treatment? If this is part of some kind of intimidation campaign (think "sign up for our service and we'll get your mail delivered, but if you don't, delivering to domains that we host becomes your problem"). I would think a campaign of intimidation would be a less than useful strategy when there are alread antitrust probes underway, these things can change direction as discoveries dictate.

Normally I would put oddities like the ones I saw in this case down to a silly misconfiguration, some combination of incompetence and arrogance and, quite possibly, some out of control automation thrown in. But here we are seeing clearly wrong behavior from a company that prides itself in hiring only the smartest people they can find. That doesn't totally rule out incompetence or plain bad luck, but it makes for a very strange episode. (And lest we forget, here is some data on a previous episode involving a large US corporation, spam and general silliness. (Now with the external link fixed via the wayback machine.))

One other interesting question is whether other operators, big and small behave in any similar ways. If you have seen phenomena like this involving Google or other operators, I would like to hear from you by email (easily found in this article) or in comments.

Update 2016-05-03: With Google silent on all aspects of this story, it is not possible pinpoint whether the public whining or something else that made a difference, but today a message aimed at one of those troublesome domains made it through to its intended recipient.

Much like the mysteriously disappeared messages, my logs show a "Message accepted for delivery" response from the Google servers, but unlike the earlier messages, this actually appeared in the intended inbox. In the meantime one of one of the useful bits of feedback I got on the article was that my IPv6 reverse DNS was in fact lacking. Today I fixed that, or at least made a plausible attempt to (you're very welcome to check with tools available to you). And for possibly unrelated reasons, my test message made it through all the way to the intended recipient's mailbox.

[Opinions offered here are my own and may or may not reflect the views of my employer.]

Update 2016-11-21: Since this piece was originally written, we've seen several episodes of Gmail.com or other Google domains disappearing or bouncing mail from bsdly.net, and at some point I got around to setting up proper DMARC records as well.

In a separate development, I also set up the script that generates the blacklist for export to send a copy of its output to my gmail.com addresses. This has lead to a few fascinating bounces, all of them archived here in order to preserve a record of incidents and developments

It should be no surprise that even the SPF-DKIM-DMARC trinity is not actually enough to avoid bounces and disappearances, driving Google to come up with a separate DNS TXT record of their own, the google-site-verification record, which contains a key they will generate for your domain if you manage to find the correct parts of their website.

Update 2017-02-12 - 2017-02-13: At semi-random intervals since this piece was originally written, Gmail has had a number of short periods of bouncing my hourly spamtrap reports sent to myself at bsdly.net with a Cc: to my gmail account.

At most times these episodes last for a couple of hours only, possibly when a human intervenes to correct the situation. However, this weekend we have seen more of this nonsense than usual: Gmail started bouncing my hourly reports on Saturday evening CET, with the first bounce for the 2017-02-11 21:10 report, keeping up the bounces through the 2017-02-12 05:10 report.

Then after a brief respite the bounces resumed with the 2017-02-12 08:10 report and have persisted, bouncing every hourly message (except the 2017-02-13 03:10, 04:10 and 13.10 messages) and with non-delivery, with the last bounce received now for the 2017-02-13 21:10 report. I archive all bounces as they arrive mod whatever time I have on my hands in the googlefail archive.

Update 2020-10-21: Recent events have lead me to suspect that Google may be relying very much, thank you, on SPF information, and that their systems may have been seeing timeouts on SPF lookups due to my BIND not switching to TCP answers soon enough. One other incident is almost certain to have been caused by a lack of correct reverse lookup for the sending system's IPv6 address.

Update 2026-01-08: Max Stucchi and I will be giving a PF tutorial at AsiaBSDC0n 2016, and I welcome your questions now that I'm revising the material for that session. See this blog post for some ideas.

For a broader overview and retrospective, you may be interested in reading Eighteen Years of Greytrapping - Is the Weirdness Finally Paying Off?, which links to this piece and a number of other related resources.

You might also be interested in reading selected pieces via That Grumpy BSD Guy: A Short Reading List (also here).

Separately, pre-orders of The Book of PF, 4th edition are now open. For a little background, see the blog post Yes, The Book of PF, 4th Edition Is Coming Soon (also here). ~~We are hoping to have physical copies of the book available in time for the conference, and hopefully you will be able to find it in good book stores by then.~~ The latest information I have is that physical copies should be ready to ship by the end of January 2026.

Sunday, April 14, 2013

Maintaining A Publicly Available Blacklist - Mechanisms And Principles

When you publicly assert that somebody sent spam, you need to ensure that your data is accurate. Your process needs to be simple and verifiable, and to compensate for any errors, you want your process to be transparent to the public with clear points of contact and line of responsibility. Here are some pointers from the operator of the bsdly.net greytrap-based blacklist.

Regular readers will be aware that bsdly.net started sharing our locally generated blacklist of known spam senders back in July 2007, and that we've offered hourly updates for free since then.

Note: This piece is also available without trackers but classic formatting only here.

The mechanics of maintaining a list boil down to a few simple steps, as described in the original article and the various web pages it references as well several followups, but the probably most informative recipe for how it's all done was this one, written in May 2012 in response to (as usual) a heated exchange on openbsd-misc.

As I've explained in earlier articles, once the basic spamd(8) setup is in place, maintaining the blacklist starts with defining your list of known bad, never to become deliverable adresses in domains you control. It is worth noting that you can run spamd on any OpenBSD computer even if you do not run a real mail service (several of my correspondents do, and do evil things like crank up the time between response bytes to 10 seconds for entertainment), but as it happens we have a few real mail servers behind our spamd equipped gateways, so it seemed natural to restrict our pool of trap addresses to the domains that are actually served by our kit here.

Collecting addresses for the spamtraps list started with a totally manual process of fishing out addresses from the mail server logs, greping for log entries for delivery attempts to non-existent addresses in our domains. Spammers would do (as they still do) Joe jobs on one or more of our domains, making up or generating fake addresses to use as From: or Reply-to: addresses on their spam messages, and messages that for one reason or the other were not deliverable would end up generating bounce messages that our mail service would need to deal with. But a manual process is error prone and we're bound to have missed a few, so not too long after I'd written the script that generates the downloadable blacklist, I had it checking the active greylist for any addresses not already in the pool of known bad addresses.

This is the process that has helped generate the current list of 'imaginary friends', now 24,324 entries long and with a growth of usually a handful per day (but there have been whole days without a single new entry) but up to a few hundred, in rare cases, whenever the script runs. I assume there will be more entries arriving as I write and post this article, but right now the latest entry so far, received 13 Apr 2013 15:10 CEST, was pfpeter@bsdly.net (which mildly suggests that somebody is having a bit of fun with my address and obvious keywords -- if you get the trap address list, you'll see that grep peter@bsdly.net sortlist turns up close to a hundred entries, mostly combinations of well-known keywords and my email address).

You could argue that fishing out bounce-to addresses of the greylist quickly for trapping purposes runs the risk of unfairly penalizing innoncent third parties with badly configured mail services, and I must admit that risk exists. However, my experiment of planting my own made-up adresses in the spamtraps list reveals that the list is indeed read and used by spammers, and after all, early sufferers would be blacklisted from here only for 24 hours after their last attempt at bouncing back the worthless stuff to us.

And once an address is in the spamtrap list, attempts at delivering mail to that address turns up in logs something like this:

Apr 14 15:19:22 skapet spamd[1733]: (GREY) 201.215.127.126: <switchbackiwh0@google.com> -> <aramforbess@bsdly.net>

Apr 14 15:19:22 skapet spamd[31358]: Trapping 201.215.127.126 for tuple 201.215.127.126 pc-126-127-215-201.cm.vtr.net <switchbackiwh0@google.com> <aramforbess@bsdly.net>

Apr 14 15:19:22 skapet spamd[31358]: sync_trap 201.215.127.126

the sync_trap line indicates that this spamd is set up to synchronize with a sister site, like I described in the In The Name Of Sane Email... article. When the miscreant returns, it looks something like this:

Apr 14 15:28:01 skapet spamd[30256]: 201.215.127.126: connected (3/3), lists: spamd-greytrap

Apr 14 15:30:15 skapet spamd[30256]: 201.215.127.126: disconnected after 134 seconds. lists: spamd-greytrap

most likely with repeat attempts until the sender gives up.

That's the basic mechanism. Now for the principles. I outlined some of the operating principles in a kind of terms of service statement here, but I'll offer a rehash here with a tiny sprinkling of tweaks I've made to the process in order to make the quality of the data I offer better.

First, as I already pointed out in the ingress, you want your process to be simple and verifiable. Run of the mill spamd greytrapping passes the first test with flying colors; after all, any host that ends up in the blacklist verifiably tried to send mail to a known bad address. Keep your logs around for a while, and you should be in good shape to verify what happened.

You also want your data to be accurate, with each entry representing a host that verifiably sent spam. This means watching out for errors of any kind, including but not limited to finding and removing false positives. The automatic 24-hour expiry that's part of the whole greytrapping experience helps a lot here. Any perpetrator or unlucky victim will be out of harm's or blockage's way within 24 hours of the last undesired action we register from their side. There is no requirement that the system administrator track down a web form and swear on their grandmother's pituitary gland that they have 'cleaned up the system'. We (perhaps naively) assume that anyone we don't hear from is no longer our problem.

However, spamd was designed to be a solution to a relatively simple and limited set of problems. Every day some spam messages will manage to get past the outer defenses and face the content filtering that in most cases makes the right decision and drops any spam messages that reaches it on the floor. And there is a small, but not entirely non-existent body of messages that are spam of some kind that will end up in users' inboxes.

For the case where messages are dropped by the content filtering, I found that it was fairly simple to extract the IP addresses of the last hop before entering our network from the logs generated by the content filtering, and at regular intervals these IP addresses are collected from the mail servers with the content filtering in place, and fed into the local greytrap via spamdb(8). It took more than a few dry runs before I trusted the process, but setting up something similar for your environment should be within any sysadmin's scripting skills. We use spamassassin and clamav here, but you should be able to extract fairly easily the information you need to fit the behavior of your particular combination of software. We also offer our users the option of saving messages in spam and not-spam folders on a network drive to train spamassassin's Bayesian engine, indirectly helping the quality of the generated blacklist via more accurate detection of spam. In addition, a so-minded administrator can even extract IP addresses from any headers the user had a mind to conserve and use spamdb(8) to manually insert offending IP addresses in the local greytrap list.

And finally to compensate for any errors, you want your process to be transparent to the public with clear points of contact and line of responsibility. In other words, make sure that you have people in place who are indeed accessible and responsive when somebody tries to contact you via any of the RFC 2142 required addresses. And post something like this article to somewhere reachable. At bsdly.net and associated domains, it's a distinct advantage that contact attempts happen from hosts not currently in the blacklists, but as far as I am aware any errors in the published list have been dealt with before anybody else noticed, and we have avoided being party to the blocklist vendettas and web forum flame wars that have plagued other blacklist maintainers (it has been suggested that the December 2012 DDOS incident could have been part of somebody's revenge, but we do not have sufficient evidence to point any fingers).

In short, you need to keep things simple, act responsibly and be responsive to anyone contacting you about your (mostly automatically generated) work product.

Good night and good luck.

2013-04-15 update: Clarified that manual spamdb(8) manipulation can be used to insert IP addreses in the blacklist too.

2013-04-16 update: It is also possible to fetch the hourly dump from the NUUG mirror here: https://home.nuug.no/~peter/bsdly.net.traplist. In fact, fetching from there should under most circumstances be faster than getting it from the original location. The file is copied at 15 minutes past the hour, while the generating starts at 10 past the hour.

In addition to the techniques described here, it is useful to know that OpenBSD developer Peter Hessler is working on distributing spamd data via BGP, as described in his AsiaBSDCon 2013 paper. Not part of the base distribution yet, but work continues and could come in useful in addition to the batch import of exported lists like the bsdly.net hourly dump.

If you're interested in setting up your own spamd, your main source of information is included in your OpenBSD or FreeBSD installation: the man pages such as the one I refer to here. Recommended secondary sources include my own The Book of PF and the PF tutorial (or here) it grew out of. This has again been argely superseded by Network Management with the OpenBSD Packet Filter toolset, by Peter N. M. Hansteen, Massimiliano Stucchi and Tom Smyth (A PF tutorial, this is the BSDCan 2024 edition. In addition, you can the OpenBSD project by buying merchandise from the OpenBSD shop.

If you're interested in OpenBSD in general, you have a real treat coming up in the form of Michael W. Lucas' Absolute OpenBSD, 2nd edition.

You might also be interested in reading selected pieces via That Grumpy BSD Guy: A Short Reading List (also here).

Tuesday, December 25, 2012

DDOS Bots Are People! (Or Manned By Some, At Least)

Mitigating a DDOS attack against your infrastructure involves both people skills and tech skills. Whining won't cut it at all. The underlying problem remains the sad fact that the botnet herders are able to find fresh hosts for their malware. Should we start publishing more information about those pesky DDOS participants?

I have a confession to make. For a while and up until recently, one of my name servers was in fact an open resolver. The way I discovered and fixed the problem was by way of a rather crude DNS based DDOS.

Regular readers (Hi, Bert!) will be aware that I haven't actually published anything noteworthy for a while. So I was a bit surprised to find in early December 2012 that bsdly.net and associated domains was under a DNS based distributed denial of service (DDOS) attack. The attack itself appeared to be nothing special -- just a bunch of machines sending loads and loads of rubbish DNS requests directed at the IP addresses listed as authoritative masters for a few selected domains.

The targets were on relatively modest connections (think SOHO grade), so their pipes were flooded by the traffic and the people who were relying on that connectivity were not getting much network-related done. The sites weren't totally offline, but just about anything would time out without completing and life would be miserable. I've made a graph of the traffic available here, in a weekly view of that approximate period that nicely illustrates normal vs abnormal levels for those networks, generated by nfsen from pflow(4) data.

The networks under attack were in this instance either part of my personal lab or equipment used and relied upon by old friends, so I set out to make things liveable again as soon as I was able. Read on for field notes on practical incident response.

Under Attack? Just Block Them Then!
My early impulse was of course to adjust the PF rules that take care of rapid-fire brute force attacks (see eg the tutorial or the book for info) to swallow up the the rapid-fire DNS as well. That was unfortunately only partially successful. We only achieved small dips in the noise level.

Looking at the traffic via tcpdump(8) and OpenBSD's excellent systat states view revealed that the floods were incoming at a fairly quick pace and was consistently filling up the state table on each of the firewalls, so all timeouts were effectively zero for longish periods. A pfctl -k directed at known attackers would typically show a few thousand states killed, only to see the numbers rise quickly again to the max number of states limit. Even selectively blocking by hand or rate-limiting via pf tricks was only partially effective.

The traffic graphs showed some improvement, but the tcpdump output didn't slow at all. At this point it was getting fairly obvious that the requests were junk -- no sane application will request the same information several thousand times in the space of a few seconds.

It Takes People Skills. Plus whois. And A Back Channel.
So on to the boring part. In most cases what does help, eventually, is contacting the people responsible for the security of the networks where the noise traffic originates. On Unixish systems, you have at your fingertips the whois(1) command, which is designed for that specific purpose. Use it. Feeding a routeable IP adress to whois will in most circumstances turn up useful contact data. In most cases, the address you're looking for is abuse@ or the security officer role for the network or domain.

If you're doing this research while you're the target of a DDOS, you will be thanking yourself for installing a back channel to somewhere that will give you enough connectivity to run whois and send email to the abuse@ addresses. If your job description includes dealing with problems of this type and you don't have that in place already, drop what you're doing and start making arrangements to get a back channel, right now.

Next up, take some time to draft a readable message text you can reuse quickly to convey all relevant information to the persons handling abuse@ mail at the other end.

Be polite (I've found that starting with a "Dear Colleague" helps), to the point, offer relevant information up front and provide links to more (such as in my case tcpdump output) for followup. Stress the urgency of the matter, but do not make threats of any kind, and save the expletives for some other time.

The issue here is to provide enough information to make the other party start working on the problem at their end and preferably inspire them to make that task a high priority one. Do offer an easy point of contact, make sure you can actually read and respond to email at the address you're sending from, and if necessary include the phone number where you are most easily reachable.

When you have a useful template message, get ready to send however many carefully amended copies of that message to however many colleagues (aka abuse@) it takes. Take care to cut and paste correctly, if there's a mismatch between your subject and your message body on anything essential or inconsistencies within your message, more likely than not your message will be discarded as a probable prank. Put any address you think relevant in your Cc: field, but do not hold out any high hopes off useful response from law enforcement. Only directly affected parties will display any interest whatsoever.

Fill in any blanks or template fields with the output from your monitoring tools. But remember, your main assets at this point are your people skills. If the volume is too high or you find the people part difficult, now is the time to enlist somebody to handle communications while you deal with the technical and analysis parts.

You will of course find that there are abuse contact addresses that are in fact black holes (despite the RFC stating basic requirements), and unless you are a name they've heard about you should expect law enforcement to be totally useless. But some useful information may turn up.

Good Tools Help, Beware Of Snake Oil
I've already mentioned monitoring tools, for collecting and analyzing your traffic. There is no question you need to have useful tools in place. What I have ended up doing is to collect NetFlow traffic metadata via OpenBSD's pflow(4) and similar means and monitoring the via NFSen. Other tools exist, and if you're interested in network traffic monitoring in general and NetFlow tools in particular, you could do worse than pick up a copy of Michael W. Lucas' recent book Network Flow Analysis.

Michael chose to work with the flow-tools family of utilities, but he does an outstanding job of explaining the subject in both theory and in the context of practical applications. What you read in Michael's book can easily be transferred to other toolsets once you get at grip on the matter.

Unfortunately, (as you will see from the replies you get to your messages) if you do take an interest in your network traffic and start measuring, you will be one of a very select minority. One rather perverse side effect of 'anti-terror' or 'anti-anythingyouhate' legislation such as the European Union Data Retention Directive and similar log data retention legislation in the works elsewhere is that logging anything not directly associated with the health of your own equipment is likely to become legally burdensome and potentially expensive, so operators will only start logging with a granularity that would be useful in our context once there are clear indications that an incident is underway.

Combine this with the general naive optimism people tend to exhibit (aka 'it won't happen here'), and result is that very few system operators actually have a clue about their network traffic.

Those who do measure their traffic and respond to your queries may turn up useful information - one correspondent was adamant that the outgoing traffic graph for the IP adress I had quoted to them was flat and claimed that what I was likely seeing was my servers being utilized in a DNS amplification attach (very well described by Cloudflare in this blog post). The main takeway here is that since UDP is basically 'fire and forget', unless your application takes special care, it is possible to spoof the source address and target the return traffic at someone else.

My minor quarrel with the theory was that the vast majority of requests were not recursive queries (a rough count based on grep -c on tcpdump output preserved here says that "ANY" queries for domains we did legitimately answer for at the start of the incident outnumbered recursive queries by a ratio better than 10,000 to 1). So DNS amplification may have been a part of the problem, but a rather small one (but do read the Cloudflare article anyway, it contains quite a bit of useful information).

And to make a long story slightly shorter, the most effective means of fighting the attack proved also to be almost alarmingly simple. First off, I moved the authority for the noise generating domains off elsewhere (the domains were essentially dormant anyway, reserved on behalf of a friend of mine some years ago for plans that somehow failed to move forward). That did not have the expected effect: the queries for those domains kept coming beyond the zone files' stated timeouts, aimed at the very same IP adresses as before. The only difference was that those queries were now met with a 'denied' response, as were (after I had found the config error on one host and fixed it) any recursive queries originating from the outside.

The fact that the noisemakers kept coming anyway lead me to a rather obvious conclusion: Any IP address that generates a 'denied' response from our name server is up to no good, and can legitimately be blackhole routed at the Internet-facing interface. Implementing the solution was (no surprise) a matter of cooking up some scriptery, including one that tails the relevant logs closely, greps out the relevant information and one that issues a simple route add -host $offendingip 127.0.0.1 -blackhole for each offending IP address.

My users reported vastly improved network conditions almost immediately, while the number of blackhole routed IP addresses at each site quickly shot up to a 24-hour average somewhere in the low thousands before dropping rather sharply to at first a few hundreds, through a few dozen to, at last count, a total of 5.

There are a number of similar war stories out there, and good number of them end up with a recommendation to buy 'DDOS protection' from some vendor or other (more often than not some proprietary solution where you get no clue about the innards), or to 'provision your stuff to infrastructure that's too big to be DDOSed'. Of these options I would prefer the latter, but this story I think shows that correct use of the tools OpenBSD and other sane operating systems provide for free will go a long way. More kit helps if you're lucky, but smarts win the day.

Should we publish, or 'name and shame'?
I strongly suspect that most of the handful of boxes that are currently blackhole routed by my setup here belong to a specific class of 'security consultant' who for reasons of their own want a piece of the sniffing for recursive resolvers action. But I really can't be certain: I have now way except whois and guesswork to determine who mans the scanning boxes and for what purpose (will they alert owners of any flaws found or save it all for their next attack -- there just is no way to tell). Scans like those (typically involving a query for './A/IN' or the texbook 'isc.org/ANY/IN') are are of course annoying, but whoever operates those boxes are very welcome to contact me in any way they can with data on their legitimate purposes.

During the attack I briefly published a list of the IP addresses that had been active during the last 24 hours to the bsdly.net web site, and for a short while I even included them as a separate section in the bsdly.net blacklist for good measure (an ethically questionable move, since that list is generated for a different and quite specific purpose). I am toying with the idea of publishing the current list of blackholed hosts in some way, possibly dumping to somewhere web-accessible every hour or so, if feedback on this column indicates it would be a useful measure. Please let me know what you think in comments or via email.

For the rest of you out there, please make an effort to keep your systems in trim, well configured with no services running other than for the specific purposes you picked. Keeping your boxes under your own control does take an effort, but it's worth your trouble. Of course there are entire operating environments worth avoiding, and if you're curious about whether any system in your network was somehow involved in the incident, I will answer reasonable requests for specific digging around my data (netflow and other). As a side note, the story I had originally planned to use as an illustration of how useful netflow data is in monitoring and capacity planning involves a case of astoundingly inane use of a Microsoft product in a high dollar density environment, but I'll let that one lie for now.

Good night and good luck.

Update 2019-12-18: Man page links updated to modern-style man.openbsd.org links (no other content change).

Monday, May 28, 2012

In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe

Recipes in our field are all too often offered with little or no commentary to help the user understand the underlying principles of how a specific configuration works. To counter the trend and offer some free advice on a common configuration, here is my recipe for a sane mail setup.

Mailing lists can be fun. Most of the time the discussions on lists like openbsd-misc are useful, entertaining or both.

But when your battle with spam fighting technology ends up blocking your source of information and entertainment (like in the case of the recent thread titled "spamd greylisting: false positives" - starting with this message), frustration levels can run high, and in the process it emerged that some readers out there place way too much trust in a certain site offering barely commented recipes (named after a rare chemical compound Cl-Hg-Hg-Cl).

Note: This piece is also available without trackers but more basic formatting here.

I did pitch in at various points in that thread, but then it turned out that the real problem was a misconfigured secondary MX, and I thought I'd offer my own recipe, in the true spirit of sharing works for me(tm) content. So without further ado, here is

Setting Up OpenBSD's spamd(8) With Secondary MXes In Play in Four Easy Steps

Yes, it really is that simple. The four steps are:

Make sure your MXes (both primary and secondary) are able to receive mail for your domains
Set set up content filtering for all MXes, since some spambots actually speak SMTP
Set up spamd in front of all MXes
Set up synchronization between your spamds

These are the basic steps. If you want to go even further, you can supplement your greylisting and publicly available blacklists with your own greytrapping, but greytrapping is by no means required.

For steps 1) and 2), please consult the documentation for your MTA of choice and the content filtering options you have available.

If you want an overview article to get you started, you could take a peek at my longish Effective spam and malware countermeasures article (originally a BSDCan paper - if you feel the subject could bear reworking into a longer form, please let me know). Once you have made sure that your mail exchangers will accept mail for your domains (checking that secondaries do receive and spool mail when you stop the SMTP service on the primary), it's time to start setting up the content filtering.

At this point you will more likely than not discover that any differences in filtering setups between the hosts that accept and deliver mail will let spam through via the weakest link. Tune accordingly, or at least until you are satisfied that you have a fairly functional configuration.

When you're done, leave top or something similar running on each of the machines doing the filtering and occasionally note the system load numbers.

Before you start on step 3), please take some time to read relevant man pages (pf.conf, spamd, spamd.conf and spamlogd come to mind), or you could take a peek at the relevant parts of the PF FAQ, or my own writings such as The Book of PF, the somewhat shorter Firewalling with PF online tutorial or the most up to date tutorial slides with slightly less text per HTML page.

The book and tutorial both contain material relevant to the FreeBSD version and other versions based on the older syntax too (really only minor tweaks needed). In the following I will refer to the running configuration at the pair of sites that serve as my main lab for these things (and provided quite a bit of the background for The Book of PF and subsequent columns here).

As you will have read by now in the various sources I cited earlier, you need to set up rules to redirect traffic to your spamd as appropriate. Now let's take a peek at what I have running at my primary site's gateway. greping for rules that reference the smtp should do the trick:

peter@primary $ doas grep smtp /etc/pf.conf

which yields

pass in log quick on egress proto tcp from <nospamd> to port smtp
pass in log quick on egress proto tcp from <spamd-white> to port smtp
pass in log on egress proto tcp to port smtp rdr-to 127.0.0.1 port spamd queue spamd
pass out log on egress proto tcp to port smtp

Hah. But these rules differ both from the example in the spamd man page and in the other sources! Why?

Well, to tell you the truth, the only thing we achieve by doing the quick dance here is to make sure that SMTP traffic from any host that's already in the nospamd or spamd-white tables is never redirected to spamd, while traffic from anywhere else will match the first non-quick rule quoted here and will be redirected.

I do not remember the exact circumstances, but this particular construct was probably the result of a late night debugging session where the root cause of the odd behavior was something else entirely. But anyway, this recipe is offered in a true it works for me spirit, and I can attest that this configuration works.

The queue spamd part shows that this gateway also has a queue based traffic shaping regime in place. The final pass out rule is there to make sure spamlogd records outgoing mail traffic and maintains whitelist entries.

Update 2017-05-25: At some point after this was originally written, I revised that rule set. They now read, with no quick dance:

pass in on egress inet proto tcp from any to any port smtp \
divert-to 127.0.0.1 port spamd set queue spamd set prio 0

pass in log(all) on egress proto tcp from <nospamd> to port smtp
pass in log(all) on egress proto tcp from <spamd-white> to port smtp

pass out on egress proto tcp from { self $mailservers } to any port smtp

And of course for those rules to load, you need to define the tables before you use them by putting these two lines

table <spamd-white> persist
table <nospamd> persist file "/etc/mail/nospamd"

somewhere early in your /etc/pf.conf file.

Now let's see what the rules on the site with secondary MX looks like. We type:

$ doas grep smtp /etc/pf.conf

and get

pass in log on egress proto tcp to port smtp rdr-to 127.0.0.1 port spamd
pass log proto tcp from <moby> to port smtp
pass log proto tcp from <spamd-white> to port smtp
pass log proto tcp from $lan to port smtp

which is almost to the letter (barring only an obscure literature reference for one of the table names) the same as the published sample configurations.

Pro tip: Stick as close as possible to the recommended configuration from the spamd(8) man page. The first version here produced some truly odd results on occasion.

Once again the final rule is there to make sure spamlogd records outgoing mail traffic and maintains whitelist entries. The tables, again earlier on in the /etc/pf.conf file, are:

table <spamd-white> persist counters
table <moby> file "/etc/mail/nospamd"

At this point, you have seen how to set up two spamds, each running in front of a mail exchanger. You can choose to run with the default spamd.conf, or you can edit in your own customizations.

The next works for me item is bsdly.net's very own spamd.conf file, which automatically makes you a user of my greytrapping based blacklist.

Once you have edited the /etc/rc.conf.local files on both machines so the spamd_flags= no longer contains NO (change to spamd_flags="" for now), you can start spamd (by running /usr/libexec/spamd and /usr/libexec/spamdlogd and run /usr/libexec/spamd-setup manually).

Note (update 2021-03-19): On modern OpenBSD versions, the easiest way to enable and start spamd is (assuming you have configured doas to allow your user to run rcctl:)

$ doas rcctl enable spamd

$ doas rcctl start spamd

Or if you want, reboot the system and look for the spamlogd and spamd startup lines in the /etc/rc output.

The fourth and final required step for a spamd setup with backup mail exchangers it to set up synchronization between the spamds. The synchronization keeps your greylists in sync and transfers information on any greytrapped entries to the partner spamds. As the spamd man page explains, the synchronization options -y and -Y are command line options to spamd.

So let's see what the /etc/rc.conf.local on the primary has in its spamd_flags options line:

peter@primary-gw $ doas grep spamd /etc/rc.conf.local
spamd_flags="-v -G 2:8:864 -w 1 -y bge0 -Y secondary.com -Y secondary-gw.secondary.com "

Here we see that I've turned up verbose logging (-v), for some reason I've fiddled with the greylisting parameters (-G). But more significantly, I've also set up this spamd to listen for synchronization messages on the bge0 interface (-y) and to send its own synchronization messages to the hosts designated by the -Y options.

On the secondary, the configuration is almost identical. The only difference is the interface name and that the synchronization partner is the primary gateway.

$ doas grep spamd /etc/rc.conf.local spamd_flags="-v -G 2:8:864 -w 1 -y xl0 -Y primary-gw.primary.com -Y primary.com"

With these settings in place, you have more or less completed step four of our recipe. But if you want to make sure you get all spamd log messages in a separate log file, add these lines to your /etc/syslog.conf:

# spamd
!!spamd
daemon.err;daemon.warn;daemon.info;daemon.debug /var/log/spamd

After noting the system load on your content filtering machines, restart your spamds. Then watch the system load values on the content filterers and take a note of them from time to time, say every 30 minutes or so.

Step 4) is the last required step for building a multi-MX configuration. You may want to just leave the system running for a while and watch any messages that turn up in the spamd logs or the mail exchanger's logs.

The final embellishment is to set up local greytrapping. The principle is simple: If you have one or more addresses in your domain that you know will never be valid, you add them to your list of trapping addresses with a command such as

$ doas spamdb -T -a noreply@mydomain.nx

and any host that tries to deliver mail to noreply@mydomain.nx will be added to the local blacklist spamd-greytrap to be stuttered at for as long as it takes.

Greytrapping can be fun, you can search for posts here tagged with the obvious keywords. To get you started, I offer up my published list of trap addresses, built mainly from logs of unsuccessful delivery attempts here, at The BSDly.net traplist page, while the raw list of trap email addresses is available here. If you want to use that list in a similar manner for your site, please do, only remember to replace the domain names with one or more that you will be receiving mail for.

This is the list that is used to trap the addresses I publish here with a faster mirror here. The list is already in the spamd.conf file I offered you earlier.

If you want more background on the BSDly.net list, please see the How I Run This List, Or The Ethics of Running a Greytrapping Based Blacklist page or search this blog on the obvious keywords.

By the way, what happened to the load on those content filtering machines?

Update 2012-05-30: Titles updated to clarify that the main feature here is the spamd(8) spam deferral daemon from the OpenBSD base system, not the identically-named program from the SpamAssassin content filtering suite.

Update 2013-04-16: Added the Pro tip: Stick as close as possible to the recommendend configuration from the spamd(8) man page. The first version here produced some truly odd results on occasion.

Update 2015-01-23: Changed the Book of PF links to point to the most recent edition.

Update 2015-08-01: Several correspondents have asked me for a useful nospamd file. Here's the one I use at bsdly.net, (http://www.bsdly.net/~peter/nospamd) collected over the years from various incidents and some SPF records extracting via dig -ttxt domain.tld.

Update 2017-05-25: In an act of procrastination while preparing slides for the upcoming BSDCan PF and networking tutorial, I added man.openbsd.org links for man page references, and edited in some minor fixes such as up to date rules.

Update 2021-03-19: Added a note about using rcctl to enable and start spamd on recent OpenBSD versions. A tangentially related incident had me review this article, and I found that it would be more useful to readers to be pointed at the recommended way to run a system. Returning readers may also be interested in the care and feeding activites for the traplist and other data we offer as described in the more recent articles Goodness, Enumerated by Robots. Or, Handling Those Who Do Not Play Well With Greylisting and Badness, Enumerated by Robots (both 2018).

And of course, if you're still wondering why OpenBSD is good for you, the slides from my OpenBSD and you presentation might help.

You might also be interested in reading selected pieces via That Grumpy BSD Guy: A Short Reading List (also here).

Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Some times we just need facts on the table, automated.

In my previous blog post, I wondered aloud about publishing data about the machines that verifiably tried to spam us. The response was other than overwhelming, and with the script running once per day anyway, I now publish the results via the Name And Shame Robot page.

The annoucement below is very close to the text there, so by way of explanation, here is a gift to all my fellow spamd junkies out there:

We started actively greytrapping and publishing our list of greytrap addresses (almost exclusively addresses generated or made up elsewhere and harvested from our logs) during July 2007. The list of greytrap addresses is published on the Traplist page along with some commentary. You can find related comments in this blog post and its followups.

One byproduct of the greytrapping is a list of IP addresses that has tried to deliver mail to one or more of our greytrap addresses during the last 24 hours. The reasoning is, none of these addresses are valid, and any attempts at delivering to those addresses is more likely than not spam. You can download that list here as a raw list of IP addresses, or as a DNS zone file intended as a DNS blacklist here.

In early August 2008, I wrote a small script that copies (rsyncs, actually) the current list of trapped IP addresses as well as the spamd log off the firewall and for each IP address collects all log entries from the spamd log. The resulting file is rsynced back to the webserver, and you can view the latest version here.

The material here is useful mainly to the system administrators responsible for the machines that appear in it, or people who are interested in studying spammer or spambot behavior. Times are given according to the Europe/Oslo time zone (CET or CEST according to season), and if a date appears several times for an IP address entry, the reason is simply that the log data spans several years. The default syslog settings do not record the year.

In the data you will find several kinds of entries, most of them are pretty obvious and straightforward, others less so. The likely FAQ is, "what are the entries with no log data?". The answer is, the spamd here synchronizes with a spamds at other sites. The entries without log data entered our traplist through a sync operation, but the host did not attempt direct contact here.

The other likely question is, "what is that becks list?". It's what the rest of the world refers to as uatraps. I copied the data for that list into my config from Bob Beck's message on OpenBSD-misc and didn't notice that the list had an official name until much later.

Please note that this is not an up-to-the minute list. Depending on the number of hosts currently in the list of trapped addresses, the script's run time could be anything up to several hours. For that reason, the script starts at the time stated at the beginning of the report file and runs until it finishes generating. The last thing the script does is to rsync the report file to the webserver. For the time being, I archive older versions off-line.

This is now a totally hands-off, automated operation. The report is currently generated on a Pentium IV-class computer with few and only occasional other duties. If you have any comments or concerns, the address in the next sentence is the one I use for day to day email. If you find this data useful, donations of faster hardware or money (paypal to peter@bsdly.net or contact me for bank information) is of course welcome.

Wednesday, August 1, 2007

On the business end of a blacklist. Oh the hilarity.

I had planned to write about something else for my next blog entry, but life came back and bit me with another spam related episode. Next time, I promise, I'll do something interesting.

In the meantime, I've discovered that a) very few people actually use SPF to reject mail b) the SPF syntax looks simple, but is hard to get right, and c) there are still blacklists which routinely block whole /24 nets.

This morning I got a message from somebody I met at BSDCan in May, asking me to do something LinkedIn-related. Naturally, since I felt I needed some more details to do what this person wanted, I sent a short email message. That message got rejected,


SMTP error from remote mail server after MAIL FROM: SIZE=2240:
    host mailstore1.secureserver.net [64.202.166.11]:
    554 refused mailfrom because of SPF policy

which means that the SPF record


 datadok.no. IN TXT "v=spf1 ip4:194.54.103.64/26 
                     ip4:194.54.107.16/29 -all"

does not do what you think it does. Mail sent from 194.54.103.66 was not let through.

OK, the checking tool at the OpenSPF site seem to agree with secureserver.net, and I seriously can not blame them for the choice to trust SPF absolutely.

At the moment it seems my listing each host name is what does the trick. Weird. Anyway, next up in my attempt to communicate with my overseas friend, I tried sending a message from bsdly.net instead. That bounced too, but for a slightly different reason:


 SMTP error from remote mail server after RCPT TO::
    host smtp.where.secureserver.net [64.202.166.12]:
    553 Dynamic pool 194.54.107.142.

If you look up the data for bsdly.net, you will find that valid mail from there gets sent mainly from 194.54.107.19, which is in the tiny /29 our ISP set aside for my home net when I told them I wanted a fixed IP address.

I'm not sure if the rest of the "ip=194.54.107.*" network is actually a pool of dynamically allocated addresses these days, but I do know is that 194.54.107.16/29 has not been dynamically allocated for quite a number of years.

Going to the URL gave me this picture:

This really gives me no useful information at all. Except, of course, that at secureserver.net they think that putting entire /24 nets on their blacklist is useful. Some of us tend to disagree with that notion.

Anyway, I filled in the form with a terse but hopefully polite message, and clicked Submit.

I was rewarded with this message:

If I read this correctly, they think mail from 194.54.107.19 is spam because BGNett or MTULink have not set up reverse lookup for 194.54.107.142. OR because they think the entire /24 is dynamically allocated. OR somebody in that subnet may have sent spam at one time. I can only guess at the real reason, and repeat over and over that blocking entire subnets will give you a generous helping of false positives.

Nevermind that, the SPF record which made my mail from datadok.no go through to my overseas friend included a:hostname.domain.tld for all allowed senders.

And in other news, the PF tutorial saw its visitor number 15000 since EuroBSDCon 2006 on Saturday, last count is 15220.

That grumpy BSD guy

Sunday, June 8, 2025

Should I Stop Caring and Let IP Address Reputation Sort Them Out?

Monday, August 13, 2018

Badness, Enumerated by Robots

Saturday, April 23, 2016

Does Your Email Provider Know What A "Joejob" Is?

Sunday, April 14, 2013

Maintaining A Publicly Available Blacklist - Mechanisms And Principles

Tuesday, December 25, 2012

DDOS Bots Are People! (Or Manned By Some, At Least)

Monday, May 28, 2012

In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe

Setting Up OpenBSD's spamd(8) With Secondary MXes In Play in Four Easy Steps

Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Wednesday, August 1, 2007

On the business end of a blacklist. Oh the hilarity.

About Me

Buy The Book of PF

Upcoming Talks

Blog Archive

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)

Sunday, June 8, 2025

Monday, August 13, 2018

Saturday, April 23, 2016

Sunday, April 14, 2013

Tuesday, December 25, 2012

Monday, May 28, 2012

Setting Up OpenBSD's spamd(8) With Secondary MXes In Play in Four Easy Steps

Saturday, August 9, 2008

Wednesday, August 1, 2007

About Me

Buy The Book of PF

Upcoming Talks

Blog Archive

Subscribe To Posts From Here

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)