[RFC] location-importer.in: Add Spamhaus DROP lists

Message ID 07e4fb71-b852-216d-0ea3-534f428eaa9b@ipfire.org
State Superseded
Headers
Series [RFC] location-importer.in: Add Spamhaus DROP lists |

Commit Message

Peter Müller Sept. 25, 2021, 1:31 p.m. UTC
  A while ago, it was discussed whether or not libloc should become an
"opinionated database", i. e. including any information on a network's
reputation.

In general, this idea was dismissed as libloc is neither intended nor
suitable for such tasks, and we do not want to make (political?)
decisions like these for various reasons. All we do is to provide a
useful location database in a neutral way, and leave it up to our users
on how to react on certain results.

However, there is a problematic area. Take AS55303 as an example: We
_know_ this is to be a dirty network, tampering with RIR data and
hijacking IP space, and strongly recommend against processing any
connection originating from or directed to it.

Since it appears to be loaded with proxies used by miscreants for
abusive purposes, all we can do at the time of writing is to flag it
as "anonymous proxy", but we lack possibility of telling our users
something like "this is not a safe area". The very same goes for known
bulletproof ISPs, IP hijackers, and so forth.

This patch therefore suggests to populate the "is_drop" flag introduced
in libloc 0.9.8 (albeit currently unused in production) with the
contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
have at least the baddest of the bad covered. The very same lists are,
in fact, included in popular IPS rulesets as well - a decent amount of
IPFire users is therefore likely to have them already enabled, but in a
very costly way.

It is not planned to go further, partly because there is no other feed
publicly available, which would come with the same intention,
volatility, and FP rate.

Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)
  

Comments

Michael Tremer Sept. 30, 2021, 1:05 p.m. UTC | #1
Hello,

> On 25 Sep 2021, at 14:31, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> A while ago, it was discussed whether or not libloc should become an
> "opinionated database", i. e. including any information on a network's
> reputation.
> 
> In general, this idea was dismissed as libloc is neither intended nor
> suitable for such tasks, and we do not want to make (political?)
> decisions like these for various reasons. All we do is to provide a
> useful location database in a neutral way, and leave it up to our users
> on how to react on certain results.

I agree that the database should not become political, although I personally do not feel that those DROP decisions are falling into the political category.

If we would decide to drop certain traffic *just because* it is coming from a certain country, that would be a different story. If anything we are blocking organisations for continued malicious behaviour. Coding that kind of information into the database simply makes a lot of sense to me.

> However, there is a problematic area. Take AS55303 as an example: We
> _know_ this is to be a dirty network, tampering with RIR data and
> hijacking IP space, and strongly recommend against processing any
> connection originating from or directed to it.
> 
> Since it appears to be loaded with proxies used by miscreants for
> abusive purposes, all we can do at the time of writing is to flag it
> as "anonymous proxy", but we lack possibility of telling our users
> something like "this is not a safe area". The very same goes for known
> bulletproof ISPs, IP hijackers, and so forth.
> 
> This patch therefore suggests to populate the "is_drop" flag introduced
> in libloc 0.9.8 (albeit currently unused in production) with the
> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
> have at least the baddest of the bad covered. The very same lists are,
> in fact, included in popular IPS rulesets as well - a decent amount of
> IPFire users is therefore likely to have them already enabled, but in a
> very costly way.
> 
> It is not planned to go further, partly because there is no other feed
> publicly available, which would come with the same intention,
> volatility, and FP rate.
> 
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
> 1 file changed, 104 insertions(+)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index da058d3..0f06465 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -1059,6 +1059,9 @@ class CLI(object):
> 			# network allocation lists in a machine-readable format...
> 			self._update_overrides_for_aws()
> 
> +			# Update overrides for Spamhaus DROP feeds...
> +			self._update_overrides_for_spamhaus_drop()
> +
> 			for file in ns.files:
> 				log.info("Reading %s..." % file)
> 
> @@ -1243,6 +1246,107 @@ class CLI(object):
> 				)
> 
> 
> +	def _update_overrides_for_spamhaus_drop(self):
> +		downloader = location.importer.Downloader()
> +
> +		ip_urls = [
> +					"https://www.spamhaus.org/drop/drop.txt",
> +					"https://www.spamhaus.org/drop/edrop.txt",
> +					"https://www.spamhaus.org/drop/dropv6.txt"
> +				]
> +
> +		asn_urls = [
> +					"https://www.spamhaus.org/drop/asndrop.txt"
> +				]
> +
> +		for url in ip_urls:
> +			try:
> +				with downloader.request(url, return_blocks=False) as f:
> +					fcontent = f.body.readlines()
> +			except Exception as e:
> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
> +				return
> +
> +			# Iterate through every line, filter comments and add remaining networks to
> +			# the override table in case they are valid...
> +			with self.db.transaction():
> +				for sline in fcontent:
> +
> +					# XXX: Is there a better way to do this?
> +					sline = sline.decode("utf-8")

No, what is wrong with it? You are reading bytes and you assume that it has been encoded as UTF-8. Couldn’t be any easier.

> +
> +					# Comments start with a semicolon, followed by a space...
> +					if sline.startswith("; "):
> +						continue

I don’t know where you are taking the confidence from that there will always be the space. I would have been satisfied with finding ‘;’.

> +
> +					# Extract network and ignore anything afterwards...
> +					try:
> +						network = ipaddress.ip_network(sline.split()[0], strict=False)
> +					except ValueError:
> +						log.error("Unable to parse line: %s" % sline)
> +						continue
> +
> +					# Sanitize parsed networks...
> +					if not self._check_parsed_network(network):
> +						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
> +							(url, network))
> +						continue
> +
> +					# Conduct SQL statement...
> +					self.db.execute("""
> +						INSERT INTO network_overrides(
> +							network,
> +							source,
> +							is_drop
> +						) VALUES (%s, %s, %s)
> +						ON CONFLICT (network) DO NOTHING""",
> +						"%s" % network,
> +						"Spamhaus DROP lists",
> +						True
> +					)
> +
> +		for url in asn_urls:
> +			try:
> +				with downloader.request(url, return_blocks=False) as f:
> +					fcontent = f.body.readlines()
> +			except Exception as e:
> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
> +				return
> +
> +			# Iterate through every line, filter comments and add remaining ASNs to
> +			# the override table in case they are valid...
> +			with self.db.transaction():
> +				for sline in fcontent:
> +
> +					# XXX: Is there a better way to do this?
> +					sline = sline.decode("utf-8")
> +
> +					# Comments start with a semicolon, followed by a space...
> +					if sline.startswith("; "):
> +						continue
> +
> +					# Extract ASN and ignore anything afterwards...
> +					asn = int(sline.split()[0].strip("AS"))
> +
> +					# Sanitize parsed ASNs...
> +					if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
> +						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
> +							(url, asn))
> +						continue

Should we not have a simple function for this that we can use repeatedly?

> +
> +					# Conduct SQL statement...
> +					self.db.execute("""
> +						INSERT INTO autnum_overrides(
> +							number,
> +							source,
> +							is_drop
> +						) VALUES (%s, %s, %s)
> +						ON CONFLICT (number) DO NOTHING""",
> +						"%s" % asn,
> +						"Spamhaus ASN-DROP list",
> +						True
> +					)
> +
> 	@staticmethod
> 	def _parse_bool(block, key):
> 		val = block.get(key)
> -- 
> 2.26.2
  

Patch

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index da058d3..0f06465 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -1059,6 +1059,9 @@  class CLI(object):
 			# network allocation lists in a machine-readable format...
 			self._update_overrides_for_aws()
 
+			# Update overrides for Spamhaus DROP feeds...
+			self._update_overrides_for_spamhaus_drop()
+
 			for file in ns.files:
 				log.info("Reading %s..." % file)
 
@@ -1243,6 +1246,107 @@  class CLI(object):
 				)
 
 
+	def _update_overrides_for_spamhaus_drop(self):
+		downloader = location.importer.Downloader()
+
+		ip_urls = [
+					"https://www.spamhaus.org/drop/drop.txt",
+					"https://www.spamhaus.org/drop/edrop.txt",
+					"https://www.spamhaus.org/drop/dropv6.txt"
+				]
+
+		asn_urls = [
+					"https://www.spamhaus.org/drop/asndrop.txt"
+				]
+
+		for url in ip_urls:
+			try:
+				with downloader.request(url, return_blocks=False) as f:
+					fcontent = f.body.readlines()
+			except Exception as e:
+				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
+				return
+
+			# Iterate through every line, filter comments and add remaining networks to
+			# the override table in case they are valid...
+			with self.db.transaction():
+				for sline in fcontent:
+
+					# XXX: Is there a better way to do this?
+					sline = sline.decode("utf-8")
+
+					# Comments start with a semicolon, followed by a space...
+					if sline.startswith("; "):
+						continue
+
+					# Extract network and ignore anything afterwards...
+					try:
+						network = ipaddress.ip_network(sline.split()[0], strict=False)
+					except ValueError:
+						log.error("Unable to parse line: %s" % sline)
+						continue
+
+					# Sanitize parsed networks...
+					if not self._check_parsed_network(network):
+						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
+							(url, network))
+						continue
+
+					# Conduct SQL statement...
+					self.db.execute("""
+						INSERT INTO network_overrides(
+							network,
+							source,
+							is_drop
+						) VALUES (%s, %s, %s)
+						ON CONFLICT (network) DO NOTHING""",
+						"%s" % network,
+						"Spamhaus DROP lists",
+						True
+					)
+
+		for url in asn_urls:
+			try:
+				with downloader.request(url, return_blocks=False) as f:
+					fcontent = f.body.readlines()
+			except Exception as e:
+				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
+				return
+
+			# Iterate through every line, filter comments and add remaining ASNs to
+			# the override table in case they are valid...
+			with self.db.transaction():
+				for sline in fcontent:
+
+					# XXX: Is there a better way to do this?
+					sline = sline.decode("utf-8")
+
+					# Comments start with a semicolon, followed by a space...
+					if sline.startswith("; "):
+						continue
+
+					# Extract ASN and ignore anything afterwards...
+					asn = int(sline.split()[0].strip("AS"))
+
+					# Sanitize parsed ASNs...
+					if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
+						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
+							(url, asn))
+						continue
+
+					# Conduct SQL statement...
+					self.db.execute("""
+						INSERT INTO autnum_overrides(
+							number,
+							source,
+							is_drop
+						) VALUES (%s, %s, %s)
+						ON CONFLICT (number) DO NOTHING""",
+						"%s" % asn,
+						"Spamhaus ASN-DROP list",
+						True
+					)
+
 	@staticmethod
 	def _parse_bool(block, key):
 		val = block.get(key)