location-importer: Fix Spamhaus ASN-DROP parsing

Message ID fd183979-30db-463a-a381-4894cf4c8033@ipfire.org
State Superseded
Headers
Series location-importer: Fix Spamhaus ASN-DROP parsing |

Commit Message

Peter Müller Nov. 4, 2023, 2:04 p.m. UTC
  The format of this list has changed, from a plain text file with a
customer schema to JSON. Adjust our routines accordingly to make use of
this list again.

Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
Tested-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/scripts/location-importer.in | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)
  

Comments

Michael Tremer Nov. 5, 2023, 1:47 p.m. UTC | #1
Hello Peter,

> On 4 Nov 2023, at 14:04, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> The format of this list has changed, from a plain text file with a
> customer schema to JSON. Adjust our routines accordingly to make use of
> this list again.
> 
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> Tested-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/scripts/location-importer.in | 21 +++++++--------------
> 1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
> index 28a4f6c..8b0c676 100644
> --- a/src/scripts/location-importer.in
> +++ b/src/scripts/location-importer.in
> @@ -3,7 +3,7 @@
> #                                                                             #
> # libloc - A library to determine the location of someone on the Internet     #
> #                                                                             #
> -# Copyright (C) 2020-2022 IPFire Development Team <info@ipfire.org>           #
> +# Copyright (C) 2020-2023 IPFire Development Team <info@ipfire.org>           #
> #                                                                             #
> # This library is free software; you can redistribute it and/or               #
> # modify it under the terms of the GNU Lesser General Public                  #
> @@ -1686,7 +1686,7 @@ class CLI(object):
> ]
> 
> asn_lists = [
> - ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.txt")
> + ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.json")
> ]
> 
> for name, url in ip_lists:
> @@ -1759,23 +1759,16 @@ class CLI(object):
> 
> # Iterate through every line, filter comments and add remaining ASNs to
> # the override table in case they are valid...
> - for sline in f.readlines():
> + for sline in fcontent:
> # The response is assumed to be encoded in UTF-8...
> sline = sline.decode("utf-8")
> 
> - # Comments start with a semicolon...
> - if sline.startswith(";"):
> + # Load every line as a JSON object and try to obtain an ASN from it...
> + try:
> + asn = json.loads(sline)["asn"]
> + except KeyError:
> continue

It would be nicer if you didn’t do so many things in one line, because it will get difficult to catch the correct exception.

I believe that this should be split into the json.loads() operation where you should catch any JSON decoding issues. This would make it clear that some kind of download issue or similar has happened.

Fetching the “asn” field should be a second step, because an error here means that you have received a valid JSON object, but the format has changed.

> - # Throw away anything after the first space...
> - sline = sline.split()[0]
> -
> - # ... strip the "AS" prefix from it ...
> - sline = sline.strip("AS")
> -
> - # ... and convert it into an integer. Voila.
> - asn = int(sline)
> -
> # Filter invalid ASNs...
> if not self._check_parsed_asn(asn):
> log.warning("Skipping bogus ASN found in %s (%s): %s" % \
> -- 
> 2.35.3
  
Peter Müller Nov. 5, 2023, 4:59 p.m. UTC | #2
Hello Michael,

thank you for your reply.

> Hello Peter,
> 
>> On 4 Nov 2023, at 14:04, Peter Müller <peter.mueller@ipfire.org> wrote:
>>
>> The format of this list has changed, from a plain text file with a
>> customer schema to JSON. Adjust our routines accordingly to make use of
>> this list again.
>>
>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>> Tested-by: Peter Müller <peter.mueller@ipfire.org>
>> ---
>> src/scripts/location-importer.in | 21 +++++++--------------
>> 1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
>> index 28a4f6c..8b0c676 100644
>> --- a/src/scripts/location-importer.in
>> +++ b/src/scripts/location-importer.in
>> @@ -3,7 +3,7 @@
>> #                                                                             #
>> # libloc - A library to determine the location of someone on the Internet     #
>> #                                                                             #
>> -# Copyright (C) 2020-2022 IPFire Development Team <info@ipfire.org>           #
>> +# Copyright (C) 2020-2023 IPFire Development Team <info@ipfire.org>           #
>> #                                                                             #
>> # This library is free software; you can redistribute it and/or               #
>> # modify it under the terms of the GNU Lesser General Public                  #
>> @@ -1686,7 +1686,7 @@ class CLI(object):
>> ]
>>
>> asn_lists = [
>> - ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.txt")
>> + ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.json")
>> ]
>>
>> for name, url in ip_lists:
>> @@ -1759,23 +1759,16 @@ class CLI(object):
>>
>> # Iterate through every line, filter comments and add remaining ASNs to
>> # the override table in case they are valid...
>> - for sline in f.readlines():
>> + for sline in fcontent:
>> # The response is assumed to be encoded in UTF-8...
>> sline = sline.decode("utf-8")
>>
>> - # Comments start with a semicolon...
>> - if sline.startswith(";"):
>> + # Load every line as a JSON object and try to obtain an ASN from it...
>> + try:
>> + asn = json.loads(sline)["asn"]
>> + except KeyError:
>> continue
> 
> It would be nicer if you didn’t do so many things in one line, because it will get difficult to catch the correct exception.
> 
> I believe that this should be split into the json.loads() operation where you should catch any JSON decoding issues. This would make it clear that some kind of download issue or similar has happened.
> 
> Fetching the “asn” field should be a second step, because an error here means that you have received a valid JSON object, but the format has changed.

Yes, that makes sense. I'll submit a second version of this patch.

Having the json.loads() operation and the extraction of listed ASNs separated carries
the benefit of being able to extract the AS's name as well. I was thinking that we
might want to update the autnums table accordingly, if it does not contain the AS
already.

This would allow us to have names for ASN-DROP'ed ASNs in the LACNIC space. However,
there are only two listed at the moment, so I'm not sure if this is worth the effort.
But then, having nothing is better than having nothing at all. :-)

What do you think?

Thanks, and best regards,
Peter Müller

> 
>> - # Throw away anything after the first space...
>> - sline = sline.split()[0]
>> -
>> - # ... strip the "AS" prefix from it ...
>> - sline = sline.strip("AS")
>> -
>> - # ... and convert it into an integer. Voila.
>> - asn = int(sline)
>> -
>> # Filter invalid ASNs...
>> if not self._check_parsed_asn(asn):
>> log.warning("Skipping bogus ASN found in %s (%s): %s" % \
>> -- 
>> 2.35.3
>
  
Michael Tremer Nov. 5, 2023, 6:02 p.m. UTC | #3
Hello,

> On 5 Nov 2023, at 16:59, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> Hello Michael,
> 
> thank you for your reply.
> 
>> Hello Peter,
>> 
>>> On 4 Nov 2023, at 14:04, Peter Müller <peter.mueller@ipfire.org> wrote:
>>> 
>>> The format of this list has changed, from a plain text file with a
>>> customer schema to JSON. Adjust our routines accordingly to make use of
>>> this list again.
>>> 
>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>> Tested-by: Peter Müller <peter.mueller@ipfire.org>
>>> ---
>>> src/scripts/location-importer.in | 21 +++++++--------------
>>> 1 file changed, 7 insertions(+), 14 deletions(-)
>>> 
>>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
>>> index 28a4f6c..8b0c676 100644
>>> --- a/src/scripts/location-importer.in
>>> +++ b/src/scripts/location-importer.in
>>> @@ -3,7 +3,7 @@
>>> #                                                                             #
>>> # libloc - A library to determine the location of someone on the Internet     #
>>> #                                                                             #
>>> -# Copyright (C) 2020-2022 IPFire Development Team <info@ipfire.org>           #
>>> +# Copyright (C) 2020-2023 IPFire Development Team <info@ipfire.org>           #
>>> #                                                                             #
>>> # This library is free software; you can redistribute it and/or               #
>>> # modify it under the terms of the GNU Lesser General Public                  #
>>> @@ -1686,7 +1686,7 @@ class CLI(object):
>>> ]
>>> 
>>> asn_lists = [
>>> - ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.txt")
>>> + ("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.json")
>>> ]
>>> 
>>> for name, url in ip_lists:
>>> @@ -1759,23 +1759,16 @@ class CLI(object):
>>> 
>>> # Iterate through every line, filter comments and add remaining ASNs to
>>> # the override table in case they are valid...
>>> - for sline in f.readlines():
>>> + for sline in fcontent:
>>> # The response is assumed to be encoded in UTF-8...
>>> sline = sline.decode("utf-8")
>>> 
>>> - # Comments start with a semicolon...
>>> - if sline.startswith(";"):
>>> + # Load every line as a JSON object and try to obtain an ASN from it...
>>> + try:
>>> + asn = json.loads(sline)["asn"]
>>> + except KeyError:
>>> continue
>> 
>> It would be nicer if you didn’t do so many things in one line, because it will get difficult to catch the correct exception.
>> 
>> I believe that this should be split into the json.loads() operation where you should catch any JSON decoding issues. This would make it clear that some kind of download issue or similar has happened.
>> 
>> Fetching the “asn” field should be a second step, because an error here means that you have received a valid JSON object, but the format has changed.
> 
> Yes, that makes sense. I'll submit a second version of this patch.
> 
> Having the json.loads() operation and the extraction of listed ASNs separated carries
> the benefit of being able to extract the AS's name as well. I was thinking that we
> might want to update the autnums table accordingly, if it does not contain the AS
> already.

I believe that this is a good idea, because potentially the ASN provided might be fake, and debugging is probably getting easier if we are looking for the same thing.

> This would allow us to have names for ASN-DROP'ed ASNs in the LACNIC space. However,
> there are only two listed at the moment, so I'm not sure if this is worth the effort.
> But then, having nothing is better than having nothing at all. :-)

Agreed.

> What do you think?

-Michael

> 
> Thanks, and best regards,
> Peter Müller
> 
>> 
>>> - # Throw away anything after the first space...
>>> - sline = sline.split()[0]
>>> -
>>> - # ... strip the "AS" prefix from it ...
>>> - sline = sline.strip("AS")
>>> -
>>> - # ... and convert it into an integer. Voila.
>>> - asn = int(sline)
>>> -
>>> # Filter invalid ASNs...
>>> if not self._check_parsed_asn(asn):
>>> log.warning("Skipping bogus ASN found in %s (%s): %s" % \
>>> -- 
>>> 2.35.3
  

Patch

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 28a4f6c..8b0c676 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -3,7 +3,7 @@ 
 #                                                                             #
 # libloc - A library to determine the location of someone on the Internet     #
 #                                                                             #
-# Copyright (C) 2020-2022 IPFire Development Team <info@ipfire.org>           #
+# Copyright (C) 2020-2023 IPFire Development Team <info@ipfire.org>           #
 #                                                                             #
 # This library is free software; you can redistribute it and/or               #
 # modify it under the terms of the GNU Lesser General Public                  #
@@ -1686,7 +1686,7 @@  class CLI(object):
 				]
 
 		asn_lists = [
-					("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.txt")
+					("SPAMHAUS-ASNDROP", "https://www.spamhaus.org/drop/asndrop.json")
 				]
 
 		for name, url in ip_lists:
@@ -1759,23 +1759,16 @@  class CLI(object):
 
 				# Iterate through every line, filter comments and add remaining ASNs to
 				# the override table in case they are valid...
-				for sline in f.readlines():
+				for sline in fcontent:
 					# The response is assumed to be encoded in UTF-8...
 					sline = sline.decode("utf-8")
 
-					# Comments start with a semicolon...
-					if sline.startswith(";"):
+					# Load every line as a JSON object and try to obtain an ASN from it...
+					try:
+						asn = json.loads(sline)["asn"]
+					except KeyError:
 						continue
 
-					# Throw away anything after the first space...
-					sline = sline.split()[0]
-
-					# ... strip the "AS" prefix from it ...
-					sline = sline.strip("AS")
-
-					# ... and convert it into an integer. Voila.
-					asn = int(sline)
-
 					# Filter invalid ASNs...
 					if not self._check_parsed_asn(asn):
 						log.warning("Skipping bogus ASN found in %s (%s): %s" % \