Process LACNIC geofeed as well

Message ID bdc5ec85-963b-25b4-3c0a-a420a263a883@ipfire.org
State Accepted
Commit 845da57719980b5a72c0b42704ab0fb5ddf85604
Headers
Series Process LACNIC geofeed as well |

Commit Message

Peter Müller Dec. 11, 2021, 9:59 p.m. UTC
  This improves country code accurarcy for suballocations within IP space
managed by LACNIC, as the delegated-extended-latest file only provides
country code information at the top level of an allocated network.

Sadly, lacnic.db.gz does not contain descriptions or names of Autonomous
Systems within the space maintained by LACNIC.

Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/python/importer.py          |  4 ++-
 src/python/location-importer.in | 43 +++++++++++++++++++++++++++------
 2 files changed, 39 insertions(+), 8 deletions(-)
  

Comments

Michael Tremer Dec. 13, 2021, 6:54 p.m. UTC | #1
Hello Peter,

I merged this patch, but I wanted to give you some guidance on how I would have written the Python code.

I am not saying my way is the better way. I just find it a lot easier to read.

> On 11 Dec 2021, at 21:59, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> This improves country code accurarcy for suballocations within IP space
> managed by LACNIC, as the delegated-extended-latest file only provides
> country code information at the top level of an allocated network.
> 
> Sadly, lacnic.db.gz does not contain descriptions or names of Autonomous
> Systems within the space maintained by LACNIC.
> 
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/python/importer.py          |  4 ++-
> src/python/location-importer.in | 43 +++++++++++++++++++++++++++------
> 2 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/src/python/importer.py b/src/python/importer.py
> index de340b4..dee36ed 100644
> --- a/src/python/importer.py
> +++ b/src/python/importer.py
> @@ -53,7 +53,9 @@ WHOIS_SOURCES = {
> 		],
> 
> 	# Latin America and Caribbean Network Information Centre
> -	# XXX ???
> +	"LACNIC": [
> +		"https://ftp.lacnic.net/lacnic/dbase/lacnic.db.gz"
> +		],
> 
> 	# Réseaux IP Européens
> 	"RIPE": [
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index b791b4d..78bb846 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -681,13 +681,42 @@ class CLI(object):
> 				# Strip any excess space
> 				start_address, end_address = start_address.rstrip(), end_address.strip()
> 
> -				# Convert to IP address
> -				try:
> -					start_address = ipaddress.ip_address(start_address)
> -					end_address   = ipaddress.ip_address(end_address)
> -				except ValueError:
> -					log.warning("Could not parse line: %s" % line)
> -					return
> +				# Handle "inetnum" formatting in LACNIC DB (e.g. "24.152.8/22" instead of "24.152.8.0/22")

I would have checked in the beginning whether the string contains “/“ and then branched off to somewhere that handles this.

That would have been more intuitive to read than the following if statement:

> +				if start_address and not (delim or end_address):
> +					try:
> +						start_address = ipaddress.ip_network(start_address, strict=False)
> +					except ValueError:
> +						start_address = start_address.split("/")
> +						ldigits = len(start_address[0].split("."))

You can use start_address[0].count(“.”) here which is shorter and easier to read. Hopefully it is implemented faster, too.

> +						# How many octets do we need to add?
> +						# (LACNIC does not seem to have a /8 or greater assigned, so the following should suffice.)
> +						if ldigits == 2:
> +							start_address = start_address[0] + ".0.0/" + start_address[1]
> +						elif ldigits == 3:
> +							start_address = start_address[0] + ".0/" + start_address[1]
> +						else:
> +							log.warning("Could not recover IPv4 address from line in LACNIC DB format: %s" % line)
> +							return

I would consider these inputs (e.g. 24.152.8/22) valid IP addresses. They are not very popular, confusing and very easy to get wrong, but inet_ntop should parse this correctly.

However, since we are in this mess already, maybe a regular expression or substitution would have been more simple to write.

In the end, this solution seems to work. The reason why I am going on about this is that we run through this millions of times. Even adding an extra if statement is becoming a performance penalty which I would like to avoid as much as possible. The other factor is that code should be easy to audit and that is why I would like to pass on my patterns because I believe they are good :)

-Michael

> +
> +						try:
> +							start_address = ipaddress.ip_network(start_address, strict=False)
> +						except ValueError:
> +							log.warning("Could not parse line in LACNIC DB format: %s" % line)
> +							return
> +
> +					# Enumerate first and last IP address of this network
> +					end_address = start_address[-1]
> +					start_address = start_address[0]
> +
> +				else:
> +					# Convert to IP address
> +					try:
> +						start_address = ipaddress.ip_address(start_address)
> +						end_address   = ipaddress.ip_address(end_address)
> +					except ValueError:
> +						log.warning("Could not parse line: %s" % line)
> +						return
> 
> 				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
> 
> -- 
> 2.26.2
  

Patch

diff --git a/src/python/importer.py b/src/python/importer.py
index de340b4..dee36ed 100644
--- a/src/python/importer.py
+++ b/src/python/importer.py
@@ -53,7 +53,9 @@  WHOIS_SOURCES = {
 		],
 
 	# Latin America and Caribbean Network Information Centre
-	# XXX ???
+	"LACNIC": [
+		"https://ftp.lacnic.net/lacnic/dbase/lacnic.db.gz"
+		],
 
 	# Réseaux IP Européens
 	"RIPE": [
diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index b791b4d..78bb846 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -681,13 +681,42 @@  class CLI(object):
 				# Strip any excess space
 				start_address, end_address = start_address.rstrip(), end_address.strip()
 
-				# Convert to IP address
-				try:
-					start_address = ipaddress.ip_address(start_address)
-					end_address   = ipaddress.ip_address(end_address)
-				except ValueError:
-					log.warning("Could not parse line: %s" % line)
-					return
+				# Handle "inetnum" formatting in LACNIC DB (e.g. "24.152.8/22" instead of "24.152.8.0/22")
+				if start_address and not (delim or end_address):
+					try:
+						start_address = ipaddress.ip_network(start_address, strict=False)
+					except ValueError:
+						start_address = start_address.split("/")
+						ldigits = len(start_address[0].split("."))
+
+						# How many octets do we need to add?
+						# (LACNIC does not seem to have a /8 or greater assigned, so the following should suffice.)
+						if ldigits == 2:
+							start_address = start_address[0] + ".0.0/" + start_address[1]
+						elif ldigits == 3:
+							start_address = start_address[0] + ".0/" + start_address[1]
+						else:
+							log.warning("Could not recover IPv4 address from line in LACNIC DB format: %s" % line)
+							return
+
+						try:
+							start_address = ipaddress.ip_network(start_address, strict=False)
+						except ValueError:
+							log.warning("Could not parse line in LACNIC DB format: %s" % line)
+							return
+
+					# Enumerate first and last IP address of this network
+					end_address = start_address[-1]
+					start_address = start_address[0]
+
+				else:
+					# Convert to IP address
+					try:
+						start_address = ipaddress.ip_address(start_address)
+						end_address   = ipaddress.ip_address(end_address)
+					except ValueError:
+						log.warning("Could not parse line: %s" % line)
+						return
 
 				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))