[v2] location-importer.in: Import (technical) AS names from ARIN

Message ID 20210608170307.623-1-peter.mueller@ipfire.org
State Accepted
Commit 92403f3910c7a1aa576fc56953ec931ebfffd107
Headers
Series [v2] location-importer.in: Import (technical) AS names from ARIN |

Commit Message

Peter Müller June 8, 2021, 5:03 p.m. UTC
  ARIN and LACNIC, unfortunately, do not seem to publish data containing
human readable AS names. For the former, we at least have a list of
tecnical names, which this patch fetches and inserts into the autnums
table.

While some of them do not seem to be suitable for human consumption (i.
e. being very cryptic), providing these data might be helpful
neverthelesss.

The second version of this patch contains some additional remarks on
efficient Python coding style from Michael, doing things more "pythonic".

Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/python/location-importer.in | 55 +++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
  

Comments

Michael Tremer June 10, 2021, 8:52 a.m. UTC | #1
Hello,

> On 8 Jun 2021, at 18:03, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> ARIN and LACNIC, unfortunately, do not seem to publish data containing
> human readable AS names. For the former, we at least have a list of
> tecnical names, which this patch fetches and inserts into the autnums
> table.
> 
> While some of them do not seem to be suitable for human consumption (i.
> e. being very cryptic), providing these data might be helpful
> neverthelesss.
> 
> The second version of this patch contains some additional remarks on
> efficient Python coding style from Michael, doing things more "pythonic".
> 
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/python/location-importer.in | 55 +++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index aa3b8f7..6ccee3b 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -505,6 +505,9 @@ class CLI(object):
> 						for line in f:
> 							self._parse_line(line, source_key, validcountries)
> 
> +		# Download and import (technical) AS names from ARIN
> +		self._import_as_names_from_arin()
> +
> 	def _check_parsed_network(self, network):
> 		"""
> 			Assistive function to detect and subsequently sort out parsed
> @@ -775,6 +778,58 @@ class CLI(object):
> 			"%s" % network, country, [country], source_key,
> 		)
> 
> +	def _import_as_names_from_arin(self):
> +		downloader = location.importer.Downloader()
> +
> +		# XXX: Download AS names file from ARIN (note that these names appear to be quite
> +		# technical, not intended for human consumption, as description fields in
> +		# organisation handles for other RIRs are - however, this is what we have got,
> +		# and in some cases, it might be still better than nothing)
> +		with downloader.request("https://ftp.arin.net/info/asn.txt", return_blocks=False) as f:
> +			for line in f:
> +				# Convert binary line to string...
> +				line = str(line)
> +
> +				# ... valid lines start with a space, followed by the number of the Autonomous System ...
> +				if not line.startswith(" "):
> +					continue
> +
> +				# Split line and check if there is a valid ASN in it...
> +				asn, name = line.split()[0:2]
> +
> +				try:
> +					asn = int(asn)
> +				except ValueError:
> +					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
> +					continue
> +
> +				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
> +					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
> +					continue
> +
> +				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
> +				if re.match(r"^(ASN-BLK|)(AFCONC|AFRINIC|APNIC|ASNBLK|DNIC|LACNIC|RIPE|IANA)(\d?$|\-.*)", name):
> +					continue

This is still not entirely optimal. It doesn’t matter too much, so I will merge it, but…

* You added a selection group which you do not need, so you could have written (?:…) instead of (…).

\-.* matches a literal dash and then anything after it. You do not care about what comes after, so you could have just had \- and that is it. It would have saved a couple of CPU cycles because you don’t have to read the entire rest of the string.

> +
> +				# Bail out in case the AS name contains anything we do not expect here...
> +				if re.search(r"[^a-zA-Z0-9-_]", name):
> +					log.debug("Skipping ARIN AS name for %s containing invalid characters: %s" % \
> +							(asn, name))
> +
> +				# Things look good here, run INSERT statement and skip this one if we already have
> +				# a (better?) name for this Autonomous System...
> +				self.db.execute("""
> +					INSERT INTO autnums(
> +						number,
> +						name,
> +						source
> +					) VALUES (%s, %s, %s)
> +					ON CONFLICT (number) DO NOTHING""",
> +					asn,
> +					name,
> +					"ARIN",
> +				)
> +
> 	def handle_update_announcements(self, ns):
> 		server = ns.server[0]
> 
> -- 
> 2.20.1
> 

-Michael
  

Patch

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index aa3b8f7..6ccee3b 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -505,6 +505,9 @@  class CLI(object):
 						for line in f:
 							self._parse_line(line, source_key, validcountries)
 
+		# Download and import (technical) AS names from ARIN
+		self._import_as_names_from_arin()
+
 	def _check_parsed_network(self, network):
 		"""
 			Assistive function to detect and subsequently sort out parsed
@@ -775,6 +778,58 @@  class CLI(object):
 			"%s" % network, country, [country], source_key,
 		)
 
+	def _import_as_names_from_arin(self):
+		downloader = location.importer.Downloader()
+
+		# XXX: Download AS names file from ARIN (note that these names appear to be quite
+		# technical, not intended for human consumption, as description fields in
+		# organisation handles for other RIRs are - however, this is what we have got,
+		# and in some cases, it might be still better than nothing)
+		with downloader.request("https://ftp.arin.net/info/asn.txt", return_blocks=False) as f:
+			for line in f:
+				# Convert binary line to string...
+				line = str(line)
+
+				# ... valid lines start with a space, followed by the number of the Autonomous System ...
+				if not line.startswith(" "):
+					continue
+
+				# Split line and check if there is a valid ASN in it...
+				asn, name = line.split()[0:2]
+
+				try:
+					asn = int(asn)
+				except ValueError:
+					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
+					continue
+
+				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
+					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
+					continue
+
+				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
+				if re.match(r"^(ASN-BLK|)(AFCONC|AFRINIC|APNIC|ASNBLK|DNIC|LACNIC|RIPE|IANA)(\d?$|\-.*)", name):
+					continue
+
+				# Bail out in case the AS name contains anything we do not expect here...
+				if re.search(r"[^a-zA-Z0-9-_]", name):
+					log.debug("Skipping ARIN AS name for %s containing invalid characters: %s" % \
+							(asn, name))
+
+				# Things look good here, run INSERT statement and skip this one if we already have
+				# a (better?) name for this Autonomous System...
+				self.db.execute("""
+					INSERT INTO autnums(
+						number,
+						name,
+						source
+					) VALUES (%s, %s, %s)
+					ON CONFLICT (number) DO NOTHING""",
+					asn,
+					name,
+					"ARIN",
+				)
+
 	def handle_update_announcements(self, ns):
 		server = ns.server[0]