From patchwork Sun Dec 10 19:37:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Peter_M=C3=BCller?= X-Patchwork-Id: 7389 Return-Path: Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature ECDSA (secp384r1) client-digest SHA384) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by web04.haj.ipfire.org (Postfix) with ESMTPS id 4SpFX22tnsz3x25 for ; Sun, 10 Dec 2023 19:37:38 +0000 (UTC) Received: from mail02.haj.ipfire.org (mail02.haj.ipfire.org [172.28.1.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature ECDSA (secp384r1) client-digest SHA384) (Client CN "mail02.haj.ipfire.org", Issuer "R3" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 4SpFX128kRzC6; Sun, 10 Dec 2023 19:37:37 +0000 (UTC) Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4SpFX10Hnzz2xlN; Sun, 10 Dec 2023 19:37:37 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature ECDSA (secp384r1) client-digest SHA384) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4SpFWy4YFRz2xlN for ; Sun, 10 Dec 2023 19:37:34 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4SpFWx11w1zC6 for ; Sun, 10 Dec 2023 19:37:32 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1702237053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=YO7qKCmkworUhwD/PEf0VD4UCFKHgvsa/u/AhKUdrQU=; b=DYJPKeA6wldEIZiWWDJ7Dl7+iGNg+X5d6ATYNJXC8NOOcV46/Ye0ZCwiYYGnMDnxTgMI7S SWAsQiOExRuWZZBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1702237053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=YO7qKCmkworUhwD/PEf0VD4UCFKHgvsa/u/AhKUdrQU=; b=sYP9FbZB9nqqrgebScrjgn2oEVY/o2Z2c+mRA8MC32ZJgEvXdemgC2sdQ1OkCaniejWRSj gUzvyBAM7Vk9Y+rfbFMbQoLFZSpQ79JA28VGUl7hhtkSQP3tlJHCSZBmyvl/ZahGeEsDXt aCdZ5qaJu8RjGQt5v++lSMcDk3bNvmh4IodJHg7Qjc9Vjv8MwlDj9xSGOokhElqPfz8Pvf mhDmEBI1VPwy5aj/vB5lD/hwOBo/JYQp5rCXBt7tJ/dO5sDZHknrhkIHLSykNn1/0Dvgdd n3bF4GAHvUuU9r7DBt6betMCy1/XN9Asft4z5eqqou1Q3tfDvZlu4mQ/O0elJQ== Message-ID: Date: Sun, 10 Dec 2023 19:37:00 +0000 MIME-Version: 1.0 To: "IPFire: Location" From: =?utf-8?q?Peter_M=C3=BCller?= Subject: [PATCH] location-importer: Replace ARIN AS names source with one that offers human-readable names Message-ID-Hash: SZRZM37NU5Q2VX76W2QUKT65BY6WME27 X-Message-ID-Hash: SZRZM37NU5Q2VX76W2QUKT65BY6WME27 X-MailFrom: peter.mueller@ipfire.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: "IPFire Location development/database maintainance talk." Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This patch replaces our previous source for AS names in ARIN's realms with another file provided by ARIN that contains human-readable names for organizations ASNs have been allocated to. Please note that a TRUNCATE autnums; is necessary on machines previously running the old version of location-importer, in order to make use of this changed data source. Signed-off-by: Peter Müller --- src/scripts/location-importer.in | 47 ++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in index 28a4f6c..96b3a20 100644 --- a/src/scripts/location-importer.in +++ b/src/scripts/location-importer.in @@ -3,7 +3,7 @@ # # # libloc - A library to determine the location of someone on the Internet # # # -# Copyright (C) 2020-2022 IPFire Development Team # +# Copyright (C) 2020-2023 IPFire Development Team # # # # This library is free software; you can redistribute it and/or # # modify it under the terms of the GNU Lesser General Public # @@ -19,6 +19,7 @@ import argparse import concurrent.futures +import csv import http.client import ipaddress import json @@ -1033,36 +1034,42 @@ class CLI(object): def _import_as_names_from_arin(self): downloader = location.importer.Downloader() - # XXX: Download AS names file from ARIN (note that these names appear to be quite - # technical, not intended for human consumption, as description fields in - # organisation handles for other RIRs are - however, this is what we have got, - # and in some cases, it might be still better than nothing) - for line in downloader.request_lines("https://ftp.arin.net/info/asn.txt"): - # Valid lines start with a space, followed by the number of the Autonomous System ... - if not line.startswith(" "): + # Download AS names file from ARIN and load it into CSV parser + for line in downloader.request_lines("https://ftp.arin.net/pub/resource_registry_service/asns.csv"): + + # Valid lines start with a " ... + if not line.startswith("\""): continue # Split line and check if there is a valid ASN in it... - asn, name = line.split()[0:2] + for row in csv.reader([line]): + orgname = row[0] + orghandle = row[1] + firstasn = row[3] + lastasn = row[4] try: - asn = int(asn) + firstasn = int(firstasn.strip("\"")) + lastasn = int(lastasn.strip("\"")) except ValueError: - log.debug("Skipping ARIN AS names line not containing an integer for ASN") + log.debug("Skipping ARIN AS names line not containing valid integers for ASN") continue # Filter invalid ASNs... - if not self._check_parsed_asn(asn): + if not self._check_parsed_asn(firstasn): continue - # Skip any AS name that appears to be a placeholder for a different RIR or entity... - if re.match(r"^(ASN-BLK|)(AFCONC|AFRINIC|APNIC|ASNBLK|LACNIC|RIPE|IANA)(?:\d?$|\-)", name): + if firstasn > lastasn: + continue + + # Filter any bulk AS assignments, since these are present for other RIRs where + # we get better data from elsewhere. + if not firstasn == lastasn: continue - # Bail out in case the AS name contains anything we do not expect here... - if re.search(r"[^a-zA-Z0-9-_]", name): - log.debug("Skipping ARIN AS name for %s containing invalid characters: %s" % \ - (asn, name)) + # Skip any AS name that appears to be a placeholder for a different RIR or entity... + if re.match(r"^(AFRINIC|APNIC|LACNIC|RIPE)$", orghandle.strip("\"")): + continue # Things look good here, run INSERT statement and skip this one if we already have # a (better?) name for this Autonomous System... @@ -1073,8 +1080,8 @@ class CLI(object): source ) VALUES (%s, %s, %s) ON CONFLICT (number) DO NOTHING""", - asn, - name, + firstasn, + orgname.strip("\""), "ARIN", )