From patchwork Tue Mar 30 15:47:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Peter_M=C3=BCller?= X-Patchwork-Id: 4005 Return-Path: Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by web04.haj.ipfire.org (Postfix) with ESMTPS id 4F8v2q6D3Zz3ws3 for ; Tue, 30 Mar 2021 15:47:15 +0000 (UTC) Received: from mail02.haj.ipfire.org (mail02.haj.ipfire.org [172.28.1.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail02.haj.ipfire.org", Issuer "R3" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 4F8v2q3qvXzjk; Tue, 30 Mar 2021 15:47:15 +0000 (UTC) Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4F8v2q2bxCz2xVT; Tue, 30 Mar 2021 15:47:15 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4F8v2n6vD5z2xBf for ; Tue, 30 Mar 2021 15:47:13 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4F8v2m5s3fzjk for ; Tue, 30 Mar 2021 15:47:12 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1617119233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=zLOIel1bbSaM4GOdmv8Xz5k9dYoIGVXDsHCkgGqJ/TY=; b=3swGyw1b/Ttzoo16GLJJVucZhLVnhlq/TchORufsb9d5S6K3uQ8GQ215Vpb9lR1OwhKQZL L9HiN3vpsba8bUBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1617119233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=zLOIel1bbSaM4GOdmv8Xz5k9dYoIGVXDsHCkgGqJ/TY=; b=t5k8CkcI1X06LhulzwmxxvTNeCNTT6uBtFxMoKbNeqWUIZDMRY6C4Js3wK/vhj5iMjoXrL hYPe4YOkK2I+BX0Ht9F4kLNt6y1/umWb3HBHFrojLjfUEx/Pa2TT9NfCJcC82E5W1lkBgi q50+z2CQ3jT0ytXPPqATI037Koa3C9Ax28d/b6A/ydtXhFhIxGu+h2KA9zJWVTy8oRiWCe tK4CaVnrW9kP0/QyV9CkMXszJ8SZvf5p80lBpu6573yJGJ1VMRU/iivNEM4hY+YYtxGUmY yF8ILIhkvgCDYvCNoeHPoERLxQ1D+5CB5io6qRCqhgglf8utqJqlzVMcgsEpPQ== To: "IPFire: Location" From: =?utf-8?q?Peter_M=C3=BCller?= Subject: [PATCH v2] location-importer.in: skip networks with unknown country codes Message-ID: <5d6a3267-6b61-fa2e-b35c-0fc42c713e33@ipfire.org> Date: Tue, 30 Mar 2021 17:47:10 +0200 MIME-Version: 1.0 Content-Language: en-US X-BeenThere: location@lists.ipfire.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: location-bounces@lists.ipfire.org Sender: "Location" There is no sense in parsing and storting networks whose country codes cannot be found in the ISO-3166-x country code table. This avoids side effects in applications using the location database, and introduces another sanity check to compensate bogus RIR data. On location02, this affects some networks from APNIC (country code: ZZ) as well as a bunch of smaller allocations within the RIPE region still tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU (Soviet Union) was found - while the NIC for .su TLD is still operational. :-) Applying this patch causes the countries to be processed before update_whois() is called. In case no countries are present in the SQL table, this check is silently omitted. Fixes: #12510 Signed-off-by: Peter Müller --- src/python/location-importer.in | 38 ++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/src/python/location-importer.in b/src/python/location-importer.in index e2f201b..1e08458 100644 --- a/src/python/location-importer.in +++ b/src/python/location-importer.in @@ -388,10 +388,17 @@ class CLI(object): TRUNCATE TABLE networks; """) + # Fetch all valid country codes to check parsed networks aganist... + rows = self.db.query("SELECT * FROM countries ORDER BY country_code") + validcountries = [] + + for row in rows: + validcountries.append(row.country_code) + for source in location.importer.WHOIS_SOURCES: with downloader.request(source, return_blocks=True) as f: for block in f: - self._parse_block(block) + self._parse_block(block, validcountries) # Process all parsed networks from every RIR we happen to have access to, # insert the largest network chunks into the networks table immediately... @@ -467,7 +474,7 @@ class CLI(object): # Download data with downloader.request(source) as f: for line in f: - self._parse_line(line) + self._parse_line(line, validcountries) def _check_parsed_network(self, network): """ @@ -532,7 +539,7 @@ class CLI(object): # be suitable for libloc consumption... return True - def _parse_block(self, block): + def _parse_block(self, block, validcountries = None): # Get first line to find out what type of block this is line = block[0] @@ -542,7 +549,7 @@ class CLI(object): # inetnum if line.startswith("inet6num:") or line.startswith("inetnum:"): - return self._parse_inetnum_block(block) + return self._parse_inetnum_block(block, validcountries) # organisation elif line.startswith("organisation:"): @@ -573,7 +580,7 @@ class CLI(object): autnum.get("asn"), autnum.get("org"), ) - def _parse_inetnum_block(self, block): + def _parse_inetnum_block(self, block, validcountries = None): log.debug("Parsing inetnum block:") inetnum = {} @@ -616,10 +623,10 @@ class CLI(object): if not inetnum or not "country" in inetnum: return - # Skip objects with bogus country code 'ZZ' - if inetnum.get("country") == "ZZ": - log.warning("Skipping network with bogus country 'ZZ': %s" % \ - (inetnum.get("inet6num") or inetnum.get("inetnum"))) + # Skip objects with unknown country codes + if validcountries and inetnum.get("country") not in validcountries: + log.warning("Skipping network with bogus country '%s': %s" % \ + (inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetnum"))) return # Iterate through all networks enumerated from above, check them for plausibility and insert @@ -652,7 +659,7 @@ class CLI(object): org.get("organisation"), org.get("org-name"), ) - def _parse_line(self, line): + def _parse_line(self, line, validcountries = None): # Skip version line if line.startswith("2"): return @@ -667,8 +674,15 @@ class CLI(object): log.warning("Could not parse line: %s" % line) return - # Skip any lines that are for stats only - if country_code == "*": + # Skip any lines that are for stats only or do not have a country + # code at all (avoids log spam below) + if not country_code or country_code == '*': + return + + # Skip objects with unknown country codes + if validcountries and country_code not in validcountries: + log.warning("Skipping line with bogus country '%s': %s" % \ + (country_code, line)) return if type in ("ipv6", "ipv4"):