Message ID | ba77e5d0-4250-7801-bd13-92a2fcfa8441@ipfire.org |
---|---|
State | Superseded |
Headers |
Return-Path: <location-bounces@lists.ipfire.org> Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384 client-signature ECDSA (P-384) client-digest SHA384) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by web04.haj.ipfire.org (Postfix) with ESMTPS id 4M7yz96nYlz3wck for <patchwork@web04.haj.ipfire.org>; Thu, 18 Aug 2022 21:42:29 +0000 (UTC) Received: from mail02.haj.ipfire.org (mail02.haj.ipfire.org [172.28.1.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384 client-signature ECDSA (P-384) client-digest SHA384) (Client CN "mail02.haj.ipfire.org", Issuer "R3" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 4M7yz91sCdz3k7; Thu, 18 Aug 2022 21:42:29 +0000 (UTC) Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4M7yz906jlz2xQq; Thu, 18 Aug 2022 21:42:29 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384 client-signature ECDSA (P-384) client-digest SHA384) (Client CN "mail01.haj.ipfire.org", Issuer "R3" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4M7yz66Jhyz2xHF for <location@lists.ipfire.org>; Thu, 18 Aug 2022 21:42:26 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4M7yz54rpFz1C9 for <location@lists.ipfire.org>; Thu, 18 Aug 2022 21:42:25 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1660858946; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p3IjWvpQ7zCQgaViwcCYo76kePWOzYVcdorC+tZFK8g=; b=hxmJBuN/K+FGhRtEPwdSJyv8w99HnpVMHnUJuPGkQUkPWvPjOiQQ8u7ebK2tDjMCEbzbJe CZRsgKawB/3/FtCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1660858946; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p3IjWvpQ7zCQgaViwcCYo76kePWOzYVcdorC+tZFK8g=; b=hFyw+EAFBk5UwaarvEbStSoIJQiKxHVG2+nPHHnI07DTCgBys+pffZGuG4PvoHyAAcoo8H 1MzdaU2WdDiBPLs5g7YSYtdKoJ9KfO4AJgfMwN4RGPl33nT1ozUFE45PvIFNm3sAjrJ1j5 bus8pxT/BDcv4r9CRu+5xJFP4GnGsMQYTufUUy9selN1Psv+aCHUYIBGtUCmd6G9ZxSyuQ Ik2nvN8Rgkmp0yczYKZ6stVS4dnnIpK6q1X0+z8OqsIsZc5nbtkq0FeoLMzdUvuY/BXwVo jeiN7sKJJLTqIB5GY3hYq5Yi2d0VH3G2YNV3jeKn6YooXuvOngLZ4wV2gqqgLg== Message-ID: <ba77e5d0-4250-7801-bd13-92a2fcfa8441@ipfire.org> Date: Thu, 18 Aug 2022 21:42:22 +0000 MIME-Version: 1.0 Content-Language: en-US To: "IPFire: Location" <location@lists.ipfire.org> From: =?utf-8?q?Peter_M=C3=BCller?= <peter.mueller@ipfire.org> Subject: [PATCH] location-importer.in: Conduct sanity checks per DROP list Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: location@lists.ipfire.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: <location.lists.ipfire.org> List-Unsubscribe: <https://lists.ipfire.org/mailman/options/location>, <mailto:location-request@lists.ipfire.org?subject=unsubscribe> List-Archive: <http://lists.ipfire.org/pipermail/location/> List-Post: <mailto:location@lists.ipfire.org> List-Help: <mailto:location-request@lists.ipfire.org?subject=help> List-Subscribe: <https://lists.ipfire.org/mailman/listinfo/location>, <mailto:location-request@lists.ipfire.org?subject=subscribe> Errors-To: location-bounces@lists.ipfire.org Sender: "Location" <location-bounces@lists.ipfire.org> |
Series |
location-importer.in: Conduct sanity checks per DROP list
|
|
Commit Message
Peter Müller
Aug. 18, 2022, 9:42 p.m. UTC
Previously, the lack of distinction between different DROP lists caused
only the last one to be persisted.
Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
src/scripts/location-importer.in | 58 +++++++++++++++++++++-----------
1 file changed, 39 insertions(+), 19 deletions(-)
Comments
Hi, being curious I wanted to test this patch, but couldn't find 'location-importer.in' in 'src/scripts'. No chance to apply this patch. Finally I found '.../build/usr/bin/location-importer'. Hm. Ok. But how do I apply this patch? Still being curious... ;-) Matthias On 18.08.2022 23:42, Peter Müller wrote: > Previously, the lack of distinction between different DROP lists caused > only the last one to be persisted. > > Signed-off-by: Peter Müller <peter.mueller@ipfire.org> > --- > src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- > 1 file changed, 39 insertions(+), 19 deletions(-) > > diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in > index 8d47497..e4a9ab0 100644 > --- a/src/scripts/location-importer.in > +++ b/src/scripts/location-importer.in > @@ -1427,18 +1427,21 @@ class CLI(object): > def _update_overrides_for_spamhaus_drop(self): > downloader = location.importer.Downloader() > > - ip_urls = [ > - "https://www.spamhaus.org/drop/drop.txt", > - "https://www.spamhaus.org/drop/edrop.txt", > - "https://www.spamhaus.org/drop/dropv6.txt" > + ip_lists = [ > + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), > + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), > + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") > ] > > - asn_urls = [ > - "https://www.spamhaus.org/drop/asndrop.txt" > + asn_lists = [ > + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") > ] > > - for url in ip_urls: > - # Fetch IP list > + for list in ip_lists: > + name = list[0] > + url = list[1] > + > + # Fetch IP list from given URL > f = downloader.retrieve(url) > > # Split into lines > @@ -1448,11 +1451,11 @@ class CLI(object): > # downloads. > if len(fcontent) > 10: > self.db.execute(""" > - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; > - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; > - """) > + DELETE FROM network_overrides WHERE source = '%s'; > + """ % name, > + ) > else: > - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) > + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) > continue > > # Iterate through every line, filter comments and add remaining networks to > @@ -1475,8 +1478,8 @@ class CLI(object): > > # Sanitize parsed networks... > if not self._check_parsed_network(network): > - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ > - (url, network)) > + log.warning("Skipping bogus network found in %s (%s): %s" % \ > + (name, url, network)) > continue > > # Conduct SQL statement... > @@ -1488,14 +1491,31 @@ class CLI(object): > ) VALUES (%s, %s, %s) > ON CONFLICT (network) DO UPDATE SET is_drop = True""", > "%s" % network, > - "Spamhaus DROP lists", > + name, > True > ) > > - for url in asn_urls: > + for list in asn_lists: > + name = list[0] > + url = list[1] > + > # Fetch URL > f = downloader.retrieve(url) > > + # Split into lines > + fcontent = f.readlines() > + > + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP > + # downloads. > + if len(fcontent) > 10: > + self.db.execute(""" > + DELETE FROM autnum_overrides WHERE source = '%s'; > + """ % name, > + ) > + else: > + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) > + continue > + > # Iterate through every line, filter comments and add remaining ASNs to > # the override table in case they are valid... > with self.db.transaction(): > @@ -1518,8 +1538,8 @@ class CLI(object): > > # Filter invalid ASNs... > if not self._check_parsed_asn(asn): > - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ > - (url, asn)) > + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ > + (name, url, asn)) > continue > > # Conduct SQL statement... > @@ -1531,7 +1551,7 @@ class CLI(object): > ) VALUES (%s, %s, %s) > ON CONFLICT (number) DO UPDATE SET is_drop = True""", > "%s" % asn, > - "Spamhaus ASN-DROP list", > + name, > True > ) >
Hello Matthias, This patch is against libloc: https://git.ipfire.org/?p=location/libloc.git;a=summary -Michael > On 19 Aug 2022, at 09:21, Matthias Fischer <matthias.fischer@ipfire.org> wrote: > > Hi, > > being curious I wanted to test this patch, but couldn't find > 'location-importer.in' in 'src/scripts'. No chance to apply this patch. > > Finally I found '.../build/usr/bin/location-importer'. Hm. Ok. > > But how do I apply this patch? > > Still being curious... ;-) > > Matthias > > On 18.08.2022 23:42, Peter Müller wrote: >> Previously, the lack of distinction between different DROP lists caused >> only the last one to be persisted. >> >> Signed-off-by: Peter Müller <peter.mueller@ipfire.org> >> --- >> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >> 1 file changed, 39 insertions(+), 19 deletions(-) >> >> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in >> index 8d47497..e4a9ab0 100644 >> --- a/src/scripts/location-importer.in >> +++ b/src/scripts/location-importer.in >> @@ -1427,18 +1427,21 @@ class CLI(object): >> def _update_overrides_for_spamhaus_drop(self): >> downloader = location.importer.Downloader() >> >> - ip_urls = [ >> - "https://www.spamhaus.org/drop/drop.txt", >> - "https://www.spamhaus.org/drop/edrop.txt", >> - "https://www.spamhaus.org/drop/dropv6.txt" >> + ip_lists = [ >> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >> ] >> >> - asn_urls = [ >> - "https://www.spamhaus.org/drop/asndrop.txt" >> + asn_lists = [ >> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") >> ] >> >> - for url in ip_urls: >> - # Fetch IP list >> + for list in ip_lists: >> + name = list[0] >> + url = list[1] >> + >> + # Fetch IP list from given URL >> f = downloader.retrieve(url) >> >> # Split into lines >> @@ -1448,11 +1451,11 @@ class CLI(object): >> # downloads. >> if len(fcontent) > 10: >> self.db.execute(""" >> - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; >> - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; >> - """) >> + DELETE FROM network_overrides WHERE source = '%s'; >> + """ % name, >> + ) >> else: >> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) >> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >> continue >> >> # Iterate through every line, filter comments and add remaining networks to >> @@ -1475,8 +1478,8 @@ class CLI(object): >> >> # Sanitize parsed networks... >> if not self._check_parsed_network(network): >> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >> - (url, network)) >> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >> + (name, url, network)) >> continue >> >> # Conduct SQL statement... >> @@ -1488,14 +1491,31 @@ class CLI(object): >> ) VALUES (%s, %s, %s) >> ON CONFLICT (network) DO UPDATE SET is_drop = True""", >> "%s" % network, >> - "Spamhaus DROP lists", >> + name, >> True >> ) >> >> - for url in asn_urls: >> + for list in asn_lists: >> + name = list[0] >> + url = list[1] >> + >> # Fetch URL >> f = downloader.retrieve(url) >> >> + # Split into lines >> + fcontent = f.readlines() >> + >> + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP >> + # downloads. >> + if len(fcontent) > 10: >> + self.db.execute(""" >> + DELETE FROM autnum_overrides WHERE source = '%s'; >> + """ % name, >> + ) >> + else: >> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >> + continue >> + >> # Iterate through every line, filter comments and add remaining ASNs to >> # the override table in case they are valid... >> with self.db.transaction(): >> @@ -1518,8 +1538,8 @@ class CLI(object): >> >> # Filter invalid ASNs... >> if not self._check_parsed_asn(asn): >> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >> - (url, asn)) >> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >> + (name, url, asn)) >> continue >> >> # Conduct SQL statement... >> @@ -1531,7 +1551,7 @@ class CLI(object): >> ) VALUES (%s, %s, %s) >> ON CONFLICT (number) DO UPDATE SET is_drop = True""", >> "%s" % asn, >> - "Spamhaus ASN-DROP list", >> + name, >> True >> ) >> >
Hello Peter, > On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller@ipfire.org> wrote: > > Previously, the lack of distinction between different DROP lists caused > only the last one to be persisted. > > Signed-off-by: Peter Müller <peter.mueller@ipfire.org> > --- > src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- > 1 file changed, 39 insertions(+), 19 deletions(-) > > diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in > index 8d47497..e4a9ab0 100644 > --- a/src/scripts/location-importer.in > +++ b/src/scripts/location-importer.in > @@ -1427,18 +1427,21 @@ class CLI(object): > def _update_overrides_for_spamhaus_drop(self): > downloader = location.importer.Downloader() > > - ip_urls = [ > - "https://www.spamhaus.org/drop/drop.txt", > - "https://www.spamhaus.org/drop/edrop.txt", > - "https://www.spamhaus.org/drop/dropv6.txt" > + ip_lists = [ > + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), > + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), > + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") > ] Would it not be better to have this as a dict? Or if you prefer something to iterate over, a tuple? Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted? > > - asn_urls = [ > - "https://www.spamhaus.org/drop/asndrop.txt" > + asn_lists = [ > + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") > ] > > - for url in ip_urls: > - # Fetch IP list > + for list in ip_lists: > + name = list[0] > + url = list[1] It would be more Pythonic if you wrote it like this: for name, url in ip_lists: ... > + > + # Fetch IP list from given URL > f = downloader.retrieve(url) > > # Split into lines > @@ -1448,11 +1451,11 @@ class CLI(object): > # downloads. > if len(fcontent) > 10: > self.db.execute(""" > - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; > - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; > - """) > + DELETE FROM network_overrides WHERE source = '%s'; No need for the ‘’ around the %s. > + """ % name, > + ) > else: > - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) > + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) > continue > > # Iterate through every line, filter comments and add remaining networks to > @@ -1475,8 +1478,8 @@ class CLI(object): > > # Sanitize parsed networks... > if not self._check_parsed_network(network): > - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ > - (url, network)) > + log.warning("Skipping bogus network found in %s (%s): %s" % \ > + (name, url, network)) > continue > > # Conduct SQL statement... > @@ -1488,14 +1491,31 @@ class CLI(object): > ) VALUES (%s, %s, %s) > ON CONFLICT (network) DO UPDATE SET is_drop = True""", > "%s" % network, > - "Spamhaus DROP lists", > + name, > True > ) > > - for url in asn_urls: > + for list in asn_lists: > + name = list[0] > + url = list[1] > + See above. > # Fetch URL > f = downloader.retrieve(url) > > + # Split into lines > + fcontent = f.readlines() > + > + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP > + # downloads. > + if len(fcontent) > 10: > + self.db.execute(""" > + DELETE FROM autnum_overrides WHERE source = '%s'; > + """ % name, > + ) > + else: > + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) > + continue Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data. Would it also not be programmatically better to start the import and then count how many lines you are importing? The “f.readlines()” call is something that I would consider a bit ugly. > + > # Iterate through every line, filter comments and add remaining ASNs to > # the override table in case they are valid... > with self.db.transaction(): > @@ -1518,8 +1538,8 @@ class CLI(object): > > # Filter invalid ASNs... > if not self._check_parsed_asn(asn): > - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ > - (url, asn)) > + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ > + (name, url, asn)) > continue > > # Conduct SQL statement... > @@ -1531,7 +1551,7 @@ class CLI(object): > ) VALUES (%s, %s, %s) > ON CONFLICT (number) DO UPDATE SET is_drop = True""", > "%s" % asn, > - "Spamhaus ASN-DROP list", > + name, > True > ) > > -- > 2.35.3
Hello Michael, better late than never: Thanks for your reply and the comments. > Hello Peter, > >> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller@ipfire.org> wrote: >> >> Previously, the lack of distinction between different DROP lists caused >> only the last one to be persisted. >> >> Signed-off-by: Peter Müller <peter.mueller@ipfire.org> >> --- >> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >> 1 file changed, 39 insertions(+), 19 deletions(-) >> >> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in >> index 8d47497..e4a9ab0 100644 >> --- a/src/scripts/location-importer.in >> +++ b/src/scripts/location-importer.in >> @@ -1427,18 +1427,21 @@ class CLI(object): >> def _update_overrides_for_spamhaus_drop(self): >> downloader = location.importer.Downloader() >> >> - ip_urls = [ >> - "https://www.spamhaus.org/drop/drop.txt", >> - "https://www.spamhaus.org/drop/edrop.txt", >> - "https://www.spamhaus.org/drop/dropv6.txt" >> + ip_lists = [ >> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >> ] > > Would it not be better to have this as a dict? > > Or if you prefer something to iterate over, a tuple? > > Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted? Hm, good point. I have changed that in the upcoming second version of the patch. > >> >> - asn_urls = [ >> - "https://www.spamhaus.org/drop/asndrop.txt" >> + asn_lists = [ >> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") >> ] >> >> - for url in ip_urls: >> - # Fetch IP list >> + for list in ip_lists: >> + name = list[0] >> + url = list[1] > > It would be more Pythonic if you wrote it like this: > > for name, url in ip_lists: > ... > >> + >> + # Fetch IP list from given URL >> f = downloader.retrieve(url) >> >> # Split into lines >> @@ -1448,11 +1451,11 @@ class CLI(object): >> # downloads. >> if len(fcontent) > 10: >> self.db.execute(""" >> - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; >> - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; >> - """) >> + DELETE FROM network_overrides WHERE source = '%s'; > > No need for the ‘’ around the %s. Yes there is, the SQL command results in an exception otherwise. Second version of the patch is coming right up. Thanks, and best regards, Peter Müller > >> + """ % name, >> + ) >> else: >> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) >> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >> continue >> >> # Iterate through every line, filter comments and add remaining networks to >> @@ -1475,8 +1478,8 @@ class CLI(object): >> >> # Sanitize parsed networks... >> if not self._check_parsed_network(network): >> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >> - (url, network)) >> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >> + (name, url, network)) >> continue >> >> # Conduct SQL statement... >> @@ -1488,14 +1491,31 @@ class CLI(object): >> ) VALUES (%s, %s, %s) >> ON CONFLICT (network) DO UPDATE SET is_drop = True""", >> "%s" % network, >> - "Spamhaus DROP lists", >> + name, >> True >> ) >> >> - for url in asn_urls: >> + for list in asn_lists: >> + name = list[0] >> + url = list[1] >> + > > See above. > >> # Fetch URL >> f = downloader.retrieve(url) >> >> + # Split into lines >> + fcontent = f.readlines() >> + >> + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP >> + # downloads. >> + if len(fcontent) > 10: >> + self.db.execute(""" >> + DELETE FROM autnum_overrides WHERE source = '%s'; >> + """ % name, >> + ) >> + else: >> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >> + continue > > Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data. > > Would it also not be programmatically better to start the import and then count how many lines you are importing? > > The “f.readlines()” call is something that I would consider a bit ugly. > >> + >> # Iterate through every line, filter comments and add remaining ASNs to >> # the override table in case they are valid... >> with self.db.transaction(): >> @@ -1518,8 +1538,8 @@ class CLI(object): >> >> # Filter invalid ASNs... >> if not self._check_parsed_asn(asn): >> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >> - (url, asn)) >> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >> + (name, url, asn)) >> continue >> >> # Conduct SQL statement... >> @@ -1531,7 +1551,7 @@ class CLI(object): >> ) VALUES (%s, %s, %s) >> ON CONFLICT (number) DO UPDATE SET is_drop = True""", >> "%s" % asn, >> - "Spamhaus ASN-DROP list", >> + name, >> True >> ) >> >> -- >> 2.35.3 >
Hello, > On 26 Sep 2022, at 19:26, Peter Müller <peter.mueller@ipfire.org> wrote: > > Hello Michael, > > better late than never: Thanks for your reply and the comments. > >> Hello Peter, >> >>> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller@ipfire.org> wrote: >>> >>> Previously, the lack of distinction between different DROP lists caused >>> only the last one to be persisted. >>> >>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org> >>> --- >>> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >>> 1 file changed, 39 insertions(+), 19 deletions(-) >>> >>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in >>> index 8d47497..e4a9ab0 100644 >>> --- a/src/scripts/location-importer.in >>> +++ b/src/scripts/location-importer.in >>> @@ -1427,18 +1427,21 @@ class CLI(object): >>> def _update_overrides_for_spamhaus_drop(self): >>> downloader = location.importer.Downloader() >>> >>> - ip_urls = [ >>> - "https://www.spamhaus.org/drop/drop.txt", >>> - "https://www.spamhaus.org/drop/edrop.txt", >>> - "https://www.spamhaus.org/drop/dropv6.txt" >>> + ip_lists = [ >>> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >>> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >>> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >>> ] >> >> Would it not be better to have this as a dict? >> >> Or if you prefer something to iterate over, a tuple? >> >> Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted? > > Hm, good point. I have changed that in the upcoming second version of the patch. > >> >>> >>> - asn_urls = [ >>> - "https://www.spamhaus.org/drop/asndrop.txt" >>> + asn_lists = [ >>> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") >>> ] >>> >>> - for url in ip_urls: >>> - # Fetch IP list >>> + for list in ip_lists: >>> + name = list[0] >>> + url = list[1] >> >> It would be more Pythonic if you wrote it like this: >> >> for name, url in ip_lists: >> ... >> >>> + >>> + # Fetch IP list from given URL >>> f = downloader.retrieve(url) >>> >>> # Split into lines >>> @@ -1448,11 +1451,11 @@ class CLI(object): >>> # downloads. >>> if len(fcontent) > 10: >>> self.db.execute(""" >>> - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; >>> - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; >>> - """) >>> + DELETE FROM network_overrides WHERE source = '%s'; >> >> No need for the ‘’ around the %s. > > Yes there is, the SQL command results in an exception otherwise. Err, you are allowing some SQL command injection here. You do not want to use “%s” as a string format parameter for Python strings. It is more of a placeholder for the value that has to be escaped first - SQLite is using ? which is probably a better choice for Python. The database binding is doing this automatically, but the arguments need to be passed after the query. Under no circumstances whatsoever you want to chance the query dynamically. Never ever. I fixed this here: https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=b56bf4bf065130b38968b580e0bea6db809783d8 Best, -Michael > > Second version of the patch is coming right up. > > Thanks, and best regards, > Peter Müller > >> >>> + """ % name, >>> + ) >>> else: >>> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) >>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>> continue >>> >>> # Iterate through every line, filter comments and add remaining networks to >>> @@ -1475,8 +1478,8 @@ class CLI(object): >>> >>> # Sanitize parsed networks... >>> if not self._check_parsed_network(network): >>> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >>> - (url, network)) >>> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >>> + (name, url, network)) >>> continue >>> >>> # Conduct SQL statement... >>> @@ -1488,14 +1491,31 @@ class CLI(object): >>> ) VALUES (%s, %s, %s) >>> ON CONFLICT (network) DO UPDATE SET is_drop = True""", >>> "%s" % network, >>> - "Spamhaus DROP lists", >>> + name, >>> True >>> ) >>> >>> - for url in asn_urls: >>> + for list in asn_lists: >>> + name = list[0] >>> + url = list[1] >>> + >> >> See above. >> >>> # Fetch URL >>> f = downloader.retrieve(url) >>> >>> + # Split into lines >>> + fcontent = f.readlines() >>> + >>> + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP >>> + # downloads. >>> + if len(fcontent) > 10: >>> + self.db.execute(""" >>> + DELETE FROM autnum_overrides WHERE source = '%s'; >>> + """ % name, >>> + ) >>> + else: >>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>> + continue >> >> Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data. >> >> Would it also not be programmatically better to start the import and then count how many lines you are importing? >> >> The “f.readlines()” call is something that I would consider a bit ugly. >> >>> + >>> # Iterate through every line, filter comments and add remaining ASNs to >>> # the override table in case they are valid... >>> with self.db.transaction(): >>> @@ -1518,8 +1538,8 @@ class CLI(object): >>> >>> # Filter invalid ASNs... >>> if not self._check_parsed_asn(asn): >>> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >>> - (url, asn)) >>> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >>> + (name, url, asn)) >>> continue >>> >>> # Conduct SQL statement... >>> @@ -1531,7 +1551,7 @@ class CLI(object): >>> ) VALUES (%s, %s, %s) >>> ON CONFLICT (number) DO UPDATE SET is_drop = True""", >>> "%s" % asn, >>> - "Spamhaus ASN-DROP list", >>> + name, >>> True >>> ) >>> >>> -- >>> 2.35.3
Hello Michael, > Hello, > >> On 26 Sep 2022, at 19:26, Peter Müller <peter.mueller@ipfire.org> wrote: >> >> Hello Michael, >> >> better late than never: Thanks for your reply and the comments. >> >>> Hello Peter, >>> >>>> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller@ipfire.org> wrote: >>>> >>>> Previously, the lack of distinction between different DROP lists caused >>>> only the last one to be persisted. >>>> >>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org> >>>> --- >>>> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >>>> 1 file changed, 39 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in >>>> index 8d47497..e4a9ab0 100644 >>>> --- a/src/scripts/location-importer.in >>>> +++ b/src/scripts/location-importer.in >>>> @@ -1427,18 +1427,21 @@ class CLI(object): >>>> def _update_overrides_for_spamhaus_drop(self): >>>> downloader = location.importer.Downloader() >>>> >>>> - ip_urls = [ >>>> - "https://www.spamhaus.org/drop/drop.txt", >>>> - "https://www.spamhaus.org/drop/edrop.txt", >>>> - "https://www.spamhaus.org/drop/dropv6.txt" >>>> + ip_lists = [ >>>> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >>>> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >>>> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >>>> ] >>> >>> Would it not be better to have this as a dict? >>> >>> Or if you prefer something to iterate over, a tuple? >>> >>> Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted? >> >> Hm, good point. I have changed that in the upcoming second version of the patch. >> >>> >>>> >>>> - asn_urls = [ >>>> - "https://www.spamhaus.org/drop/asndrop.txt" >>>> + asn_lists = [ >>>> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") >>>> ] >>>> >>>> - for url in ip_urls: >>>> - # Fetch IP list >>>> + for list in ip_lists: >>>> + name = list[0] >>>> + url = list[1] >>> >>> It would be more Pythonic if you wrote it like this: >>> >>> for name, url in ip_lists: >>> ... >>> >>>> + >>>> + # Fetch IP list from given URL >>>> f = downloader.retrieve(url) >>>> >>>> # Split into lines >>>> @@ -1448,11 +1451,11 @@ class CLI(object): >>>> # downloads. >>>> if len(fcontent) > 10: >>>> self.db.execute(""" >>>> - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; >>>> - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; >>>> - """) >>>> + DELETE FROM network_overrides WHERE source = '%s'; >>> >>> No need for the ‘’ around the %s. >> >> Yes there is, the SQL command results in an exception otherwise. > > Err, you are allowing some SQL command injection here. > > You do not want to use “%s” as a string format parameter for Python strings. It is more of a placeholder for the value that has to be escaped first - SQLite is using ? which is probably a better choice for Python. > > The database binding is doing this automatically, but the arguments need to be passed after the query. > > Under no circumstances whatsoever you want to chance the query dynamically. Never ever. > > I fixed this here: https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=b56bf4bf065130b38968b580e0bea6db809783d8 better late than never: Many thanks for spotting and fixing this before it went into anything stable or productive! Guess I'm not a good programmer after all... All the best, Peter Müller > > Best, > -Michael > >> >> Second version of the patch is coming right up. >> >> Thanks, and best regards, >> Peter Müller >> >>> >>>> + """ % name, >>>> + ) >>>> else: >>>> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) >>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>>> continue >>>> >>>> # Iterate through every line, filter comments and add remaining networks to >>>> @@ -1475,8 +1478,8 @@ class CLI(object): >>>> >>>> # Sanitize parsed networks... >>>> if not self._check_parsed_network(network): >>>> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >>>> - (url, network)) >>>> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >>>> + (name, url, network)) >>>> continue >>>> >>>> # Conduct SQL statement... >>>> @@ -1488,14 +1491,31 @@ class CLI(object): >>>> ) VALUES (%s, %s, %s) >>>> ON CONFLICT (network) DO UPDATE SET is_drop = True""", >>>> "%s" % network, >>>> - "Spamhaus DROP lists", >>>> + name, >>>> True >>>> ) >>>> >>>> - for url in asn_urls: >>>> + for list in asn_lists: >>>> + name = list[0] >>>> + url = list[1] >>>> + >>> >>> See above. >>> >>>> # Fetch URL >>>> f = downloader.retrieve(url) >>>> >>>> + # Split into lines >>>> + fcontent = f.readlines() >>>> + >>>> + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP >>>> + # downloads. >>>> + if len(fcontent) > 10: >>>> + self.db.execute(""" >>>> + DELETE FROM autnum_overrides WHERE source = '%s'; >>>> + """ % name, >>>> + ) >>>> + else: >>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>>> + continue >>> >>> Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data. >>> >>> Would it also not be programmatically better to start the import and then count how many lines you are importing? >>> >>> The “f.readlines()” call is something that I would consider a bit ugly. >>> >>>> + >>>> # Iterate through every line, filter comments and add remaining ASNs to >>>> # the override table in case they are valid... >>>> with self.db.transaction(): >>>> @@ -1518,8 +1538,8 @@ class CLI(object): >>>> >>>> # Filter invalid ASNs... >>>> if not self._check_parsed_asn(asn): >>>> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >>>> - (url, asn)) >>>> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >>>> + (name, url, asn)) >>>> continue >>>> >>>> # Conduct SQL statement... >>>> @@ -1531,7 +1551,7 @@ class CLI(object): >>>> ) VALUES (%s, %s, %s) >>>> ON CONFLICT (number) DO UPDATE SET is_drop = True""", >>>> "%s" % asn, >>>> - "Spamhaus ASN-DROP list", >>>> + name, >>>> True >>>> ) >>>> >>>> -- >>>> 2.35.3 >
This is the reason why we are doing code review... > On 2 Oct 2022, at 12:04, Peter Müller <peter.mueller@ipfire.org> wrote: > > Hello Michael, > >> Hello, >> >>> On 26 Sep 2022, at 19:26, Peter Müller <peter.mueller@ipfire.org> wrote: >>> >>> Hello Michael, >>> >>> better late than never: Thanks for your reply and the comments. >>> >>>> Hello Peter, >>>> >>>>> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller@ipfire.org> wrote: >>>>> >>>>> Previously, the lack of distinction between different DROP lists caused >>>>> only the last one to be persisted. >>>>> >>>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org> >>>>> --- >>>>> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >>>>> 1 file changed, 39 insertions(+), 19 deletions(-) >>>>> >>>>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in >>>>> index 8d47497..e4a9ab0 100644 >>>>> --- a/src/scripts/location-importer.in >>>>> +++ b/src/scripts/location-importer.in >>>>> @@ -1427,18 +1427,21 @@ class CLI(object): >>>>> def _update_overrides_for_spamhaus_drop(self): >>>>> downloader = location.importer.Downloader() >>>>> >>>>> - ip_urls = [ >>>>> - "https://www.spamhaus.org/drop/drop.txt", >>>>> - "https://www.spamhaus.org/drop/edrop.txt", >>>>> - "https://www.spamhaus.org/drop/dropv6.txt" >>>>> + ip_lists = [ >>>>> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >>>>> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >>>>> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >>>>> ] >>>> >>>> Would it not be better to have this as a dict? >>>> >>>> Or if you prefer something to iterate over, a tuple? >>>> >>>> Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted? >>> >>> Hm, good point. I have changed that in the upcoming second version of the patch. >>> >>>> >>>>> >>>>> - asn_urls = [ >>>>> - "https://www.spamhaus.org/drop/asndrop.txt" >>>>> + asn_lists = [ >>>>> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") >>>>> ] >>>>> >>>>> - for url in ip_urls: >>>>> - # Fetch IP list >>>>> + for list in ip_lists: >>>>> + name = list[0] >>>>> + url = list[1] >>>> >>>> It would be more Pythonic if you wrote it like this: >>>> >>>> for name, url in ip_lists: >>>> ... >>>> >>>>> + >>>>> + # Fetch IP list from given URL >>>>> f = downloader.retrieve(url) >>>>> >>>>> # Split into lines >>>>> @@ -1448,11 +1451,11 @@ class CLI(object): >>>>> # downloads. >>>>> if len(fcontent) > 10: >>>>> self.db.execute(""" >>>>> - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; >>>>> - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; >>>>> - """) >>>>> + DELETE FROM network_overrides WHERE source = '%s'; >>>> >>>> No need for the ‘’ around the %s. >>> >>> Yes there is, the SQL command results in an exception otherwise. >> >> Err, you are allowing some SQL command injection here. >> >> You do not want to use “%s” as a string format parameter for Python strings. It is more of a placeholder for the value that has to be escaped first - SQLite is using ? which is probably a better choice for Python. >> >> The database binding is doing this automatically, but the arguments need to be passed after the query. >> >> Under no circumstances whatsoever you want to chance the query dynamically. Never ever. >> >> I fixed this here: https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=b56bf4bf065130b38968b580e0bea6db809783d8 > > better late than never: Many thanks for spotting and fixing this before it went into > anything stable or productive! Guess I'm not a good programmer after all... > > All the best, > Peter Müller > >> >> Best, >> -Michael >> >>> >>> Second version of the patch is coming right up. >>> >>> Thanks, and best regards, >>> Peter Müller >>> >>>> >>>>> + """ % name, >>>>> + ) >>>>> else: >>>>> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) >>>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>>>> continue >>>>> >>>>> # Iterate through every line, filter comments and add remaining networks to >>>>> @@ -1475,8 +1478,8 @@ class CLI(object): >>>>> >>>>> # Sanitize parsed networks... >>>>> if not self._check_parsed_network(network): >>>>> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >>>>> - (url, network)) >>>>> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >>>>> + (name, url, network)) >>>>> continue >>>>> >>>>> # Conduct SQL statement... >>>>> @@ -1488,14 +1491,31 @@ class CLI(object): >>>>> ) VALUES (%s, %s, %s) >>>>> ON CONFLICT (network) DO UPDATE SET is_drop = True""", >>>>> "%s" % network, >>>>> - "Spamhaus DROP lists", >>>>> + name, >>>>> True >>>>> ) >>>>> >>>>> - for url in asn_urls: >>>>> + for list in asn_lists: >>>>> + name = list[0] >>>>> + url = list[1] >>>>> + >>>> >>>> See above. >>>> >>>>> # Fetch URL >>>>> f = downloader.retrieve(url) >>>>> >>>>> + # Split into lines >>>>> + fcontent = f.readlines() >>>>> + >>>>> + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP >>>>> + # downloads. >>>>> + if len(fcontent) > 10: >>>>> + self.db.execute(""" >>>>> + DELETE FROM autnum_overrides WHERE source = '%s'; >>>>> + """ % name, >>>>> + ) >>>>> + else: >>>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) >>>>> + continue >>>> >>>> Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data. >>>> >>>> Would it also not be programmatically better to start the import and then count how many lines you are importing? >>>> >>>> The “f.readlines()” call is something that I would consider a bit ugly. >>>> >>>>> + >>>>> # Iterate through every line, filter comments and add remaining ASNs to >>>>> # the override table in case they are valid... >>>>> with self.db.transaction(): >>>>> @@ -1518,8 +1538,8 @@ class CLI(object): >>>>> >>>>> # Filter invalid ASNs... >>>>> if not self._check_parsed_asn(asn): >>>>> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >>>>> - (url, asn)) >>>>> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >>>>> + (name, url, asn)) >>>>> continue >>>>> >>>>> # Conduct SQL statement... >>>>> @@ -1531,7 +1551,7 @@ class CLI(object): >>>>> ) VALUES (%s, %s, %s) >>>>> ON CONFLICT (number) DO UPDATE SET is_drop = True""", >>>>> "%s" % asn, >>>>> - "Spamhaus ASN-DROP list", >>>>> + name, >>>>> True >>>>> ) >>>>> >>>>> -- >>>>> 2.35.3
diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in index 8d47497..e4a9ab0 100644 --- a/src/scripts/location-importer.in +++ b/src/scripts/location-importer.in @@ -1427,18 +1427,21 @@ class CLI(object): def _update_overrides_for_spamhaus_drop(self): downloader = location.importer.Downloader() - ip_urls = [ - "https://www.spamhaus.org/drop/drop.txt", - "https://www.spamhaus.org/drop/edrop.txt", - "https://www.spamhaus.org/drop/dropv6.txt" + ip_lists = [ + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") ] - asn_urls = [ - "https://www.spamhaus.org/drop/asndrop.txt" + asn_lists = [ + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt") ] - for url in ip_urls: - # Fetch IP list + for list in ip_lists: + name = list[0] + url = list[1] + + # Fetch IP list from given URL f = downloader.retrieve(url) # Split into lines @@ -1448,11 +1451,11 @@ class CLI(object): # downloads. if len(fcontent) > 10: self.db.execute(""" - DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list'; - DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists'; - """) + DELETE FROM network_overrides WHERE source = '%s'; + """ % name, + ) else: - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url) + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) continue # Iterate through every line, filter comments and add remaining networks to @@ -1475,8 +1478,8 @@ class CLI(object): # Sanitize parsed networks... if not self._check_parsed_network(network): - log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ - (url, network)) + log.warning("Skipping bogus network found in %s (%s): %s" % \ + (name, url, network)) continue # Conduct SQL statement... @@ -1488,14 +1491,31 @@ class CLI(object): ) VALUES (%s, %s, %s) ON CONFLICT (network) DO UPDATE SET is_drop = True""", "%s" % network, - "Spamhaus DROP lists", + name, True ) - for url in asn_urls: + for list in asn_lists: + name = list[0] + url = list[1] + # Fetch URL f = downloader.retrieve(url) + # Split into lines + fcontent = f.readlines() + + # Conduct a very basic sanity check to rule out CDN issues causing bogus DROP + # downloads. + if len(fcontent) > 10: + self.db.execute(""" + DELETE FROM autnum_overrides WHERE source = '%s'; + """ % name, + ) + else: + log.error("%s (%s) returned likely bogus file, ignored" % (name, url)) + continue + # Iterate through every line, filter comments and add remaining ASNs to # the override table in case they are valid... with self.db.transaction(): @@ -1518,8 +1538,8 @@ class CLI(object): # Filter invalid ASNs... if not self._check_parsed_asn(asn): - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ - (url, asn)) + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ + (name, url, asn)) continue # Conduct SQL statement... @@ -1531,7 +1551,7 @@ class CLI(object): ) VALUES (%s, %s, %s) ON CONFLICT (number) DO UPDATE SET is_drop = True""", "%s" % asn, - "Spamhaus ASN-DROP list", + name, True )