From patchwork Sun Sep 20 19:21:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Peter_M=C3=BCller?= X-Patchwork-Id: 3477 Return-Path: Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail01.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by web04.haj.ipfire.org (Postfix) with ESMTPS id 4Bvcqt74hvz3x48 for ; Sun, 20 Sep 2020 19:21:14 +0000 (UTC) Received: from mail02.haj.ipfire.org (mail02.haj.ipfire.org [172.28.1.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail02.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 4Bvcqt64xbzsQ; Sun, 20 Sep 2020 19:21:14 +0000 (UTC) Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4Bvcqt4vwwz2xny; Sun, 20 Sep 2020 19:21:14 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail01.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4Bvcqs42rZz2xny for ; Sun, 20 Sep 2020 19:21:13 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384)) (Client did not present a certificate) by mail01.ipfire.org (Postfix) with ESMTPSA id 4Bvcqp1nWzzhS; Sun, 20 Sep 2020 19:21:09 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1600629672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=siey5TjWMGrfUAzGwb32kgi8X8iMLxLtpIeZ2XHJN4g=; b=7uIjqOwt2kfuQNIA/BtCsSYJgMY2ViRlFgEsj2ZSQ/rXpQlx8QkwSR6aJCznVluXP0pzcm CDvwu9qz8SuKZ1Dg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1600629672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=siey5TjWMGrfUAzGwb32kgi8X8iMLxLtpIeZ2XHJN4g=; b=O+/kkfJdvO7fhsCyYZsCG/hpIDzLo6hENFXRRjRhYug/Jhoyv3fE9CpWMKdm5DEsWiRCJL mgHHCsgxyn5/jpbCoJyO9tU17q15k3GhubDBg+gtn3XPD2WDfeZsIEM0UDo8AlydpNSVL2 RYFeZ3ccSJ4QZgLVSSXl0qdZSga0EQMWo2LBtSdGx15yabH32E5qQcv+LRXL/m3juEcfn/ gtgg6Lybfmo8IuX+5q3cO3RVr0b33eF346lkE7lP6FednC5Was5MUkdWryBcVetzCS4U2p 0AjJ3PYFr9WLr9+HY3QK3vG+p06sQWg5U+3ZU1Ly3tX3hH+jilVwSMUfZ+lxpw== Subject: [PATCH v2 3/3] importer: Purge any redundant entries To: location@lists.ipfire.org References: <54b93c52-9578-3390-d8d4-e889766dcb84@ipfire.org> From: =?utf-8?q?Peter_M=C3=BCller?= Message-ID: <8a59a82a-ba19-d1d2-0a1d-87fc72195f16@ipfire.org> Date: Sun, 20 Sep 2020 19:21:03 +0000 MIME-Version: 1.0 In-Reply-To: <54b93c52-9578-3390-d8d4-e889766dcb84@ipfire.org> Content-Language: en-US Authentication-Results: mail01.ipfire.org; auth=pass smtp.mailfrom=peter.mueller@ipfire.org X-BeenThere: location@lists.ipfire.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: location-bounces@lists.ipfire.org Sender: "Location" When importing inetnums, we might import various small networks which are not relevant for us as long as they do not have a different country code than their parent network. Therefore we delete all these entries to keep the database smaller without losing any information. The second version of this patch introduces a SQL statement parallelised across all CPUs available, while the DELETE-statement of the first version literally took ages to complete. However, cleaning up those data still takes about 26 hours (!) on our location02 testing machine, making daily updates of the location database impossible to the current knowledge. real 1521m30.620s user 38m45.521s sys 9m6.027s Special thanks goes to Michael for spending numerous hours on this, setting up a testing environment, doing PostgreSQL magic and providing helpful advice while debugging. Partially fixes: #12458 Cc: Michael Tremer Signed-off-by: Peter Müller --- src/python/location-importer.in | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/src/python/location-importer.in b/src/python/location-importer.in index e3a07a0..1467923 100644 --- a/src/python/location-importer.in +++ b/src/python/location-importer.in @@ -374,7 +374,27 @@ class CLI(object): INSERT INTO autnums(number, name) SELECT _autnums.number, _organizations.name FROM _autnums JOIN _organizations ON _autnums.organization = _organizations.handle - ON CONFLICT (number) DO UPDATE SET name = excluded.name; + ON CONFLICT (number) DO UPDATE SET name = excluded.name + """) + + self.db.execute(""" + --- Purge any redundant entries + CREATE TEMPORARY TABLE _garbage ON COMMIT DROP + AS + SELECT network FROM networks candidates + WHERE EXISTS ( + SELECT FROM networks + WHERE + networks.network << candidates.network + AND + networks.country = candidates.country + ); + + CREATE UNIQUE INDEX _garbage_search ON _garbage USING BTREE(network); + + DELETE FROM networks WHERE EXISTS ( + SELECT FROM _garbage WHERE networks.network = _garbage.network + ); """) # Download all extended sources