From patchwork Sun Dec 31 06:12:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Justin Luth X-Patchwork-Id: 1590 Return-Path: Received: from mail01.ipfire.org (unknown [172.28.1.200]) by web02.ipfire.org (Postfix) with ESMTP id 3D117600A1 for ; Sat, 30 Dec 2017 20:12:25 +0100 (CET) Received: from mail01.ipfire.org (localhost [IPv6:::1]) by mail01.ipfire.org (Postfix) with ESMTP id 4AD8EBAA; Sat, 30 Dec 2017 20:12:24 +0100 (CET) Received: from mout.gmx.com (mout.gmx.com [74.208.4.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mail.gmx.com", Issuer "thawte SSL CA - G2" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id A3B58A2C for ; Sat, 30 Dec 2017 20:12:20 +0100 (CET) Received: from [192.168.3.1] ([41.79.25.253]) by mail.gmx.com (mrgmxus002 [74.208.5.15]) with ESMTPSA (Nemesis) id 0McVbq-1eDhAr2guO-00Hg0V; Sat, 30 Dec 2017 20:12:06 +0100 To: development@lists.ipfire.org From: Justin Luth Subject: [PATCH] Fix bug 11558 updxlrator: use mirror mode for SHA1, filenames Message-ID: <43f953c8-7f0a-c7a2-2835-bae749b2a9b9@mail.com> Date: Sat, 30 Dec 2017 22:12:01 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 Content-Language: en-US X-Provags-ID: V03:K0:agc7Fnq8NsRgQG4/8krAWSaJC8U9Arz6MMPFgF9VM44S8l5O6o4 aOURejE/gHb01CKU1Ul4Js5/FOhEm+jG3M0L9FE4GvXNZSzMYcCYJdaErG4/HsIFQdI3G6H KBT7Ea2uVgRN8hAMwkWb4k/dy/L3ocrzAfCq/rw5smtFiRzMwMy98PC1dgyUSu2sMpfD3WX GiIdDScmnAddVkZYSM4Xw== X-UI-Out-Filterresults: notjunk:1; V01:K0:cOq4OHKqOdQ=:dXgeTsj3HurDNst0HTAOI4 8rt+NOxSPymGbIOptCt/c5JejdlqAt1c6+Q7xAQpcxMIoRit1OC+t3JBdKaeNlZL10Ynh33vq OXUxD2Q/vMtIvJx4LekVIoEd7vuFOmvhCuwNh9aqfB32VbKQw4PymW9pHEtTDb6vbAE8B5Jpn JZ6KNvqwEQwy5VT/GWWSEkknz2MeDBry2bWOGG8uSsnZAQGvkzjKj1hBeLpGH7u9QYcySxJAG /WDxvc4tCUSgMQkuNjQcWwCQs4gCXB4SjIdSXA6UUo5KP/UekVN5A4WcQXZ1413TgkxxUan2H lgY8qqj77lnZWPWeHoTb7VNUVbn6FOkpVlBhGs5283R1uEW6D+7RHxUUGwetKFxJW7kHBYpE7 b7MxycXb9D/FM0Xn9usrDT5tKfCGOYKsHlHMjblYo1KL9qgogeUeg+b1oGXkCAun6CSrGgZ4s dC8V2Ro4Zp7dR6czFKxlwYdJSgBcZ3jgFhPwwUZM8LVSOM+00U5tkFOHV0r7uJPSj91CSPBhe KuBxJoHd8CDatuZeshyjFGvYlZBqgJ2RWzSGB09ZYSaU/MYBCHNlk1xI4w61Ne4tgRiLOWJcL YvxhINLaPEfSV3ZqoTWf/81RLMKj7iGMZZgMnmThFauWAfEb9f7UTMpBKDU5ccbGm2Z9lkTmI GEkZoNFbM1uzAEVM5vUQm7GAtbe9RHOBwqz2LUA/XyxqswVAaZ8+ofdgS49ddTv+YqRPWXLY+ demp/dOkiLE89nI6O3Sku8VL6weD6t2H72TAjVapwapeQXK9KTxs+SvLOTU7xRPJBtjBlLUo9 S8clY2w8VrbpRSuc0eZbXim0ZZ1rAYtftF+LGSZNUwQ7IgXnE8= X-BeenThere: development@lists.ipfire.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: IPFire development talk List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: development-bounces@lists.ipfire.org Sender: "Development" Most Microsoft updates now contain an SHA1 hash in the filename. Since these files are uniquely identifiable, use mirror mode (which creates a hash of just the filename instead of the entire URL) to cache them. (But first check the URL cache to see if it has been downloaded as a URL already.) This is a HUGELY needed fix. Windows 10 updates are 5+ GB per month, and we lose several days of bandwidth downloading duplicates from different mirrors. Sometimes a single client will request the same patch from multiple mirrors. That's bad. This patch will save a ton of bandwidth, and lots of disk space. The patch limits the SHA1 test to microsoft only, but it could be easily expanded to other vendors if there is a need. Signed-off-by: Justin Luth --- This is a slight hack, because the fix is tucked away in a somewhat obscure function. I mean, someone could completely redesign this and make more modular functions that create the hash, check if the file exists, etc. But this patch very neatly is contained in one section of the code and doesn't modify anything else, so I think the simplicity and elegance warrant the hackiness. Because the fix is tucked away in the check_cache function, I added one comment in the Microsoft section, clearly alerting future programmers about the change. Originally, I had put my SHA1 test here, but doing so required pre-processing the caches and renaming the hash identifiers. This patch avoids that ugly business. This patch works beautifully because it never downloads anything extra. If you already cached the URL, then you won't re-download the filename. But if you hit a different mirror now, you will download one more time (as normal) and after that every different mirror will be "satisfied". In the bug report, there is a script that can be tweaked to RENAME the URL hash to become a filename hash, in case any site really wants to avoid that possibility of redownloading a file they already have. But since I haven't seen anyone else complaining about this problem, I doubt anyone would be interested. A good test URL (that is a small file, not 1+ GB) is 7.au.download.windowsupdate.com/d/msdownload/update/others/2015/03/16743052_f84687743a71a750edef8ffedd978602a2592000.cab You can use numbers other than 7, remove the 7. or remove 7.au. in order to access different mirrors of the same file. --- config/updxlrator/updxlrator | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator index 5baaaae58..ff23b3a95 100644 --- a/config/updxlrator/updxlrator +++ b/config/updxlrator/updxlrator @@ -86,6 +86,8 @@ while (<>) { && ($source_url !~ m@\&@) ) { + # NOTE: check_cache will change to $mirror instead of $unique if the filename contains an SHA1 hash + # and the URL is not found in cache! $xlrator_url = &check_cache($source_url,$hostaddr,$username,"Microsoft",$unique); } @@ -400,6 +402,17 @@ sub check_cache &debuglog("Retrieving file from cache ($updsource) for $hostaddr"); &setcachestatus("$updcachedir/$vendorid/$uuid/access.log",time); $cacheurl="http://$netsettings{'GREEN_ADDRESS'}:$http_port/updatecache/$vendorid/$uuid/$updfile"; + } + elsif ( + ($cfmirror == $unique) && + ($vendorid == "microsoft") && + ($source_url =~ m@.*[0-9a-f]{40}\.[^\.]+@i) + ) + { + # Most Microsoft updates now have an SHA1 hash in the name. These should be treated as unique files. + # Since it wasn't found in the URL cache, switch to mirror mode and try again using just the filename. + &debuglog("SHA1: $vendorid $uuid not cached. Reprocessing as mirror $sourceurl"); + $cacheurl = &check_cache($source_url,$hostaddr,$username,$vendorid,$mirror); } else {