Message ID | 9fdfa42e-cb04-be00-3605-1284d294f5c5@mail.com |
---|---|
State | Dropped |
Headers |
Return-Path: <development-bounces@lists.ipfire.org> Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384 client-signature ECDSA (P-384) client-digest SHA384) (Client CN "mail01.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by web04.haj.ipfire.org (Postfix) with ESMTPS id 47TGYD2mx6z43WR for <patchwork@web04.haj.ipfire.org>; Thu, 5 Dec 2019 13:20:16 +0000 (UTC) Received: from mail02.haj.ipfire.org (mail02.haj.ipfire.org [172.28.1.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384 client-signature ECDSA (P-384) client-digest SHA384) (Client CN "mail02.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 47TGYB6d6Pz2Hd; Thu, 5 Dec 2019 13:20:14 +0000 (UTC) Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 47TGYB5HqYz2yHC; Thu, 5 Dec 2019 13:20:14 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) client-signature ECDSA (P-384)) (Client CN "mail01.haj.ipfire.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 47TGY941nYz2xxq for <development@lists.ipfire.org>; Thu, 5 Dec 2019 13:20:13 +0000 (UTC) Received: from mout.gmx.com (mout.gmx.com [74.208.4.200]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.com", Issuer "GeoTrust RSA CA 2018" (verified OK)) by mail01.ipfire.org (Postfix) with ESMTPS id 47TGY720hvz2Hd for <development@lists.ipfire.org>; Thu, 5 Dec 2019 13:20:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mail.com; s=dbd5af2cbaf7; t=1575551992; bh=ZAEiRC/87hxwqpVE99JqGZgHfQU8/GquTW1lJshluNU=; h=X-UI-Sender-Class:Subject:References:To:From:Date:In-Reply-To; b=N2GBHH9TnQC4UlQ1YORhGwprK3FUcSiDPBztlWQBHlDjTcv5W3IZZKoLhY1UM5e7N DZJ5eko5V9oGf77uitK8GeDhLBh16MYnddeAgDyhUvy3m4M4VSBIYiwoeJbRt9v82T 5flKE5Y1amawI9cRunvfK69i/TEvJqlkrFWk1cS8= X-UI-Sender-Class: 214d933f-fd2f-45c7-a636-f5d79ae31a79 Received: from [192.168.3.1] ([41.79.25.253]) by mail.gmx.com (mrgmxus001 [74.208.5.15]) with ESMTPSA (Nemesis) id 0MCcfk-1iUddV0tPu-009NCJ for <development@lists.ipfire.org>; Thu, 05 Dec 2019 14:07:12 +0100 Subject: [PATCH] Fix bug 12252: updxlrator - handle \ in filename References: <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com> To: development@lists.ipfire.org From: Justin Luth <jluth@mail.com> X-Forwarded-Message-Id: <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com> Message-ID: <9fdfa42e-cb04-be00-3605-1284d294f5c5@mail.com> Date: Thu, 5 Dec 2019 16:07:06 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <a345f1bf-67d7-9ec6-bba2-ac84ea02a272@mail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Provags-ID: V03:K1:j6rVFJrYOaAYn/epIYDKiVAyWo1lqjHzR0BaP3+Mr19xJs0RQoV omcRZ1jaiswM0LNUtnkexzP2xKr85nPID5wGROyCO50y1V/SkTJohcIMjWwsNAjg81c9yy5 yBtc91Jv+LAYnL8F4aBoN0Jqt1wzkHcdLmd8n39Z2Q+k15vvqtoG5kaC63jUgM8CrWZOvCH Z/yrgd8Yq8Dz9TbadWk+A== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:vWgRfVwff9w=:xujaG8XgOcjbxsBykMOJHZ nUfk85/DQiUAqHIuUCsLynOw4jgVbYZRn54Tnaq5ynyJFVbDmZOke8Y5EwlxxYBriteVTVvTy f2qJhLI9i5s9/pdEYehjQ8D8ndnYqzr/EcNqbkPgQy04AfiHYdDaZdh3beljHWV1W/dfR5V/r ByMXomAI3OL2MtnTdjBX3VVZil8SeqyntXhhGK0hMGhdxrKfcccE4sLuOzkV2DjJjDJ7Q9GNv bcMYeOQDcepKR9wSti0438z89Ic9X4XC8+UsXiLYkq0hKI2WK3IJrzxjtTqAU8t8fL6T+9ZzU 2Y20qUDhUB4DfMai4LcT3QFRd21DYnuwD4zH99vRztsYaocW1iNWYiI0xOLNuobWyo6Id30Jq U9pv9SvLnZyjZFTgo1YiIMqT/QzxtrPhdHJKlJcFb5swwSB2rSCsWfF9E6O9J9S97+eS56fQO Wd9+USymFhbvg+3xnzXlfCQC7qwlU6xh+SDym1Ovvw+lZf8xJ4RfHb68Ryd0NXZOVo6FjM6iV ihwMexXQqjvHaci9VqSdz/2VM7KtA5Y74e+AWVs5qbOuTzWxmd6P0Z10wMhZiEFOgM0rdfGh6 aeVXzSyKFpX903Z2LXPVdNjMxBLMGrC/tGLfS4omKo1/7G9YqqRKyPKgGFeQs6XgqgMQ2jJp+ H5HgvEPUTxe+zau7yfdbTbDh6JfiDnFb7p2XLVO2gPR1V8wzFO4CxJ3Ft5/MFd2hyuY4+9MCf FxbKLrJmfUg8UTIt3diaFn869U7pfO18vTe5nu/c0XWrkPCQnBLWAO1pJ5m0iKBH/YAh6kncO 1FVxe3vf0nMAJGW+PB00Yz+xhUsrjOW6klwawa2wDh+hrRCtBrSGqNflRI23BExQjftkV0PyE iGB6anmoRsBDS1jfVytZYlOzpaIexEku1/GTGEXx/+ReEOlHTouSetyC/QWTPJdptAVZDNTNA oq7KxsIv0NViwlEKMKFOvzxFXh821zi46nFAMU405fJqQuruOcmAMmpz+sYEIsaOt8fA73dE4 AgVSvieCKZAloRY+yiaR6E5u5qjFA3NwfC0iENankwEzYKz6an+Vz+eyOZeN7MzjHkW17Fjpr ygDLMWu21fm+nBfdBosxbIgLtVLXeJcUq5he3QBpWgybfTDMJJD5Zbgo4d0xKgnbJesEXpdLi OUwvdzsT+r0TfAo0unaGgQQ46s7M5S8JR1VBCjScbtnHuv++2DFNQfZWD5Zyfj1HRGsd+Yhc7 /SPA08jcWqFqjV7Bb76m3wyxeRgccvOSy0W/fUQ== Authentication-Results: mail01.ipfire.org; dkim=pass header.d=mail.com header.s=dbd5af2cbaf7 header.b=N2GBHH9T; dmarc=none; spf=pass (mail01.ipfire.org: domain of jluth@mail.com designates 74.208.4.200 as permitted sender) smtp.mailfrom=jluth@mail.com X-Rspamd-Queue-Id: 47TGY720hvz2Hd X-Spamd-Result: default: False [-3.61 / 11.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; DWL_DNSWL_FAIL(0.00)[74.208.4.200:server fail]; FREEMAIL_FROM(0.00)[mail.com]; R_SPF_ALLOW(-0.20)[+ip4:74.208.4.192/26]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[mail.com:+]; MX_GOOD(-0.01)[mx00.mail.com,mx01.mail.com]; RCVD_IN_DNSWL_LOW(-0.10)[74.208.4.200:from]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[mail.com]; MID_RHS_MATCH_FROM(0.00)[]; BAYES_HAM(-3.00)[100.00%]; DWL_DNSWL_NONE(0.00)[mail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[mail.com:s=dbd5af2cbaf7]; RECEIVED_SPAMHAUS_PBL(0.00)[41.79.25.253:received]; FROM_HAS_DN(0.00)[]; ASN(0.00)[asn:8560, ipnet:74.208.0.0/16, country:DE]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[development@lists.ipfire.org]; DMARC_NA(0.00)[mail.com]; SENDER_REP_HAM(0.00)[asn: 8560(-0.22), country: DE(-0.01), ip: 74.208.4.200(0.00)]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Server: mail01.haj.ipfire.org X-BeenThere: development@lists.ipfire.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: IPFire development talk <development.lists.ipfire.org> List-Unsubscribe: <https://lists.ipfire.org/mailman/options/development>, <mailto:development-request@lists.ipfire.org?subject=unsubscribe> List-Archive: <http://lists.ipfire.org/pipermail/development/> List-Post: <mailto:development@lists.ipfire.org> List-Help: <mailto:development-request@lists.ipfire.org?subject=help> List-Subscribe: <https://lists.ipfire.org/mailman/listinfo/development>, <mailto:development-request@lists.ipfire.org?subject=subscribe> Errors-To: development-bounces@lists.ipfire.org Sender: "Development" <development-bounces@lists.ipfire.org> |
Series |
Fix bug 12252: updxlrator - handle \ in filename
|
|
Commit Message
Justin Luth
Dec. 5, 2019, 1:07 p.m. UTC
Recently some Microsoft patches are alternating between using \ and / as directory separators, leaving files behind in /var/updatecache/download that can only be removed in a terminal. What was happening was that the lock file was created using the URL name containing %5c while wget was actually writing to a filename with a real backslash. The zero-sized lockfile was moved into the uuid folder, and the actual .cab file was orphaned in the download folder, indicated in the GUI as a stranded file that couldn't be deleted. An alternate way to fix the problem could be to force wget to use --restrict-file-names=windows - in which case \ | : ? " * < > would all be saved as escaped codes. That would seem to better match the URL mangling that apparently is happening. Cons: -removing sourceurl mangling means re-downloading existing caches. -less human-readable URLs / filesnames -conceptual alternative: I didn't prove/test this theory since it means re-downloading all oddly-named files. P.S. I didn't change perl's utime command to use updatefile_shellEscaped since the filepath is in quotes - the space didn't actually need to be escaped. --- config/updxlrator/download | 20 ++++++++++++++++---- config/updxlrator/updxlrator | 2 ++ 2 files changed, 18 insertions(+), 4 deletions(-) -- 2.17.1
Comments
Hi, Thank you very much for looking into this. This seems to be a tricky one. > On 5 Dec 2019, at 13:07, Justin Luth <jluth@mail.com> wrote: > > Recently some Microsoft patches are alternating between using > \ and / as directory separators, leaving files behind in > /var/updatecache/download that can only be removed in a terminal. > > What was happening was that the lock file was created using the > URL name containing %5c while wget was actually writing to a filename > with a real backslash. The zero-sized lockfile was moved into the uuid > folder, and the actual .cab file was orphaned in the download folder, > indicated in the GUI as a stranded file that couldn't be deleted. > > An alternate way to fix the problem could be to force wget to > use --restrict-file-names=windows - in which case \ | : ? " * < > > would all be saved as escaped codes. That would seem to better > match the URL mangling that apparently is happening. > Cons: > -removing sourceurl mangling means re-downloading existing caches. > -less human-readable URLs / filesnames > -conceptual alternative: I didn't prove/test this theory since it > means re-downloading all oddly-named files. > > P.S. I didn't change perl's utime command to use updatefile_shellEscaped > since the filepath is in quotes - the space didn't actually need to be > escaped. > > --- > > config/updxlrator/download | 20 ++++++++++++++++---- > config/updxlrator/updxlrator | 2 ++ > 2 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/config/updxlrator/download b/config/updxlrator/download > index afa6e6cb9..df4a58725 100644 > --- a/config/updxlrator/download > +++ b/config/updxlrator/download > @@ -26,6 +26,7 @@ my @http_header=(); > my $remote_size=0; > my $remote_mtime=0; > my $updatefile=''; > +my $updatefile_shellEscaped=''; #double escaped for re-processing by > shell/system calls > my $unique=0; > my $mirror=1; > > @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0; > > umask(0002); > > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename > from the server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by > http://search.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of > the URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =~ s@\%2b@+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped = $updatefile; > +$updatefile_shellEscaped =~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; > $updatefile =~ s@\%20@ @ig; > +$updatefile =~ s@\%5c@\\@ig; > $vendorid =~ tr/A-Z/a-z/; This is really messy code. Not yours. What has been here before. It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters. > unless (-d "$repository/download/$vendorid") > @@ -58,7 +72,7 @@ if($restartdl == 0) > > # hinder multiple downloads from starting simultaneously. Create > empty "lock" file. > # TODO: Another thread may sneak in between these two commands - > so not fool-proof, but good enough? > - system("touch $repository/download/$vendorid/$updatefile"); > + system("touch > $repository/download/$vendorid/$updatefile_shellEscaped"); However, this is all needed because we are calling shell commands here to move a file. I suppose that that is quite dangerous because any forged filenames could be used to run any shell commands. If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system: https://perldoc.perl.org/File/Copy.html#move Would you like to try that option? > > } > else > @@ -169,11 +183,9 @@ if ($_ == 0) > } > > &writelog("Moving file to the cache directory: $vendorid/$uuid"); > - $updatefile =~ s@ @\\ @ig; > - system("mv $repository/download/$vendorid/$updatefile > $repository/$vendorid/$uuid"); > + system("mv $repository/download/$vendorid/$updatefile_shellEscaped > $repository/$vendorid/$uuid"); Same here. > # Workaround for IPCop's mv bug: > utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; > - $updatefile =~ s@\\ @ @ig; > > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); > diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator > index cdc7eeb50..1febae51e 100644 > --- a/config/updxlrator/updxlrator > +++ b/config/updxlrator/updxlrator > @@ -338,11 +338,13 @@ sub check_cache > my $sourceurl=$_[0]; > my $cfmirror=$_[4]; > > + # Any changes made here must be mirrored in "download" > $sourceurl =~ s@\%2b@+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); > $updfile =~ s@\%20@ @ig; > + $updfile =~ s@\%5c@\\@ig; I think we will still need this one then, don’t we? > > if ($cfmirror) > { > > — > 2.17.1 > -Michael
My minimally updated patch is re-included at the end of the message, and comments are responded to inline with the original context. I just tweaked some comment messages, and replaced a system call that I introduced recently with a perl-native implementation. On 12/6/19 9:55 AM, Michael Tremer wrote: > This seems to be a tricky one. > > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename > from the server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by > http://search.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of > the URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =~ s@\%2b@+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped = $updatefile; > +$updatefile_shellEscaped =~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; > $updatefile =~ s@\%20@ @ig; > +$updatefile =~ s@\%5c@\\@ig; > $vendorid =~ tr/A-Z/a-z/; > > It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters. Maybe, but most of the escaped characters are really weird for use in filenames, so likely the approach was "instead of decoding every possibility, lets just do the ones that are actually in use." And in 7 years I haven't noticed backslashes being a problem - but suddenly a new patch URL has popped up (wrongly) requesting a backslash. So now we need to deal with that situation or else have our ipfire cache look increasingly messy. > >> unless (-d "$repository/download/$vendorid") >> @@ -58,7 +72,7 @@ if($restartdl == 0) >> >> # hinder multiple downloads from starting simultaneously. Create >> empty "lock" file. >> # TODO: Another thread may sneak in between these two commands - >> so not fool-proof, but good enough? >> - system("touch $repository/download/$vendorid/$updatefile"); >> + system("touch >> $repository/download/$vendorid/$updatefile_shellEscaped"); > However, this is all needed because we are calling shell commands here to move a file. Not ALL of this is needed because of shell commands. Only the duplicate _shellEscaped variable is needed for the shell commands. The addition of $updatefile =~ s@\%5c@\\@ig; is required regardless of using shell commands or not. > If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system: > > https://perldoc.perl.org/File/Copy.html#move > > Would you like to try that option? Not really, since I'm just a bugfixer and not a perl programmer. But I did try it, and it (silently) failed to work. I also tried to use make_path instead of mkdir -p and that also silently failed. So I abandoned the attempt to remove all system calls. I'm also not eager to introduce regressions, and that is exactly what would happen if I started playing with things that I don't understand and don't know how to debug. > > -Michael Recently some Microsoft patches are alternating between using \ and / as directory separators, leaving files behind in updatecache/download that can only be removed in a terminal. What was happening was that the lock file was created using the URL name containing %5c while wget was actually writing to a filename with a real backslash. The zero-sized lockfile was moved into the uuid folder, and the actual .cab file was orphaned in the download folder, indicated in the GUI as a stranded file that couldn't be deleted. An alternate way to fix the problem could be to force wget to use --restrict-file-names=windows - in which case \ | : ? " * < > would all be saved as escaped codes. That would seem to better match the URL mangling that apparently is happening. Cons: -removing sourceurl mangling means re-downloading existing caches. -less human-readable URLs / filenames -conceptual alternative: I didn't prove/test this theory since it means re-downloading all oddly-named files. P.S. I didn't change perl's utime command to use updatefile_shellEscaped since the filepath is in quotes - the space didn't actually need to be escaped. --- config/updxlrator/download | 21 +++++++++++++++++---- config/updxlrator/updxlrator | 2 ++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/config/updxlrator/download b/config/updxlrator/download index afa6e6cb9..ab89171e4 100644 --- a/config/updxlrator/download +++ b/config/updxlrator/download @@ -26,6 +26,7 @@ my @http_header=(); my $remote_size=0; my $remote_mtime=0; my $updatefile=''; +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls my $unique=0; my $mirror=1; @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0; umask(0002); +# Hopefully correct documentation: based on debugging a "\" problem... +# WGET doesn't use the filename from the URL, but gets the filename from the server info. +# -mangles that name according to --restrict-file-names. +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. +# -(which suggests the entry for %2f is incorrect BTW) +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. +# -achievable either by mangling the URL and/or the filename. +# -*make sure "updxlrator" has the same adjustments* $sourceurl =~ s@\%2b@+@ig; $sourceurl =~ s@\%2f@/@ig; $sourceurl =~ s@\%7e@~@ig; $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); +$updatefile_shellEscaped = $updatefile; +$updatefile_shellEscaped =~ s@\%20@\\ @ig; +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; $updatefile =~ s@\%20@ @ig; +$updatefile =~ s@\%5c@\\@ig; #added to fix ipfire bug 12252 $vendorid =~ tr/A-Z/a-z/; unless (-d "$repository/download/$vendorid") @@ -58,8 +72,9 @@ if($restartdl == 0) # hinder multiple downloads from starting simultaneously. Create empty "lock" file. # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough? - system("touch $repository/download/$vendorid/$updatefile"); + #use a native perl implementation of: system("touch $repository/download/$vendorid/$updatefile_shellEscaped"); + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" } } else { @@ -169,11 +184,9 @@ if ($_ == 0) } &writelog("Moving file to the cache directory: $vendorid/$uuid"); - $updatefile =~ s@ @\\ @ig; - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid"); + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid"); # Workaround for IPCop's mv bug: utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; - $updatefile =~ s@\\ @ @ig; &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator index cdc7eeb50..1febae51e 100644 --- a/config/updxlrator/updxlrator +++ b/config/updxlrator/updxlrator @@ -338,11 +338,13 @@ sub check_cache my $sourceurl=$_[0]; my $cfmirror=$_[4]; + # Any changes made here must be mirrored in "download" $sourceurl =~ s@\%2b@+@ig; $sourceurl =~ s@\%2f@/@ig; $sourceurl =~ s@\%7e@~@ig; $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); $updfile =~ s@\%20@ @ig; + $updfile =~ s@\%5c@\\@ig; if ($cfmirror) { -- 2.17.1
Hi, Sorry for not replying earlier. It is the run-up to Christmas and I am quite busy with a thousand little things. So stay tuned if you do not hear from me very swiftly. > On 9 Dec 2019, at 17:57, Justin Luth <jluth@mail.com> wrote: > > My minimally updated patch is re-included at the end of the message, and > comments are responded to inline with the original context. I just > tweaked some comment messages, and replaced a system call that I > introduced recently with a perl-native implementation. > > On 12/6/19 9:55 AM, Michael Tremer wrote: >> This seems to be a tricky one. >> >> +# Hopefully correct documentation: based on debugging a "\" problem... >> +# WGET doesn't use the filename from the URL, but gets the filename >> from the server info. >> +# -mangles that name according to --restrict-file-names. >> +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. >> +# -(which suggests the entry for %2f is incorrect BTW) >> +# The URL passed to this module is mangled (correct? - perhaps by >> http://search.cpan.org/dist/URI/lib/URI/Escape.pm) >> +# -but $updatefile needs to match wget's download name, regardless of >> the URL's encoding. >> +# -achievable either by mangling the URL and/or the filename. >> +# -*make sure "updxlrator" has the same adjustments* >> $sourceurl =~ s@\%2b@+@ig; >> $sourceurl =~ s@\%2f@/@ig; >> $sourceurl =~ s@\%7e@~@ig; >> $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); >> +$updatefile_shellEscaped = $updatefile; >> +$updatefile_shellEscaped =~ s@\%20@\\ @ig; >> +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; >> $updatefile =~ s@\%20@ @ig; >> +$updatefile =~ s@\%5c@\\@ig; >> $vendorid =~ tr/A-Z/a-z/; >> >> It is quite hard to figure out what was actually supposed to happen here. The replacements look more or less like fixes instead of thinking ahead and decoding any special characters. > Maybe, but most of the escaped characters are really weird for use in > filenames, so likely the approach was "instead of decoding every > possibility, lets just do the ones that are actually in use." And in 7 > years I haven't noticed backslashes being a problem - but suddenly a new > patch URL has popped up (wrongly) requesting a backslash. So now we need > to deal with that situation or else have our ipfire cache look > increasingly messy. Yeah it looks like it, but try and error isn’t always the best approach for software development. >> >>> unless (-d "$repository/download/$vendorid") >>> @@ -58,7 +72,7 @@ if($restartdl == 0) >>> >>> # hinder multiple downloads from starting simultaneously. Create >>> empty "lock" file. >>> # TODO: Another thread may sneak in between these two commands - >>> so not fool-proof, but good enough? >>> - system("touch $repository/download/$vendorid/$updatefile"); >>> + system("touch >>> $repository/download/$vendorid/$updatefile_shellEscaped"); >> However, this is all needed because we are calling shell commands here to move a file. > Not ALL of this is needed because of shell commands. Only the duplicate > _shellEscaped variable is needed for the shell commands. The addition of > > $updatefile =~ s@\%5c@\\@ig; > > is required regardless of using shell commands or not. >> If we would use perl’s move() function, then we could avoid escaping the characters here. They should all be allowed to be used in the file system: >> >> https://perldoc.perl.org/File/Copy.html#move >> >> Would you like to try that option? > Not really, since I'm just a bugfixer and not a perl programmer. But I > did try it, and it (silently) failed to work. I also tried to use > make_path instead of mkdir -p and that also silently failed. So I > abandoned the attempt to remove all system calls. I'm also not eager to > introduce regressions, and that is exactly what would happen if I > started playing with things that I don't understand and don't know how > to debug. LOL, nobody is eager to introduce any regressions :) If you have IO operations like mkdir() you can catch any errors with the “die” keyword: mkdir(“$path/file”) or die “Could not create directory: $!”; That will cause the script to abort and log the message. >> >> -Michael > > Recently some Microsoft patches are alternating between using > \ and / as directory separators, leaving files behind in > updatecache/download that can only be removed in a terminal. > > What was happening was that the lock file was created using the > URL name containing %5c while wget was actually writing to a filename > with a real backslash. The zero-sized lockfile was moved into the uuid > folder, and the actual .cab file was orphaned in the download folder, > indicated in the GUI as a stranded file that couldn't be deleted. > > An alternate way to fix the problem could be to force wget to > use --restrict-file-names=windows - in which case \ | : ? " * < > > would all be saved as escaped codes. That would seem to better > match the URL mangling that apparently is happening. > Cons: > -removing sourceurl mangling means re-downloading existing caches. > -less human-readable URLs / filenames > -conceptual alternative: I didn't prove/test this theory since it > means re-downloading all oddly-named files. I would consider that that is a change that has more implications for the rest of the code which is not robust enough against it. Best, -Michael > > P.S. I didn't change perl's utime command to use updatefile_shellEscaped > since the filepath is in quotes - the space didn't actually need to be > escaped. > --- > config/updxlrator/download | 21 +++++++++++++++++---- > config/updxlrator/updxlrator | 2 ++ > 2 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/config/updxlrator/download b/config/updxlrator/download > index afa6e6cb9..ab89171e4 100644 > --- a/config/updxlrator/download > +++ b/config/updxlrator/download > @@ -26,6 +26,7 @@ my @http_header=(); > my $remote_size=0; > my $remote_mtime=0; > my $updatefile=''; > +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls > my $unique=0; > my $mirror=1; > > @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0; > > umask(0002); > > +# Hopefully correct documentation: based on debugging a "\" problem... > +# WGET doesn't use the filename from the URL, but gets the filename from the server info. > +# -mangles that name according to --restrict-file-names. > +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. > +# -(which suggests the entry for %2f is incorrect BTW) > +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) > +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. > +# -achievable either by mangling the URL and/or the filename. > +# -*make sure "updxlrator" has the same adjustments* > $sourceurl =~ s@\%2b@+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); > +$updatefile_shellEscaped = $updatefile; > +$updatefile_shellEscaped =~ s@\%20@\\ @ig; > +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; > $updatefile =~ s@\%20@ @ig; > +$updatefile =~ s@\%5c@\\@ig; #added to fix ipfire bug 12252 > $vendorid =~ tr/A-Z/a-z/; > > unless (-d "$repository/download/$vendorid") > @@ -58,8 +72,9 @@ if($restartdl == 0) > > # hinder multiple downloads from starting simultaneously. Create empty "lock" file. > # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough? > - system("touch $repository/download/$vendorid/$updatefile"); > > + #use a native perl implementation of: system("touch $repository/download/$vendorid/$updatefile_shellEscaped"); > + { open my $lock_file, ">>", "$repository/download/$vendorid/$updatefile" } > } > else > { > @@ -169,11 +184,9 @@ if ($_ == 0) > } > > &writelog("Moving file to the cache directory: $vendorid/$uuid"); > - $updatefile =~ s@ @\\ @ig; > - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid"); > + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid"); > # Workaround for IPCop's mv bug: > utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; > - $updatefile =~ s@\\ @ @ig; > > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); > &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); > diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator > index cdc7eeb50..1febae51e 100644 > --- a/config/updxlrator/updxlrator > +++ b/config/updxlrator/updxlrator > @@ -338,11 +338,13 @@ sub check_cache > my $sourceurl=$_[0]; > my $cfmirror=$_[4]; > > + # Any changes made here must be mirrored in "download" > $sourceurl =~ s@\%2b@+@ig; > $sourceurl =~ s@\%2f@/@ig; > $sourceurl =~ s@\%7e@~@ig; > $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); > $updfile =~ s@\%20@ @ig; > + $updfile =~ s@\%5c@\\@ig; > > if ($cfmirror) > { > -- > 2.17.1 >
diff --git a/config/updxlrator/download b/config/updxlrator/download index afa6e6cb9..df4a58725 100644 --- a/config/updxlrator/download +++ b/config/updxlrator/download @@ -26,6 +26,7 @@ my @http_header=(); my $remote_size=0; my $remote_mtime=0; my $updatefile=''; +my $updatefile_shellEscaped=''; #double escaped for re-processing by shell/system calls my $unique=0; my $mirror=1; @@ -38,11 +39,24 @@ my $restartdl = defined($ARGV[3]) ? $ARGV[3] : 0; umask(0002); +# Hopefully correct documentation: based on debugging a "\" problem... +# WGET doesn't use the filename from the URL, but gets the filename from the server info. +# -mangles that name according to --restrict-file-names. +# -for unix, only mangles "/", so +, ~, \ etc. are not escaped/mangled. +# -(which suggests the entry for %2f is incorrect BTW) +# The URL passed to this module is mangled (correct? - perhaps by http://search.cpan.org/dist/URI/lib/URI/Escape.pm) +# -but $updatefile needs to match wget's download name, regardless of the URL's encoding. +# -achievable either by mangling the URL and/or the filename. +# -*make sure "updxlrator" has the same adjustments* $sourceurl =~ s@\%2b@+@ig; $sourceurl =~ s@\%2f@/@ig; $sourceurl =~ s@\%7e@~@ig; $updatefile = substr($sourceurl,rindex($sourceurl,"/")+1); +$updatefile_shellEscaped = $updatefile; +$updatefile_shellEscaped =~ s@\%20@\\ @ig; +$updatefile_shellEscaped =~ s@\%5c@\\\\@ig; $updatefile =~ s@\%20@ @ig; +$updatefile =~ s@\%5c@\\@ig; $vendorid =~ tr/A-Z/a-z/; unless (-d "$repository/download/$vendorid") @@ -58,7 +72,7 @@ if($restartdl == 0) # hinder multiple downloads from starting simultaneously. Create empty "lock" file. # TODO: Another thread may sneak in between these two commands - so not fool-proof, but good enough? - system("touch $repository/download/$vendorid/$updatefile"); + system("touch $repository/download/$vendorid/$updatefile_shellEscaped"); } else @@ -169,11 +183,9 @@ if ($_ == 0) } &writelog("Moving file to the cache directory: $vendorid/$uuid"); - $updatefile =~ s@ @\\ @ig; - system("mv $repository/download/$vendorid/$updatefile $repository/$vendorid/$uuid"); + system("mv $repository/download/$vendorid/$updatefile_shellEscaped $repository/$vendorid/$uuid"); # Workaround for IPCop's mv bug: utime time,$remote_mtime,"$repository/$vendorid/$uuid/$updatefile"; - $updatefile =~ s@\\ @ @ig; &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/source.url",$sourceurl); &UPDXLT::setcachestatus("$repository/$vendorid/$uuid/status",$UPDXLT::sfOk); diff --git a/config/updxlrator/updxlrator b/config/updxlrator/updxlrator index cdc7eeb50..1febae51e 100644 --- a/config/updxlrator/updxlrator +++ b/config/updxlrator/updxlrator @@ -338,11 +338,13 @@ sub check_cache my $sourceurl=$_[0]; my $cfmirror=$_[4]; + # Any changes made here must be mirrored in "download" $sourceurl =~ s@\%2b@+@ig; $sourceurl =~ s@\%2f@/@ig; $sourceurl =~ s@\%7e@~@ig; $updfile = substr($sourceurl,rindex($sourceurl,"/")+1); $updfile =~ s@\%20@ @ig; + $updfile =~ s@\%5c@\\@ig; if ($cfmirror) {