Fix bug 11567 updxlrator: don't prematurely release lock, file

Message ID 3407e12c-a7f6-6f39-c2b5-8ce33e325e8e@mail.com
State Dropped
Headers
Series Fix bug 11567 updxlrator: don't prematurely release lock, file |

Commit Message

Justin Luth Dec. 30, 2017, 6:44 p.m. UTC
  With Microsoft's new style of downloading updates,
where portions of a patch are requested multiple times per second,
it has become extremely common for downloads to reach > 100%.
Due to an early unlinking of the "lock" file, there is a big window of
opportunity (between the unlink and wget actually saving some data)
for multiple download/wget threads to start, adding to the same file.
So not only is bandwidth wasted by duplicate downloads running
simultaneously, but the resulting file is corrupt anyway.

The problem is noticed more often by low bandwidth users
(who need the benefits of updxlrator the most)
because then wget's latency is even longer, creating
a very wide window of opportunity.

Ultimately, this needs something like "flock", where the
file is set and tested in one operation. But for now,
settle with the current test / create lock solution, and
just stop unnecessarily releasing the lock.

Since the file already exists as a lock when wget starts,
wget now must ALWAYS run with --continue, which
works fine on a zero-sized file.

Signed-off-by: Justin Luth  <jluth@mail.com>
---
  config/updxlrator/download | 12 ++++++------
  1 file changed, 6 insertions(+), 6 deletions(-)

  my $sourceurl = $ARGV[1]; if (!defined($sourceurl) || $sourceurl eq 
'') { exit; }
@@ -57,16 +56,15 @@ if($restartdl == 0)
      # this is a new download
      exit if (-e "$repository/download/$vendorid/$updatefile");

-    # dotzball: Why is this necessary?
+    # hinder multiple downloads from starting simultaneously. Create 
empty "lock" file.
+    # TODO: Another thread may sneak in between these two commands - so 
not fool-proof, but good enough?
      system("touch $repository/download/$vendorid/$updatefile");
-    $wgetContinueFlag = "-nc";

  }
  else
  {
      # this is a restart of a previous (unfinished) download
      # -> continue download
-    $wgetContinueFlag = "-c";
      &writelog("Continue download: $updatefile");
  }

@@ -133,7 +131,9 @@ unless($restartdl)
  {
      # this is a new download
      # -> download from scratch
-    unlink "$repository/download/$vendorid/$updatefile";
+
+    #already exited earlier if the file existed, and afterwards created 
this empty "lock", so if not empty now, another thread is already 
downloading it.
+    exit if ( -s "$repository/download/$vendorid/$updatefile" );
      unlink "$repository/download/$vendorid/$updatefile.info";
  }

@@ -147,7 +147,7 @@ $dlinfo{'REMOTESIZE'} = $remote_size;
  $dlinfo{'STATUS'} = "1";
  &UPDXLT::writehash("$repository/download/$vendorid/$updatefile.info", 
\%dlinfo);

-my $cmd = "$UPDXLT::wget $login $dlrate 
--user-agent=\"$UPDXLT::useragent\" -q -P $repository/download/$vendorid 
$wgetContinueFlag $sourceurl";
+my $cmd = "$UPDXLT::wget $login $dlrate 
--user-agent=\"$UPDXLT::useragent\" -q -P $repository/download/$vendorid 
--continue $sourceurl";

  $_ = system("$cmd");
  $ENV{'http_proxy'} = '';
  

Comments

Michael Tremer Dec. 31, 2017, 1:33 a.m. UTC | #1
Hello,

good patch and okay with me.

(Has the same line wrapping issue like the previous one)

Best,
-Michael

On Sat, 2017-12-30 at 10:44 +0300, Justin Luth wrote:
> With Microsoft's new style of downloading updates,
> where portions of a patch are requested multiple times per second,
> it has become extremely common for downloads to reach > 100%.
> Due to an early unlinking of the "lock" file, there is a big window of
> opportunity (between the unlink and wget actually saving some data)
> for multiple download/wget threads to start, adding to the same file.
> So not only is bandwidth wasted by duplicate downloads running
> simultaneously, but the resulting file is corrupt anyway.
> 
> The problem is noticed more often by low bandwidth users
> (who need the benefits of updxlrator the most)
> because then wget's latency is even longer, creating
> a very wide window of opportunity.
> 
> Ultimately, this needs something like "flock", where the
> file is set and tested in one operation. But for now,
> settle with the current test / create lock solution, and
> just stop unnecessarily releasing the lock.
> 
> Since the file already exists as a lock when wget starts,
> wget now must ALWAYS run with --continue, which
> works fine on a zero-sized file.
> 
> Signed-off-by: Justin Luth  <jluth@mail.com>
> ---
>   config/updxlrator/download | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/config/updxlrator/download b/config/updxlrator/download
> index dbc722c23..afa6e6cb9 100644
> --- a/config/updxlrator/download
> +++ b/config/updxlrator/download
> @@ -30,7 +30,6 @@ my $unique=0;
>   my $mirror=1;
> 
>   my %dlinfo=();
> -my $wgetContinueFlag="";
> 
>   my $vendorid  = $ARGV[0]; if (!defined($vendorid)  || $vendorid eq '') 
> { exit; }
>   my $sourceurl = $ARGV[1]; if (!defined($sourceurl) || $sourceurl eq 
> '') { exit; }
> @@ -57,16 +56,15 @@ if($restartdl == 0)
>       # this is a new download
>       exit if (-e "$repository/download/$vendorid/$updatefile");
> 
> -    # dotzball: Why is this necessary?
> +    # hinder multiple downloads from starting simultaneously. Create 
> empty "lock" file.
> +    # TODO: Another thread may sneak in between these two commands - so 
> not fool-proof, but good enough?
>       system("touch $repository/download/$vendorid/$updatefile");
> -    $wgetContinueFlag = "-nc";
> 
>   }
>   else
>   {
>       # this is a restart of a previous (unfinished) download
>       # -> continue download
> -    $wgetContinueFlag = "-c";
>       &writelog("Continue download: $updatefile");
>   }
> 
> @@ -133,7 +131,9 @@ unless($restartdl)
>   {
>       # this is a new download
>       # -> download from scratch
> -    unlink "$repository/download/$vendorid/$updatefile";
> +
> +    #already exited earlier if the file existed, and afterwards created 
> this empty "lock", so if not empty now, another thread is already 
> downloading it.
> +    exit if ( -s "$repository/download/$vendorid/$updatefile" );
>       unlink "$repository/download/$vendorid/$updatefile.info";
>   }
> 
> @@ -147,7 +147,7 @@ $dlinfo{'REMOTESIZE'} = $remote_size;
>   $dlinfo{'STATUS'} = "1";
>   &UPDXLT::writehash("$repository/download/$vendorid/$updatefile.info", 
> \%dlinfo);
> 
> -my $cmd = "$UPDXLT::wget $login $dlrate 
> --user-agent=\"$UPDXLT::useragent\" -q -P $repository/download/$vendorid 
> $wgetContinueFlag $sourceurl";
> +my $cmd = "$UPDXLT::wget $login $dlrate 
> --user-agent=\"$UPDXLT::useragent\" -q -P $repository/download/$vendorid 
> --continue $sourceurl";
> 
>   $_ = system("$cmd");
>   $ENV{'http_proxy'} = '';
  

Patch

diff --git a/config/updxlrator/download b/config/updxlrator/download
index dbc722c23..afa6e6cb9 100644
--- a/config/updxlrator/download
+++ b/config/updxlrator/download
@@ -30,7 +30,6 @@  my $unique=0;
  my $mirror=1;

  my %dlinfo=();
-my $wgetContinueFlag="";

  my $vendorid  = $ARGV[0]; if (!defined($vendorid)  || $vendorid eq '') 
{ exit; }