squid init v2: Update-suggestions for (3.5.xx)-initscript
| Message ID | 1463248376-1471-1-git-send-email-matthias.fischer@ipfire.org | 
|---|---|
| State | Superseded | 
| Headers | Return-Path: <development-bounces@lists.ipfire.org> Received: from mail01.ipfire.org (hedwig.ipfire.org [172.28.1.200]) by web02.ipfire.org (Postfix) with ESMTP id 46FE962924 for <patchwork@ipfire.org>; Sat, 14 May 2016 19:53:05 +0200 (CEST) Received: from mail01.ipfire.org (localhost [IPv6:::1]) by mail01.ipfire.org (Postfix) with ESMTP id 8E10EC17; Sat, 14 May 2016 19:53:04 +0200 (CEST) Received: from Devel.localdomain (p5DD82396.dip0.t-ipconnect.de [93.216.35.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 6B20BABE for <development@lists.ipfire.org>; Sat, 14 May 2016 19:53:00 +0200 (CEST) From: Matthias Fischer <matthias.fischer@ipfire.org> To: development@lists.ipfire.org Subject: [PATCH] squid init v2: Update-suggestions for (3.5.xx)-initscript Date: Sat, 14 May 2016 19:52:56 +0200 Message-Id: <1463248376-1471-1-git-send-email-matthias.fischer@ipfire.org> X-Mailer: git-send-email 2.8.2 X-BeenThere: development@lists.ipfire.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: IPFire development talk <development.lists.ipfire.org> List-Unsubscribe: <http://lists.ipfire.org/mailman/options/development>, <mailto:development-request@lists.ipfire.org?subject=unsubscribe> List-Archive: <http://lists.ipfire.org/pipermail/development/> List-Post: <mailto:development@lists.ipfire.org> List-Help: <mailto:development-request@lists.ipfire.org?subject=help> List-Subscribe: <http://lists.ipfire.org/mailman/listinfo/development>, <mailto:development-request@lists.ipfire.org?subject=subscribe> Errors-To: development-bounces@lists.ipfire.org Sender: "Development" <development-bounces@lists.ipfire.org> | 
Message
    Matthias Fischer
    15 May 2016, 3:52 a.m. UTC
  
  
These changes are mainly meant for upcoming squid 3.5.xx:
- Raised 'while-loop'-time for stopping squid to 120 seconds.
- Changed the 'while-loop' from using 'statusproc' to 'squid -k check',
  since 'statusproc' didn't find the still running '(squid-1)'-process.
  Thus, 'squid' hadn't enough time to close the index. This led to a
  damaged 'swap.state'-file and "Rebuilding storage in /var/log/cache"
  always ended with "(dirty log)".
- Added some 'boot_mesg' lines in stop-routine.
- Changed the 'flush'-command to delete the entire 'swap.state'-file - it
  will be automatically rebuild during the next start.
Best,
Matthias
Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org>
---
 src/initscripts/init.d/squid | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
  
Comments
Hi, thanks for working on this. I still have a few questions and remarks: On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote: > These changes are mainly meant for upcoming squid 3.5.xx: > > - Raised 'while-loop'-time for stopping squid to 120 seconds. Is there any objective for this value or is this just an estimate? I assume what you are trying to do here is to find a timeout that is high enough that even slow storage will have enough time to write back the cache from memory. I have seen systems where closing and reopening the cache takes 15-20 minutes. > - Changed the 'while-loop' from using 'statusproc' to 'squid -k check', > since 'statusproc' didn't find the still running '(squid-1)'-process. > Thus, 'squid' hadn't enough time to close the index. This led to a > damaged 'swap.state'-file and "Rebuilding storage in /var/log/cache" > always ended with "(dirty log)". > > - Added some 'boot_mesg' lines in stop-routine. That debugging output should not be shown to the user. Just show [ OK ] or [ FAIL ] at the end of the line. > - Changed the 'flush'-command to delete the entire 'swap.state'-file - it > will be automatically rebuild during the next start. I get that it is intentional to delete this file when it is damaged. It is however the journal of the cache index and when deleted will require squid to rebuild the entire cache from the all files in the cache. That will certainly take a long time (especially on those systems that are likely to lose their swap.state because of slow storage). We should *always* avoid this when ever we can. Generally I take this as a workaround for now since squid does not notice when the swap.state file was damaged and continues working with corrupt data. Would you like to file a bug report and see if the squid developers can find a way to automatically detect when squid is reading a damaged file and discard it then? I guess that would be the safest solution. > > Best, > Matthias > > Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org> > --- > src/initscripts/init.d/squid | 29 +++++++++++++++++------------ > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/src/initscripts/init.d/squid b/src/initscripts/init.d/squid > index abed90a..5d9521b 100644 > --- a/src/initscripts/init.d/squid > +++ b/src/initscripts/init.d/squid > @@ -108,22 +108,28 @@ case "$1" in > # Wait until all redirectors have been stopped. > wait > > - # If squid is still running, wait up to 30 seconds > + # If squid is still running, wait up to 120 seconds > # before we go on to kill it. > - counter=30 > - > - while [ ${counter} -gt 0 ]; do > - statusproc /usr/sbin/squid >/dev/null && > break; > + n=0 > + while /usr/sbin/squid -k check && [ $n -lt 120 ]; do > sleep 1 > - counter=$(( ${counter} - 1)) > + n=$(( ${n} + 1 )) > done So the status output with the dots is gone now. Can we add a notice to the status line that says something like "Shutting down squid (this may take up to a few minutes). > > - # Kill squid service, if still running. > - killproc /usr/sbin/squid >/dev/null > + # If (squid-1) is still running, kill all squid > processes > + # and delete damaged '/var/log/cache/swap.state'. > + if ( ps ax | grep "(squid-1)$" ); then Can we use pgrep here? > + killproc /usr/sbin/squid >/dev/null > + /bin/rm -f /var/log/cache/swap.state Do not use the full path. "rm" should be enough. > + boot_mesg "(squid-1) was killed after 120 > seconds." > + else > + boot_mesg "(squid-1) exited normally, > shutdown OK." > + fi > + fi > > - # Trash remain pid file from squid. > + # Delete remaining pid file from squid if it STILL > exists. > rm -rf /var/run/squid.pid > - fi > + > ;; > > restart) > @@ -143,8 +149,7 @@ case "$1" in > > flush) > $0 stop > - echo > /var/log/cache/swap.state > - chown squid.squid /var/log/cache/swap.state > + /bin/rm -f /var/log/cache/swap.state See above for "rm". I am also wondering what this was supposed to do. These lines are executed when you click the button to clear the cache on the web user interface. However, the cache is not really cleared. The cache is rebuilt, but not cleared. Shouldn't we remove all data in the directory and reinitialize everything with squid -z? Here is some documentation about that: http://wiki.squid-cache.org/SquidFaq/ClearingTheCache Just removing the swap.state file still leaves all the data on disk and my assumption when clearing the cache is that I get rid of the data in it. Either to clear up space, because it was corrupted, etc... > sleep 1 > $0 start > ;;
On 15.05.2016 12:37, Michael Tremer wrote: > Hi, Hi, > thanks for working on this. No problem... ;-) > I still have a few questions and remarks: > > On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote: >> These changes are mainly meant for upcoming squid 3.5.xx: >> >> - Raised 'while-loop'-time for stopping squid to 120 seconds. > > Is there any objective for this value or is this just an estimate? This value is the most used default. I didn't find a further explanation. > I assume what you are trying to do here is to find a timeout that is > high enough that even slow storage will have enough time to write > back the cache from memory. Yep. > I have seen systems where closing and reopening the cache takes 15-20 > minutes. Yep - I know that those systems exist. But if we don't want to end up waiting forever, we have to set a border. 120 seconds is the most found default. Make your choice. ;-) I set it to 360 seconds now - see below. >> - Changed the 'while-loop' from using 'statusproc' to 'squid -k >> check', since 'statusproc' didn't find the still running >> '(squid-1)'-process. Thus, 'squid' hadn't enough time to close the >> index. This led to a damaged 'swap.state'-file and "Rebuilding >> storage in /var/log/cache" always ended with "(dirty log)". >> >> - Added some 'boot_mesg' lines in stop-routine. > > That debugging output should not be shown to the user. Just show [ OK > ] or [ FAIL ] at the end of the line. Solved. See below. >> - Changed the 'flush'-command to delete the entire >> 'swap.state'-file - it will be automatically rebuild during the >> next start. > > I get that it is intentional to delete this file when it is damaged. > It is however the journal of the cache index and when deleted will > require squid to rebuild the entire cache from the all files in the > cache. That will certainly take a long time (especially on those > systems that are likely to lose their swap.state because of slow > storage). > > We should *always* avoid this when ever we can. ACK. This is why I altered the loop to 360 seconds. For now. > Generally I take this as a workaround for now since squid does not > notice when the swap.state file was damaged and continues working > with corrupt data. Would you like to file a bug report and see if the > squid developers can find a way to automatically detect when squid is > reading a damaged file and discard it then? I guess that would be the > safest solution. Yep. For me its a workaround, too. I'll try to file a bug report and see what happens... >> >> Best, Matthias >> >> Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org> --- >> src/initscripts/init.d/squid | 29 +++++++++++++++++------------ 1 >> file changed, 17 insertions(+), 12 deletions(-) >> >> diff --git a/src/initscripts/init.d/squid >> b/src/initscripts/init.d/squid index abed90a..5d9521b 100644 --- >> a/src/initscripts/init.d/squid +++ b/src/initscripts/init.d/squid >> @@ -108,22 +108,28 @@ case "$1" in # Wait until all redirectors >> have been stopped. wait >> >> - # If squid is still running, wait up to 30 seconds + # If >> squid is still running, wait up to 120 seconds # before we go on to >> kill it. - counter=30 - - while [ ${counter} -gt 0 ]; do - >> statusproc /usr/sbin/squid >/dev/null && break; + n=0 + while >> /usr/sbin/squid -k check && [ $n -lt 120 ]; do sleep 1 - >> counter=$(( ${counter} - 1)) + n=$(( ${n} + 1 )) done > > So the status output with the dots is gone now. Can we add a notice > to the status line that says something like "Shutting down squid > (this may take up to a few minutes). Working on it. See below. > >> >> - # Kill squid service, if still running. - killproc >> /usr/sbin/squid >/dev/null + # If (squid-1) is still running, >> kill all squid processes + # and delete damaged >> '/var/log/cache/swap.state'. + if ( ps ax | grep "(squid-1)$" ); >> then > > Can we use pgrep here? Yes. Done. See below. >> + killproc /usr/sbin/squid >/dev/null + /bin/rm -f >> /var/log/cache/swap.state > > Do not use the full path. "rm" should be enough. Done. >> + boot_mesg "(squid-1) was killed after 120 seconds." + else + >> boot_mesg "(squid-1) exited normally, shutdown OK." + fi + fi >> >> - # Trash remain pid file from squid. + # Delete remaining pid >> file from squid if it STILL exists. rm -rf /var/run/squid.pid - >> fi + ;; >> >> restart) @@ -143,8 +149,7 @@ case "$1" in >> >> flush) $0 stop - echo > /var/log/cache/swap.state - chown >> squid.squid /var/log/cache/swap.state + /bin/rm -f >> /var/log/cache/swap.state > > See above for "rm". Done as above. ;-) > I am also wondering what this was supposed to do. These lines are > executed when you click the button to clear the cache on the web user > interface. However, the cache is not really cleared. > > The cache is rebuilt, but not cleared. Shouldn't we remove all data > in the directory and reinitialize everything with squid -z? > > Here is some documentation about that: > > http://wiki.squid-cache.org/SquidFaq/ClearingTheCache > > Just removing the swap.state file still leaves all the data on disk > and my assumption when clearing the cache is that I get rid of the > data in it. Either to clear up space, because it was corrupted, > etc... ... This would be consistent (correct translation?). What about this: ... # If squid is still running, wait up to 360 seconds # before we go on to kill it and delete cache structure. boot_mesg "Shutting down squid (this may take up to a few minutes)\n" n=0 while [ /usr/sbin/squid -k check > /dev/null 2>&1 ] && [ $n -lt 360 ]; do sleep 1 n=$(( ${n} + 1 )) done # If (squid-1) is still running after 360 seconds, # kill all squid processes and delete damaged cache structure. if ( pgrep -fl "(squid-1)" > /dev/null 2>&1 ); then killproc /usr/sbin/squid >/dev/null rm -r /var/log/squid/cache/* boot_mesg -n "You should not be reading this warning.\n" boot_mesg -n "Some squid-processes had to be killed after 360 seconds,\n" boot_mesg -n "so index file and cache contents had to be deleted.\n" boot_mesg -n "Therefore, the complete cache structure must be rebuild\n" boot_mesg "during next start." echo_warning else boot_mesg "All squid processes exited normally." echo_ok boot_mesg "" fi fi # Delete remaining pid file from squid if it still exists. rm -rf /var/run/squid.pid ;; Best, Matthias
On Sun, 2016-05-15 at 16:57 +0200, Matthias Fischer wrote: > On 15.05.2016 12:37, Michael Tremer wrote: > > > > Hi, > Hi, > > > > > thanks for working on this. > No problem... ;-) > > > > > I still have a few questions and remarks: > > > > On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote: > > > > > > These changes are mainly meant for upcoming squid 3.5.xx: > > > > > > - Raised 'while-loop'-time for stopping squid to 120 seconds. > > Is there any objective for this value or is this just an estimate? > This value is the most used default. I didn't find a further explanation. > > > > > I assume what you are trying to do here is to find a timeout that is > > high enough that even slow storage will have enough time to write > > back the cache from memory. > Yep. > > > > > I have seen systems where closing and reopening the cache takes 15-20 > > minutes. > Yep - I know that those systems exist. But if we don't want to end up > waiting forever, we have to set a border. 120 seconds is the most found > default. Make your choice. ;-) > > I set it to 360 seconds now - see below. > > > > > > > > > - Changed the 'while-loop' from using 'statusproc' to 'squid -k > > > check', since 'statusproc' didn't find the still running > > > '(squid-1)'-process. Thus, 'squid' hadn't enough time to close the > > > index. This led to a damaged 'swap.state'-file and "Rebuilding > > > storage in /var/log/cache" always ended with "(dirty log)". > > > > > > - Added some 'boot_mesg' lines in stop-routine. > > That debugging output should not be shown to the user. Just show [ OK > > ] or [ FAIL ] at the end of the line. > Solved. See below. > > > > > > > > > - Changed the 'flush'-command to delete the entire > > > 'swap.state'-file - it will be automatically rebuild during the > > > next start. > > I get that it is intentional to delete this file when it is damaged. > > It is however the journal of the cache index and when deleted will > > require squid to rebuild the entire cache from the all files in the > > cache. That will certainly take a long time (especially on those > > systems that are likely to lose their swap.state because of slow > > storage). > > > > We should *always* avoid this when ever we can. > ACK. This is why I altered the loop to 360 seconds. For now. > > > > > Generally I take this as a workaround for now since squid does not > > notice when the swap.state file was damaged and continues working > > with corrupt data. Would you like to file a bug report and see if the > > squid developers can find a way to automatically detect when squid is > > reading a damaged file and discard it then? I guess that would be the > > safest solution. > Yep. For me its a workaround, too. I'll try to file a bug report and see > what happens... Please connect that to the one on our bugtracker. > > > > > > > > > > > > Best, Matthias > > > > > > Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org> --- > > > src/initscripts/init.d/squid | 29 +++++++++++++++++------------ 1 > > > file changed, 17 insertions(+), 12 deletions(-) > > > > > > diff --git a/src/initscripts/init.d/squid > > > b/src/initscripts/init.d/squid index abed90a..5d9521b 100644 --- > > > a/src/initscripts/init.d/squid +++ b/src/initscripts/init.d/squid > > > @@ -108,22 +108,28 @@ case "$1" in # Wait until all redirectors > > > have been stopped. wait > > > > > > - # If squid is still running, wait up to 30 > > > seconds + # If > > > squid is still running, wait up to 120 seconds # before we go on to > > > kill it. - counter=30 - - wh > > > ile [ ${counter} -gt 0 ]; do - > > > statusproc /usr/sbin/squid >/dev/null && break; + n > > > =0 + while > > > /usr/sbin/squid -k check && [ $n -lt 120 ]; do sleep 1 - > > > counter=$(( ${counter} - 1)) + n=$(( ${n} + > > > 1 )) done > > So the status output with the dots is gone now. Can we add a notice > > to the status line that says something like "Shutting down squid > > (this may take up to a few minutes). > Working on it. See below. > > > > > > > > > > > > > > - # Kill squid service, if still running. - > > > killproc > > > /usr/sbin/squid >/dev/null + # If (squid-1) is > > > still running, > > > kill all squid processes + # and delete damaged > > > '/var/log/cache/swap.state'. + if ( ps ax | grep > > > "(squid-1)$" ); > > > then > > Can we use pgrep here? > Yes. > Done. > See below. > > > > > > > > > + killproc /usr/sbin/squid >/dev/null + > > > /bin/rm -f > > > /var/log/cache/swap.state > > Do not use the full path. "rm" should be enough. > Done. > > > > > > > > > + boot_mesg "(squid-1) was killed after 120 > > > seconds." + else + > > > boot_mesg "(squid-1) exited normally, shutdown OK." + > > > fi + fi > > > > > > - # Trash remain pid file from squid. + > > > # Delete remaining pid > > > file from squid if it STILL exists. rm -rf /var/run/squid.pid - > > > fi + ;; > > > > > > restart) @@ -143,8 +149,7 @@ case "$1" in > > > > > > flush) $0 stop - echo > /var/log/cache/swap.state - > > > chown > > > squid.squid /var/log/cache/swap.state + /bin/rm -f > > > /var/log/cache/swap.state > > See above for "rm". > Done as above. ;-) > > > > > I am also wondering what this was supposed to do. These lines are > > executed when you click the button to clear the cache on the web user > > interface. However, the cache is not really cleared. > > > > The cache is rebuilt, but not cleared. Shouldn't we remove all data > > in the directory and reinitialize everything with squid -z? > > > > Here is some documentation about that: > > > > http://wiki.squid-cache.org/SquidFaq/ClearingTheCache > > > > Just removing the swap.state file still leaves all the data on disk > > and my assumption when clearing the cache is that I get rid of the > > data in it. Either to clear up space, because it was corrupted, > > etc... ... > This would be consistent (correct translation?). > > What about this: > > ... > # If squid is still running, wait up to 360 seconds > # before we go on to kill it and delete cache structure. > boot_mesg "Shutting down squid (this may take up to a few minutes)\n" > n=0 > while [ /usr/sbin/squid -k check > /dev/null 2>&1 ] && [ $n -lt 360 ]; > do > sleep 1 > n=$(( ${n} + 1 )) > done > > # If (squid-1) is still running after 360 seconds, > # kill all squid processes and delete damaged cache structure. > if ( pgrep -fl "(squid-1)" > /dev/null 2>&1 ); then > killproc /usr/sbin/squid >/dev/null > rm -r /var/log/squid/cache/* > boot_mesg -n "You should not be reading this warning.\n" > boot_mesg -n "Some squid-processes had to be killed after 360 > seconds,\n" > boot_mesg -n "so index file and cache contents had to be > deleted.\n" > boot_mesg -n "Therefore, the complete cache structure must be > rebuild\n" > boot_mesg "during next start." > echo_warning > else > boot_mesg "All squid processes exited normally." > echo_ok > boot_mesg "" > fi > fi > > # Delete remaining pid file from squid if it still exists. > rm -rf /var/run/squid.pid > > ;; > > Best, > Matthias