mbox

squid init v2: Update-suggestions for (3.5.xx)-initscript

Message ID 1463248376-1471-1-git-send-email-matthias.fischer@ipfire.org
State Superseded
Headers

Message

Matthias Fischer May 15, 2016, 3:52 a.m. UTC
  These changes are mainly meant for upcoming squid 3.5.xx:

- Raised 'while-loop'-time for stopping squid to 120 seconds.

- Changed the 'while-loop' from using 'statusproc' to 'squid -k check',
  since 'statusproc' didn't find the still running '(squid-1)'-process.
  Thus, 'squid' hadn't enough time to close the index. This led to a
  damaged 'swap.state'-file and "Rebuilding storage in /var/log/cache"
  always ended with "(dirty log)".

- Added some 'boot_mesg' lines in stop-routine.

- Changed the 'flush'-command to delete the entire 'swap.state'-file - it
  will be automatically rebuild during the next start.

Best,
Matthias

Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org>
---
 src/initscripts/init.d/squid | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
  

Comments

Michael Tremer May 15, 2016, 8:37 p.m. UTC | #1
Hi,

thanks for working on this.

I still have a few questions and remarks:

On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote:
> These changes are mainly meant for upcoming squid 3.5.xx:
> 
> - Raised 'while-loop'-time for stopping squid to 120 seconds.

Is there any objective for this value or is this just an estimate?

I assume what you are trying to do here is to find a timeout that is high enough
that even slow storage will have enough time to write back the cache from
memory.

I have seen systems where closing and reopening the cache takes 15-20 minutes.

> - Changed the 'while-loop' from using 'statusproc' to 'squid -k check',
>   since 'statusproc' didn't find the still running '(squid-1)'-process.
>   Thus, 'squid' hadn't enough time to close the index. This led to a
>   damaged 'swap.state'-file and "Rebuilding storage in /var/log/cache"
>   always ended with "(dirty log)".
> 
> - Added some 'boot_mesg' lines in stop-routine.

That debugging output should not be shown to the user. Just show [ OK ] or
[ FAIL ] at the end of the line.

> - Changed the 'flush'-command to delete the entire 'swap.state'-file - it
>   will be automatically rebuild during the next start.

I get that it is intentional to delete this file when it is damaged. It is
however the journal of the cache index and when deleted will require squid to
rebuild the entire cache from the all files in the cache. That will certainly
take a long time (especially on those systems that are likely to lose their
swap.state because of slow storage).

We should *always* avoid this when ever we can.

Generally I take this as a workaround for now since squid does not notice when
the swap.state file was damaged and continues working with corrupt data. Would
you like to file a bug report and see if the squid developers can find a way to
automatically detect when squid is reading a damaged file and discard it then? I
guess that would be the safest solution.

> 
> Best,
> Matthias
> 
> Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org>
> ---
>  src/initscripts/init.d/squid | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/src/initscripts/init.d/squid b/src/initscripts/init.d/squid
> index abed90a..5d9521b 100644
> --- a/src/initscripts/init.d/squid
> +++ b/src/initscripts/init.d/squid
> @@ -108,22 +108,28 @@ case "$1" in
>  			# Wait until all redirectors have been stopped.
>  			wait
>  
> -			# If squid is still running, wait up to 30 seconds
> +			# If squid is still running, wait up to 120 seconds
>  			# before we go on to kill it.
> -			counter=30
> -
> -			while [ ${counter} -gt 0 ]; do
> -				statusproc /usr/sbin/squid >/dev/null &&
> break;
> +			n=0
> +			while /usr/sbin/squid -k check && [ $n -lt 120 ]; do
>  				sleep 1
> -				counter=$(( ${counter} - 1))
> +				n=$(( ${n} + 1 ))
>  			done

So the status output with the dots is gone now. Can we add a notice to the
status line that says something like "Shutting down squid (this may take up to a
few minutes).

>  
> -			# Kill squid service, if still running.
> -			killproc /usr/sbin/squid >/dev/null
> +			# If (squid-1) is still running, kill all squid
> processes
> +			# and delete damaged '/var/log/cache/swap.state'.
> +			if ( ps ax | grep "(squid-1)$" ); then

Can we use pgrep here?

> +				killproc /usr/sbin/squid >/dev/null
> +				/bin/rm -f /var/log/cache/swap.state

Do not use the full path. "rm" should be enough.

> +				boot_mesg "(squid-1) was killed after 120
> seconds."
> +			else
> +				boot_mesg "(squid-1) exited normally,
> shutdown OK."
> +			fi
> +		fi
>  
> -			# Trash remain pid file from squid.
> +			# Delete remaining pid file from squid if it STILL
> exists.
>  			rm -rf /var/run/squid.pid
> -		fi
> +
>  		;;
>  
>  	restart)
> @@ -143,8 +149,7 @@ case "$1" in
>  
>  	flush)
>  		$0 stop
> -		echo > /var/log/cache/swap.state
> -		chown squid.squid /var/log/cache/swap.state
> +		/bin/rm -f /var/log/cache/swap.state

See above for "rm".

I am also wondering what this was supposed to do. These lines are executed when
you click the button to clear the cache on the web user interface. However, the
cache is not really cleared.

The cache is rebuilt, but not cleared. Shouldn't we remove all data in the
directory and reinitialize everything with squid -z?

Here is some documentation about that:

  http://wiki.squid-cache.org/SquidFaq/ClearingTheCache

Just removing the swap.state file still leaves all the data on disk and my
assumption when clearing the cache is that I get rid of the data in it. Either
to clear up space, because it was corrupted, etc...

>  		sleep 1
>  		$0 start
>  		;;
  
Matthias Fischer May 16, 2016, 12:57 a.m. UTC | #2
On 15.05.2016 12:37, Michael Tremer wrote:
> Hi,

Hi,

> thanks for working on this.

No problem... ;-)

> I still have a few questions and remarks:
> 
> On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote:
>> These changes are mainly meant for upcoming squid 3.5.xx:
>> 
>> - Raised 'while-loop'-time for stopping squid to 120 seconds.
> 
> Is there any objective for this value or is this just an estimate?

This value is the most used default. I didn't find a further explanation.

> I assume what you are trying to do here is to find a timeout that is
> high enough that even slow storage will have enough time to write
> back the cache from memory.

Yep.

> I have seen systems where closing and reopening the cache takes 15-20
> minutes.

Yep - I know that those systems exist. But if we don't want to end up
waiting forever, we have to set a border. 120 seconds is the most found
default. Make your choice. ;-)

I set it to 360 seconds now - see below.

>> - Changed the 'while-loop' from using 'statusproc' to 'squid -k
>> check', since 'statusproc' didn't find the still running
>> '(squid-1)'-process. Thus, 'squid' hadn't enough time to close the
>> index. This led to a damaged 'swap.state'-file and "Rebuilding
>> storage in /var/log/cache" always ended with "(dirty log)".
>> 
>> - Added some 'boot_mesg' lines in stop-routine.
> 
> That debugging output should not be shown to the user. Just show [ OK
> ] or [ FAIL ] at the end of the line.

Solved. See below.

>> - Changed the 'flush'-command to delete the entire
>> 'swap.state'-file - it will be automatically rebuild during the
>> next start.
> 
> I get that it is intentional to delete this file when it is damaged.
> It is however the journal of the cache index and when deleted will
> require squid to rebuild the entire cache from the all files in the
> cache. That will certainly take a long time (especially on those
> systems that are likely to lose their swap.state because of slow
> storage).
> 
> We should *always* avoid this when ever we can.

ACK. This is why I altered the loop to 360 seconds. For now.

> Generally I take this as a workaround for now since squid does not
> notice when the swap.state file was damaged and continues working
> with corrupt data. Would you like to file a bug report and see if the
> squid developers can find a way to automatically detect when squid is
> reading a damaged file and discard it then? I guess that would be the
> safest solution.

Yep. For me its a workaround, too. I'll try to file a bug report and see
what happens...

>> 
>> Best, Matthias
>> 
>> Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org> --- 
>> src/initscripts/init.d/squid | 29 +++++++++++++++++------------ 1
>> file changed, 17 insertions(+), 12 deletions(-)
>> 
>> diff --git a/src/initscripts/init.d/squid
>> b/src/initscripts/init.d/squid index abed90a..5d9521b 100644 ---
>> a/src/initscripts/init.d/squid +++ b/src/initscripts/init.d/squid 
>> @@ -108,22 +108,28 @@ case "$1" in # Wait until all redirectors
>> have been stopped. wait
>> 
>> -			# If squid is still running, wait up to 30 seconds +			# If
>> squid is still running, wait up to 120 seconds # before we go on to
>> kill it. -			counter=30 - -			while [ ${counter} -gt 0 ]; do -
>> statusproc /usr/sbin/squid >/dev/null && break; +			n=0 +			while
>> /usr/sbin/squid -k check && [ $n -lt 120 ]; do sleep 1 -
>> counter=$(( ${counter} - 1)) +				n=$(( ${n} + 1 )) done
> 
> So the status output with the dots is gone now. Can we add a notice
> to the status line that says something like "Shutting down squid
> (this may take up to a few minutes).

Working on it. See below.

> 
>> 
>> -			# Kill squid service, if still running. -			killproc
>> /usr/sbin/squid >/dev/null +			# If (squid-1) is still running,
>> kill all squid processes +			# and delete damaged
>> '/var/log/cache/swap.state'. +			if ( ps ax | grep "(squid-1)$" );
>> then
> 
> Can we use pgrep here?

Yes.
Done.
See below.

>> +				killproc /usr/sbin/squid >/dev/null +				/bin/rm -f
>> /var/log/cache/swap.state
> 
> Do not use the full path. "rm" should be enough.

Done.

>> +				boot_mesg "(squid-1) was killed after 120 seconds." +			else +
>> boot_mesg "(squid-1) exited normally, shutdown OK." +			fi +		fi
>> 
>> -			# Trash remain pid file from squid. +			# Delete remaining pid
>> file from squid if it STILL exists. rm -rf /var/run/squid.pid -
>> fi + ;;
>> 
>> restart) @@ -143,8 +149,7 @@ case "$1" in
>> 
>> flush) $0 stop -		echo > /var/log/cache/swap.state -		chown
>> squid.squid /var/log/cache/swap.state +		/bin/rm -f
>> /var/log/cache/swap.state
> 
> See above for "rm".

Done as above. ;-)

> I am also wondering what this was supposed to do. These lines are
> executed when you click the button to clear the cache on the web user
> interface. However, the cache is not really cleared.
> 
> The cache is rebuilt, but not cleared. Shouldn't we remove all data
> in the directory and reinitialize everything with squid -z?
> 
> Here is some documentation about that:
> 
> http://wiki.squid-cache.org/SquidFaq/ClearingTheCache
> 
> Just removing the swap.state file still leaves all the data on disk
> and my assumption when clearing the cache is that I get rid of the
> data in it. Either to clear up space, because it was corrupted,
> etc... ...

This would be consistent (correct translation?).

What about this:

...
# If squid is still running, wait up to 360 seconds
# before we go on to kill it and delete cache structure.
	boot_mesg "Shutting down squid (this may take up to a few minutes)\n"
	n=0
	while [ /usr/sbin/squid -k check > /dev/null 2>&1 ] && [ $n -lt 360 ]; do
	sleep 1
	n=$(( ${n} + 1 ))
	done

# If (squid-1) is still running after 360 seconds,
# kill all squid processes and delete damaged cache structure.
if ( pgrep -fl "(squid-1)" > /dev/null 2>&1 ); then
	killproc /usr/sbin/squid >/dev/null
	rm -r /var/log/squid/cache/*
		boot_mesg -n "You should not be reading this warning.\n"
		boot_mesg -n "Some squid-processes had to be killed after 360 seconds,\n"
		boot_mesg -n "so index file and cache contents had to be deleted.\n"
		boot_mesg -n "Therefore, the complete cache structure must be rebuild\n"
		boot_mesg "during next start."
		echo_warning
			else
		boot_mesg "All squid processes exited normally."
		echo_ok
		boot_mesg ""
	fi
fi

# Delete remaining pid file from squid if it still exists.
	rm -rf /var/run/squid.pid

	;;

Best,
Matthias
  
Michael Tremer May 17, 2016, 1:15 a.m. UTC | #3
On Sun, 2016-05-15 at 16:57 +0200, Matthias Fischer wrote:
> On 15.05.2016 12:37, Michael Tremer wrote:
> > 
> > Hi,
> Hi,
> 
> > 
> > thanks for working on this.
> No problem... ;-)
> 
> > 
> > I still have a few questions and remarks:
> > 
> > On Sat, 2016-05-14 at 19:52 +0200, Matthias Fischer wrote:
> > > 
> > > These changes are mainly meant for upcoming squid 3.5.xx:
> > > 
> > > - Raised 'while-loop'-time for stopping squid to 120 seconds.
> > Is there any objective for this value or is this just an estimate?
> This value is the most used default. I didn't find a further explanation.
> 
> > 
> > I assume what you are trying to do here is to find a timeout that is
> > high enough that even slow storage will have enough time to write
> > back the cache from memory.
> Yep.
> 
> > 
> > I have seen systems where closing and reopening the cache takes 15-20
> > minutes.
> Yep - I know that those systems exist. But if we don't want to end up
> waiting forever, we have to set a border. 120 seconds is the most found
> default. Make your choice. ;-)
> 
> I set it to 360 seconds now - see below.
> 
> > 
> > > 
> > > - Changed the 'while-loop' from using 'statusproc' to 'squid -k
> > > check', since 'statusproc' didn't find the still running
> > > '(squid-1)'-process. Thus, 'squid' hadn't enough time to close the
> > > index. This led to a damaged 'swap.state'-file and "Rebuilding
> > > storage in /var/log/cache" always ended with "(dirty log)".
> > > 
> > > - Added some 'boot_mesg' lines in stop-routine.
> > That debugging output should not be shown to the user. Just show [ OK
> > ] or [ FAIL ] at the end of the line.
> Solved. See below.
> 
> > 
> > > 
> > > - Changed the 'flush'-command to delete the entire
> > > 'swap.state'-file - it will be automatically rebuild during the
> > > next start.
> > I get that it is intentional to delete this file when it is damaged.
> > It is however the journal of the cache index and when deleted will
> > require squid to rebuild the entire cache from the all files in the
> > cache. That will certainly take a long time (especially on those
> > systems that are likely to lose their swap.state because of slow
> > storage).
> > 
> > We should *always* avoid this when ever we can.
> ACK. This is why I altered the loop to 360 seconds. For now.
> 
> > 
> > Generally I take this as a workaround for now since squid does not
> > notice when the swap.state file was damaged and continues working
> > with corrupt data. Would you like to file a bug report and see if the
> > squid developers can find a way to automatically detect when squid is
> > reading a damaged file and discard it then? I guess that would be the
> > safest solution.
> Yep. For me its a workaround, too. I'll try to file a bug report and see
> what happens...

Please connect that to the one on our bugtracker.

> 
> > 
> > > 
> > > 
> > > Best, Matthias
> > > 
> > > Signed-off-by: Matthias Fischer <matthias.fischer@ipfire.org> --- 
> > > src/initscripts/init.d/squid | 29 +++++++++++++++++------------ 1
> > > file changed, 17 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/src/initscripts/init.d/squid
> > > b/src/initscripts/init.d/squid index abed90a..5d9521b 100644 ---
> > > a/src/initscripts/init.d/squid +++ b/src/initscripts/init.d/squid 
> > > @@ -108,22 +108,28 @@ case "$1" in # Wait until all redirectors
> > > have been stopped. wait
> > > 
> > > -			# If squid is still running, wait up to 30
> > > seconds +			# If
> > > squid is still running, wait up to 120 seconds # before we go on to
> > > kill it. -			counter=30 - -			wh
> > > ile [ ${counter} -gt 0 ]; do -
> > > statusproc /usr/sbin/squid >/dev/null && break; +			n
> > > =0 +			while
> > > /usr/sbin/squid -k check && [ $n -lt 120 ]; do sleep 1 -
> > > counter=$(( ${counter} - 1)) +				n=$(( ${n} +
> > > 1 )) done
> > So the status output with the dots is gone now. Can we add a notice
> > to the status line that says something like "Shutting down squid
> > (this may take up to a few minutes).
> Working on it. See below.
> 
> > 
> > 
> > > 
> > > 
> > > -			# Kill squid service, if still running. -		
> > > 	killproc
> > > /usr/sbin/squid >/dev/null +			# If (squid-1) is
> > > still running,
> > > kill all squid processes +			# and delete damaged
> > > '/var/log/cache/swap.state'. +			if ( ps ax | grep
> > > "(squid-1)$" );
> > > then
> > Can we use pgrep here?
> Yes.
> Done.
> See below.
> 
> > 
> > > 
> > > +				killproc /usr/sbin/squid >/dev/null +	
> > > 			/bin/rm -f
> > > /var/log/cache/swap.state
> > Do not use the full path. "rm" should be enough.
> Done.
> 
> > 
> > > 
> > > +				boot_mesg "(squid-1) was killed after 120
> > > seconds." +			else +
> > > boot_mesg "(squid-1) exited normally, shutdown OK." +			
> > > fi +		fi
> > > 
> > > -			# Trash remain pid file from squid. +		
> > > 	# Delete remaining pid
> > > file from squid if it STILL exists. rm -rf /var/run/squid.pid -
> > > fi + ;;
> > > 
> > > restart) @@ -143,8 +149,7 @@ case "$1" in
> > > 
> > > flush) $0 stop -		echo > /var/log/cache/swap.state -		
> > > chown
> > > squid.squid /var/log/cache/swap.state +		/bin/rm -f
> > > /var/log/cache/swap.state
> > See above for "rm".
> Done as above. ;-)
> 
> > 
> > I am also wondering what this was supposed to do. These lines are
> > executed when you click the button to clear the cache on the web user
> > interface. However, the cache is not really cleared.
> > 
> > The cache is rebuilt, but not cleared. Shouldn't we remove all data
> > in the directory and reinitialize everything with squid -z?
> > 
> > Here is some documentation about that:
> > 
> > http://wiki.squid-cache.org/SquidFaq/ClearingTheCache
> > 
> > Just removing the swap.state file still leaves all the data on disk
> > and my assumption when clearing the cache is that I get rid of the
> > data in it. Either to clear up space, because it was corrupted,
> > etc... ...
> This would be consistent (correct translation?).
> 
> What about this:
> 
> ...
> # If squid is still running, wait up to 360 seconds
> # before we go on to kill it and delete cache structure.
> 	boot_mesg "Shutting down squid (this may take up to a few minutes)\n"
> 	n=0
> 	while [ /usr/sbin/squid -k check > /dev/null 2>&1 ] && [ $n -lt 360 ];
> do
> 	sleep 1
> 	n=$(( ${n} + 1 ))
> 	done
> 
> # If (squid-1) is still running after 360 seconds,
> # kill all squid processes and delete damaged cache structure.
> if ( pgrep -fl "(squid-1)" > /dev/null 2>&1 ); then
> 	killproc /usr/sbin/squid >/dev/null
> 	rm -r /var/log/squid/cache/*
> 		boot_mesg -n "You should not be reading this warning.\n"
> 		boot_mesg -n "Some squid-processes had to be killed after 360
> seconds,\n"
> 		boot_mesg -n "so index file and cache contents had to be
> deleted.\n"
> 		boot_mesg -n "Therefore, the complete cache structure must be
> rebuild\n"
> 		boot_mesg "during next start."
> 		echo_warning
> 			else
> 		boot_mesg "All squid processes exited normally."
> 		echo_ok
> 		boot_mesg ""
> 	fi
> fi
> 
> # Delete remaining pid file from squid if it still exists.
> 	rm -rf /var/run/squid.pid
> 
> 	;;
> 
> Best,
> Matthias