collectd: Do not sync

Message ID 20240130180152.4161491-1-michael.tremer@ipfire.org
State Staged
Commit 500b6311b439dd480ca2fb715a6f1a05b33fcad5
Headers
Series collectd: Do not sync |

Commit Message

Michael Tremer Jan. 30, 2024, 6:01 p.m. UTC
  Calling a global sync operation manually is generally a bad idea as it
can block for forever. If people have storage that does not retain
anything that is being written to it, they need to fix their hardware.

Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
---
 src/initscripts/system/collectd | 3 ---
 1 file changed, 3 deletions(-)
  

Comments

Robin Roevens Jan. 30, 2024, 7:50 p.m. UTC | #1
Hi Michael, all

First, this mail does not contain critique, I'm only trying to learn,
if I'm wrong. But I think your remark about fixing hardware is not
correct?

In my understanding, sync synchronizes the not-yet-written data
currently in kernel managed cache memory (hence in volatile system
memory) to permanent storage. 
Normally storage hardware also has its own cache, which I presume you
are referring to, but I don't think sync cares or even knows about that
and that is/should be handled within the hardware itself. 
As soon as storage hardware returns an ACK on a write operation, sync
is happy. Not actually guaranteeing any physical write, only that all
to-be-written data is actually delivered to the storage hardware. 

I do agree that a (full) sync there is probably overkill. The content
of the config will be available to the collectd daemon whether it is
actually written on disk or still in cache, so no sync needed. 
If the server crashes/powercycles/... on that point, I think a possible
commented line that should be uncommented in that config is then
probably of the least concern.
On the other hand, a sync that blocks forever, is probably more of an
indicator that the hardware should be fixed, as this would mean sync
doesn't get all the ACK's it expects.

Please correct me if I'm wrong.


Regards
Robin

Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]:
> Calling a global sync operation manually is generally a bad idea as
> it
> can block for forever. If people have storage that does not retain
> anything that is being written to it, they need to fix their
> hardware.
> 
> Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
> ---
>  src/initscripts/system/collectd | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/src/initscripts/system/collectd
> b/src/initscripts/system/collectd
> index bb8a2f54f..56b799d56 100644
> --- a/src/initscripts/system/collectd
> +++ b/src/initscripts/system/collectd
> @@ -146,9 +146,6 @@ case "$1" in
>  			sed -i -e "s|^#LoadPlugin swap|LoadPlugin
> swap|g" /etc/collectd.conf
>  		fi
>  
> -		# sync after config update...
> -		sync
> -
>  		if [ $(date +%Y) -gt 2011 ]; then
>  			boot_mesg "Starting Collection daemon..."
>  			/usr/sbin/collectd -C /etc/collectd.conf
> -- 
> 2.39.2
> 
>
  
Michael Tremer Jan. 31, 2024, 10:54 a.m. UTC | #2
Hello Robin,

> On 30 Jan 2024, at 19:50, Robin Roevens <robin.roevens@disroot.org> wrote:
> 
> Hi Michael, all
> 
> First, this mail does not contain critique, I'm only trying to learn,
> if I'm wrong. But I think your remark about fixing hardware is not
> correct?
> 
> In my understanding, sync synchronizes the not-yet-written data
> currently in kernel managed cache memory (hence in volatile system
> memory) to permanent storage. 

Yes.

> Normally storage hardware also has its own cache, which I presume you
> are referring to, but I don't think sync cares or even knows about that
> and that is/should be handled within the hardware itself. 
> As soon as storage hardware returns an ACK on a write operation, sync
> is happy. Not actually guaranteeing any physical write, only that all
> to-be-written data is actually delivered to the storage hardware.

Well, this is all a little bit more complicated, because it massively depends on the type of hardware we are talking about…

A classic example for a piece of hardware with a cache is a RAID controller. When we call sync(), the kernel sends all data to that RAID controller and as soon as the last operation has been confirmed, sync() is considered done. That does not mean that any of that data has actually been written to any physical storage device. RAID controllers have some temporary caches that are persistent and so that data will not be lost even if there was a power outage immediately after the sync().

Then there is plain old hard drives. They traditionally did not use to be very smart. So if the kernel wanted to write something, it would have been written straight to disk and once that was done, sync() would have returned.

Those two scenarios are not a problem at all.

Then there is the more problematic type of hardware which is usually (cheap?) flash storage. Those devices very often have a faster cache and then very slow persistent storage. To pretend that a device is much faster than it actually is, blocks are being written to a volatile cache and later written to the persistent storage, but there is no power source in case of a sudden loss of power.

We discovered this with a regular reboot where we also call sync to make sure that everything is properly written to disk and then we make the system reboot. Those devices confirm that everything is written when it isn’t but then the system cycles power and the last blocks are gone. That will result in a corrupt file system and those systems will perform a filesystem check (and usually repair too) at the next boot.

The visual way to see this phenomenon is a USB stick with a USB light. Calling sync() will return but the light will continue flashing because the device is actually busy, but has told the OS that it is already done. And hence we have this problem...

> I do agree that a (full) sync there is probably overkill. The content
> of the config will be available to the collectd daemon whether it is
> actually written on disk or still in cache, so no sync needed. 
> If the server crashes/powercycles/... on that point, I think a possible
> commented line that should be uncommented in that config is then
> probably of the least concern.

On top of all of this comes that the “flash image” that we ship used to come with no journal. In a journaling file system, a loss of power would either result in the old version of that file, or the new one - depending on how far the writes have come. Neither of that is a problem in our scenario. Without the journal, it is quite likely to have a broken file.

> On the other hand, a sync that blocks forever, is probably more of an
> indicator that the hardware should be fixed, as this would mean sync
> doesn't get all the ACK's it expects.

It’s not hardware. The system I was dealing with had an NFS volume mounted which was unavailable for a moment. That caused the entire dial-in sequence to hang because collectd was waiting for a sync() to finish just after reconnecting.

So, calling sync() is generally a very bad idea. It works around bugs that should not be there in the first place, but is just creating some fancy race condition where you increase chances that no data will be lost, but it will never be anywhere acceptable for me.

Ergo, don’t buy cheap flash. Use a journaling file system. If your flash dies because you are writing to it it was not fit for purpose.

As you can see I am very annoyed about these things because over all those years they have cost us so much time to debug and “fix” and it simply is not worth it…

> Please correct me if I'm wrong.

I hope this clears it up :)

-Michael

> 
> 
> Regards
> Robin
> 
> Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]:
>> Calling a global sync operation manually is generally a bad idea as
>> it
>> can block for forever. If people have storage that does not retain
>> anything that is being written to it, they need to fix their
>> hardware.
>> 
>> Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
>> ---
>>  src/initscripts/system/collectd | 3 ---
>>  1 file changed, 3 deletions(-)
>> 
>> diff --git a/src/initscripts/system/collectd
>> b/src/initscripts/system/collectd
>> index bb8a2f54f..56b799d56 100644
>> --- a/src/initscripts/system/collectd
>> +++ b/src/initscripts/system/collectd
>> @@ -146,9 +146,6 @@ case "$1" in
>>   sed -i -e "s|^#LoadPlugin swap|LoadPlugin
>> swap|g" /etc/collectd.conf
>>   fi
>>  
>> - # sync after config update...
>> - sync
>> -
>>   if [ $(date +%Y) -gt 2011 ]; then
>>   boot_mesg "Starting Collection daemon..."
>>   /usr/sbin/collectd -C /etc/collectd.conf
>> -- 
>> 2.39.2
>> 
>> 
> 
> -- 
> Dit bericht is gescanned op virussen en andere gevaarlijke
> inhoud door MailScanner en lijkt schoon te zijn.
>
  
Robin Roevens Jan. 31, 2024, 11:50 p.m. UTC | #3
Hi Michael

Thanks for the clear explanation. And indeed I can imagine sync locking
up on a stale NFS mount when it still needs data written to it..I
didn't consider that as I don't have NFS mounts on my ipfire instance,
but other users may indeed have. It does however fits in my "hardware
problem" scenario, as sync won't get an ACK from the stale NFS mount..
Only networking errors may be more frequent of a problem than actual
hardware failure, increasing the chances that sync indeed locks up.

And it is as you say, a journaling fs should not have a problem when no
manual syncs are performed during 'crucial' operations.

The usb flash drive story is new for me, but now that I read your
explanation, I now understand why I sometimes have trouble with corrupt
usb sticks where I just wrote a boot image to and then restart the
machine to boot from that stick to find out it doesn't want to boot due
to a corrupt root partition. Now I know exactly why this sometimes
happens :-)

Always happy to learn! Thanks

Regards
Robin

Michael Tremer schreef op wo 31-01-2024 om 10:54 [+0000]:
> Hello Robin,
> 
> > On 30 Jan 2024, at 19:50, Robin Roevens <robin.roevens@disroot.org>
> > wrote:
> > 
> > Hi Michael, all
> > 
> > First, this mail does not contain critique, I'm only trying to
> > learn,
> > if I'm wrong. But I think your remark about fixing hardware is not
> > correct?
> > 
> > In my understanding, sync synchronizes the not-yet-written data
> > currently in kernel managed cache memory (hence in volatile system
> > memory) to permanent storage. 
> 
> Yes.
> 
> > Normally storage hardware also has its own cache, which I presume
> > you
> > are referring to, but I don't think sync cares or even knows about
> > that
> > and that is/should be handled within the hardware itself. 
> > As soon as storage hardware returns an ACK on a write operation,
> > sync
> > is happy. Not actually guaranteeing any physical write, only that
> > all
> > to-be-written data is actually delivered to the storage hardware.
> 
> Well, this is all a little bit more complicated, because it massively
> depends on the type of hardware we are talking about…
> 
> A classic example for a piece of hardware with a cache is a RAID
> controller. When we call sync(), the kernel sends all data to that
> RAID controller and as soon as the last operation has been confirmed,
> sync() is considered done. That does not mean that any of that data
> has actually been written to any physical storage device. RAID
> controllers have some temporary caches that are persistent and so
> that data will not be lost even if there was a power outage
> immediately after the sync().
> 
> Then there is plain old hard drives. They traditionally did not use
> to be very smart. So if the kernel wanted to write something, it
> would have been written straight to disk and once that was done,
> sync() would have returned.
> 
> Those two scenarios are not a problem at all.
> 
> Then there is the more problematic type of hardware which is usually
> (cheap?) flash storage. Those devices very often have a faster cache
> and then very slow persistent storage. To pretend that a device is
> much faster than it actually is, blocks are being written to a
> volatile cache and later written to the persistent storage, but there
> is no power source in case of a sudden loss of power.
> 
> We discovered this with a regular reboot where we also call sync to
> make sure that everything is properly written to disk and then we
> make the system reboot. Those devices confirm that everything is
> written when it isn’t but then the system cycles power and the last
> blocks are gone. That will result in a corrupt file system and those
> systems will perform a filesystem check (and usually repair too) at
> the next boot.
> 
> The visual way to see this phenomenon is a USB stick with a USB
> light. Calling sync() will return but the light will continue
> flashing because the device is actually busy, but has told the OS
> that it is already done. And hence we have this problem...
> 
> > I do agree that a (full) sync there is probably overkill. The
> > content
> > of the config will be available to the collectd daemon whether it
> > is
> > actually written on disk or still in cache, so no sync needed. 
> > If the server crashes/powercycles/... on that point, I think a
> > possible
> > commented line that should be uncommented in that config is then
> > probably of the least concern.
> 
> On top of all of this comes that the “flash image” that we ship used
> to come with no journal. In a journaling file system, a loss of power
> would either result in the old version of that file, or the new one -
> depending on how far the writes have come. Neither of that is a
> problem in our scenario. Without the journal, it is quite likely to
> have a broken file.
> 
> > On the other hand, a sync that blocks forever, is probably more of
> > an
> > indicator that the hardware should be fixed, as this would mean
> > sync
> > doesn't get all the ACK's it expects.
> 
> It’s not hardware. The system I was dealing with had an NFS volume
> mounted which was unavailable for a moment. That caused the entire
> dial-in sequence to hang because collectd was waiting for a sync() to
> finish just after reconnecting.
> 
> So, calling sync() is generally a very bad idea. It works around bugs
> that should not be there in the first place, but is just creating
> some fancy race condition where you increase chances that no data
> will be lost, but it will never be anywhere acceptable for me.
> 
> Ergo, don’t buy cheap flash. Use a journaling file system. If your
> flash dies because you are writing to it it was not fit for purpose.
> 
> As you can see I am very annoyed about these things because over all
> those years they have cost us so much time to debug and “fix” and it
> simply is not worth it…
> 
> > Please correct me if I'm wrong.
> 
> I hope this clears it up :)
> 
> -Michael
> 
> > 
> > 
> > Regards
> > Robin
> > 
> > Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]:
> > > Calling a global sync operation manually is generally a bad idea
> > > as
> > > it
> > > can block for forever. If people have storage that does not
> > > retain
> > > anything that is being written to it, they need to fix their
> > > hardware.
> > > 
> > > Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
> > > ---
> > >  src/initscripts/system/collectd | 3 ---
> > >  1 file changed, 3 deletions(-)
> > > 
> > > diff --git a/src/initscripts/system/collectd
> > > b/src/initscripts/system/collectd
> > > index bb8a2f54f..56b799d56 100644
> > > --- a/src/initscripts/system/collectd
> > > +++ b/src/initscripts/system/collectd
> > > @@ -146,9 +146,6 @@ case "$1" in
> > >   sed -i -e "s|^#LoadPlugin swap|LoadPlugin
> > > swap|g" /etc/collectd.conf
> > >   fi
> > >  
> > > - # sync after config update...
> > > - sync
> > > -
> > >   if [ $(date +%Y) -gt 2011 ]; then
> > >   boot_mesg "Starting Collection daemon..."
> > >   /usr/sbin/collectd -C /etc/collectd.conf
> > > -- 
> > > 2.39.2
> > > 
> > > 
> > 
> > -- 
> > Dit bericht is gescanned op virussen en andere gevaarlijke
> > inhoud door MailScanner en lijkt schoon te zijn.
> > 
> 
>
  
Michael Tremer Feb. 1, 2024, 4:32 p.m. UTC | #4
Hello,

> On 1 Feb 2024, at 00:50, Robin Roevens <robin.roevens@disroot.org> wrote:
> 
> Hi Michael
> 
> Thanks for the clear explanation. And indeed I can imagine sync locking
> up on a stale NFS mount when it still needs data written to it..I
> didn't consider that as I don't have NFS mounts on my ipfire instance,
> but other users may indeed have. It does however fits in my "hardware
> problem" scenario, as sync won't get an ACK from the stale NFS mount..
> Only networking errors may be more frequent of a problem than actual
> hardware failure, increasing the chances that sync indeed locks up.

I am actually doing something a little bit more perverted on one system where I am mounting a large file that is on an NFS share with an ext4 file system (because NFS lacks certain features). So hence the locking issues.

> And it is as you say, a journaling fs should not have a problem when no
> manual syncs are performed during 'crucial' operations.

Should, but if the FS cannot rely on the data actually being written when it gets that sort if feedback, then there is only hope left.

> The usb flash drive story is new for me, but now that I read your
> explanation, I now understand why I sometimes have trouble with corrupt
> usb sticks where I just wrote a boot image to and then restart the
> machine to boot from that stick to find out it doesn't want to boot due
> to a corrupt root partition. Now I know exactly why this sometimes
> happens :-)

We have (used to have?) a couple of hacks in there. But really bad SSDs have kind of gone now. Even the not so bad ones don’t cost much these days… So hopefully we won’t have to think about these things for that much longer.

-Michael

> Always happy to learn! Thanks
> 
> Regards
> Robin
> 
> Michael Tremer schreef op wo 31-01-2024 om 10:54 [+0000]:
>> Hello Robin,
>> 
>>> On 30 Jan 2024, at 19:50, Robin Roevens <robin.roevens@disroot.org>
>>> wrote:
>>> 
>>> Hi Michael, all
>>> 
>>> First, this mail does not contain critique, I'm only trying to
>>> learn,
>>> if I'm wrong. But I think your remark about fixing hardware is not
>>> correct?
>>> 
>>> In my understanding, sync synchronizes the not-yet-written data
>>> currently in kernel managed cache memory (hence in volatile system
>>> memory) to permanent storage.
>> 
>> Yes.
>> 
>>> Normally storage hardware also has its own cache, which I presume
>>> you
>>> are referring to, but I don't think sync cares or even knows about
>>> that
>>> and that is/should be handled within the hardware itself. 
>>> As soon as storage hardware returns an ACK on a write operation,
>>> sync
>>> is happy. Not actually guaranteeing any physical write, only that
>>> all
>>> to-be-written data is actually delivered to the storage hardware.
>> 
>> Well, this is all a little bit more complicated, because it massively
>> depends on the type of hardware we are talking about…
>> 
>> A classic example for a piece of hardware with a cache is a RAID
>> controller. When we call sync(), the kernel sends all data to that
>> RAID controller and as soon as the last operation has been confirmed,
>> sync() is considered done. That does not mean that any of that data
>> has actually been written to any physical storage device. RAID
>> controllers have some temporary caches that are persistent and so
>> that data will not be lost even if there was a power outage
>> immediately after the sync().
>> 
>> Then there is plain old hard drives. They traditionally did not use
>> to be very smart. So if the kernel wanted to write something, it
>> would have been written straight to disk and once that was done,
>> sync() would have returned.
>> 
>> Those two scenarios are not a problem at all.
>> 
>> Then there is the more problematic type of hardware which is usually
>> (cheap?) flash storage. Those devices very often have a faster cache
>> and then very slow persistent storage. To pretend that a device is
>> much faster than it actually is, blocks are being written to a
>> volatile cache and later written to the persistent storage, but there
>> is no power source in case of a sudden loss of power.
>> 
>> We discovered this with a regular reboot where we also call sync to
>> make sure that everything is properly written to disk and then we
>> make the system reboot. Those devices confirm that everything is
>> written when it isn’t but then the system cycles power and the last
>> blocks are gone. That will result in a corrupt file system and those
>> systems will perform a filesystem check (and usually repair too) at
>> the next boot.
>> 
>> The visual way to see this phenomenon is a USB stick with a USB
>> light. Calling sync() will return but the light will continue
>> flashing because the device is actually busy, but has told the OS
>> that it is already done. And hence we have this problem...
>> 
>>> I do agree that a (full) sync there is probably overkill. The
>>> content
>>> of the config will be available to the collectd daemon whether it
>>> is
>>> actually written on disk or still in cache, so no sync needed. 
>>> If the server crashes/powercycles/... on that point, I think a
>>> possible
>>> commented line that should be uncommented in that config is then
>>> probably of the least concern.
>> 
>> On top of all of this comes that the “flash image” that we ship used
>> to come with no journal. In a journaling file system, a loss of power
>> would either result in the old version of that file, or the new one -
>> depending on how far the writes have come. Neither of that is a
>> problem in our scenario. Without the journal, it is quite likely to
>> have a broken file.
>> 
>>> On the other hand, a sync that blocks forever, is probably more of
>>> an
>>> indicator that the hardware should be fixed, as this would mean
>>> sync
>>> doesn't get all the ACK's it expects.
>> 
>> It’s not hardware. The system I was dealing with had an NFS volume
>> mounted which was unavailable for a moment. That caused the entire
>> dial-in sequence to hang because collectd was waiting for a sync() to
>> finish just after reconnecting.
>> 
>> So, calling sync() is generally a very bad idea. It works around bugs
>> that should not be there in the first place, but is just creating
>> some fancy race condition where you increase chances that no data
>> will be lost, but it will never be anywhere acceptable for me.
>> 
>> Ergo, don’t buy cheap flash. Use a journaling file system. If your
>> flash dies because you are writing to it it was not fit for purpose.
>> 
>> As you can see I am very annoyed about these things because over all
>> those years they have cost us so much time to debug and “fix” and it
>> simply is not worth it…
>> 
>>> Please correct me if I'm wrong.
>> 
>> I hope this clears it up :)
>> 
>> -Michael
>> 
>>> 
>>> 
>>> Regards
>>> Robin
>>> 
>>> Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]:
>>>> Calling a global sync operation manually is generally a bad idea
>>>> as
>>>> it
>>>> can block for forever. If people have storage that does not
>>>> retain
>>>> anything that is being written to it, they need to fix their
>>>> hardware.
>>>> 
>>>> Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
>>>> ---
>>>>  src/initscripts/system/collectd | 3 ---
>>>>  1 file changed, 3 deletions(-)
>>>> 
>>>> diff --git a/src/initscripts/system/collectd
>>>> b/src/initscripts/system/collectd
>>>> index bb8a2f54f..56b799d56 100644
>>>> --- a/src/initscripts/system/collectd
>>>> +++ b/src/initscripts/system/collectd
>>>> @@ -146,9 +146,6 @@ case "$1" in
>>>>   sed -i -e "s|^#LoadPlugin swap|LoadPlugin
>>>> swap|g" /etc/collectd.conf
>>>>   fi
>>>>  
>>>> - # sync after config update...
>>>> - sync
>>>> -
>>>>   if [ $(date +%Y) -gt 2011 ]; then
>>>>   boot_mesg "Starting Collection daemon..."
>>>>   /usr/sbin/collectd -C /etc/collectd.conf
>>>> -- 
>>>> 2.39.2
>>>> 
>>>> 
>>> 
>>> -- 
>>> Dit bericht is gescanned op virussen en andere gevaarlijke
>>> inhoud door MailScanner en lijkt schoon te zijn.
>>> 
>> 
>> 
> 
> -- 
> Dit bericht is gescanned op virussen en andere gevaarlijke
> inhoud door MailScanner en lijkt schoon te zijn.
>
  

Patch

diff --git a/src/initscripts/system/collectd b/src/initscripts/system/collectd
index bb8a2f54f..56b799d56 100644
--- a/src/initscripts/system/collectd
+++ b/src/initscripts/system/collectd
@@ -146,9 +146,6 @@  case "$1" in
 			sed -i -e "s|^#LoadPlugin swap|LoadPlugin swap|g" /etc/collectd.conf
 		fi
 
-		# sync after config update...
-		sync
-
 		if [ $(date +%Y) -gt 2011 ]; then
 			boot_mesg "Starting Collection daemon..."
 			/usr/sbin/collectd -C /etc/collectd.conf