[1/2] header.pl: Add utf-8 handling into cleanhtml command

Message ID 20240617111236.2926-1-adolf.belka@ipfire.org
State Staged
Commit eef090c3de39f6a625a07f44d6f133729dd165fd
Headers
Series [1/2] header.pl: Add utf-8 handling into cleanhtml command |

Commit Message

Adolf Belka June 17, 2024, 11:12 a.m. UTC
  - existing cleanhtml command does not handle diacritical charcters such as umlauts, acute,
   grave and circumflex accents.
- In bug 12395 the problem was resolved by adding decode before and encode after the
   cleanhtml command in dns.cgi
- Suggestion from @Michael Tremer was to add the decode and encode sections into the
   actual cleanhtml subroutine in header.pl
- This patch submission is the execution of that suggestion.
- This will ensure that whenever cleanhtml is used for any remark in a WUI page it will
   handle diacritical charcters.
- Tested out on my vm testbed system and confirmed to be working when cleanhtml has the
   encode and decode lines.
- Combined with this patch is another one that changes the dns.cgi to remove the decode
   and encode entries added into the cgi code.

Suggested-by: Michael Tremer <michael.tremer@ipfire.org>
Tested-by: Adolf Belka <adolf.belka@ipfire.org>
Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
---
 config/cfgroot/header.pl | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

Adolf Belka July 17, 2024, 8:58 a.m. UTC | #1
Have tested this patch set out on all the menu items that have remarks and are using cleanhtml in Core Update 187 Testing.

All remarks were able to have full range of diacritical characters.
I tested with

ß Ф Ч < > Ӧ ü £ μ ô ò ó õ å ä ã â á à

as the remark and it worked everywhere tested, which it did not without this patch set.

Regards,
Adolf.

On 17/06/2024 13:12, Adolf Belka wrote:
> - existing cleanhtml command does not handle diacritical charcters such as umlauts, acute,
>     grave and circumflex accents.
> - In bug 12395 the problem was resolved by adding decode before and encode after the
>     cleanhtml command in dns.cgi
> - Suggestion from @Michael Tremer was to add the decode and encode sections into the
>     actual cleanhtml subroutine in header.pl
> - This patch submission is the execution of that suggestion.
> - This will ensure that whenever cleanhtml is used for any remark in a WUI page it will
>     handle diacritical charcters.
> - Tested out on my vm testbed system and confirmed to be working when cleanhtml has the
>     encode and decode lines.
> - Combined with this patch is another one that changes the dns.cgi to remove the decode
>     and encode entries added into the cgi code.
> 
> Suggested-by: Michael Tremer <michael.tremer@ipfire.org>
> Tested-by: Adolf Belka <adolf.belka@ipfire.org>
> Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
> ---
>   config/cfgroot/header.pl | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/config/cfgroot/header.pl b/config/cfgroot/header.pl
> index a67ff92ee..66b49e411 100644
> --- a/config/cfgroot/header.pl
> +++ b/config/cfgroot/header.pl
> @@ -16,6 +16,7 @@ use File::Basename;
>   use HTML::Entities();
>   use Socket;
>   use Time::Local;
> +use Encode;
>   
>   our %color = ();
>   &General::readhash("/srv/web/ipfire/html/themes/ipfire/include/colors.txt", \%color);
> @@ -365,8 +366,13 @@ sub escape($) {
>   sub cleanhtml {
>   	my $outstring =$_[0];
>   	$outstring =~ tr/,/ / if not defined $_[1] or $_[1] ne 'y';
> -
> -	return escape($outstring);
> +	# decode the UTF-8 text so that characters with diacritical marks such as
> +	# umlauts are treated correctly by the escape command
> +	$outstring = &Encode::decode("UTF-8",$outstring);
> +	escape($outstring);
> +	# encode the text back to UTF-8 after running the escape command
> +	$outstring = &Encode::encode("UTF-8",$outstring);
> +	return $outstring;
>   }
>   
>   sub connectionstatus
  

Patch

diff --git a/config/cfgroot/header.pl b/config/cfgroot/header.pl
index a67ff92ee..66b49e411 100644
--- a/config/cfgroot/header.pl
+++ b/config/cfgroot/header.pl
@@ -16,6 +16,7 @@  use File::Basename;
 use HTML::Entities();
 use Socket;
 use Time::Local;
+use Encode;
 
 our %color = ();
 &General::readhash("/srv/web/ipfire/html/themes/ipfire/include/colors.txt", \%color);
@@ -365,8 +366,13 @@  sub escape($) {
 sub cleanhtml {
 	my $outstring =$_[0];
 	$outstring =~ tr/,/ / if not defined $_[1] or $_[1] ne 'y';
-
-	return escape($outstring);
+	# decode the UTF-8 text so that characters with diacritical marks such as
+	# umlauts are treated correctly by the escape command
+	$outstring = &Encode::decode("UTF-8",$outstring);
+	escape($outstring);
+	# encode the text back to UTF-8 after running the escape command
+	$outstring = &Encode::encode("UTF-8",$outstring);
+	return $outstring;
 }
 
 sub connectionstatus