location-importer.in: process unaligned IP ranges in RIR data files correctly

Message ID 8d42901f-e1be-285f-2060-639218e2b694@ipfire.org
State Accepted
Commit 1814283b82e479f60c0871f0518593f1c6fd4b87
Headers
Series location-importer.in: process unaligned IP ranges in RIR data files correctly |

Commit Message

Peter Müller March 29, 2021, 8:24 p.m. UTC
  The IP range given in an inetnum object apparently not necessarily
matches distinct subnet boundaries. As a result, the current attempt to
calculate its CIDR mask resulted in faulty subnets not covering the
entire IP range.

This patch leaves the task of enumerating subnets to the ipaddress
module itself, which handles things much more robust. Since the output
may contain of several subnets, a list for the inetnum key is necessary
as well as a loop over them when conducting the SQL statements.

Fixes: #12595

Cc: Michael Tremer <michael.tremer@ipfire.org>
Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/python/location-importer.in | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)
  

Comments

Michael Tremer March 29, 2021, 8:27 p.m. UTC | #1
Thank you for this.

Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?

-Michael

> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> The IP range given in an inetnum object apparently not necessarily
> matches distinct subnet boundaries. As a result, the current attempt to
> calculate its CIDR mask resulted in faulty subnets not covering the
> entire IP range.
> 
> This patch leaves the task of enumerating subnets to the ipaddress
> module itself, which handles things much more robust. Since the output
> may contain of several subnets, a list for the inetnum key is necessary
> as well as a loop over them when conducting the SQL statements.
> 
> Fixes: #12595
> 
> Cc: Michael Tremer <michael.tremer@ipfire.org>
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/python/location-importer.in | 31 +++++++++++--------------------
> 1 file changed, 11 insertions(+), 20 deletions(-)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index 2506925..e2f201b 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -3,7 +3,7 @@
> #                                                                             #
> # libloc - A library to determine the location of someone on the Internet     #
> #                                                                             #
> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
> #                                                                             #
> # This library is free software; you can redistribute it and/or               #
> # modify it under the terms of the GNU Lesser General Public                  #
> @@ -604,18 +604,10 @@ class CLI(object):
> 					log.warning("Could not parse line: %s" % line)
> 					return
> 
> -				# Set prefix to default
> -				prefix = 32
> -
> -				# Count number of addresses in this subnet
> -				num_addresses = int(end_address) - int(start_address)
> -				if num_addresses:
> -					prefix -= math.log(num_addresses, 2)
> -
> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
> 
> 			elif key == "inet6num":
> -				inetnum[key] = val
> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
> 
> 			elif key == "country":
> 				inetnum[key] = val.upper()
> @@ -630,15 +622,14 @@ class CLI(object):
> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
> 			return
> 
> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
> -
> -		if not self._check_parsed_network(network):
> -			return
> -
> -		self.db.execute("INSERT INTO _rirdata(network, country) \
> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
> -			"%s" % network, inetnum.get("country"),
> -		)
> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
> +		# them into the database, if _check_parsed_network() succeeded
> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
> +			if self._check_parsed_network(single_network):
> +				self.db.execute("INSERT INTO _rirdata(network, country) \
> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
> +					"%s" % single_network, inetnum.get("country"),
> +				)
> 
> 	def _parse_org_block(self, block):
> 		org = {}
> -- 
> 2.26.2
  
Peter Müller March 29, 2021, 8:32 p.m. UTC | #2
Hello Michael,

you're welcome.

Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.

Thanks, and best regards,
Peter Müller


> Thank you for this.
> 
> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
> 
> -Michael
> 
>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>
>> The IP range given in an inetnum object apparently not necessarily
>> matches distinct subnet boundaries. As a result, the current attempt to
>> calculate its CIDR mask resulted in faulty subnets not covering the
>> entire IP range.
>>
>> This patch leaves the task of enumerating subnets to the ipaddress
>> module itself, which handles things much more robust. Since the output
>> may contain of several subnets, a list for the inetnum key is necessary
>> as well as a loop over them when conducting the SQL statements.
>>
>> Fixes: #12595
>>
>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>> ---
>> src/python/location-importer.in | 31 +++++++++++--------------------
>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>
>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>> index 2506925..e2f201b 100644
>> --- a/src/python/location-importer.in
>> +++ b/src/python/location-importer.in
>> @@ -3,7 +3,7 @@
>> #                                                                             #
>> # libloc - A library to determine the location of someone on the Internet     #
>> #                                                                             #
>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>> #                                                                             #
>> # This library is free software; you can redistribute it and/or               #
>> # modify it under the terms of the GNU Lesser General Public                  #
>> @@ -604,18 +604,10 @@ class CLI(object):
>> 					log.warning("Could not parse line: %s" % line)
>> 					return
>>
>> -				# Set prefix to default
>> -				prefix = 32
>> -
>> -				# Count number of addresses in this subnet
>> -				num_addresses = int(end_address) - int(start_address)
>> -				if num_addresses:
>> -					prefix -= math.log(num_addresses, 2)
>> -
>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>
>> 			elif key == "inet6num":
>> -				inetnum[key] = val
>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>
>> 			elif key == "country":
>> 				inetnum[key] = val.upper()
>> @@ -630,15 +622,14 @@ class CLI(object):
>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>> 			return
>>
>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>> -
>> -		if not self._check_parsed_network(network):
>> -			return
>> -
>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>> -			"%s" % network, inetnum.get("country"),
>> -		)
>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>> +		# them into the database, if _check_parsed_network() succeeded
>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>> +			if self._check_parsed_network(single_network):
>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>> +					"%s" % single_network, inetnum.get("country"),
>> +				)
>>
>> 	def _parse_org_block(self, block):
>> 		org = {}
>> -- 
>> 2.26.2
>
  
Peter Müller March 29, 2021, 8:34 p.m. UTC | #3
By the way: https://patchwork.ipfire.org/patch/3620/ is still waiting for a decision of yours. :-)

> Hello Michael,
> 
> you're welcome.
> 
> Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.
> 
> Thanks, and best regards,
> Peter Müller
> 
> 
>> Thank you for this.
>>
>> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
>>
>> -Michael
>>
>>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>
>>> The IP range given in an inetnum object apparently not necessarily
>>> matches distinct subnet boundaries. As a result, the current attempt to
>>> calculate its CIDR mask resulted in faulty subnets not covering the
>>> entire IP range.
>>>
>>> This patch leaves the task of enumerating subnets to the ipaddress
>>> module itself, which handles things much more robust. Since the output
>>> may contain of several subnets, a list for the inetnum key is necessary
>>> as well as a loop over them when conducting the SQL statements.
>>>
>>> Fixes: #12595
>>>
>>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>> ---
>>> src/python/location-importer.in | 31 +++++++++++--------------------
>>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>> index 2506925..e2f201b 100644
>>> --- a/src/python/location-importer.in
>>> +++ b/src/python/location-importer.in
>>> @@ -3,7 +3,7 @@
>>> #                                                                             #
>>> # libloc - A library to determine the location of someone on the Internet     #
>>> #                                                                             #
>>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>>> #                                                                             #
>>> # This library is free software; you can redistribute it and/or               #
>>> # modify it under the terms of the GNU Lesser General Public                  #
>>> @@ -604,18 +604,10 @@ class CLI(object):
>>> 					log.warning("Could not parse line: %s" % line)
>>> 					return
>>>
>>> -				# Set prefix to default
>>> -				prefix = 32
>>> -
>>> -				# Count number of addresses in this subnet
>>> -				num_addresses = int(end_address) - int(start_address)
>>> -				if num_addresses:
>>> -					prefix -= math.log(num_addresses, 2)
>>> -
>>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>>
>>> 			elif key == "inet6num":
>>> -				inetnum[key] = val
>>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>>
>>> 			elif key == "country":
>>> 				inetnum[key] = val.upper()
>>> @@ -630,15 +622,14 @@ class CLI(object):
>>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>> 			return
>>>
>>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>>> -
>>> -		if not self._check_parsed_network(network):
>>> -			return
>>> -
>>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>> -			"%s" % network, inetnum.get("country"),
>>> -		)
>>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>> +		# them into the database, if _check_parsed_network() succeeded
>>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>>> +			if self._check_parsed_network(single_network):
>>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>> +					"%s" % single_network, inetnum.get("country"),
>>> +				)
>>>
>>> 	def _parse_org_block(self, block):
>>> 		org = {}
>>> -- 
>>> 2.26.2
>>
  
Michael Tremer March 29, 2021, 8:40 p.m. UTC | #4
Hello,

I was looking for this one, but could not find it.

It doesn’t apply. Would you like to rebase this to master and submit it again?

-Michael

P.S. Still unsure whether I should wait or not :)

> On 29 Mar 2021, at 21:34, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> By the way: https://patchwork.ipfire.org/patch/3620/ is still waiting for a decision of yours. :-)
> 
>> Hello Michael,
>> 
>> you're welcome.
>> 
>> Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.
>> 
>> Thanks, and best regards,
>> Peter Müller
>> 
>> 
>>> Thank you for this.
>>> 
>>> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
>>> 
>>> -Michael
>>> 
>>>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>> 
>>>> The IP range given in an inetnum object apparently not necessarily
>>>> matches distinct subnet boundaries. As a result, the current attempt to
>>>> calculate its CIDR mask resulted in faulty subnets not covering the
>>>> entire IP range.
>>>> 
>>>> This patch leaves the task of enumerating subnets to the ipaddress
>>>> module itself, which handles things much more robust. Since the output
>>>> may contain of several subnets, a list for the inetnum key is necessary
>>>> as well as a loop over them when conducting the SQL statements.
>>>> 
>>>> Fixes: #12595
>>>> 
>>>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>>> ---
>>>> src/python/location-importer.in | 31 +++++++++++--------------------
>>>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>>> 
>>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>>> index 2506925..e2f201b 100644
>>>> --- a/src/python/location-importer.in
>>>> +++ b/src/python/location-importer.in
>>>> @@ -3,7 +3,7 @@
>>>> #                                                                             #
>>>> # libloc - A library to determine the location of someone on the Internet     #
>>>> #                                                                             #
>>>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>>>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>>>> #                                                                             #
>>>> # This library is free software; you can redistribute it and/or               #
>>>> # modify it under the terms of the GNU Lesser General Public                  #
>>>> @@ -604,18 +604,10 @@ class CLI(object):
>>>> 					log.warning("Could not parse line: %s" % line)
>>>> 					return
>>>> 
>>>> -				# Set prefix to default
>>>> -				prefix = 32
>>>> -
>>>> -				# Count number of addresses in this subnet
>>>> -				num_addresses = int(end_address) - int(start_address)
>>>> -				if num_addresses:
>>>> -					prefix -= math.log(num_addresses, 2)
>>>> -
>>>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>>>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>>> 
>>>> 			elif key == "inet6num":
>>>> -				inetnum[key] = val
>>>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>>> 
>>>> 			elif key == "country":
>>>> 				inetnum[key] = val.upper()
>>>> @@ -630,15 +622,14 @@ class CLI(object):
>>>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>>> 			return
>>>> 
>>>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>>>> -
>>>> -		if not self._check_parsed_network(network):
>>>> -			return
>>>> -
>>>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>>>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>> -			"%s" % network, inetnum.get("country"),
>>>> -		)
>>>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>>> +		# them into the database, if _check_parsed_network() succeeded
>>>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>>>> +			if self._check_parsed_network(single_network):
>>>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>>>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>> +					"%s" % single_network, inetnum.get("country"),
>>>> +				)
>>>> 
>>>> 	def _parse_org_block(self, block):
>>>> 		org = {}
>>>> -- 
>>>> 2.26.2
>>>
  
Peter Müller March 30, 2021, 3:49 p.m. UTC | #5
Hello Michael,

thank you for your reply.

Here you are: https://patchwork.ipfire.org/patch/4005/

Aside from that, there are still 8 patches left on https://patchwork.ipfire.org/project/location/list/.
Perhaps you might want to check these as well before tagging a new release.

#11754 and #12594 won't be ready that soon, so I am fine with a new libloc version after the patches
mentioned above have been checked on whether they are ready for merging them.

Thanks, and best regards,
Peter Müller


> Hello,
> 
> I was looking for this one, but could not find it.
> 
> It doesn’t apply. Would you like to rebase this to master and submit it again?
> 
> -Michael
> 
> P.S. Still unsure whether I should wait or not :)
> 
>> On 29 Mar 2021, at 21:34, Peter Müller <peter.mueller@ipfire.org> wrote:
>>
>> By the way: https://patchwork.ipfire.org/patch/3620/ is still waiting for a decision of yours. :-)
>>
>>> Hello Michael,
>>>
>>> you're welcome.
>>>
>>> Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>>>
>>>
>>>> Thank you for this.
>>>>
>>>> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
>>>>
>>>> -Michael
>>>>
>>>>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>>>
>>>>> The IP range given in an inetnum object apparently not necessarily
>>>>> matches distinct subnet boundaries. As a result, the current attempt to
>>>>> calculate its CIDR mask resulted in faulty subnets not covering the
>>>>> entire IP range.
>>>>>
>>>>> This patch leaves the task of enumerating subnets to the ipaddress
>>>>> module itself, which handles things much more robust. Since the output
>>>>> may contain of several subnets, a list for the inetnum key is necessary
>>>>> as well as a loop over them when conducting the SQL statements.
>>>>>
>>>>> Fixes: #12595
>>>>>
>>>>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>>>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>>>> ---
>>>>> src/python/location-importer.in | 31 +++++++++++--------------------
>>>>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>>>> index 2506925..e2f201b 100644
>>>>> --- a/src/python/location-importer.in
>>>>> +++ b/src/python/location-importer.in
>>>>> @@ -3,7 +3,7 @@
>>>>> #                                                                             #
>>>>> # libloc - A library to determine the location of someone on the Internet     #
>>>>> #                                                                             #
>>>>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>>>>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>>>>> #                                                                             #
>>>>> # This library is free software; you can redistribute it and/or               #
>>>>> # modify it under the terms of the GNU Lesser General Public                  #
>>>>> @@ -604,18 +604,10 @@ class CLI(object):
>>>>> 					log.warning("Could not parse line: %s" % line)
>>>>> 					return
>>>>>
>>>>> -				# Set prefix to default
>>>>> -				prefix = 32
>>>>> -
>>>>> -				# Count number of addresses in this subnet
>>>>> -				num_addresses = int(end_address) - int(start_address)
>>>>> -				if num_addresses:
>>>>> -					prefix -= math.log(num_addresses, 2)
>>>>> -
>>>>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>>>>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>>>>
>>>>> 			elif key == "inet6num":
>>>>> -				inetnum[key] = val
>>>>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>>>>
>>>>> 			elif key == "country":
>>>>> 				inetnum[key] = val.upper()
>>>>> @@ -630,15 +622,14 @@ class CLI(object):
>>>>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>>>> 			return
>>>>>
>>>>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>>>>> -
>>>>> -		if not self._check_parsed_network(network):
>>>>> -			return
>>>>> -
>>>>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>> -			"%s" % network, inetnum.get("country"),
>>>>> -		)
>>>>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>>>> +		# them into the database, if _check_parsed_network() succeeded
>>>>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>>>>> +			if self._check_parsed_network(single_network):
>>>>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>> +					"%s" % single_network, inetnum.get("country"),
>>>>> +				)
>>>>>
>>>>> 	def _parse_org_block(self, block):
>>>>> 		org = {}
>>>>> -- 
>>>>> 2.26.2
>>>>
>
  
Michael Tremer April 1, 2021, 9:38 a.m. UTC | #6
Hello,

This patch has been merged and pushed into production and it looks like we now have some networks split into many smaller ones.

The file size of the database afterhasn’t changed though.

-Michael

> On 30 Mar 2021, at 16:49, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> Hello Michael,
> 
> thank you for your reply.
> 
> Here you are: https://patchwork.ipfire.org/patch/4005/
> 
> Aside from that, there are still 8 patches left on https://patchwork.ipfire.org/project/location/list/.
> Perhaps you might want to check these as well before tagging a new release.
> 
> #11754 and #12594 won't be ready that soon, so I am fine with a new libloc version after the patches
> mentioned above have been checked on whether they are ready for merging them.
> 
> Thanks, and best regards,
> Peter Müller
> 
> 
>> Hello,
>> 
>> I was looking for this one, but could not find it.
>> 
>> It doesn’t apply. Would you like to rebase this to master and submit it again?
>> 
>> -Michael
>> 
>> P.S. Still unsure whether I should wait or not :)
>> 
>>> On 29 Mar 2021, at 21:34, Peter Müller <peter.mueller@ipfire.org> wrote:
>>> 
>>> By the way: https://patchwork.ipfire.org/patch/3620/ is still waiting for a decision of yours. :-)
>>> 
>>>> Hello Michael,
>>>> 
>>>> you're welcome.
>>>> 
>>>> Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.
>>>> 
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>> 
>>>> 
>>>>> Thank you for this.
>>>>> 
>>>>> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
>>>>> 
>>>>> -Michael
>>>>> 
>>>>>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>>>> 
>>>>>> The IP range given in an inetnum object apparently not necessarily
>>>>>> matches distinct subnet boundaries. As a result, the current attempt to
>>>>>> calculate its CIDR mask resulted in faulty subnets not covering the
>>>>>> entire IP range.
>>>>>> 
>>>>>> This patch leaves the task of enumerating subnets to the ipaddress
>>>>>> module itself, which handles things much more robust. Since the output
>>>>>> may contain of several subnets, a list for the inetnum key is necessary
>>>>>> as well as a loop over them when conducting the SQL statements.
>>>>>> 
>>>>>> Fixes: #12595
>>>>>> 
>>>>>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>>>>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>>>>> ---
>>>>>> src/python/location-importer.in | 31 +++++++++++--------------------
>>>>>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>>>>> 
>>>>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>>>>> index 2506925..e2f201b 100644
>>>>>> --- a/src/python/location-importer.in
>>>>>> +++ b/src/python/location-importer.in
>>>>>> @@ -3,7 +3,7 @@
>>>>>> #                                                                             #
>>>>>> # libloc - A library to determine the location of someone on the Internet     #
>>>>>> #                                                                             #
>>>>>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>>>>>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>>>>>> #                                                                             #
>>>>>> # This library is free software; you can redistribute it and/or               #
>>>>>> # modify it under the terms of the GNU Lesser General Public                  #
>>>>>> @@ -604,18 +604,10 @@ class CLI(object):
>>>>>> 					log.warning("Could not parse line: %s" % line)
>>>>>> 					return
>>>>>> 
>>>>>> -				# Set prefix to default
>>>>>> -				prefix = 32
>>>>>> -
>>>>>> -				# Count number of addresses in this subnet
>>>>>> -				num_addresses = int(end_address) - int(start_address)
>>>>>> -				if num_addresses:
>>>>>> -					prefix -= math.log(num_addresses, 2)
>>>>>> -
>>>>>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>>>>>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>>>>> 
>>>>>> 			elif key == "inet6num":
>>>>>> -				inetnum[key] = val
>>>>>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>>>>> 
>>>>>> 			elif key == "country":
>>>>>> 				inetnum[key] = val.upper()
>>>>>> @@ -630,15 +622,14 @@ class CLI(object):
>>>>>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>>>>> 			return
>>>>>> 
>>>>>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>>>>>> -
>>>>>> -		if not self._check_parsed_network(network):
>>>>>> -			return
>>>>>> -
>>>>>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>>> -			"%s" % network, inetnum.get("country"),
>>>>>> -		)
>>>>>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>>>>> +		# them into the database, if _check_parsed_network() succeeded
>>>>>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>>>>>> +			if self._check_parsed_network(single_network):
>>>>>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>>> +					"%s" % single_network, inetnum.get("country"),
>>>>>> +				)
>>>>>> 
>>>>>> 	def _parse_org_block(self, block):
>>>>>> 		org = {}
>>>>>> -- 
>>>>>> 2.26.2
>>>>> 
>>
  
Peter Müller April 1, 2021, 4:35 p.m. UTC | #7
Hello Michael,

seems to work as designed then. :-)

I will let the Tor folks know about this so they can distribute new location information with their next release.

Thanks, and best regards,
Peter Müller


> Hello,
> 
> This patch has been merged and pushed into production and it looks like we now have some networks split into many smaller ones.
> 
> The file size of the database afterhasn’t changed though.
> 
> -Michael
> 
>> On 30 Mar 2021, at 16:49, Peter Müller <peter.mueller@ipfire.org> wrote:
>>
>> Hello Michael,
>>
>> thank you for your reply.
>>
>> Here you are: https://patchwork.ipfire.org/patch/4005/
>>
>> Aside from that, there are still 8 patches left on https://patchwork.ipfire.org/project/location/list/.
>> Perhaps you might want to check these as well before tagging a new release.
>>
>> #11754 and #12594 won't be ready that soon, so I am fine with a new libloc version after the patches
>> mentioned above have been checked on whether they are ready for merging them.
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>
>>> Hello,
>>>
>>> I was looking for this one, but could not find it.
>>>
>>> It doesn’t apply. Would you like to rebase this to master and submit it again?
>>>
>>> -Michael
>>>
>>> P.S. Still unsure whether I should wait or not :)
>>>
>>>> On 29 Mar 2021, at 21:34, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>>
>>>> By the way: https://patchwork.ipfire.org/patch/3620/ is still waiting for a decision of yours. :-)
>>>>
>>>>> Hello Michael,
>>>>>
>>>>> you're welcome.
>>>>>
>>>>> Well, #11754 and #12594 would be the next issues on my list, but I have no working code for them, yet.
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>>>
>>>>>
>>>>>> Thank you for this.
>>>>>>
>>>>>> Are there any other things coming or can I go ahead and tag another version to roll these changes out into production?
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>>> On 29 Mar 2021, at 21:24, Peter Müller <peter.mueller@ipfire.org> wrote:
>>>>>>>
>>>>>>> The IP range given in an inetnum object apparently not necessarily
>>>>>>> matches distinct subnet boundaries. As a result, the current attempt to
>>>>>>> calculate its CIDR mask resulted in faulty subnets not covering the
>>>>>>> entire IP range.
>>>>>>>
>>>>>>> This patch leaves the task of enumerating subnets to the ipaddress
>>>>>>> module itself, which handles things much more robust. Since the output
>>>>>>> may contain of several subnets, a list for the inetnum key is necessary
>>>>>>> as well as a loop over them when conducting the SQL statements.
>>>>>>>
>>>>>>> Fixes: #12595
>>>>>>>
>>>>>>> Cc: Michael Tremer <michael.tremer@ipfire.org>
>>>>>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>>>>>> ---
>>>>>>> src/python/location-importer.in | 31 +++++++++++--------------------
>>>>>>> 1 file changed, 11 insertions(+), 20 deletions(-)
>>>>>>>
>>>>>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>>>>>> index 2506925..e2f201b 100644
>>>>>>> --- a/src/python/location-importer.in
>>>>>>> +++ b/src/python/location-importer.in
>>>>>>> @@ -3,7 +3,7 @@
>>>>>>> #                                                                             #
>>>>>>> # libloc - A library to determine the location of someone on the Internet     #
>>>>>>> #                                                                             #
>>>>>>> -# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
>>>>>>> +# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
>>>>>>> #                                                                             #
>>>>>>> # This library is free software; you can redistribute it and/or               #
>>>>>>> # modify it under the terms of the GNU Lesser General Public                  #
>>>>>>> @@ -604,18 +604,10 @@ class CLI(object):
>>>>>>> 					log.warning("Could not parse line: %s" % line)
>>>>>>> 					return
>>>>>>>
>>>>>>> -				# Set prefix to default
>>>>>>> -				prefix = 32
>>>>>>> -
>>>>>>> -				# Count number of addresses in this subnet
>>>>>>> -				num_addresses = int(end_address) - int(start_address)
>>>>>>> -				if num_addresses:
>>>>>>> -					prefix -= math.log(num_addresses, 2)
>>>>>>> -
>>>>>>> -				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
>>>>>>> +				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>>>>>>>
>>>>>>> 			elif key == "inet6num":
>>>>>>> -				inetnum[key] = val
>>>>>>> +				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
>>>>>>>
>>>>>>> 			elif key == "country":
>>>>>>> 				inetnum[key] = val.upper()
>>>>>>> @@ -630,15 +622,14 @@ class CLI(object):
>>>>>>> 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>>>>>> 			return
>>>>>>>
>>>>>>> -		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
>>>>>>> -
>>>>>>> -		if not self._check_parsed_network(network):
>>>>>>> -			return
>>>>>>> -
>>>>>>> -		self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>>>> -			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>>>> -			"%s" % network, inetnum.get("country"),
>>>>>>> -		)
>>>>>>> +		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>>>>>> +		# them into the database, if _check_parsed_network() succeeded
>>>>>>> +		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
>>>>>>> +			if self._check_parsed_network(single_network):
>>>>>>> +				self.db.execute("INSERT INTO _rirdata(network, country) \
>>>>>>> +					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
>>>>>>> +					"%s" % single_network, inetnum.get("country"),
>>>>>>> +				)
>>>>>>>
>>>>>>> 	def _parse_org_block(self, block):
>>>>>>> 		org = {}
>>>>>>> -- 
>>>>>>> 2.26.2
>>>>>>
>>>
>
  

Patch

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index 2506925..e2f201b 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -3,7 +3,7 @@ 
 #                                                                             #
 # libloc - A library to determine the location of someone on the Internet     #
 #                                                                             #
-# Copyright (C) 2020 IPFire Development Team <info@ipfire.org>                #
+# Copyright (C) 2020-2021 IPFire Development Team <info@ipfire.org>           #
 #                                                                             #
 # This library is free software; you can redistribute it and/or               #
 # modify it under the terms of the GNU Lesser General Public                  #
@@ -604,18 +604,10 @@  class CLI(object):
 					log.warning("Could not parse line: %s" % line)
 					return
 
-				# Set prefix to default
-				prefix = 32
-
-				# Count number of addresses in this subnet
-				num_addresses = int(end_address) - int(start_address)
-				if num_addresses:
-					prefix -= math.log(num_addresses, 2)
-
-				inetnum["inetnum"] = "%s/%.0f" % (start_address, prefix)
+				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
 
 			elif key == "inet6num":
-				inetnum[key] = val
+				inetnum[key] = [ipaddress.ip_network(val, strict=False)]
 
 			elif key == "country":
 				inetnum[key] = val.upper()
@@ -630,15 +622,14 @@  class CLI(object):
 				(inetnum.get("inet6num") or inetnum.get("inetnum")))
 			return
 
-		network = ipaddress.ip_network(inetnum.get("inet6num") or inetnum.get("inetnum"), strict=False)
-
-		if not self._check_parsed_network(network):
-			return
-
-		self.db.execute("INSERT INTO _rirdata(network, country) \
-			VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
-			"%s" % network, inetnum.get("country"),
-		)
+		# Iterate through all networks enumerated from above, check them for plausibility and insert
+		# them into the database, if _check_parsed_network() succeeded
+		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
+			if self._check_parsed_network(single_network):
+				self.db.execute("INSERT INTO _rirdata(network, country) \
+					VALUES(%s, %s) ON CONFLICT (network) DO UPDATE SET country = excluded.country",
+					"%s" % single_network, inetnum.get("country"),
+				)
 
 	def _parse_org_block(self, block):
 		org = {}