[v2] location-importer.in: skip networks with unknown country codes

Message ID 5d6a3267-6b61-fa2e-b35c-0fc42c713e33@ipfire.org
State Accepted
Commit 84b175e2fffbe28be8343e99705d8438c0daa3a0
Headers
Series [v2] location-importer.in: skip networks with unknown country codes |

Commit Message

Peter Müller March 30, 2021, 3:47 p.m. UTC
  There is no sense in parsing and storting networks whose country codes
cannot be found in the ISO-3166-x country code table. This avoids side
effects in applications using the location database, and introduces
another sanity check to compensate bogus RIR data.

On location02, this affects some networks from APNIC (country code: ZZ)
as well as a bunch of smaller allocations within the RIPE region still
tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU
(Soviet Union) was found - while the NIC for .su TLD is still
operational. :-)

Applying this patch causes the countries to be processed before
update_whois() is called. In case no countries are present in the SQL
table, this check is silently omitted.

Fixes: #12510

Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
---
 src/python/location-importer.in | 38 ++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 12 deletions(-)
  

Comments

Michael Tremer April 1, 2021, 9:51 a.m. UTC | #1
Hello,

I merged this patch, but it has some unwanted side-effects:

Technically it works as designed as we are successfully dropping any countries that are not part of the imported list. I changed our scripts that these will always be imported first now.

I ran a manual import which dropped CS which is Serbia and Montenegro. This used to be a valid country code, but Serbia and Montenegro is not a single country any more. I decided to add it because we would have dropped too many networks without it. Now we are dropping a few networks with country code YU - Yugoslavia.

Montenegro became independent from Serbia in 2006, Yugoslavia became the State Union of Serbia and Montenegro in 2003. For some reasons (probably because I didn’t do research) I thought these events were closer together and therefore thought that all networks with country code CS simply “forgot” to update this, but there never were any that actually existing during the time of Yugoslavia.

Long story short: Would anybody object to add YU to the database although it doesn’t exist as a country any more? I guess we cannot just “rewrite” it because the situation is way too complicated. However, we wanted to give people an idea where some IP address is located and that is kind of does not work if the country does not exist any more. Returning nothing instead is not a great solution either because we are then simply hiding networks that exist.

Or did I overlook an ever better option?

-Michael

> On 30 Mar 2021, at 16:47, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> There is no sense in parsing and storting networks whose country codes
> cannot be found in the ISO-3166-x country code table. This avoids side
> effects in applications using the location database, and introduces
> another sanity check to compensate bogus RIR data.
> 
> On location02, this affects some networks from APNIC (country code: ZZ)
> as well as a bunch of smaller allocations within the RIPE region still
> tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU
> (Soviet Union) was found - while the NIC for .su TLD is still
> operational. :-)
> 
> Applying this patch causes the countries to be processed before
> update_whois() is called. In case no countries are present in the SQL
> table, this check is silently omitted.
> 
> Fixes: #12510
> 
> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
> ---
> src/python/location-importer.in | 38 ++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index e2f201b..1e08458 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -388,10 +388,17 @@ class CLI(object):
> 				TRUNCATE TABLE networks;
> 			""")
> 
> +			# Fetch all valid country codes to check parsed networks aganist...
> +			rows = self.db.query("SELECT * FROM countries ORDER BY country_code")
> +			validcountries = []
> +
> +			for row in rows:
> +				validcountries.append(row.country_code)
> +
> 			for source in location.importer.WHOIS_SOURCES:
> 				with downloader.request(source, return_blocks=True) as f:
> 					for block in f:
> -						self._parse_block(block)
> +						self._parse_block(block, validcountries)
> 
> 			# Process all parsed networks from every RIR we happen to have access to,
> 			# insert the largest network chunks into the networks table immediately...
> @@ -467,7 +474,7 @@ class CLI(object):
> 				# Download data
> 				with downloader.request(source) as f:
> 					for line in f:
> -						self._parse_line(line)
> +						self._parse_line(line, validcountries)
> 
> 	def _check_parsed_network(self, network):
> 		"""
> @@ -532,7 +539,7 @@ class CLI(object):
> 		# be suitable for libloc consumption...
> 		return True
> 
> -	def _parse_block(self, block):
> +	def _parse_block(self, block, validcountries = None):
> 		# Get first line to find out what type of block this is
> 		line = block[0]
> 
> @@ -542,7 +549,7 @@ class CLI(object):
> 
> 		# inetnum
> 		if line.startswith("inet6num:") or line.startswith("inetnum:"):
> -			return self._parse_inetnum_block(block)
> +			return self._parse_inetnum_block(block, validcountries)
> 
> 		# organisation
> 		elif line.startswith("organisation:"):
> @@ -573,7 +580,7 @@ class CLI(object):
> 			autnum.get("asn"), autnum.get("org"),
> 		)
> 
> -	def _parse_inetnum_block(self, block):
> +	def _parse_inetnum_block(self, block, validcountries = None):
> 		log.debug("Parsing inetnum block:")
> 
> 		inetnum = {}
> @@ -616,10 +623,10 @@ class CLI(object):
> 		if not inetnum or not "country" in inetnum:
> 			return
> 
> -		# Skip objects with bogus country code 'ZZ'
> -		if inetnum.get("country") == "ZZ":
> -			log.warning("Skipping network with bogus country 'ZZ': %s" % \
> -				(inetnum.get("inet6num") or inetnum.get("inetnum")))
> +		# Skip objects with unknown country codes
> +		if validcountries and inetnum.get("country") not in validcountries:
> +			log.warning("Skipping network with bogus country '%s': %s" % \
> +				(inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetnum")))
> 			return
> 
> 		# Iterate through all networks enumerated from above, check them for plausibility and insert
> @@ -652,7 +659,7 @@ class CLI(object):
> 			org.get("organisation"), org.get("org-name"),
> 		)
> 
> -	def _parse_line(self, line):
> +	def _parse_line(self, line, validcountries = None):
> 		# Skip version line
> 		if line.startswith("2"):
> 			return
> @@ -667,8 +674,15 @@ class CLI(object):
> 			log.warning("Could not parse line: %s" % line)
> 			return
> 
> -		# Skip any lines that are for stats only
> -		if country_code == "*":
> +		# Skip any lines that are for stats only or do not have a country
> +		# code at all (avoids log spam below)
> +		if not country_code or country_code == '*':
> +			return
> +
> +		# Skip objects with unknown country codes
> +		if validcountries and country_code not in validcountries:
> +			log.warning("Skipping line with bogus country '%s': %s" % \
> +				(country_code, line))
> 			return
> 
> 		if type in ("ipv6", "ipv4"):
> -- 
> 2.26.2
  
Peter Müller April 2, 2021, 7:58 p.m. UTC | #2
Hello Michael,

thank you for your reply and merging this.

On location02, the amount of networks being ignored because of a "YU" country set is
(please excuse the crappy "sort" output - it is really useless when it comes to IP addresses):

> 194.106.185.0/26
> 194.106.185.144/28
> 194.106.185.96/28
> 194.194.158.0/25
> 194.194.158.128/26
> 194.247.200.160/28
> 194.247.207.224/27
> 194.247.223.12/30
> 194.247.223.16/28
> 194.247.223.32/28
> 194.247.223.48/28
> 194.247.223.72/29
> 194.247.223.80/28
> 194.247.223.96/28
> 195.178.61.128/26
> 195.178.62.192/27
> 195.178.62.64/28
> 195.178.63.0/29
> 195.178.63.128/28
> 195.178.63.144/28
> 195.178.63.32/27
> 195.178.63.8/29
> 195.250.104.135/32
> 195.250.104.140/32
> 195.250.104.145/32
> 195.250.104.224/28
> 195.250.104.240/28
> 195.250.113.144/28
> 195.250.113.192/27
> 195.250.114.224/27
> 195.250.116.0/26
> 195.250.116.64/26
> 195.252.107.128/29
> 195.252.110.192/26
> 195.252.111.128/26
> 195.252.111.192/26
> 195.252.115.0/24
> 195.252.118.0/26
> 195.252.118.64/26
> 195.252.120.0/24
> 195.66.165.0/24
> 217.26.77.64/26
> 217.26.79.0/27
> 62.108.115.0/28
> 62.108.117.16/28
> 62.108.117.96/28
> 62.193.141.192/28
> 62.193.141.224/28
> 62.193.141.240/28
> 62.193.141.56/29

Since we currently ignore anything more specific than a /24, only these networks are actually
relevant in this discussion, as we would have discarded the others anyway:

> 195.66.165.0/24
> 195.252.115.0/24
> 195.252.120.0/24

Digging deeper into them, the first one dead-ends somewhere in the vicinity of Croatia, having
a RIPE database entry dated before 2003:

> inetnum:        195.66.165.0 - 195.66.165.255
> netname:        Posta_Crne_Gore
> descr:          Posta Crne Gore
> country:        YU
> admin-c:        MM609-RIPE
> tech-c:         MM609-RIPE
> status:         ASSIGNED PA
> mnt-by:         AS8585-MNT
> created:        2001-10-04T08:34:51Z
> last-modified:  2002-10-30T09:36:47Z
> source:         RIPE
> 
> person:         Martinovic Milan
> address:        Posta Crne Gore
> address:        Slobode 1
> address:        81000 Podgorica
> address:        Montenegro, Yugoslavia
> phone:          +381 81 225 181
> nic-hdl:        MM609-RIPE
> created:        1970-01-01T00:00:00Z
> last-modified:  2020-06-03T10:52:16Z
> source:         RIPE # Filtered
> mnt-by:         AS8585-MNT
> 
> route:          195.66.160.0/19
> descr:          Internet Crna Gora
> origin:         AS8585
> mnt-by:         AS8585-MNT
> created:        1970-01-01T00:00:00Z
> last-modified:  2001-09-22T09:33:48Z
> source:         RIPE # Filtered

The second is routed by AS6700 into a residential dial-up network pool somewhere in Serbia,
while it's RIPE DB entry shows:

> inetnum:        195.252.115.0 - 195.252.115.255
> netname:        DRENIK
> descr:          Drenik ISP
> descr:          Beograd, Deligradska 19
> country:        YU
> admin-c:        DR47-RIPE
> tech-c:         DR47-RIPE
> status:         ASSIGNED PA
> mnt-by:         AS6700-MNT
> created:        2002-04-11T08:21:26Z
> last-modified:  2002-04-11T08:21:26Z
> source:         RIPE
> 
> person:         Nenad Repac
> address:        D.D. TELEFONIJA
> address:        Marsala Tolbuhina 56
> address:        11000 Beograd
> address:        Yugoslavia
> phone:          +381 11 444 11 44 Ext. 381
> fax-no:         +381 11 3248 953
> nic-hdl:        DR47-RIPE
> mnt-by:         AS6700-MNT
> created:        1970-01-01T00:00:00Z
> last-modified:  2001-09-21T23:28:31Z
> source:         RIPE # Filtered
> 
> route:          195.252.96.0/19
> descr:          BeotelNet ISP, Belgrade, RS
> origin:         AS6700
> mnt-by:         AS6700-MNT
> created:        1970-01-01T00:00:00Z
> last-modified:  2019-07-15T09:12:36Z
> source:         RIPE

Same goes for the third network, having a RIPE DB entry maintained by the same organisation:

> inetnum:        195.252.120.0 - 195.252.120.255
> netname:        ABSOFT
> descr:          AB SOFT
> descr:          Kneza Milosa 82, Beograd
> country:        YU
> admin-c:        DR47-RIPE
> tech-c:         DR47-RIPE
> status:         ASSIGNED PA
> mnt-by:         AS6700-MNT
> created:        2002-04-10T16:54:48Z
> last-modified:  2002-04-10T16:54:48Z
> source:         RIPE
> 
> person:         Nenad Repac
> address:        D.D. TELEFONIJA
> address:        Marsala Tolbuhina 56
> address:        11000 Beograd
> address:        Yugoslavia
> phone:          +381 11 444 11 44 Ext. 381
> fax-no:         +381 11 3248 953
> nic-hdl:        DR47-RIPE
> mnt-by:         AS6700-MNT
> created:        1970-01-01T00:00:00Z
> last-modified:  2001-09-21T23:28:31Z
> source:         RIPE # Filtered
> 
> route:          195.252.96.0/19
> descr:          BeotelNet ISP, Belgrade, RS
> origin:         AS6700
> mnt-by:         AS6700-MNT
> created:        1970-01-01T00:00:00Z
> last-modified:  2019-07-15T09:12:36Z
> source:         RIPE

Since we are only dealing with three networks here and their actual location seems to be pretty
clear to me, I suggest _not_ to add YU as a legitimate country. Instead, I would just write overrides
for these networks.

Would you be fine with that?

Thanks, and best regards,
Peter Müller


> Hello,
> 
> I merged this patch, but it has some unwanted side-effects:
> 
> Technically it works as designed as we are successfully dropping any countries that are not part of the imported list. I changed our scripts that these will always be imported first now.
> 
> I ran a manual import which dropped CS which is Serbia and Montenegro. This used to be a valid country code, but Serbia and Montenegro is not a single country any more. I decided to add it because we would have dropped too many networks without it. Now we are dropping a few networks with country code YU - Yugoslavia.
> 
> Montenegro became independent from Serbia in 2006, Yugoslavia became the State Union of Serbia and Montenegro in 2003. For some reasons (probably because I didn’t do research) I thought these events were closer together and therefore thought that all networks with country code CS simply “forgot” to update this, but there never were any that actually existing during the time of Yugoslavia.
> 
> Long story short: Would anybody object to add YU to the database although it doesn’t exist as a country any more? I guess we cannot just “rewrite” it because the situation is way too complicated. However, we wanted to give people an idea where some IP address is located and that is kind of does not work if the country does not exist any more. Returning nothing instead is not a great solution either because we are then simply hiding networks that exist.
> 
> Or did I overlook an ever better option?
> 
> -Michael
> 
>> On 30 Mar 2021, at 16:47, Peter Müller <peter.mueller@ipfire.org> wrote:
>>
>> There is no sense in parsing and storting networks whose country codes
>> cannot be found in the ISO-3166-x country code table. This avoids side
>> effects in applications using the location database, and introduces
>> another sanity check to compensate bogus RIR data.
>>
>> On location02, this affects some networks from APNIC (country code: ZZ)
>> as well as a bunch of smaller allocations within the RIPE region still
>> tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU
>> (Soviet Union) was found - while the NIC for .su TLD is still
>> operational. :-)
>>
>> Applying this patch causes the countries to be processed before
>> update_whois() is called. In case no countries are present in the SQL
>> table, this check is silently omitted.
>>
>> Fixes: #12510
>>
>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>> ---
>> src/python/location-importer.in | 38 ++++++++++++++++++++++-----------
>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>> index e2f201b..1e08458 100644
>> --- a/src/python/location-importer.in
>> +++ b/src/python/location-importer.in
>> @@ -388,10 +388,17 @@ class CLI(object):
>> 				TRUNCATE TABLE networks;
>> 			""")
>>
>> +			# Fetch all valid country codes to check parsed networks aganist...
>> +			rows = self.db.query("SELECT * FROM countries ORDER BY country_code")
>> +			validcountries = []
>> +
>> +			for row in rows:
>> +				validcountries.append(row.country_code)
>> +
>> 			for source in location.importer.WHOIS_SOURCES:
>> 				with downloader.request(source, return_blocks=True) as f:
>> 					for block in f:
>> -						self._parse_block(block)
>> +						self._parse_block(block, validcountries)
>>
>> 			# Process all parsed networks from every RIR we happen to have access to,
>> 			# insert the largest network chunks into the networks table immediately...
>> @@ -467,7 +474,7 @@ class CLI(object):
>> 				# Download data
>> 				with downloader.request(source) as f:
>> 					for line in f:
>> -						self._parse_line(line)
>> +						self._parse_line(line, validcountries)
>>
>> 	def _check_parsed_network(self, network):
>> 		"""
>> @@ -532,7 +539,7 @@ class CLI(object):
>> 		# be suitable for libloc consumption...
>> 		return True
>>
>> -	def _parse_block(self, block):
>> +	def _parse_block(self, block, validcountries = None):
>> 		# Get first line to find out what type of block this is
>> 		line = block[0]
>>
>> @@ -542,7 +549,7 @@ class CLI(object):
>>
>> 		# inetnum
>> 		if line.startswith("inet6num:") or line.startswith("inetnum:"):
>> -			return self._parse_inetnum_block(block)
>> +			return self._parse_inetnum_block(block, validcountries)
>>
>> 		# organisation
>> 		elif line.startswith("organisation:"):
>> @@ -573,7 +580,7 @@ class CLI(object):
>> 			autnum.get("asn"), autnum.get("org"),
>> 		)
>>
>> -	def _parse_inetnum_block(self, block):
>> +	def _parse_inetnum_block(self, block, validcountries = None):
>> 		log.debug("Parsing inetnum block:")
>>
>> 		inetnum = {}
>> @@ -616,10 +623,10 @@ class CLI(object):
>> 		if not inetnum or not "country" in inetnum:
>> 			return
>>
>> -		# Skip objects with bogus country code 'ZZ'
>> -		if inetnum.get("country") == "ZZ":
>> -			log.warning("Skipping network with bogus country 'ZZ': %s" % \
>> -				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>> +		# Skip objects with unknown country codes
>> +		if validcountries and inetnum.get("country") not in validcountries:
>> +			log.warning("Skipping network with bogus country '%s': %s" % \
>> +				(inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetnum")))
>> 			return
>>
>> 		# Iterate through all networks enumerated from above, check them for plausibility and insert
>> @@ -652,7 +659,7 @@ class CLI(object):
>> 			org.get("organisation"), org.get("org-name"),
>> 		)
>>
>> -	def _parse_line(self, line):
>> +	def _parse_line(self, line, validcountries = None):
>> 		# Skip version line
>> 		if line.startswith("2"):
>> 			return
>> @@ -667,8 +674,15 @@ class CLI(object):
>> 			log.warning("Could not parse line: %s" % line)
>> 			return
>>
>> -		# Skip any lines that are for stats only
>> -		if country_code == "*":
>> +		# Skip any lines that are for stats only or do not have a country
>> +		# code at all (avoids log spam below)
>> +		if not country_code or country_code == '*':
>> +			return
>> +
>> +		# Skip objects with unknown country codes
>> +		if validcountries and country_code not in validcountries:
>> +			log.warning("Skipping line with bogus country '%s': %s" % \
>> +				(country_code, line))
>> 			return
>>
>> 		if type in ("ipv6", "ipv4"):
>> -- 
>> 2.26.2
>
  
Michael Tremer April 4, 2021, 12:37 p.m. UTC | #3
Hello,

Very good analysis. Thank you very much for investing your time.

Can you do the same for Serbia and Montenegro, please?

And I would like to silence the warning then (at least for special country codes like ZZ, YU and whatever else we find).

-Michael

> On 2 Apr 2021, at 20:58, Peter Müller <peter.mueller@ipfire.org> wrote:
> 
> Hello Michael,
> 
> thank you for your reply and merging this.
> 
> On location02, the amount of networks being ignored because of a "YU" country set is
> (please excuse the crappy "sort" output - it is really useless when it comes to IP addresses):
> 
>> 194.106.185.0/26
>> 194.106.185.144/28
>> 194.106.185.96/28
>> 194.194.158.0/25
>> 194.194.158.128/26
>> 194.247.200.160/28
>> 194.247.207.224/27
>> 194.247.223.12/30
>> 194.247.223.16/28
>> 194.247.223.32/28
>> 194.247.223.48/28
>> 194.247.223.72/29
>> 194.247.223.80/28
>> 194.247.223.96/28
>> 195.178.61.128/26
>> 195.178.62.192/27
>> 195.178.62.64/28
>> 195.178.63.0/29
>> 195.178.63.128/28
>> 195.178.63.144/28
>> 195.178.63.32/27
>> 195.178.63.8/29
>> 195.250.104.135/32
>> 195.250.104.140/32
>> 195.250.104.145/32
>> 195.250.104.224/28
>> 195.250.104.240/28
>> 195.250.113.144/28
>> 195.250.113.192/27
>> 195.250.114.224/27
>> 195.250.116.0/26
>> 195.250.116.64/26
>> 195.252.107.128/29
>> 195.252.110.192/26
>> 195.252.111.128/26
>> 195.252.111.192/26
>> 195.252.115.0/24
>> 195.252.118.0/26
>> 195.252.118.64/26
>> 195.252.120.0/24
>> 195.66.165.0/24
>> 217.26.77.64/26
>> 217.26.79.0/27
>> 62.108.115.0/28
>> 62.108.117.16/28
>> 62.108.117.96/28
>> 62.193.141.192/28
>> 62.193.141.224/28
>> 62.193.141.240/28
>> 62.193.141.56/29
> 
> Since we currently ignore anything more specific than a /24, only these networks are actually
> relevant in this discussion, as we would have discarded the others anyway:
> 
>> 195.66.165.0/24
>> 195.252.115.0/24
>> 195.252.120.0/24
> 
> Digging deeper into them, the first one dead-ends somewhere in the vicinity of Croatia, having
> a RIPE database entry dated before 2003:
> 
>> inetnum:        195.66.165.0 - 195.66.165.255
>> netname:        Posta_Crne_Gore
>> descr:          Posta Crne Gore
>> country:        YU
>> admin-c:        MM609-RIPE
>> tech-c:         MM609-RIPE
>> status:         ASSIGNED PA
>> mnt-by:         AS8585-MNT
>> created:        2001-10-04T08:34:51Z
>> last-modified:  2002-10-30T09:36:47Z
>> source:         RIPE
>> 
>> person:         Martinovic Milan
>> address:        Posta Crne Gore
>> address:        Slobode 1
>> address:        81000 Podgorica
>> address:        Montenegro, Yugoslavia
>> phone:          +381 81 225 181
>> nic-hdl:        MM609-RIPE
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2020-06-03T10:52:16Z
>> source:         RIPE # Filtered
>> mnt-by:         AS8585-MNT
>> 
>> route:          195.66.160.0/19
>> descr:          Internet Crna Gora
>> origin:         AS8585
>> mnt-by:         AS8585-MNT
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2001-09-22T09:33:48Z
>> source:         RIPE # Filtered
> 
> The second is routed by AS6700 into a residential dial-up network pool somewhere in Serbia,
> while it's RIPE DB entry shows:
> 
>> inetnum:        195.252.115.0 - 195.252.115.255
>> netname:        DRENIK
>> descr:          Drenik ISP
>> descr:          Beograd, Deligradska 19
>> country:        YU
>> admin-c:        DR47-RIPE
>> tech-c:         DR47-RIPE
>> status:         ASSIGNED PA
>> mnt-by:         AS6700-MNT
>> created:        2002-04-11T08:21:26Z
>> last-modified:  2002-04-11T08:21:26Z
>> source:         RIPE
>> 
>> person:         Nenad Repac
>> address:        D.D. TELEFONIJA
>> address:        Marsala Tolbuhina 56
>> address:        11000 Beograd
>> address:        Yugoslavia
>> phone:          +381 11 444 11 44 Ext. 381
>> fax-no:         +381 11 3248 953
>> nic-hdl:        DR47-RIPE
>> mnt-by:         AS6700-MNT
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2001-09-21T23:28:31Z
>> source:         RIPE # Filtered
>> 
>> route:          195.252.96.0/19
>> descr:          BeotelNet ISP, Belgrade, RS
>> origin:         AS6700
>> mnt-by:         AS6700-MNT
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2019-07-15T09:12:36Z
>> source:         RIPE
> 
> Same goes for the third network, having a RIPE DB entry maintained by the same organisation:
> 
>> inetnum:        195.252.120.0 - 195.252.120.255
>> netname:        ABSOFT
>> descr:          AB SOFT
>> descr:          Kneza Milosa 82, Beograd
>> country:        YU
>> admin-c:        DR47-RIPE
>> tech-c:         DR47-RIPE
>> status:         ASSIGNED PA
>> mnt-by:         AS6700-MNT
>> created:        2002-04-10T16:54:48Z
>> last-modified:  2002-04-10T16:54:48Z
>> source:         RIPE
>> 
>> person:         Nenad Repac
>> address:        D.D. TELEFONIJA
>> address:        Marsala Tolbuhina 56
>> address:        11000 Beograd
>> address:        Yugoslavia
>> phone:          +381 11 444 11 44 Ext. 381
>> fax-no:         +381 11 3248 953
>> nic-hdl:        DR47-RIPE
>> mnt-by:         AS6700-MNT
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2001-09-21T23:28:31Z
>> source:         RIPE # Filtered
>> 
>> route:          195.252.96.0/19
>> descr:          BeotelNet ISP, Belgrade, RS
>> origin:         AS6700
>> mnt-by:         AS6700-MNT
>> created:        1970-01-01T00:00:00Z
>> last-modified:  2019-07-15T09:12:36Z
>> source:         RIPE
> 
> Since we are only dealing with three networks here and their actual location seems to be pretty
> clear to me, I suggest _not_ to add YU as a legitimate country. Instead, I would just write overrides
> for these networks.
> 
> Would you be fine with that?
> 
> Thanks, and best regards,
> Peter Müller
> 
> 
>> Hello,
>> 
>> I merged this patch, but it has some unwanted side-effects:
>> 
>> Technically it works as designed as we are successfully dropping any countries that are not part of the imported list. I changed our scripts that these will always be imported first now.
>> 
>> I ran a manual import which dropped CS which is Serbia and Montenegro. This used to be a valid country code, but Serbia and Montenegro is not a single country any more. I decided to add it because we would have dropped too many networks without it. Now we are dropping a few networks with country code YU - Yugoslavia.
>> 
>> Montenegro became independent from Serbia in 2006, Yugoslavia became the State Union of Serbia and Montenegro in 2003. For some reasons (probably because I didn’t do research) I thought these events were closer together and therefore thought that all networks with country code CS simply “forgot” to update this, but there never were any that actually existing during the time of Yugoslavia.
>> 
>> Long story short: Would anybody object to add YU to the database although it doesn’t exist as a country any more? I guess we cannot just “rewrite” it because the situation is way too complicated. However, we wanted to give people an idea where some IP address is located and that is kind of does not work if the country does not exist any more. Returning nothing instead is not a great solution either because we are then simply hiding networks that exist.
>> 
>> Or did I overlook an ever better option?
>> 
>> -Michael
>> 
>>> On 30 Mar 2021, at 16:47, Peter Müller <peter.mueller@ipfire.org> wrote:
>>> 
>>> There is no sense in parsing and storting networks whose country codes
>>> cannot be found in the ISO-3166-x country code table. This avoids side
>>> effects in applications using the location database, and introduces
>>> another sanity check to compensate bogus RIR data.
>>> 
>>> On location02, this affects some networks from APNIC (country code: ZZ)
>>> as well as a bunch of smaller allocations within the RIPE region still
>>> tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU
>>> (Soviet Union) was found - while the NIC for .su TLD is still
>>> operational. :-)
>>> 
>>> Applying this patch causes the countries to be processed before
>>> update_whois() is called. In case no countries are present in the SQL
>>> table, this check is silently omitted.
>>> 
>>> Fixes: #12510
>>> 
>>> Signed-off-by: Peter Müller <peter.mueller@ipfire.org>
>>> ---
>>> src/python/location-importer.in | 38 ++++++++++++++++++++++-----------
>>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>> 
>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>> index e2f201b..1e08458 100644
>>> --- a/src/python/location-importer.in
>>> +++ b/src/python/location-importer.in
>>> @@ -388,10 +388,17 @@ class CLI(object):
>>> 				TRUNCATE TABLE networks;
>>> 			""")
>>> 
>>> +			# Fetch all valid country codes to check parsed networks aganist...
>>> +			rows = self.db.query("SELECT * FROM countries ORDER BY country_code")
>>> +			validcountries = []
>>> +
>>> +			for row in rows:
>>> +				validcountries.append(row.country_code)
>>> +
>>> 			for source in location.importer.WHOIS_SOURCES:
>>> 				with downloader.request(source, return_blocks=True) as f:
>>> 					for block in f:
>>> -						self._parse_block(block)
>>> +						self._parse_block(block, validcountries)
>>> 
>>> 			# Process all parsed networks from every RIR we happen to have access to,
>>> 			# insert the largest network chunks into the networks table immediately...
>>> @@ -467,7 +474,7 @@ class CLI(object):
>>> 				# Download data
>>> 				with downloader.request(source) as f:
>>> 					for line in f:
>>> -						self._parse_line(line)
>>> +						self._parse_line(line, validcountries)
>>> 
>>> 	def _check_parsed_network(self, network):
>>> 		"""
>>> @@ -532,7 +539,7 @@ class CLI(object):
>>> 		# be suitable for libloc consumption...
>>> 		return True
>>> 
>>> -	def _parse_block(self, block):
>>> +	def _parse_block(self, block, validcountries = None):
>>> 		# Get first line to find out what type of block this is
>>> 		line = block[0]
>>> 
>>> @@ -542,7 +549,7 @@ class CLI(object):
>>> 
>>> 		# inetnum
>>> 		if line.startswith("inet6num:") or line.startswith("inetnum:"):
>>> -			return self._parse_inetnum_block(block)
>>> +			return self._parse_inetnum_block(block, validcountries)
>>> 
>>> 		# organisation
>>> 		elif line.startswith("organisation:"):
>>> @@ -573,7 +580,7 @@ class CLI(object):
>>> 			autnum.get("asn"), autnum.get("org"),
>>> 		)
>>> 
>>> -	def _parse_inetnum_block(self, block):
>>> +	def _parse_inetnum_block(self, block, validcountries = None):
>>> 		log.debug("Parsing inetnum block:")
>>> 
>>> 		inetnum = {}
>>> @@ -616,10 +623,10 @@ class CLI(object):
>>> 		if not inetnum or not "country" in inetnum:
>>> 			return
>>> 
>>> -		# Skip objects with bogus country code 'ZZ'
>>> -		if inetnum.get("country") == "ZZ":
>>> -			log.warning("Skipping network with bogus country 'ZZ': %s" % \
>>> -				(inetnum.get("inet6num") or inetnum.get("inetnum")))
>>> +		# Skip objects with unknown country codes
>>> +		if validcountries and inetnum.get("country") not in validcountries:
>>> +			log.warning("Skipping network with bogus country '%s': %s" % \
>>> +				(inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetnum")))
>>> 			return
>>> 
>>> 		# Iterate through all networks enumerated from above, check them for plausibility and insert
>>> @@ -652,7 +659,7 @@ class CLI(object):
>>> 			org.get("organisation"), org.get("org-name"),
>>> 		)
>>> 
>>> -	def _parse_line(self, line):
>>> +	def _parse_line(self, line, validcountries = None):
>>> 		# Skip version line
>>> 		if line.startswith("2"):
>>> 			return
>>> @@ -667,8 +674,15 @@ class CLI(object):
>>> 			log.warning("Could not parse line: %s" % line)
>>> 			return
>>> 
>>> -		# Skip any lines that are for stats only
>>> -		if country_code == "*":
>>> +		# Skip any lines that are for stats only or do not have a country
>>> +		# code at all (avoids log spam below)
>>> +		if not country_code or country_code == '*':
>>> +			return
>>> +
>>> +		# Skip objects with unknown country codes
>>> +		if validcountries and country_code not in validcountries:
>>> +			log.warning("Skipping line with bogus country '%s': %s" % \
>>> +				(country_code, line))
>>> 			return
>>> 
>>> 		if type in ("ipv6", "ipv4"):
>>> -- 
>>> 2.26.2
>>
  

Patch

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index e2f201b..1e08458 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -388,10 +388,17 @@  class CLI(object):
 				TRUNCATE TABLE networks;
 			""")
 
+			# Fetch all valid country codes to check parsed networks aganist...
+			rows = self.db.query("SELECT * FROM countries ORDER BY country_code")
+			validcountries = []
+
+			for row in rows:
+				validcountries.append(row.country_code)
+
 			for source in location.importer.WHOIS_SOURCES:
 				with downloader.request(source, return_blocks=True) as f:
 					for block in f:
-						self._parse_block(block)
+						self._parse_block(block, validcountries)
 
 			# Process all parsed networks from every RIR we happen to have access to,
 			# insert the largest network chunks into the networks table immediately...
@@ -467,7 +474,7 @@  class CLI(object):
 				# Download data
 				with downloader.request(source) as f:
 					for line in f:
-						self._parse_line(line)
+						self._parse_line(line, validcountries)
 
 	def _check_parsed_network(self, network):
 		"""
@@ -532,7 +539,7 @@  class CLI(object):
 		# be suitable for libloc consumption...
 		return True
 
-	def _parse_block(self, block):
+	def _parse_block(self, block, validcountries = None):
 		# Get first line to find out what type of block this is
 		line = block[0]
 
@@ -542,7 +549,7 @@  class CLI(object):
 
 		# inetnum
 		if line.startswith("inet6num:") or line.startswith("inetnum:"):
-			return self._parse_inetnum_block(block)
+			return self._parse_inetnum_block(block, validcountries)
 
 		# organisation
 		elif line.startswith("organisation:"):
@@ -573,7 +580,7 @@  class CLI(object):
 			autnum.get("asn"), autnum.get("org"),
 		)
 
-	def _parse_inetnum_block(self, block):
+	def _parse_inetnum_block(self, block, validcountries = None):
 		log.debug("Parsing inetnum block:")
 
 		inetnum = {}
@@ -616,10 +623,10 @@  class CLI(object):
 		if not inetnum or not "country" in inetnum:
 			return
 
-		# Skip objects with bogus country code 'ZZ'
-		if inetnum.get("country") == "ZZ":
-			log.warning("Skipping network with bogus country 'ZZ': %s" % \
-				(inetnum.get("inet6num") or inetnum.get("inetnum")))
+		# Skip objects with unknown country codes
+		if validcountries and inetnum.get("country") not in validcountries:
+			log.warning("Skipping network with bogus country '%s': %s" % \
+				(inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetnum")))
 			return
 
 		# Iterate through all networks enumerated from above, check them for plausibility and insert
@@ -652,7 +659,7 @@  class CLI(object):
 			org.get("organisation"), org.get("org-name"),
 		)
 
-	def _parse_line(self, line):
+	def _parse_line(self, line, validcountries = None):
 		# Skip version line
 		if line.startswith("2"):
 			return
@@ -667,8 +674,15 @@  class CLI(object):
 			log.warning("Could not parse line: %s" % line)
 			return
 
-		# Skip any lines that are for stats only
-		if country_code == "*":
+		# Skip any lines that are for stats only or do not have a country
+		# code at all (avoids log spam below)
+		if not country_code or country_code == '*':
+			return
+
+		# Skip objects with unknown country codes
+		if validcountries and country_code not in validcountries:
+			log.warning("Skipping line with bogus country '%s': %s" % \
+				(country_code, line))
 			return
 
 		if type in ("ipv6", "ipv4"):