[0/4] kernel: aarch64: Add support for Traverse Ten64 board

Message ID 20221003062019.19636-1-matt@traverse.com.au
Headers
Series kernel: aarch64: Add support for Traverse Ten64 board |

Message

Mathew McBride Oct. 3, 2022, 6:20 a.m. UTC
  Hi all,
This patchset adds support for our (Traverse) Ten64 board,
which is an ARM64 networking board using NXP's LS1088A SoC.

I have been intending to do this for a very long time but was waiting
for the kernel version to be upgraded to 5.10 or above given the
significant amount of work that has been done upstream for this
hardware in recent times.

There are four components to this patch:
1: Enable the relevant kernel options for our box
This follows our doc at https://ten64doc.traverse.com.au/kernel/

2: Add patches to fully support SFP+
One of these patches came in after 5.15+, while the other
fixes a deadlock issue that occurs when detaching/unloading
the SFP+ ports (such as rebooting the system). Unfortunately
this issue has been stalled upstream without resolution for
a while now.

3: Fix our real time clock (rtc-rx8025) not being modprobed
I haven't been able to figure out why our RTC driver does not
get loaded, given every other relevant module (like GPIO, I2C)
does get loaded.

If there is a better way to do this, feel free to NAK and
suggest a better method.

4: Bypass the u-boot bootscript on Ten64
The Ten64 uses u-boot which has both EFI and classic
'distroboot' support. We much prefer to boot EFI as this
provides some benefits, such as not having to supply your
own device tree.

A quirk of the Ten64 implementation (related to how
the IOMMU hardware is configured) is that a "failed"
bootscript will block the boot of other types (like EFI),
so detect if we are on a Ten64 and jump straight to GRUB.

My intention is to prioritize EFI always in a future Ten64
firmware release so this doesn't happen, at which point this
hack can be removed.
(Removing boot.scr does the same thing, but I prefer
that it will boot out of the box without modification)

Here is the fireinfo from a Ten64:
https://fireinfo.ipfire.org/profile/97f7fd96a529ca2e5488ab095b7d9effe67d0ef3
(Note to self: I should figure out how to improve the fireinfo output on ARM platforms)

I have also tested this on an AWS Graviton (ARM64) instance
to verify there are no regressions on other "standard"
(EFI-capable) ARM64 systems.

Mathew McBride (4):
  linux: enable options for NXP Layerscape
  kernel: add patches for SFP support on NXP Layerscape/DPAA2 (arm64)
  config: u-boot: bypass the u-boot script on Traverse Ten64
  initscripts: load RTC module (RX8025) for Ten64 board:w

 config/kernel/kernel.config.aarch64-ipfire    | 76 +++++++++++++----
 config/u-boot/boot.cmd                        |  9 +++
 lfs/linux                                     |  3 +
 src/initscripts/system/setclock               |  8 ++
 ...rm64-dpaa2-add-support-for-10g-modes.patch | 39 +++++++++
 ...inux-5.15-arm64-dpaa2-fix-lock-issue.patch | 81 +++++++++++++++++++
 6 files changed, 202 insertions(+), 14 deletions(-)
 create mode 100644 src/patches/linux/linux-5-15-arm64-dpaa2-add-support-for-10g-modes.patch
 create mode 100644 src/patches/linux/linux-5.15-arm64-dpaa2-fix-lock-issue.patch
  

Comments

Michael Tremer Oct. 4, 2022, 8:56 a.m. UTC | #1
Hello Mathew,

Good to hear from you again...

> On 3 Oct 2022, at 07:20, Mathew McBride <matt@traverse.com.au> wrote:
> 
> Hi all,
> This patchset adds support for our (Traverse) Ten64 board,
> which is an ARM64 networking board using NXP's LS1088A SoC.

Great!

> I have been intending to do this for a very long time but was waiting
> for the kernel version to be upgraded to 5.10 or above given the
> significant amount of work that has been done upstream for this
> hardware in recent times.

We are on 5.15 for quite a while now and I have an experimental branch with 6.0 ready which I did not test on ARM, yet.

Will any of the changes in this patchset be incompatible with 6.0, or is it all in fact backported from mainline?

> There are four components to this patch:
> 1: Enable the relevant kernel options for our box
> This follows our doc at https://ten64doc.traverse.com.au/kernel/

I assume that this is all part of the upstream kernel. So I have no problem with enabling this. It should be very unlikely to break anything.

> 2: Add patches to fully support SFP+
> One of these patches came in after 5.15+, while the other
> fixes a deadlock issue that occurs when detaching/unloading
> the SFP+ ports (such as rebooting the system). Unfortunately
> this issue has been stalled upstream without resolution for
> a while now.

:(

> 3: Fix our real time clock (rtc-rx8025) not being modprobed
> I haven't been able to figure out why our RTC driver does not
> get loaded, given every other relevant module (like GPIO, I2C)
> does get loaded.
> 
> If there is a better way to do this, feel free to NAK and
> suggest a better method.

This is kind of ugly. But it is not as bad as trying to load the module on all machines. You have a good way to determine if there is at least a chance to be successful.

I can live with this for now, but maybe it is a good idea to file a bug upstream and have them work on the module being automatically loaded as all the others?

> 4: Bypass the u-boot bootscript on Ten64
> The Ten64 uses u-boot which has both EFI and classic
> 'distroboot' support. We much prefer to boot EFI as this
> provides some benefits, such as not having to supply your
> own device tree.
> 
> A quirk of the Ten64 implementation (related to how
> the IOMMU hardware is configured) is that a "failed"
> bootscript will block the boot of other types (like EFI),
> so detect if we are on a Ten64 and jump straight to GRUB.
> 
> My intention is to prioritize EFI always in a future Ten64
> firmware release so this doesn't happen, at which point this
> hack can be removed.
> (Removing boot.scr does the same thing, but I prefer
> that it will boot out of the box without modification)

Great that you are supporting EFI.

As bad as EFI is, it is the only scalable way to make IPFire boot on as many devices without any complicated quirks, tons of bootloaders that are 99% the same code, but then are not, and so on.

If I could I would only support EFI, but all the cheap single board computers do not really play ball, yet.

> Here is the fireinfo from a Ten64:
> https://fireinfo.ipfire.org/profile/97f7fd96a529ca2e5488ab095b7d9effe67d0ef3
> (Note to self: I should figure out how to improve the fireinfo output on ARM platforms)

Oh, this is indeed a little bit short. Are the network interfaces not connected using PCIe or some other bus that can be enumerated?

> I have also tested this on an AWS Graviton (ARM64) instance
> to verify there are no regressions on other "standard"
> (EFI-capable) ARM64 systems.

That is very good to know. IPFire works like a charm on those :)

> Mathew McBride (4):
>  linux: enable options for NXP Layerscape
>  kernel: add patches for SFP support on NXP Layerscape/DPAA2 (arm64)
>  config: u-boot: bypass the u-boot script on Traverse Ten64
>  initscripts: load RTC module (RX8025) for Ten64 board:w
> 
> config/kernel/kernel.config.aarch64-ipfire    | 76 +++++++++++++----
> config/u-boot/boot.cmd                        |  9 +++
> lfs/linux                                     |  3 +
> src/initscripts/system/setclock               |  8 ++
> ...rm64-dpaa2-add-support-for-10g-modes.patch | 39 +++++++++
> ...inux-5.15-arm64-dpaa2-fix-lock-issue.patch | 81 +++++++++++++++++++
> 6 files changed, 202 insertions(+), 14 deletions(-)
> create mode 100644 src/patches/linux/linux-5-15-arm64-dpaa2-add-support-for-10g-modes.patch
> create mode 100644 src/patches/linux/linux-5.15-arm64-dpaa2-fix-lock-issue.patch

Core Update 171 is technically closed, but I would suggest to still merge those patches into it, since the big testing phase has not yet been started.

I do not want to ship another kernel in the next update if we don’t have to, so it makes sense to have this merged now. It is very unlikely to break anything else.

@Peter: Could you please merge this? I will submit my tags shortly.

-Michael

> 
> -- 
> 2.30.1
>
  
Mathew McBride Oct. 28, 2022, 5:11 a.m. UTC | #2
Hi Michael,

Just to finally get back to your other questions/comments.
Apart from the boot.scr issue (fixed by removing the boot.scr file), Core 171 is working well on the Ten64.

On Tue, Oct 4, 2022, at 7:56 PM, Michael Tremer wrote:
> Hello Mathew,
> 
> Good to hear from you again...
> [snip]
> 
> Will any of the changes in this patchset be incompatible with 6.0, or is it 
> all in fact backported from mainline?

It's all backported from mainline. I'm not aware of any upcoming changes that will break things.

> > There are four components to this patch:
> > 1: Enable the relevant kernel options for our box
> > This follows our doc at https://ten64doc.traverse.com.au/kernel/
> 
> I assume that this is all part of the upstream kernel. So I have no problem 
> with enabling this. It should be very unlikely to break anything.
> 
> > 2: Add patches to fully support SFP+
> > One of these patches came in after 5.15+, while the other
> > fixes a deadlock issue that occurs when detaching/unloading
> > the SFP+ ports (such as rebooting the system). Unfortunately
> > this issue has been stalled upstream without resolution for
> > a while now.
> 
> :(
The upstream experience for this particular SoC has been better than previous ones, but there are novel parts of it that breaks assumptions kernel (and other) developers have about network hardware. It's those parts which have been stalled upstream.

The network complex is not a "fixed function" device, network interfaces and PHYs can be connected in arbitrary pairs (for example, I could change the PHY of a running interface from an SFP to a 1000Base-T port). It basically has a pool of resources across all the network ports which one then configures the way they want.

> 
> > 3: Fix our real time clock (rtc-rx8025) not being modprobed
> > I haven't been able to figure out why our RTC driver does not
> > get loaded, given every other relevant module (like GPIO, I2C)
> > does get loaded.
> >
> > If there is a better way to do this, feel free to NAK and
> > suggest a better method.
> 
> This is kind of ugly. But it is not as bad as trying to load the module on 
> all machines. You have a good way to determine if there is at least a chance 
> to be successful.
> 
> I can live with this for now, but maybe it is a good idea to file a bug 
> upstream and have them work on the module being automatically loaded as all 
> the others?

I'm not sure if anything is broken with the upstream kernel, but I think I need to understand what causes a kernel module to be loaded without a modprobe first.

The bigger distributions deal with it by modprobing all the available *.ko's in initrd. 
In our own kernel configuration for testing we just set CONFIG_RTC_RX8035=y so it's built-in.

I could do the same if you're happy to have it as a builtin (like the x86 BIOS/CMOS rtc.)

> [snip]
> > Here is the fireinfo from a Ten64:
> > https://fireinfo.ipfire.org/profile/97f7fd96a529ca2e5488ab095b7d9effe67d0ef3
> > (Note to self: I should figure out how to improve the fireinfo output on 
> > ARM platforms)
> 
> Oh, this is indeed a little bit short. Are the network interfaces not 
> connected using PCIe or some other bus that can be enumerated?
Indeed, they come from a special "fsl-mc-bus".

My suggestion would be to crawl through the [sysfs] device/ and device/driver for each net class device to identify the full device path and driver.

$ ls /sys/class/net/
eth0  eth1  eth2  eth3  eth4  eth5  eth6  eth7  eth8  eth9  lo

$ readlink -f /sys/class/net/eth0/device/
/sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.9
$ readlink -f /sys/class/net/eth1/device/
/sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.8
..
$ readlink -f /sys/class/net/eth9/device/
/sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.0

$ readlink -f /sys/class/net/eth0/device/driver
/sys/bus/fsl-mc/drivers/fsl_dpaa2_eth

$ ls -la /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/
drwxr-xr-x    2 root     root             0 Oct 10 05:33 .
drwxr-xr-x    8 root     root             0 Oct 10 05:33 ..
--w-------    1 root     root          4096 Oct 10 05:33 bind
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.0 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.0
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.1 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.1
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.2 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.2
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.3 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.3
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.4 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.4
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.5 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.5
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.6 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.6
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.7 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.7
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.8 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.8
lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.9 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.9
--w-------    1 root     root          4096 Oct 10 05:33 uevent
--w-------    1 root     root          4096 Oct 10 05:33 unbind


Regards,
Matt

> > I have also tested this on an AWS Graviton (ARM64) instance
> > to verify there are no regressions on other "standard"
> > (EFI-capable) ARM64 systems.
> 
> That is very good to know. IPFire works like a charm on those :)
> 
> > Mathew McBride (4):
> >  linux: enable options for NXP Layerscape
> >  kernel: add patches for SFP support on NXP Layerscape/DPAA2 (arm64)
> >  config: u-boot: bypass the u-boot script on Traverse Ten64
> >  initscripts: load RTC module (RX8025) for Ten64 board:w
> >
> > config/kernel/kernel.config.aarch64-ipfire    | 76 +++++++++++++----
> > config/u-boot/boot.cmd                        |  9 +++
> > lfs/linux                                     |  3 +
> > src/initscripts/system/setclock               |  8 ++
> > ...rm64-dpaa2-add-support-for-10g-modes.patch | 39 +++++++++
> > ...inux-5.15-arm64-dpaa2-fix-lock-issue.patch | 81 +++++++++++++++++++
> > 6 files changed, 202 insertions(+), 14 deletions(-)
> > create mode 100644 
> > src/patches/linux/linux-5-15-arm64-dpaa2-add-support-for-10g-modes.patch
> > create mode 100644 
> > src/patches/linux/linux-5.15-arm64-dpaa2-fix-lock-issue.patch
> 
> Core Update 171 is technically closed, but I would suggest to still merge 
> those patches into it, since the big testing phase has not yet been started.
> 
> I do not want to ship another kernel in the next update if we don’t have to, 
> so it makes sense to have this merged now. It is very unlikely to break 
> anything else.
> 
> @Peter: Could you please merge this? I will submit my tags shortly.
> 
> -Michael
> 
> >
> > -- 
> > 2.30.1
> >
> 
>
  
Michael Tremer Nov. 2, 2022, 5:40 p.m. UTC | #3
Hello,

> On 28 Oct 2022, at 06:11, Mathew McBride <matt@traverse.com.au> wrote:
> 
> Hi Michael,
> 
> Just to finally get back to your other questions/comments.
> Apart from the boot.scr issue (fixed by removing the boot.scr file), Core 171 is working well on the Ten64.

I raised this with Arne.

> On Tue, Oct 4, 2022, at 7:56 PM, Michael Tremer wrote:
>> Hello Mathew,
>> 
>> Good to hear from you again...
>> [snip]
>> 
>> Will any of the changes in this patchset be incompatible with 6.0, or is it 
>> all in fact backported from mainline?
> 
> It's all backported from mainline. I'm not aware of any upcoming changes that will break things.

Very good.

>> > There are four components to this patch:
>> > 1: Enable the relevant kernel options for our box
>> > This follows our doc at https://ten64doc.traverse.com.au/kernel/
>> 
>> I assume that this is all part of the upstream kernel. So I have no problem 
>> with enabling this. It should be very unlikely to break anything.
>> 
>> > 2: Add patches to fully support SFP+
>> > One of these patches came in after 5.15+, while the other
>> > fixes a deadlock issue that occurs when detaching/unloading
>> > the SFP+ ports (such as rebooting the system). Unfortunately
>> > this issue has been stalled upstream without resolution for
>> > a while now.
>> 
>> :(
> The upstream experience for this particular SoC has been better than previous ones, but there are novel parts of it that breaks assumptions kernel (and other) developers have about network hardware. It's those parts which have been stalled upstream.
> 
> The network complex is not a "fixed function" device, network interfaces and PHYs can be connected in arbitrary pairs (for example, I could change the PHY of a running interface from an SFP to a 1000Base-T port). It basically has a pool of resources across all the network ports which one then configures the way they want.

Cool, but how are we supposed to put this into any kind of UI?

>> 
>> > 3: Fix our real time clock (rtc-rx8025) not being modprobed
>> > I haven't been able to figure out why our RTC driver does not
>> > get loaded, given every other relevant module (like GPIO, I2C)
>> > does get loaded.
>> >
>> > If there is a better way to do this, feel free to NAK and
>> > suggest a better method.
>> 
>> This is kind of ugly. But it is not as bad as trying to load the module on 
>> all machines. You have a good way to determine if there is at least a chance 
>> to be successful.
>> 
>> I can live with this for now, but maybe it is a good idea to file a bug 
>> upstream and have them work on the module being automatically loaded as all 
>> the others?
> 
> I'm not sure if anything is broken with the upstream kernel, but I think I need to understand what causes a kernel module to be loaded without a modprobe first.

That depends. Either it is ACPI tables which you don’t have on ARM. It could be part of the device tree as well, or the system just enumerates any devices connected to a PCI bus.

> The bigger distributions deal with it by modprobing all the available *.ko's in initrd. 
> In our own kernel configuration for testing we just set CONFIG_RTC_RX8035=y so it's built-in.
> 
> I could do the same if you're happy to have it as a builtin (like the x86 BIOS/CMOS rtc.)

With an RTC I would be happy to have this built in. They are not very large normally and that is still better than calling mopdrobe a thousand times. We should already have lots of RTCs compiled into our kernel.

> 
>> [snip]
>> > Here is the fireinfo from a Ten64:
>> > https://fireinfo.ipfire.org/profile/97f7fd96a529ca2e5488ab095b7d9effe67d0ef3
>> > (Note to self: I should figure out how to improve the fireinfo output on 
>> > ARM platforms)
>> 
>> Oh, this is indeed a little bit short. Are the network interfaces not 
>> connected using PCIe or some other bus that can be enumerated?
> Indeed, they come from a special "fsl-mc-bus".
> 
> My suggestion would be to crawl through the [sysfs] device/ and device/driver for each net class device to identify the full device path and driver.

And they don’t have any kind of PCI id or something? Probably because this is not using PCI :)

This makes this very complicated.

-Michael

> 
> $ ls /sys/class/net/
> eth0  eth1  eth2  eth3  eth4  eth5  eth6  eth7  eth8  eth9  lo
> 
> $ readlink -f /sys/class/net/eth0/device/
> /sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.9
> $ readlink -f /sys/class/net/eth1/device/
> /sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.8
> ..
> $ readlink -f /sys/class/net/eth9/device/
> /sys/devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.0
> 
> $ readlink -f /sys/class/net/eth0/device/driver
> /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth
> 
> $ ls -la /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/
> drwxr-xr-x    2 root     root             0 Oct 10 05:33 .
> drwxr-xr-x    8 root     root             0 Oct 10 05:33 ..
> --w-------    1 root     root          4096 Oct 10 05:33 bind
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.0 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.0
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.1 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.1
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.2 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.2
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.3 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.3
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.4 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.4
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.5 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.5
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.6 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.6
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.7 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.7
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.8 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.8
> lrwxrwxrwx    1 root     root             0 Oct 10 05:33 dpni.9 -> ../../../../devices/platform/soc/80c000000.fsl-mc/dprc.1/dpni.9
> --w-------    1 root     root          4096 Oct 10 05:33 uevent
> --w-------    1 root     root          4096 Oct 10 05:33 unbind
> 
> 
> Regards,
> Matt
> 
>> > I have also tested this on an AWS Graviton (ARM64) instance
>> > to verify there are no regressions on other "standard"
>> > (EFI-capable) ARM64 systems.
>> 
>> That is very good to know. IPFire works like a charm on those :)
>> 
>> > Mathew McBride (4):
>> >  linux: enable options for NXP Layerscape
>> >  kernel: add patches for SFP support on NXP Layerscape/DPAA2 (arm64)
>> >  config: u-boot: bypass the u-boot script on Traverse Ten64
>> >  initscripts: load RTC module (RX8025) for Ten64 board:w
>> >
>> > config/kernel/kernel.config.aarch64-ipfire    | 76 +++++++++++++----
>> > config/u-boot/boot.cmd                        |  9 +++
>> > lfs/linux                                     |  3 +
>> > src/initscripts/system/setclock               |  8 ++
>> > ...rm64-dpaa2-add-support-for-10g-modes.patch | 39 +++++++++
>> > ...inux-5.15-arm64-dpaa2-fix-lock-issue.patch | 81 +++++++++++++++++++
>> > 6 files changed, 202 insertions(+), 14 deletions(-)
>> > create mode 100644 
>> > src/patches/linux/linux-5-15-arm64-dpaa2-add-support-for-10g-modes.patch
>> > create mode 100644 
>> > src/patches/linux/linux-5.15-arm64-dpaa2-fix-lock-issue.patch
>> 
>> Core Update 171 is technically closed, but I would suggest to still merge 
>> those patches into it, since the big testing phase has not yet been started.
>> 
>> I do not want to ship another kernel in the next update if we don’t have to, 
>> so it makes sense to have this merged now. It is very unlikely to break 
>> anything else.
>> 
>> @Peter: Could you please merge this? I will submit my tags shortly.
>> 
>> -Michael
>> 
>> >
>> > -- 
>> > 2.30.1
>> >