[Linux-Xtensa] RAM load/store reorder
baruch at tkos.co.il
Thu Nov 28 04:31:25 UTC 2013
(Adding the list to Cc)
On Wed, Nov 27, 2013 at 04:14:31PM -0800, Chris Zankel wrote:
> Basically came to the same conclusions as the two of you when I
> looked at it this morning. There seems to be a weird reordering
> execution issue. Are you using just the default bitstream from
> Tensilica? What version? Were there any changes to the external
> memory? I'd be surprised that we hadn't caught such an issue when
> running the kernel and other tools on that board before.
This is not the default Tensilica bitstream. I'll check how this bitstream was
It is worth noting that U-Boot version 2009.08 and the kernel (versions 3.12
and 3.13-rc1), all built with the same toolchain, run happily on that board.
> Regarding the compiler, it's really weird that it would reload a3
> for every loop. Marc, you probably don't have the entire
> disassembled function in front of you, so here it is.
Which version of gcc have you used?
> (I can't really come up with any explanation other than that it
> might have optimized out some additional code, but that would have
> been something rather complex like TLS or so... weird..)
I didn't notice the jump label between the store and the load. This makes the
code a little less weird, but not by much.
> 00000344 <fdt_next_subnode>:
> 344: 008136 entry a1, 64
> 347: 03bd mov.n a11, a3
> 349: ff3331 l32r a3, 18 <___arch__swab32-0xfc>
> 34c: 180c movi.n a8, 1
> 34e: 0189 s32i.n a8, a1, 0
> 350: 4139 s32i.n a3, a1, 16
> 352: 4138 l32i.n a3, a1, 16 # reloads a3
> for every loop, but a3 never changes
> 354: 02ad mov.n a10, a2
> 356: 20c110 or a12, a1, a1
> 359: 0003e0 callx8 a3
> 35c: 0abd mov.n a11, a10
> 35e: 00aa96 bltz a10, 36c <fdt_next_subnode+0x28>
> 361: 0188 l32i.n a8, a1, 0
> 363: 0518a6 blti a8, 1, 36c <fdt_next_subnode+0x28>
> 366: e81866 bnei a8, 1, JUMP_LABEL ## 352
> 369: 000046 j 36e <fdt_next_subnode+0x2a>
> 36c: fb7c movi.n a11, -1
> 36e: 0b2d mov.n a2, a11
> 370: f01d retw.n
> On 11/27/13, 11:46 AM, Baruch Siach wrote:
> >On Wed, Nov 27, 2013 at 11:22:08AM -0800, Marc Gauthier wrote:
> >>Baruch Siach wrote:
> >>>Thanks for you prompt response.
> >>>On Wed, Nov 27, 2013 at 08:39:25AM -0800, Marc Gauthier wrote:
> >>>>Baruch Siach wrote:
> >>>>>1. Why is this seemingly NOP store/load pair at 0x5f3d414
> >>>and 0x5f3d416
> >>>>>needed at all?
> >>>>Looks like you're compiling at -O0. In that case, the compiler
> >>>>ensures that all variables are in memory at every source line
> >>>>boundary ie. nothing is cached in registers across such boundaries,
> >>>>so debuggers can access/modify variables in memory (eg. stack) only,
> >>>>and the right thing happens.
> >>>The source file in question builds with -Os. I checked this
> >>>again by looking
> >>>at the actual build command line. Also, a3 holds an internal
> >>>function pointer, not a user set variable.
> >>Is fdt_next_node() a macro? what does it expand to?
> >No. fdt_next_node() is a regular function. See its full listing in my first
> >email at
> >>Is this GCC or XCC?
> >It's GCC version 4.7.2.
> >>>The FPGA runs at 25MHz which I believe is nowhere near the limit of this
> >>>board. I'll check here how exactly this bitstream was generated.
> >>Ok. I can't explain the behavior otherwise.
> >>Can you also try different ML605 board?
> >It's a KC705. I'm not sure we have another one, but I'll check.
> >>Am assuming the code works fine when single-stepped, or putting
> >>breakpoints after each instruction?
> >Yes. Single stepping through the code makes the problem magically disappear.
http://baruch.siach.name/blog/ ~. .~ Tk Open Systems
- baruch at tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
More information about the linux-xtensa