Closed Bug 840285 Opened 12 years ago Closed 12 years ago

OdinMonkey: ARM support

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: luke, Assigned: mjrosenb)

References

Details

(Whiteboard: [gdc2013])

Attachments

(2 files, 15 obsolete files)

146.40 KB, patch
luke
: review
Details | Diff | Splinter Review
142.53 KB, patch
Details | Diff | Splinter Review
Marty has graciously volunteered to add ARM support to Odin.  This is mostly adding codegen for new IM nodes and ARM trampolines.

One challenge is catching out-of-bounds heap access: we can neither use the x86 segment register trick nor the x64 large-address space trick.  The cheapest solution I can find is to use the 'usat' instruction to clamp all out-of-bounds access to a single address which we ensure will safely fault.
Blocks: 840282
this is still throws a few assertions.  I'm pretty sure most of them are due to stack issues.  I'll look into it a bit more as I'm able.
hacks on top of hacks.  I haven't rebased in a while (coming soon), and I'm pretty sure I've totally broken x64 support, but that should be trivial to fix (just it isn't yet)
however, the good news:
/home/mrosenberg/src/odin/odin/js/src/jit-test/lib/asm.js:5:0 TypeError: redeclaration of const ASM_OK_STRING
mrosenberg@tegra-ubuntu:~/src/odin/odin/js/objs/arm-dbg$ ../../src/jit-test/jit_test.py -f ./shell/js asm.js 
[17| 0| 0| 0] 100% ==================================================>|   8.6s
PASSED ALL
Attachment #713997 - Attachment is obsolete: true
Nice work!  I'm guessing you'll work on rebasing now and then we can look over the patch together in MV next week?
Is it possible for us to use the ThumbEE 'chka' when we have it instead of saturating?
I think we're generating ARM, not Thumb code, although there may be future plans for Thumb.
Yes.  Currently the Ion assembler is ARM only, If code size becomes a problem for odin, I can probably write a Thumb-2 assembler.
Adds in arm support, passes all jit-tests, and was rebased late last week.  There is currently a bug exposed by the banana bread test, but it looks like that is in the assembler rather than Odin.
Attachment #719297 - Attachment is obsolete: true
You're probably right, but just in case: if you see what looks like a corrupted instruction, I'd suspect patching bugs due to offset/actualOffset since that is an ARM-only thing that is only going to show up at scale.  That should only be happening in ModuleCompiler::finish, so it might be worth taking a slow pass over that.
The Android NDK does not appear to defined mcontext_t and ucontext_t which is a problem for a Fennec build. See: https://chromiumcodereview.appspot.com/10829122/ and http://code.google.com/p/android/issues/detail?id=34784

The first link has some definitions that may be appropriate, and these at least get  AsmJSSignalHandlers.cpp compiling.
ok, rebased off of mozilla-central.  passes all jit-tests
bananabread has some issues.  major cleanups needed, I'll get on them this evening.  I just left some of the methods that were recently merged as seperate ARM functions, because there was enough to rebase that I didn't want to also risk
Attachment #726260 - Flags: review?(luke)
Awesome work!

In the meantime, could you build a browser and verify that it runs http://kripken.github.com/misc-js-benchmarks/banana/benchmark.html and http://kripken.github.com/ammo.js/examples/new/ammo.html?
This doesn't build; same problem as what Doug said in comment #9 -- there is no mcontext_t defined in the Android NDK.
A possible patch to get AsmJSSignalHandlers.cpp compiling on Android ARM.
Comment on attachment 726260 [details] [diff] [review]
/home/mjrosenb/patches/addARMSupport-r3.patch

Review of attachment 726260 [details] [diff] [review]:
-----------------------------------------------------------------

Great progress!  Mostly code organization comments below:

It'd be nice if you took a pass over this patch; there are a bunch of printfs, #if 0, #if TODO_REMOVE, etc to be cleaned out.

Could you reuse AsmJSHeapAccess instead of adding an AsmJSBoundsCheck?  I realize that ARM doesn't need opLength or isFloat32Load, but we can just #ifdef those fields in AsmJSHeapAccess as we've done with x86/x64.  There seems to be a lot of code duplication resulting from this.

::: js/src/ion/AsmJS.cpp
@@  4389,5 @@
>      // CodeGenerator so we can destory it now.
>      return true;
>  }
>  
>  //static const unsigned CodeAlignment = 8;

rm

@@ -4761,5 @@
>      // we must align StackPointer dynamically. Don't worry about restoring
>      // StackPointer since throwLabel will clobber StackPointer immediately.
>      masm.andPtr(Imm32(~(StackAlignment - 1)), StackPointer);
> -    if (ShadowStackSpace)
> -        masm.subPtr(Imm32(ShadowStackSpace), StackPointer);

That needs to stay.

@@  4984,5 @@
>  #else
>  
>      //on ARM, we should always be aligned, just do that stuff.
>      LoadAsmJSActivationIntoRegister(masm, IntArgReg0);
>      LoadJSContextFromActivation(masm, IntArgReg0, IntArgReg0);

Comment is rather ambiguous...

@@  4988,5 @@
>      LoadJSContextFromActivation(masm, IntArgReg0, IntArgReg0);
>  
>      void (*pf)(JSContext*) = js_ReportOverRecursed;
>      masm.call(ImmWord(JS_FUNC_TO_DATA_PTR(void*, pf)));
>      masm.ma_b(throwLabel);

Can't the call/jump-to-throw path be shared with x86/64?

@@  5116,1 @@
>      masm.mov(Operand(activation, AsmJSActivation::offsetOfErrorRejoinSP()), StackPointer);

Is there not a portable assembler function here and for the 'ret'?

::: js/src/ion/AsmJSLink.cpp
@@  11,5 @@
>  #include "jstypedarrayinlines.h"
>  
>  #include "AsmJS.h"
>  #include "AsmJSModule.h"
>  #include "Ion.h"

\n after #include

@@  200,5 @@
>          }
>  #elif defined(JS_CPU_ARM)
>          // Now the length of the array is know, patch all of the bounds check sites
>          // with the new length.
>          ion::IonContext ic(cx, cx->compartment, NULL);

I'm curious which IonContext constructor this is calling, I only see IonContext constructors taking 1 and 2 arguments...

::: js/src/ion/AsmJSModule.h
@@  519,5 @@
>          ion::AutoFlushCache afc("patchBoundsCheck");
>          int bits = -1;
>          for (bits = -1; bits < 31; bits  ) {
>              if (heapSize >> (bits   1) == 0)
>                  break;

use JS_CEILING_LOG2

::: js/src/ion/CodeGenerator.cpp
@@  4327,5 @@
>      // according to the system ABI. The MAsmJSParameters which represent these
>      // parameters have been useFixed()ed to these ABI-specified positions.
>      // Thus, there is nothing special to do in the prologue except (possibly)
>      // bump the stack.
>      if (!generateAsmPrologue())

Now the name to use is generateAsmJSPrologue and generateAsmJSEpilogue.

@@  5738,3 @@
>      JS_ASSERT((AlignmentAtPrologue   masm.framePushed()) % StackAlignment == 0);
>  #else
>      JS_ASSERT((masm.framePushed()) % StackAlignment == 0);

Could you fix either AlignmentAtPrologue or StackAlignment so that the assertion doesn't need an #ifdef?

@@  5770,5 @@
>  
>  bool
>  CodeGenerator::visitAsmJSParameter(LAsmJSParameter *lir)
>  {
>  #if defined(JS_CPU_ARM) && ! defined(JS_CPU_ARM_HARDFP)

!defined

@@  5790,5 @@
>      // Don't emit a jump to the return label if this is the last block.
>  #if defined(JS_CPU_ARM) && !defined(JS_CPU_ARM_HARDFP)
>      if (lir->getOperand(0)->isFloatReg()) {
>          masm.ma_vxfer(d0, r0, r1);
>      }

No { }

::: js/src/ion/IonAllocPolicy.h
@@  94,5 @@
>  {
>    public:
>      void *malloc_(size_t bytes) {
>          void *ret = GetIonContext()->temp->allocate(bytes);
>          return ret;

Undo

::: js/src/ion/IonLinker.h
@@  18,5 @@
>  
>  namespace js {
>  namespace ion {
>  
>  //static const int CodeAlignment = 8;

Where is CodeAlignment coming from?

::: js/src/ion/LIR.cpp
@@  9,5 @@
>  #include "MIRGraph.h"
>  #include "LIR.h"
>  #include "IonSpewer.h"
>  #include "LIR-inl.h"
>  #include "shared/CodeGenerator-shared.h"

\n after

@@  340,5 @@
>      for (size_t i = 0; i < moves_.length(); i  )
>          JS_ASSERT(*to != *moves_[i].to());
>  #if 0
>      if (!from->isGeneralReg() && ! from->isFloatReg())
>          JS_ASSERT((ToStackOffset(from) & 3) == 0);

Remove

::: js/src/ion/RegisterAllocator.h
@@  312,5 @@
>  #ifdef JS_CPU_X64
>          if (mir->compilingAsmJS())
>              allRegisters_.take(AnyRegister(HeapReg));
>  #endif
>  #ifdef JS_CPU_ARM

#if defined(JS_CPU_X64)
#elif defined(JS_CPU_ARM)
#endif

::: js/src/ion/arm/Architecture-arm.h
@@  32,5 @@
>  // components of a js::Value.
>  static const int32_t NUNBOX32_TYPE_OFFSET    = 4;
>  static const int32_t NUNBOX32_PAYLOAD_OFFSET = 0;
>  
>  static const uint32_t ShadowStackSpace = 0;

\n after

::: js/src/ion/arm/Assembler-arm.h
@@  67,4 @@
>  static const Register CallTempNonArgRegs[] = { r5, r6, r7, r8 };
>  static const uint32_t NumCallTempNonArgRegs =
>      mozilla::ArrayLength(CallTempNonArgRegs);
>  #ifdef REDEFINED

REDEFINED?

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@  1645,5 @@
>      return true;
>  }
>  #if 0
>  bool
>  CodeGeneratorARM::generateAsmPrologue(const MIRTypeVector &argTypes, MIRType returnType,

Remove this code and the code below.

::: js/src/ion/arm/IonFrames-arm.h
@@  8,5 @@
>  #ifndef jsion_ionframes_arm_h__
>  #define jsion_ionframes_arm_h__
>  
>  #include "ion/shared/IonFrames-shared.h"
>  #include "ion/arm/Assembler-arm.h"

\n

::: js/src/ion/arm/Lowering-arm.cpp
@@  42,5 @@
>  bool
>  LIRGeneratorARM::lowerConstantDouble(double d, MInstruction *mir)
>  {
>      uint32_t index;
>  #ifdef WHY_DO_I_NEED_THIS

good question

::: js/src/ion/arm/Lowering-arm.h
@@  62,5 @@
>      bool visitGuardShape(MGuardShape *ins);
>      bool visitRecompileCheck(MRecompileCheck *ins);
>      bool visitStoreTypedArrayElement(MStoreTypedArrayElement *ins);
>      bool visitInterruptCheck(MInterruptCheck *ins);
>      //bool visitAsmCheckStackAndInterrupt(MAsmCheckStackAndInterrupt *ins);

rm

::: js/src/ion/shared/CodeGenerator-shared.cpp
@@  70,5 @@
>  #else
>          bool forceAlign = false;
>  #endif
>  
>          if (gen->performsAsmJSCall() || forceAlign) {

As before, it'd be nice if this just fell out of the constants chosen for AlignmentAtPrologue and StackAlignment.

::: js/src/methodjit/BaseAssembler.h
@@  1147,5 @@
>      }
>  
>      template <typename T>
>  #if defined(__GNUC__) && __GNUC__ == 4 && __GNUC_MINOR__ == 7 && defined(JS_CPU_ARM)
>          __attribute__((optimize("-O1")))

Wasn't this fix already landed?

::: memory/mozalloc/mozalloc.h
@@  189,5 @@
>  #define MOZALLOC_THROW_IF_HAS_EXCEPTIONS /**/
>  #define MOZALLOC_THROW_BAD_ALLOC_IF_HAS_EXCEPTIONS
>  #else
>  #define MOZALLOC_THROW_IF_HAS_EXCEPTIONS throw()
>  #define MOZALLOC_THROW_BAD_ALLOC_IF_HAS_EXCEPTIONS abort()

Umm, I don't think you can do that.
Attachment #726260 - Flags: review?(luke)
Good news, with the patch above plus the patch from doug in comment #13, I can do an Android build.  Executing Asm.js hello-gles2 gets a "successfully compiled asm.js" message, and gears start spinning!

Bad news, it segfaults shortly thereafter -- given the work I just did for OSX,  I'm 99% sure it's segfaulting in the AsmJS signal handler/or when the signal is thrown.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #15)
> Good news, with the patch above plus the patch from doug in comment #13, I
> can do an Android build.  Executing Asm.js hello-gles2 gets a "successfully
> compiled asm.js" message, and gears start spinning!
> 
> Bad news, it segfaults shortly thereafter -- given the work I just did for
> OSX,  I'm 99% sure it's segfaulting in the AsmJS signal handler/or when the
> signal is thrown.

Yes, I have been getting crashes on the timer tests and am trying to
trace the callback exit patch to find the problem.  The signal handler
seems to obtain the correct program counter, so perhaps it's in the
return path.
The stack does not appear to be restored after a callback.

The example code below appears to have been generated
by GenerateOperationCallbackExit and stack change
annotations have been added and do not balanced.

I have some concern about state being exposed to corruption
on the stack by another signal, but need to double check the
ARM s/w manuals.

   0x400df090:	b	0x400df098
   0x400df094:	bkpt	0x00fb
   0x400df098:	sub	sp, sp, #60	; 0x3c   // sp -= 60
   0x400df09c:	stm	sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr, pc}
   0x400df0a0:	sub	sp, sp, #0
   0x400df0a4:	mrs	r0, CPSR
   0x400df0a8:	vmrs	r1, fpscr
   0x400df0ac:	strd	r0, [sp, #-12]!          // sp -= 12
   0x400df0b0:	movw	r0, #4096	; 0x1000
   0x400df0b4:	movt	r0, #16466	; 0x4052
   0x400df0b8:	ldr	r0, [r0, #68]	; 0x44
   0x400df0bc:	ldr	r1, [r0, #24]
   0x400df0c0:	str	r1, [sp, #68]	; 0x44
   0x400df0c4:	ldr	r0, [r0]
   0x400df0c8:	b	0x400df0d0
   0x400df0cc:	bkpt	0x00fc
   0x400df0d0:	sub	sp, sp, #128	; 0x80  // sp -= 128
   0x400df0d4:	vstmia	sp!, {d0-d15}           // sp  = 128  (is the state expose on the stack?)
   0x400df0d8:	sub	sp, sp, #128	; 0x80  // sp -= 128
   0x400df0dc:	movw	r12, #38117	; 0x94e5
   0x400df0e0:	movt	r12, #6
   0x400df0e4:	blx	r12                     // Call into C. Stack preserved on return.

   0x400df0e8:	cmp	r0, #0
   0x400df0ec:	beq	0x400df118
   0x400df0f0:	vpop	{d0-d15}                // sp  = 128
   0x400df0f4:	sub	sp, sp, #128	; 0x80  // sp -= 128  (is this combination necessary?)
   0x400df0f8:	add	sp, sp, #128	; 0x80  // sp  = 128
   0x400df0fc:	ldrd	r0, [sp], #12           // sp  =  12
   0x400df100:	vmsr	fpscr, r1
   0x400df104:	msr	CPSR_fs, r0
   0x400df108:	sub	sp, sp, #0
   0x400df10c:	ldm	sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr, pc}
   0x400df110:	add	sp, sp, #60	; 0x3c  // ?? not reached
   0x400df114:	nop	{0}

   0x400df118:	movw	r5, #4096	; 0x1000
   0x400df11c:	movt	r5, #16466	; 0x4052
   0x400df120:	ldr	r5, [r5, #68]	; 0x44
   0x400df124:	ldr	sp, [r5, #16]
   0x400df128:	sub	sp, sp, #0
   0x400df12c:	ldm	sp, {r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
   0x400df130:	add	sp, sp, #40	; 0x28
   0x400df134:	mov	r0, #0
   0x400df138:	bx	lr
To expedite this work, I've put both of Marty's and Douglas's (thanks btw!) up on a user repo:
  http://hg.mozilla.org/users/lwagner_mozilla.com/armodin
We can continue to iterate on this repo and avoid the terrible patch game.  Douglas, feel free to post patches here and I'll land them on armodin for you.

Running the asm.js unit test suite (run: js/src/jit-tests/jit_tests.py $(SHELL) asm.js) under qemu (very easy to do: https://blog.mozilla.org/mjrosenb/2012/11/13/building-an-running-the-js-shell-for-arm) shows failures in asm.js/testTimeout{1-4}.js, so hopefully this is what we're seeing in comment 17.  I'll dig into this more tomorrow if noone beats me to it.
Attached patch ARM Callback exit part fix. (obsolete) — Splinter Review
This patch attempts to balance the stack pushes and pops so that it is preserved the stack upon return.  This appears to have stopped the crashes in the timeout tests. The emitted code is rather convoluted and needs more work.  Add some comments on areas still to be explored, in case someone beats me to it:

   0x403cb090:	b	0x403cb098            << are these needed?
   0x403cb094:	bkpt	0x012c                <<
   0x403cb098:	sub	sp, sp, #4                                                       <<< modified
   0x403cb09c:	b	0x403cb0a4            << are these needed?
   0x403cb0a0:	bkpt	0x012d                <<
   0x403cb0a4:	sub	sp, sp, #56	; 0x38                                           <<< modified
   0x403cb0a8:	stm	sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}  <<< modified. Replace with a push
   0x403cb0ac:	sub	sp, sp, #0             << redundant?
   0x403cb0b0:	mrs	r0, CPSR
   0x403cb0b4:	vmrs	r1, fpscr
   0x403cb0b8:	strd	r0, [sp, #-12]!
   0x403cb0bc:	movw	r0, #0
   0x403cb0c0:	movt	r0, #16450	; 0x4042
   0x403cb0c4:	ldr	r0, [r0, #28]
   0x403cb0c8:	ldr	r1, [r0, #24]
   0x403cb0cc:	str	r1, [sp, #68]	; 0x44
   0x403cb0d0:	ldr	r0, [r0]
   0x403cb0d4:	b	0x403cb0dc           << are these needed?
   0x403cb0d8:	bkpt	0x012e               <<
   0x403cb0dc:	sub	sp, sp, #128	; 0x80             << 
   0x403cb0e0:	vstmia	sp!, {d0-d15}                      << looks broken. Just use vpop?
   0x403cb0e4:	sub	sp, sp, #128	; 0x80             <<
   0x403cb0e8:	movw	r12, #22885	; 0x5965
   0x403cb0ec:	movt	r12, #3
   0x403cb0f0:	blx	r12

   0x403cb0f4:	cmp	r0, #0
   0x403cb0f8:	beq	0x403cb120
   0x403cb0fc:	vpop	{d0-d15}
   0x403cb100:	sub	sp, sp, #128	; 0x80            << redundant?
   0x403cb104:	add	sp, sp, #128	; 0x80            <<
   0x403cb108:	ldrd	r0, [sp], #12
   0x403cb10c:	vmsr	fpscr, r1
   0x403cb110:	msr	CPSR_fs, r0
   0x403cb114:	pop	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}    <<< modified
   0x403cb118:	pop	{pc}                                                           <<< modified
   0x403cb11c:	nop	{0}

    // Is there test coverage for this code path?  A throw?
   0x403cb120:	movw	r5, #0
   0x403cb124:	movt	r5, #16450	; 0x4042
   0x403cb128:	ldr	r5, [r5, #28]
   0x403cb12c:	ldr	sp, [r5, #16]
   0x403cb130:	sub	sp, sp, #0
   0x403cb134:	ldm	sp, {r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
   0x403cb138:	add	sp, sp, #40	; 0x28
   0x403cb13c:	mov	r0, #0
   0x403cb140:	bx	lr

Would like to know more about the stack alignment requirements - not currently pushing a multiple of 8 bytes.

The FP register saves appears broken, exposing the state to corruption on the stack, but this might be a more general issue with PopRegsInMaskIgnore and/or transferMultipleByRuns.
(In reply to Douglas Crosher from comment #19)
> Created attachment 727068 [details] [diff] [review]
> ARM Callback exit part fix.
> 
> This patch attempts to balance the stack pushes and pops so that it is
> preserved the stack upon return.  This appears to have stopped the crashes
> in the timeout tests. The emitted code is rather convoluted and needs more
> work.  Add some comments on areas still to be explored, in case someone
> beats me to it:
> 
>    0x403cb090:	b	0x403cb098            << are these needed?
>    0x403cb094:	bkpt	0x012c                <<
not needed, just for debugging, quite easy to remove.
>    0x403cb098:	sub	sp, sp, #4                                               
> <<< modified
>    0x403cb09c:	b	0x403cb0a4            << are these needed?
>    0x403cb0a0:	bkpt	0x012d                <<
ditto.
>    0x403cb0a4:	sub	sp, sp, #56	; 0x38                                       
> <<< modified
>    0x403cb0a8:	stm	sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11,
> r12, lr}  <<< modified. Replace with a push
IIRC, nrc has a patch that does this.  I'll look into it asap.  He wrote the code to use stm rather than pushing each register individually.
>    0x403cb0ac:	sub	sp, sp, #0             << redundant?
should also get removed with nrc's patch.
>    0x403cb0b0:	mrs	r0, CPSR
>    0x403cb0b4:	vmrs	r1, fpscr
>    0x403cb0b8:	strd	r0, [sp, #-12]!
>    0x403cb0bc:	movw	r0, #0
>    0x403cb0c0:	movt	r0, #16450	; 0x4042
>    0x403cb0c4:	ldr	r0, [r0, #28]
>    0x403cb0c8:	ldr	r1, [r0, #24]
>    0x403cb0cc:	str	r1, [sp, #68]	; 0x44
>    0x403cb0d0:	ldr	r0, [r0]
>    0x403cb0d4:	b	0x403cb0dc           << are these needed?
>    0x403cb0d8:	bkpt	0x012e               <<
same debugging code
>    0x403cb0dc:	sub	sp, sp, #128	; 0x80             << 
>    0x403cb0e0:	vstmia	sp!, {d0-d15}                      << looks broken.
> Just use vpop?
>    0x403cb0e4:	sub	sp, sp, #128	; 0x80             <<
I'm pretty sure it is doing the right thing, just in a silly way.  What do you think it is doing incorrectly?
>    0x403cb0e8:	movw	r12, #22885	; 0x5965
>    0x403cb0ec:	movt	r12, #3
>    0x403cb0f0:	blx	r12
> 
>    0x403cb0f4:	cmp	r0, #0
>    0x403cb0f8:	beq	0x403cb120
>    0x403cb0fc:	vpop	{d0-d15}
>    0x403cb100:	sub	sp, sp, #128	; 0x80            << redundant?
>    0x403cb104:	add	sp, sp, #128	; 0x80            <<
>    0x403cb108:	ldrd	r0, [sp], #12
>    0x403cb10c:	vmsr	fpscr, r1
>    0x403cb110:	msr	CPSR_fs, r0
>    0x403cb114:	pop	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12,
> lr}    <<< modified
>    0x403cb118:	pop	{pc}                                                     
> <<< modified
we should be able to pop everything at once.  no need to pop the pc separately.
>    0x403cb11c:	nop	{0}
> 
>     // Is there test coverage for this code path?  A throw?
>    0x403cb120:	movw	r5, #0
>    0x403cb124:	movt	r5, #16450	; 0x4042
>    0x403cb128:	ldr	r5, [r5, #28]
>    0x403cb12c:	ldr	sp, [r5, #16]
>    0x403cb130:	sub	sp, sp, #0
>    0x403cb134:	ldm	sp, {r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
>    0x403cb138:	add	sp, sp, #40	; 0x28
>    0x403cb13c:	mov	r0, #0
>    0x403cb140:	bx	lr
> 
> Would like to know more about the stack alignment requirements - not
> currently pushing a multiple of 8 bytes.
> 
> The FP register saves appears broken, exposing the state to corruption on
> the stack, but this might be a more general issue with PopRegsInMaskIgnore
> and/or transferMultipleByRuns.
(In reply to Marty Rosenberg [:mjrosenb] from comment #20)
> (In reply to Douglas Crosher from comment #19)
> > Created attachment 727068 [details] [diff] [review]
> > ARM Callback exit part fix.
...
> >    0x403cb0dc:	sub	sp, sp, #128	; 0x80             << 
> >    0x403cb0e0:	vstmia	sp!, {d0-d15}                      << looks broken.
> > Just use vpop?
> >    0x403cb0e4:	sub	sp, sp, #128	; 0x80             <<
> I'm pretty sure it is doing the right thing, just in a silly way.  What do
> you think it is doing incorrectly?

The 'vstmia sp!, {d0-d15}' instruction increases the sp above the data
it saves, so if the code is interrupted before the next instruction then
the state may be corrupted.  Could use a 'vpush' (not vpop sorry),
and it would replace all three instructions and avoid exposing the
state on the stack.

...
> >    0x403cb114:	pop	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12,
> > lr}    <<< modified
> >    0x403cb118:	pop	{pc}                                                     
> > <<< modified
> we should be able to pop everything at once.  no need to pop the pc
> separately.

Yes. The ARM manual states 'ARM instructions that include both the LR and
the PC in the list are deprecated', so it would be wise to separate them.
Your call.
Whiteboard: [gdc2013]
Fixed about 90% of luke's complaints, and rolled in the callback exit fix.
Attachment #724264 - Attachment is obsolete: true
Attachment #726260 - Attachment is obsolete: true
Attachment #727068 - Attachment is obsolete: true
Try run for b997bed88738 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=b997bed88738
Results (out of 23 total builds):
    warnings: 1
    failure: 22
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mrosenberg@mozilla.com-b997bed88738
Here's a possible patch to correct the FP register push sequence which exposed the state on the stack to corruption.  The patch also optimizes the instruction sequences, making use of the ability of the instructions to update the destination pointer.  This patch could stand on it's own, so let me know if a separate bug should be submitted for it?

With the latest patch set plus this patch, the callback exit becomes:

   0x40124070:	push	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr, pc}
   0x40124074:	mrs	r0, CPSR
   0x40124078:	vmrs	r1, fpscr
   0x4012407c:	strd	r0, [sp, #-12]!
   0x40124080:	movw	r0, #0
   0x40124084:	movt	r0, #16450	; 0x4042
   0x40124088:	ldr	r0, [r0, #28]
   0x4012408c:	ldr	r1, [r0, #24]
   0x40124090:	str	r1, [sp, #68]	; 0x44
   0x40124094:	ldr	r0, [r0]
   0x40124098:	vpush	{d0-d15}
   0x4012409c:	movw	r12, #23229	; 0x5abd
   0x401240a0:	movt	r12, #3
   0x401240a4:	blx	r12

   0x401240a8:	cmp	r0, #0
   0x401240ac:	beq	0x401240c8
   0x401240b0:	vpop	{d0-d15}
   0x401240b4:	ldrd	r0, [sp], #12
   0x401240b8:	vmsr	fpscr, r1
   0x401240bc:	msr	CPSR_fs, r0
   0x401240c0:	pop	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
   0x401240c4:	pop	{pc}		; (ldr pc, [sp], #4)

   0x401240c8:	movw	r5, #0
   0x401240cc:	movt	r5, #16450	; 0x4042
   0x401240d0:	ldr	r5, [r5, #28]
   0x401240d4:	ldr	sp, [r5, #16]
   0x401240d8:	pop	{r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
   0x401240dc:	mov	r0, #0
   0x401240e0:	bx	lr
In case it helps others, here's my set of patches against nightly for building the Android Fennec ARM with the asm.js support.  It is seems to be improving in reliability as more test code here is working without crashes.  The performance improvement is great for some code, over 10x faster than Ion nightly.
Here's another version of the callback exit that takes care to align the
stack. Code could be interrupted without the stack aligned, so it
might be necessary to align it here.  The ma_msr instruction emitter
was also broken, except when using r0.

The ARM specific code:
    masm.setFramePushed(0);         // set to zero so we can use masm.framePushed() below
    masm.PushRegsInMask(RegisterSet(GeneralRegisterSet(Registers::AllMask & ~(1<<Registers::sp)), FloatRegisterSet(uint32_t(0))));   // save all GP registers,excep sp

    // Save both the APSR and FPSCR in non-volatile registers.
    masm.as_mrs(r4);
    masm.as_vmrs(r5);
    // Save the stack pointer in a non-volatile register.
    masm.mov(sp,r6);
    // Align the stack.
    masm.ma_and(Imm32(~7), sp, sp);

    // Store resumePC into the return PC stack slot.
    LoadAsmJSActivationIntoRegister(masm, IntArgReg0);
    masm.loadPtr(Address(IntArgReg0, AsmJSActivation::offsetOfResumePC()), IntArgReg1);
    masm.storePtr(IntArgReg1, Address(r6, 14 * sizeof(uint32_t*)));

    // argument 0: cx
    masm.loadPtr(Address(IntArgReg0, AsmJSActivation::offsetOfContext()), IntArgReg0);

    masm.PushRegsInMask(RegisterSet(GeneralRegisterSet(0), FloatRegisterSet(FloatRegisters::AllMask)));   // save all FP registers
    JSBool (*pf)(JSContext*) = js_HandleExecutionInterrupt;
    masm.call(ImmWord(JS_FUNC_TO_DATA_PTR(void*, pf)));
    masm.branchTest32(Assembler::Zero, ReturnReg, ReturnReg, throwLabel);

    // Restore the machine state to before the interrupt. this will set the pc!
    masm.PopRegsInMask(RegisterSet(GeneralRegisterSet(0), FloatRegisterSet(FloatRegisters::AllMask)));   // restore all FP registers
    masm.mov(r6,sp);
    masm.as_vmsr(r5);
    masm.as_msr(r4);
    // Restore all GP registers
    masm.startDataTransferM(IsLoad, sp, IA, WriteBack);
    masm.transferReg(r0);
    masm.transferReg(r1);
    masm.transferReg(r2);
    masm.transferReg(r3);
    masm.transferReg(r4);
    masm.transferReg(r5);
    masm.transferReg(r6);
    masm.transferReg(r7);
    masm.transferReg(r8);
    masm.transferReg(r9);
    masm.transferReg(r10);
    masm.transferReg(r11);
    masm.transferReg(r12);
    masm.transferReg(lr);
    masm.finishDataTransfer();
    masm.ret();

Corrected as_msr:

BufferOffset
Assembler::as_msr(Register r, Condition c)
{
    // hardcode the 'mask' field to 0b11 for now.  it is bits 18 and 19, which are the two high bits of the 'c' in this constant.
    JS_ASSERT((r.code() & ~0xf) == 0);
    return writeInst(0x012cf000 | int(c) | r.code());
}


The resulting callback exit code becomes:
   0x40411070:	push	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr, pc}
   0x40411074:	mrs	r4, CPSR
   0x40411078:	vmrs	r5, fpscr
   0x4041107c:	mov	r6, sp
   0x40411080:	bic	sp, sp, #7
   0x40411084:	movw	r0, #0
   0x40411088:	movt	r0, #16466	; 0x4052
   0x4041108c:	ldr	r0, [r0, #28]
   0x40411090:	ldr	r1, [r0, #24]
   0x40411094:	str	r1, [r6, #56]	; 0x38
   0x40411098:	ldr	r0, [r0]
   0x4041109c:	vpush	{d0-d15}
   0x404110a0:	movw	r12, #23229	; 0x5abd
   0x404110a4:	movt	r12, #3
   0x404110a8:	blx	r12

   0x404110ac:	cmp	r0, #0
   0x404110b0:	beq	0x404110d0
   0x404110b4:	vpop	{d0-d15}
   0x404110b8:	mov	sp, r6
   0x404110bc:	vmsr	fpscr, r5
   0x404110c0:	msr	CPSR_fs, r4
   0x404110c4:	pop	{r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
   0x404110c8:	pop	{pc}		; (ldr pc, [sp], #4)
Comment on attachment 727258 [details] [diff] [review]
/home/mjrosenb/patches/addARMSupport-r4.patch

Not sure if Doug's most recent patch is this rebased or one of my previous patches rebased,  Either way, there shouldn't be a huge amount of variance between them.
Attachment #727258 - Flags: review?(luke)
Depends on: 853701
Comment on attachment 727258 [details] [diff] [review]
/home/mjrosenb/patches/addARMSupport-r4.patch

Perhaps this isn't the right patch; this one contains a bunch of printfs and I can see a lot of the comments from above aren't addressed.
Attachment #727258 - Flags: review?(luke)
Revised patch re-based to mozilla-central which now includes the fix from bug 849489.  This started out with addARMSupport-r4.patch plus: fixes to get it to compile; some further cleanups suggested in Luke's review; a reworked callback exit that aligns the stack; and Android support.
Attachment #727615 - Attachment is obsolete: true
Oh wow, thanks!  If you want slightly more rapid responses than bugzilla supports, feel free to hop onto IRC, I am basically always responsive on IRC.
This patch attempts to also address build issues for non-ARM systems, and completed a x64 build.

>     b/js/src/ion/shared/CodeGenerator-shared.h
...
-    inline int32_t ToStackOffset(const LAllocation *a) const {
     int32_t ToStackOffset(const LAllocation *a) const {

Reverted this change in Marty's patch.  Looks like a debug
mod and should not be committed.  Put it back if necessary.

    b/memory/mozalloc/mozalloc.h
...
-#define MOZALLOC_THROW_BAD_ALLOC_IF_HAS_EXCEPTIONS throw(std::bad_alloc)
 #define MOZALLOC_THROW_BAD_ALLOC_IF_HAS_EXCEPTIONS abort()

Reverted this change in Marty's earlier patches.  Was this
every necessary for some ARM asm.js builds?

>     b/js/src/ion/shared/IonAssemblerBuffer.h
...
 struct AssemblerBuffer
-  : public IonAllocPolicy
 {
   public:
-    AssemblerBuffer() : head(NULL), tail(NULL), m_oom(false), m_bail(false), bufferSize(0) {}
     AssemblerBuffer() : head(NULL), tail(NULL), m_oom(false), m_bail(false), bufferSize(0), LifoAlloc_(8192) {}
...
    b/js/src/ion/shared/IonAssemblerBufferWithConstantPools.h
...
-    Pool(int maxOffset_, int immSize_, int instSize_, int bias_, int alignment_,
     Pool(int maxOffset_, int immSize_, int instSize_, int bias_, int alignment_, LifoAlloc &LifoAlloc_,
...

Might the LifoAlloc changes have been a separate enhancement
or do they add infrastruture needed for the ARM asm.js support?

Might it be better to split them out?
Attachment #728598 - Attachment is obsolete: true
Update re-basing after bug 850070 landed.
Attachment #728617 - Attachment is obsolete: true
Sorry for the large amount of divergent development without rebasing.  I'll merge the changes and bugfixes into your rebased patch shortly.  This one passes all jit-tests, as well as the emscripten tests, including bananabread.
Attachment #727258 - Attachment is obsolete: true
got rebased, and it still passes all jit-tests, as well as bananabread.
Attachment #728859 - Attachment is obsolete: true
Attachment #728878 - Attachment is obsolete: true
Attachment #729115 - Flags: review?(luke)
Marty: could you give me the backstory on these changes to Load() and all the associated test changes?  On first glance, it seems like these should be in a separate bug...
Comment on attachment 729115 [details] [diff] [review]
/home/mjrosenb/patches/addARMSupoort-rebased-r0.patch

Review of attachment 729115 [details] [diff] [review]:
-----------------------------------------------------------------

Great work getting the Emscripten tests to pass!  I think it's almost done, I'd just like to see the below small fixes and see the answer to my question in comment 35.

::: js/src/ion/CodeGenerator.cpp
@@  5806,3 @@
>      JS_ASSERT((AlignmentAtPrologue   masm.framePushed()) % StackAlignment == 0);
>  #else
>      JS_ASSERT((masm.framePushed()) % StackAlignment == 0);

Could you change AlignmentAtPrologue/StackAlignment so that the original assertion is valid on ARM?

::: js/src/ion/LIR.cpp
@@  339,5 @@
>      JS_ASSERT(*from != *to);
>      for (size_t i = 0; i < moves_.length(); i  )
>          JS_ASSERT(*to != *moves_[i].to());
>  #if 0
>      if (!from->isGeneralReg() && ! from->isFloatReg())

rm

::: js/src/ion/arm/Lowering-arm.cpp
@@  50,5 @@
>  bool
>  LIRGeneratorARM::visitConstant(MConstant *ins)
>  {
>      if (ins->type() == MIRType_Double) {
>  #ifdef WHY_DO_I_NEED_THIS

rm

::: js/src/ion/arm/MacroAssembler-arm.h
@@  1194,5 @@
>  
>      void lea(Operand addr, Register dest) {
>          ma_add(addr.baseReg(), Imm32(addr.disp()), dest);
>      }
>  #ifdef TODO

rm

::: js/src/ion/shared/CodeGenerator-shared.cpp
@@  75,1 @@
>              unsigned alignmentAtCall = AlignmentAtPrologue   frameDepth_;

Can you fix AlignmentAtPrologue/frameDepth etc so that forceAlign isn't necessary?  That is, it seems like that if you defined these with the right values, you'd get 'rem == 0' below.

::: js/src/ion/x64/MacroAssembler-x64.h
@@  940,5 @@
>          JS_ASSERT(nextInsn <= code   codeBytes);
>          uint8_t *target = code   codeBytes   globalDataOffset;
>          ((int32_t *)nextInsn)[-1] = target - nextInsn;
>      }
>  #if 0

rm
Attachment #729115 - Flags: review?(luke)
w.r.t. comment 35, that was me backing out sfink's changes, since they rendered me unable to run the shell version of bananbread at all.  Somehow or other, I managed to roll them up into my patch. They have been stripped out now.
Attachment #729115 - Attachment is obsolete: true
Attachment #729426 - Flags: review?(luke)
Seems to be running great, fantastic work!
Comment on attachment 729426 [details] [diff] [review]
/home/mjrosenb/patches/addARMSupoort-rebased-r1.patch

Review of attachment 729426 [details] [diff] [review]:
-----------------------------------------------------------------

Great work!

::: js/src/ion/AsmJS.cpp
@@  5365,5 @@
>      JS_ASSERT(masm.framePushed() == 0);
>  
>      masm.mov(Imm32(0), ReturnReg);
>      masm.abiret();
>  

rm extra \n
Attachment #729426 - Flags: review?(luke) → review
The ARM Android signal handler support re-based after bug 851880 landed.
Attachment #726389 - Attachment is obsolete: true
The stack alignment changes appear to have problems for the non-ARM ports.

* This change does not appear to preserve the original intent of the code.
The AlignmentAtPrologue was changed to 0 for the ARM, so perhaps this
was just a typo mistake and the original was intended to be maintained.

js/src/ion/AsmJS.cpp:

 bool
 CodeGenerator::visitAsmJSCall(LAsmJSCall *ins)
 ...
-    JS_ASSERT((AlignmentAtPrologue   masm.framePushed()) % StackAlignment == 0);
     JS_ASSERT((masm.framePushed()) % StackAlignment == 0);



* This change introduces AlignmentMidPrologue which is only defined for the ARM
but touched by all the ports.

js/src/ion/shared/CodeGenerator-shared.cpp

         if (gen->performsAsmJSCall() || forceAlign) {
             unsigned alignmentAtCall = AlignmentMidPrologue   frameDepth_;


On a separate matter, bug 854045 includes the Android ARM signal context
support and it is more comprehensive then the patch proposed here, so
could I suggest that bug 854045 be landed first.
This rebased patch set has a alternative native stack layout for the ARM asm.js support.  The approach used here is to expand the native frame size to 8 bytes: the 4 byte return pc, plus 4 bytes of unused fill.  The framePushed base then starts after this block and is aligned to an 8 byte boundary, and this simplifies some uses of the framePushed.  This also aligns the local stack on an 8 byte boundary and should speed access to double words in this area of the stack.  The AlignmentAtPrologue becomes zero for the ARM.

The patch makes the function 'adjustFrame' un-protected - perhaps a new function to allocate stack and set framePushed would be better.

There are other ways to address the issues.  The return pc slot could be managed by the Ion stack allocator, and then the 'unused fill' slot might be used.  Or the Ion stack allocator could be extended to account for the alignment.  Have some other patches to extend the ARM macro assembler to account for a framePushed alignment offset, etc.   However making the native frame size 8 bytes might be a simplest approach for now.

This patch set assumes that bug 854045 has landed so does not include the Android signal content support.

The Banana Benchmark completes when run from the shell, even with a debug build, but fails when run in the browser (fails in the browser even with asm.js disabled).  The Ammo benchmark runs for a non-debug build.
Attachment #727591 - Attachment is obsolete: true
Attachment #730548 - Attachment is obsolete: true
(In reply to Douglas Crosher [:dougc] from comment #42)
> The Banana Benchmark completes when run from the shell, even with a debug
> build, but fails when run in the browser (fails in the browser even with
> asm.js disabled).

This might be because it's trying to render, and it's likely not finding support for the proper compressed textures on your device (unless you're using a Tegra-powered device, such as a Nexus 7).  This all looks really encouraging; we should probably get things reviewed and land it, since it sounds like it's close enough that any latent bugs we can fix up after the landing, and we'll get much broader testing that way.  (They'll only appear in asm.js-using code anyway.)
I'm going to land the currently r 'ed patch, then doug, could you submit the stack fixes for review as a separate patch? thanks.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #43)
> (In reply to Douglas Crosher [:dougc] from comment #42)
> > The Banana Benchmark completes when run from the shell, even with a debug
> > build, but fails when run in the browser (fails in the browser even with
> > asm.js disabled).
> 
> This might be because it's trying to render, and it's likely not finding
> support for the proper compressed textures on your device (unless you're
> using a Tegra-powered device, such as a Nexus 7).  This all looks really
> encouraging; we should probably get things reviewed and land it, since it
> sounds like it's close enough that any latent bugs we can fix up after the
> landing, and we'll get much broader testing that way.  (They'll only appear
> in asm.js-using code anyway.)

The browser test can fail even using the 'headless' path.  It reports errors
loading the images from the decrunched blobs.  Moved the banana bench code
to a local server and added some checksums along the blob paths, but it works
now, so it needs further analysis.  Anyway it does not appear to be a
asm.js specific issue, so should not hold up asm.js, and I'll open a separate
bug if it can be narrowed down.
sorry, didn't land last night because doug pointed out that i'd totally break x86 and x64.
Depends on: 859580
https://hg.mozilla.org/mozilla-central/rev/8f3f965dc116
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: