Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mach: add more n_type mask constants #56

Merged
merged 1 commit into from
Oct 19, 2017
Merged

mach: add more n_type mask constants #56

merged 1 commit into from
Oct 19, 2017

Conversation

philipc
Copy link
Collaborator

@philipc philipc commented Oct 19, 2017

I needed the N_STAB constant, because if any of those bits are set then the N_TYPE bits have different meanings.

I also added the other new constants for completeness. Names and comments came from nlist.h.

However, N_TYPE already exists as NLIST_TYPE_MASK. So not sure if you want to keep both, or delete one of these, or rename the new constants I'm adding to include MASK in their name.

I also updated the NList helper methods to use these constants for consistency. This includes a semantic change to is_global since it is valid to have both N_EXT and N_PEXT set.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipc thanks for theis! I thought I added these for some reason, weird.

LGTM.

Is it possible to test this somehow? I wonder if we can run the get_type, is_global, is_undefined on the macho object file test embedded?

}
/// Gets the str representation of the type of this symbol
pub fn type_str(&self) -> &'static str {
n_type_to_str(self.get_type())
}
/// Whether this symbol is global or not
pub fn is_global(&self) -> bool {
self.n_type & !NLIST_TYPE_MASK == NLIST_TYPE_GLOBAL
self.n_type & N_EXT != 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, actually I changed my mind; for masking, imho, should explicitly specify what it's equal to, not that it's not equal to 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, I think that what I really want to test is that any bit in the mask is set. But since it's a 1 bit mask I'm fine with changing this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think it's a bit mask, it's the expected value. It's hard to tell since the original stab.h is pretty underdocumented.

I also think that's why I wrote the above, because the globalness of the symbol is the first 4 bits (!NLIST_TYPE_MASK), and the type in the remaining.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not accept LLVM as an authority in this matter (actually on pretty much anything). However, nlist.h agrees with you.

Incidentally, this is some of the most ridiculous bit shifting shit I've ever seen take place in 8 bits, since if any bit is set in N_STAB (e.g., what you wrote, != 0), then the entire u8 is reinterpreted via stab.h, (that's where your 0x64 comes from, which made no sense to me, since 4 is in the lower 4 bits and should have technically been masked out and hence irrelevant to the field):

  • Only symbolic debugging entries have some of the N_STAB bits set and if any
  • of these bits are set then it is a symbolic debugging entry (a stab). In
  • which case then the values of the n_type field (the entire field) are given
  • in <mach-o/stab.h>

}
/// Whether this symbol is a symbolic debugging entry
pub fn is_stab(&self) -> bool {
self.n_type & N_STAB != 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here; if n_type is garbage values, it could accidentally be true.

E.g.:

if n_type accidentally is = 0x80, then 0xe0 & 0x80 != 0

This is actually a bad example, because the random semantics whoever defined for N_STAB says if any of the bits are set, it's a debugging entry, but its more uniform to specify what its equal to in both cases.

Also it makes it more readable imho

Copy link
Collaborator Author

@philipc philipc Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N_STAB is a mask. I want to ensure nothing in that mask is set. I don't understand what you want it changed to.

edit: actually I want the reverse, I want to ensure something is set, but I don't care what.

Copy link
Owner

@m4b m4b Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I want it set to self.n_type & N_STAB == N_STAB. In this case, it just so happens that its just what you want.

When I read a piece of code that is value & SOMETHING != 0, at a glance, without further context, I don't know if you're just checking it isn't equal to 0 (for whatever reason), or rather if you're specifically checking that it is equal to a specific value (that happens to be the mask, or the value on the right hand side).

These two things are not semantically equivalent, for the reason I gave above.

Masking values and checking they're != 0 is never clear to me. It won't happen in this case because N_STAB won't change its semantics, but if it did, the check could randomly become true since the test for != 0 is more permissive than ==

It's not too important, but that's essentially the gist of what I meant.

Also basically all bit masking in this lib follows that pattern

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, self.n_type & N_STAB == N_STAB is wrong. If any bits in N_STAB are set, then it is a stab. Eg if the value is 0x64 then it is a stab, and your suggested change won't match that. (0x64 is a value I picked out of the llvm-nm output for a stab).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also argue that using the value & BIT_PATTERN == BIT_PATTERN is dangerous in general. eg if BIT_PATTERN is 0x5, then value & BIT_PATTERN == BIT_PATTERN will match 0x5 and 0x7 and 0xff etc. What you should do is value & BIT_MASK == BIT_PATTERN. It also doesn't make sense to do value & BIT_MASK == BIT_MASK, you should always be checking against a specific pattern, not the complete mask.

In this case, N_STAB is a bit mask, not a bit pattern. So perhaps it would be better to add MASK to its name, but that means it is no longer named the same as in nlist.h.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now, I thought N_STAB was the expected value, my suggestion was completely wrong.

eg if BIT_PATTERN is 0x5, then value & BIT_PATTERN == BIT_PATTERN will match 0x5 and 0x7 and 0xff etc

It's not dangerous at all; or if it is, its just as dangerous as doing value & MASK != 0 or value & PATTERN != 0 or all the other stuff I see, but I don't know if its productive to argue about this anymore

Copy link
Collaborator Author

@philipc philipc Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is productive to argue this, because it's important to get this stuff right, so I'll give one more example, but can leave it at that if you want. The previous example I gave wasn't very good, so here's a better one using some values defined in goblin.

goblin defines these:

pub const BIND_OPCODE_MASK              : u8     = 0xF0;
pub const BIND_OPCODE_DONE              : Opcode = 0x00;
pub const BIND_OPCODE_SET_DYLIB_ORDINAL_IMM     : Opcode = 0x10;
pub const BIND_OPCODE_SET_DYLIB_ORDINAL_ULEB        : Opcode = 0x20;
pub const BIND_OPCODE_SET_DYLIB_SPECIAL_IMM     : Opcode = 0x30;

The following test is wrong because it will match any opcode at all:

opcode & BIND_OPCODE_DONE == BIND_OPCODE_DONE

The following is wrong because it will match BIND_OPCODE_SET_DYLIB_ORDINAL_IMM and BIND_OPCODE_SET_DYLIB_SPECIAL_IMM:

opcode & BIND_OPCODE_SET_DYLIB_ORDINAL_IMM == BIND_OPCODE_SET_DYLIB_ORDINAL_IMM

Both of these tests are wrong, because they are doing value & BIT_PATTERN == BIT_PATTERN.

The correct tests are:

opcode & BIND_OPCODE_MASK == BIND_OPCODE_DONE
opcode & BIND_OPCODE_MASK == BIND_OPCODE_SET_DYLIB_ORDINAL_IMM

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't like to be wrong, but I'm very wrong here.

What's weird is when I have a mask I use it of course (like the BIND_OPCODE_* implementation) but I guess I never internalized the actual semantics of what I was doing 😆

So I don't know what this line is doing anymore: https://github.com//m4b/goblin/blob/47d23ce9fdb5fc01b87a54658fdadff1fc26923e/src/mach/fat.rs#L114

I recommended it be changed to current from != 0, but I didn't review clearly (mostly because I hate bitmasking and find it tedious). :/

I believe this is now incorrect, e.g.:

  1000 = val
& 1100 = mask
--------
  1000 != mask

In the case of CPU_ARCH_ABI64, my suggestion accidentally also turns out to be true, since the mask is 0x01000000, i.e., the mask only has 1 bit. (my (incorrect) formula will always be true for masks of 1 bit)

I never really thought about the logic of bitmasking. So the moral of the story is if the mask covers more than 1 bit, we always need to check if the result is != 0.

I will now return to kindergarten

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't know what this line is doing anymore: https://github.com//m4b/goblin/blob/47d23ce9fdb5fc01b87a54658fdadff1fc26923e/src/mach/fat.rs#L114

That's not wrong, because it's a 1 bit field. I did a grep through the rest of the crate and didn't find any comparisons that aren't operating on 1 bit fields. This IMPORT_RVA_MASK_32 definition looks weird though, I'd expect it to be 0x7fff_ffff instead, but I'm not familiar with the format and haven't investigated it.

@m4b
Copy link
Owner

m4b commented Oct 19, 2017

This includes a semantic change to is_global since it is valid to have both N_EXT and N_PEXT set.

What does this mean exactly? I don't know what a private external symbol is. Is this common?

@philipc
Copy link
Collaborator Author

philipc commented Oct 19, 2017

What does this mean exactly? I don't know what a private external symbol is. Is this common?

I don't know either. I just know from looking at llvm code that it is possible to have both set.

@philipc
Copy link
Collaborator Author

philipc commented Oct 19, 2017

Is it possible to test this somehow? I wonder if we can run the get_type, is_global, is_undefined on the macho object file test embedded?

Since it's a fixed test file, I could test that symbol at specific entries have the expected values for these.

@m4b
Copy link
Owner

m4b commented Oct 19, 2017

ah, we do have a macho file, can add a simple test iterating the file somewhere here: https://github.com//m4b/goblin/blob/71a3810e9933a1f20515e7f5d76d11743e182f6a/tests/macho.rs#L79

@m4b
Copy link
Owner

m4b commented Oct 19, 2017

Oh, and can delete the old NLIST_TYPE_* stuff

@m4b
Copy link
Owner

m4b commented Oct 19, 2017

Also what file are you seeing stab entries in? I've never encountered a file with stab entries on OSX, and afaik, they were long deprecated in favor of dwarf?

@m4b m4b merged commit 2fff1b9 into m4b:master Oct 19, 2017
@m4b
Copy link
Owner

m4b commented Oct 19, 2017

Thank you for clarification and the PR!

@m4b
Copy link
Owner

m4b commented Oct 19, 2017

Need to fix CI, rustc has changed internal rlib format again :D

@philipc
Copy link
Collaborator Author

philipc commented Oct 19, 2017

Also what file are you seeing stab entries in? I've never encountered a file with stab entries on OSX, and afaik, they were long deprecated in favor of dwarf?

It's a rust binary that I pulled out of travis some time early this year. I don't have any local OSX to test on so not sure if that has changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants