-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 32b loads to set print strings #7545
Conversation
Instead of using either a series of etc_putc or setting a series of bytes one by one, use a simple macro to define 32b constants to build up strings. Saves ~30 bytes of program code in eboot
// 4 chars at a time. Smaller code than byte-wise assignment. | ||
uint32_t fmt[2]; | ||
fmt[0] = S('v', '%', '0', '8'); | ||
fmt[1] = S('x', '\n', 0, 0); | ||
ets_printf((const char*) fmt, ver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more readable solution I guess is:
#define TINYSTR(str) [__builtin_strlen(str) 1] = str
#define TINYCHARS(chars) [__builtin_strlen(chars)] = chars
and then,
const char fmt TINYSTR("vx\n");
ets_printf(fmt, ver);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, greater then 4 bytes string literals will be placed into RODATA (and memcpy
will copy them)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry,
#define S(str) ((uint32_t)(str"\0\0\0")[0] | ((uint32_t)(str"\0\0\0")[1] << 8) | ((uint32_t)(str"\0\0\0")[2] << 16) | ((uint32_t)(str"\0\0\0")[3] << 24))
and
const uint32_t fmt[] = { S("v"), S("x\n") };
ets_printf(fmt, ver);
are OK (now placed into .text
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there aren't any really good solutions. We don't allow RODATA because we don't have the C init stuff in the bootloader. Can't use PSTR because ets_printf can't handle PROGMEM byte accesses. GCC's multichar literal puts things in the wrong order for strings (endian issue). So we end up with either a 4-param or 1-param marco w/array indexing.
Honestly, I like the existing setup more than the string 4-chars because with the current macro, if you accidentally put in a >4 char element it will error (while the string[index] will silently allow it).
But, your idea to make it a const array to get in .text, that's pretty cool. Let me see what it does to the size, might buy 4-8 bytes per instance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, looks like that's not going to work. I see .rodata appear when I do const uint32_t s={S(),S()};
Back where we were...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, anyway, the string needs to be in RAM since it will be byte accessed. So if it was reduced to just a pointer to ICACHE (where eboot lives)/PROGMEM it would end up SEGV'ing. Has to be a copy to RAM step no matter what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, get to be quite weird definition...
( #include <assert.h>
needed )
#define S(x, str) \
do { \
const unsigned int l = (__builtin_strlen(str) 1 3) / 4; \
static_assert(sizeof(x) / sizeof(*x) == l); \
unsigned int i = 0; \
while(i < l) \
x[i] = ((uint32_t)(str"\0\0\0")[i * 4 0] | ((uint32_t)(str"\0\0\0")[i * 4 1] << 8) | ((uint32_t)(str"\0\0\0")[i * 4 2] << 16) | ((uint32_t)(str"\0\0\0")[i * 4 3] << 24)), i; \
} while(0)
and
void test(void) {
uint32_t fmt[3];
S(fmt, "foo bar baz");
extern int write(const void*);
write((const char*)fmt);
}
emits the below (good result)
.file "test.c"
.text
.literal_position
.literal .LC0, 544173926
.literal .LC1, 544366946
.literal .LC2, 8020322
.align 4
.global test
.type test, @function
test:
l32r a2, .LC0
addi sp, sp, -32
s32i.n a2, sp, 0
l32r a2, .LC1
s32i.n a0, sp, 28
s32i.n a2, sp, 4
l32r a2, .LC2
s32i.n a2, sp, 8
mov.n a2, sp
call0 write
l32i.n a0, sp, 28
addi sp, sp, 32
ret.n
.size test, .-test
.ident "GCC: (GNU) 10.1.0"
but this
void test(void) {
uint32_t fmt[4];
S(fmt, "foo bar baz "); /* 1 whitespace appended */
extern int write(const void*);
write((const char*)fmt);
}
emits bad...
.file "test.c"
.text
.section .rodata
.LC0:
.string "foo bar baz "
.string ""
.string ""
.string ""
.text
.literal_position
.literal .LC1, .LC0
.align 4
.global test
.type test, @function
test:
addi sp, sp, -32
l32r a3, .LC1
s32i.n a0, sp, 28
movi.n a4, 0
.L2:
l8ui a2, a3, 1
l8ui a5, a3, 2
slli a2, a2, 8
slli a5, a5, 16
or a2, a2, a5
l8ui a5, a3, 0
add.n a6, sp, a4
or a2, a2, a5
l8ui a5, a3, 3
addi.n a4, a4, 4
slli a5, a5, 24
or a2, a2, a5
s32i.n a2, a6, 0
addi.n a3, a3, 4
bnei a4, 16, .L2
mov.n a2, sp
call0 write
l32i.n a0, sp, 28
addi sp, sp, 32
ret.n
.size test, .-test
.ident "GCC: (GNU) 10.1.0"
@earlephilhower, At the moment I am having trouble remember the exact specifics; however, earlier this year (feels like years ago) I was able to get the Boot ROM loader to handle a special eboot, with two segments one for code and one for data which got loaded to DRAM. It also required changes to elft2bin.py. It was one of many tangents/attempts I went on when doing the HWDT Stack Dump. There were some issues in getting the checksum correct for each segment. Only the 1st segment is biased by 0xFE the additional segments use 0. My notes indicate there are also some payload length tweaks that are needed in creating the .bin. If you are interested I can hunt down and update the branch, I think there are changes in eboot and elft2bin.py since then that would need to be incorporated. It might take me a while, I am a bit distracted. |
Thanks, @mhightower83 . For now, though, I'd really like to just make it simple and save a few bytes because I know there are add'l changes coming in (fixed verify, eboot protect, OTA command in flash, etc.). The way it works now the ROM copies the first 4K to IRAM (per the flash copy structure which elf2bin.py generates and the ROM parses) and jumps to the start of code. In eboot.c it's definitely possible to manually copy from IRAM->DRAM for RODATA/etc., but that's additional code which eats up any savings. I don't remember if the ROM can do multiple region copies, bu if it does then we could do the same stuff for RODATA (just align start to 4K somewhere, not freely assigned as it would be normally). But, again, that's way more work than I'm thinking of right now, for little gain. |
@earlephilhower Yes, it does; however, I could not find it documented. I ended out using the empirical method to figure out what it wanted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does elf file need to be regenerated since aug24 ?
Hello, I think this should be:
I think the cast was using the actual text as the pointer value, not the location of fmt. Have I misunderstood something? |
Instead of using either a series of etc_putc or setting a series of bytes
one by one, use a simple macro to define 32b constants to build up strings.
Saves ~30 bytes of program code in eboot