// esi -> asciiz string
// cl = character to remove
// NB cl=0 won't work well ;)
mov edi, esi
cld
loop1:
lodsb // Load the byte from [esi] into al
cmp al, cl
je loop1 // Skip this byte
stosb
or al, al
jnz loop1
x86 assembly is full of old cruft, that 'cld' there is a thing that if you've ever forgotten it probably cost you a lot of time.
99% of the time the flag is clear... unless it isn't.
So for a seasoned assembly programmer that cld is idiomatic, axod probably typed the instruction reflexively because he knows he can't rely on the state of the direction flag, even if it has nothing to do with the problem per-se.
Huh? This is C (well, asm). If you pass in garbage, you get undefined behaviour (a crash by NULL pointer dereference); this is how all str* functions work. Even
assert(str != NULL)
is being overly generous, I think - if only because a debugger would lead you to the source of the crash immediately.
You have not received any specification about the context in which the code operates, therefore the correct course of action is to assume the worse and produce bullet proof code.
If you were operating under constraints in terms of speed that would warrant a more cavalier attitude towards input checking then that would be another thing but since that wasn't specified and there is no context, 'safety first' is the way to approach the issue.
The specification states that "your routine should modify the given zero-terminated string in place". This implies that a zero-terminated string is passed in, and that it is mutable. If this is not the case, it is impossible to perform as specified; any behavior is wrong. An assertion error would be the ideal way to handle things; an immediate segfault is almost as good (and probably the practical thing to do if the string is immutable or not null-terminated). And of course, there are some error situations you simply can't detect - eg, not all cases of non-terminated strings are detectable, because you run into and mutate some other heap garbage before hitting an unmapped or immutable page.
I think it is actually really cool that RoG did not specify that part of the problem because in real life that is exactly how stuff gets specified, and plenty of times that's where the trouble comes from. Asking to revise a spec so that it is explicit is usually a very good course of action.
Not all hardware will segfault when writing to NULL, the immutable should segfault and strings that are not null terminated are indeed hard to impossible to check for (pointer wrap..., after a lot of swapping, or a stack overwrite).
Also note that libc string functions generally seg-fault when passed invalid addresses (including NULL). Implicitly expecting more sophisticated error handling in this scenario is silly.
So, you either go back to the source and ask for more info, or you take a stab at what is the right thing to do.
In this case that wasn't possible, so I assumed the worst, and I think that simply noting that there is a potential problem there would already score you points with the interviewer because they'd realize you spotted that the specification was incomplete, it didn't tell you what to do in case of invalid input.
I think this may be a cultural thing. C programmers are very much used to the standard not defining the behaviour of some things, e.g. what happens when you call strcmp(str, NULL). Segfaulting is expected here, and easy to track down.
(I would use some assert()s if a function could cause silent memory corruption; but this particular function either works (valid string), segfaults (NULL) or cannot detect a problem (stray pointer).)
Oh definitely. I agree. I assumed though that the "problem" itself was the main aim here, not testing if someone can write scaffolding error checking code.
How far do you go? Check that the memory isn't mapped as read only? ;) Check that there's actually some memory mapped into that address space?
Obviously you need both skills, but they're pretty related I'd say. If you can solve low level problems you can probably think through any potential issues, invalid inputs, edge cases etc and churn out code to deal with them.
I guess my issue is I don't see what being able to solve this says about someone, apart from "isn't completely terrible at programming".
I'd be really interested to see your original version and your latest version side-by-side, because I'm wondering if your "clean-up" actually changed its behavior.
// esi -> asciiz string
// cl = character to remove
mov edi, esi
cld
loop1:
lodsb // Load the byte from [esi] into al
or al, al
jz alldone
cmp al, cl
je loop1 // Skip this byte, since it's one we want to remove
stosb // Store the byte at [edi] from al
jmp loop1 // Loop around for the next byte
alldone:
xor al, al
stosb // 0 terminate it
The only real behavioral change was that this version would cope with an input of cl=0. My cleaned up version would not cope with cl=0, but would need a sanity check for that.
Hmm. My memory must be faulty, but I don't remember the first one having the final xor and stosb. I do remember looking at your solution and thinking that you'd forgotten it, that's why I expressed an interest in seeing the original.
heheh my initial version also explicitly added the 0, and then I also found a way to do it in the loop. Seems a common evolution.
But your final solution is clearer than mine. While it uses an extra variable (register), it avoids looking up the same thing twice. So... more efficient and clearer.
BTW: TIL char * a = "hello"; doesn't work in C (I think it doesn't allocate memory). I'm actually happy enough that I could solve this at all, not having used C for 20 years ...old
Actually, I might just be tired. :-p I don't see anything wrong with axod's code. Just emailed you a fairly verbose C version. I hate writing production code...
Even if you didn't know x86 at all you should still be able to figure out what it does by looking at it given the fact that the specification of what it does is only one sentence long.
No doubt, but if my desire is not to see other answers, I'm not going to read the assembly closely enough to understand. Your objection makes more sense with indented programming languages whose indentation structure may give away more information.
I can ignore x86 assembly just like I can ignore French: the graphical structure (i.e., indentation) of an x86 program gives away nothing, just like a paragraph of French gives me no immediate sense of its meaning. In either case, I could probably puzzle out the meaning with intense reading, but the mere presence of the text does not give it away.
My C solution, on the other hand, has a distinct graphical structure which would indicate, even to people for whom C isn't their "native language," a lot of the meaning of the text with merely a glance.
I don't really do much C, but here's x86:
Did I miss the "gotcha"s?edit: condensed code a bit