It *seems* very trivial :/ I don't really do much C, but here's x86: // esi -&#6...

Agent101 · on May 27, 2010

Not much experienced with x86 asm, but you don't seem to be incrementing esi anywhere...

What am I missing?

jacquesm · on May 27, 2010

stosb = store sequential byte increments the destination pointer, just like lodsb increments the source pointer.

Agent101 · on May 27, 2010

Ah, thanks.

Also, eww. It decrements or increments dependent upon a flag. Not nice.

jacquesm · on May 27, 2010

x86 assembly is full of old cruft, that 'cld' there is a thing that if you've ever forgotten it probably cost you a lot of time.

99% of the time the flag is clear... unless it isn't.

So for a seasoned assembly programmer that cld is idiomatic, axod probably typed the instruction reflexively because he knows he can't rely on the state of the direction flag, even if it has nothing to do with the problem per-se.

JoachimSchipper · on May 27, 2010

Please don't post the solution, people would like to think for themselves.

axod · on May 27, 2010

The worrying thing is that you're suggesting there may be people on hackernews who can't bang this out in a few seconds :/

Does it even really require much thinking? I'm not trying to be arrogant here, but it seems like a good definition of "trivial".

jacquesm · on May 27, 2010

> The worrying thing is that you're suggesting there may be people on hackernews who can't bang this out in a few seconds :/

My sentiment exactly.

And by the way, you do have a bug in that code.

axod · on May 27, 2010

Really? give me a clue...?

jacquesm · on May 27, 2010

You are making some scary assumptions about the validity of that pointer...

JoachimSchipper · on May 27, 2010

Huh? This is C (well, asm). If you pass in garbage, you get undefined behaviour (a crash by NULL pointer dereference); this is how all str* functions work. Even

    assert(str != NULL)

is being overly generous, I think - if only because a debugger would lead you to the source of the crash immediately.

jacquesm · on May 27, 2010

No Huh, really.

You have not received any specification about the context in which the code operates, therefore the correct course of action is to assume the worse and produce bullet proof code.

If you were operating under constraints in terms of speed that would warrant a more cavalier attitude towards input checking then that would be another thing but since that wasn't specified and there is no context, 'safety first' is the way to approach the issue.

Your assertion would have done the job just fine.

bdonlan · on May 27, 2010

The specification states that "your routine should modify the given zero-terminated string in place". This implies that a zero-terminated string is passed in, and that it is mutable. If this is not the case, it is impossible to perform as specified; any behavior is wrong. An assertion error would be the ideal way to handle things; an immediate segfault is almost as good (and probably the practical thing to do if the string is immutable or not null-terminated). And of course, there are some error situations you simply can't detect - eg, not all cases of non-terminated strings are detectable, because you run into and mutate some other heap garbage before hitting an unmapped or immutable page.

jacquesm · on May 27, 2010

I think it is actually really cool that RoG did not specify that part of the problem because in real life that is exactly how stuff gets specified, and plenty of times that's where the trouble comes from. Asking to revise a spec so that it is explicit is usually a very good course of action.

Not all hardware will segfault when writing to NULL, the immutable should segfault and strings that are not null terminated are indeed hard to impossible to check for (pointer wrap..., after a lot of swapping, or a stack overwrite).

axod · on May 27, 2010

I didn't include all the boring error checking code, I included just the 'meat' ;) If you call it with valid inputs, it produces valid outputs.

IMHO That's not a 'bug'. That's a lack of boring gruntwork error checking.

jedbrown · on May 27, 2010

Also note that libc string functions generally seg-fault when passed invalid addresses (including NULL). Implicitly expecting more sophisticated error handling in this scenario is silly.

jacquesm · on May 27, 2010

So, you either go back to the source and ask for more info, or you take a stab at what is the right thing to do.

In this case that wasn't possible, so I assumed the worst, and I think that simply noting that there is a potential problem there would already score you points with the interviewer because they'd realize you spotted that the specification was incomplete, it didn't tell you what to do in case of invalid input.

JoachimSchipper · on May 27, 2010

I think this may be a cultural thing. C programmers are very much used to the standard not defining the behaviour of some things, e.g. what happens when you call strcmp(str, NULL). Segfaulting is expected here, and easy to track down.

(I would use some assert()s if a function could cause silent memory corruption; but this particular function either works (valid string), segfaults (NULL) or cannot detect a problem (stray pointer).)

jacquesm · on May 27, 2010

Yes, that's more or less the problem with lots of code.

I'm not saying your code is invalid or that it won't work, but 'defensive' programming says:

- assume your input is total garbage

- handle all correct situations correctly

- handle all garbage gracefully or throw an error depending on what the circumstances dictate

I'm also missing the stackframe handling, but as you said, this is the 'meat', that meat needs a bit of scaffolding to work.

axod · on May 27, 2010

Oh definitely. I agree. I assumed though that the "problem" itself was the main aim here, not testing if someone can write scaffolding error checking code.

How far do you go? Check that the memory isn't mapped as read only? ;) Check that there's actually some memory mapped into that address space?

Obviously you need both skills, but they're pretty related I'd say. If you can solve low level problems you can probably think through any potential issues, invalid inputs, edge cases etc and churn out code to deal with them.

I guess my issue is I don't see what being able to solve this says about someone, apart from "isn't completely terrible at programming".

NickPollard · on May 27, 2010

Defensive programming is not always the correct assumption though.

jacquesm · on May 27, 2010

any assumption is wrong, but when given the choice between two assumption (safe vs unsafe) safe is best.

I did note in my 'answer' that the problem wasn't specified fully.

tsally · on May 27, 2010

The result of the program should always be a valid string. ;) Tried to be as ambiguous as I could, not really sure of another way to say it.

axod · on May 27, 2010

"valid string" ? :/ can you give me an example of an invalid string?

It correctly tacks a 0 byte on the end if that's what you mean, of course cleaning up the unused space would be outside of the task at hand.

Maybe I'll look like a complete idiot soon if it transpires I've completely missed the 'gotcha' ;)

RiderOfGiraffes · on May 27, 2010

I'd be really interested to see your original version and your latest version side-by-side, because I'm wondering if your "clean-up" actually changed its behavior.

axod · on May 27, 2010

Original code:

  // esi -> asciiz string
  // cl = character to remove

  mov edi, esi
  cld
  loop1:
      lodsb      // Load the byte from [esi] into al
      or al, al
       jz alldone
      cmp al, cl
       je loop1  // Skip this byte, since it's one we want to remove
      stosb      // Store the byte at [edi] from al
      jmp loop1  // Loop around for the next byte
  alldone:
      xor al, al
      stosb      // 0 terminate it

The only real behavioral change was that this version would cope with an input of cl=0. My cleaned up version would not cope with cl=0, but would need a sanity check for that.

RiderOfGiraffes · on May 27, 2010

Hmm. My memory must be faulty, but I don't remember the first one having the final xor and stosb. I do remember looking at your solution and thinking that you'd forgotten it, that's why I expressed an interest in seeing the original.

Still, likely my memory is at fault.

10ren · on May 27, 2010

heheh my initial version also explicitly added the 0, and then I also found a way to do it in the loop. Seems a common evolution.

But your final solution is clearer than mine. While it uses an extra variable (register), it avoids looking up the same thing twice. So... more efficient and clearer.

BTW: TIL char * a = "hello"; doesn't work in C (I think it doesn't allocate memory). I'm actually happy enough that I could solve this at all, not having used C for 20 years ...old

tsally · on May 27, 2010

Actually, I might just be tired. :-p I don't see anything wrong with axod's code. Just emailed you a fairly verbose C version. I hate writing production code...

j_baker · on May 27, 2010

It's not that it requires much thinking. It's just that it's not every day that most of us have to code something like this in C.

JoachimSchipper · on May 27, 2010

From another I made on this page:

> If you don't find any egregious bugs, I hereby claim it really is trivial.

Still, it may be a nice exercise.

xenophanes · on May 27, 2010

Are you saying that some people couldn't do the problem, but could actually read that Assembly?

jemfinch · on May 27, 2010

How many HNers do you think know x86 assembly so well that simply seeing the structure of the code gives away the answer? :)

jacquesm · on May 27, 2010

Even if you didn't know x86 at all you should still be able to figure out what it does by looking at it given the fact that the specification of what it does is only one sentence long.

jemfinch · on May 27, 2010

No doubt, but if my desire is not to see other answers, I'm not going to read the assembly closely enough to understand. Your objection makes more sense with indented programming languages whose indentation structure may give away more information.

axod · on May 27, 2010

It's 8 instructions. It's a single loop. It's moving stuff from one buffer to another and ignoring specific bytes.

What are you talking about? What's to understand?

jemfinch · on May 29, 2010

I can ignore x86 assembly just like I can ignore French: the graphical structure (i.e., indentation) of an x86 program gives away nothing, just like a paragraph of French gives me no immediate sense of its meaning. In either case, I could probably puzzle out the meaning with intense reading, but the mere presence of the text does not give it away.

My C solution, on the other hand, has a distinct graphical structure which would indicate, even to people for whom C isn't their "native language," a lot of the meaning of the text with merely a glance.

jonsen · on May 27, 2010

edit: condensed code a bit

Well well, the other version seemed to work ;-)

axod · on May 27, 2010

Wasn't quite as elegant though, and 10 instructions vs 8.

jonsen · on May 27, 2010

Elegance is in the eye of the beholder. I found your first version easier to grasp.

barrkel · on May 27, 2010

I think it's trickier than FizzBuzz, but not by a whole lot.

jacquesm · on May 27, 2010

I think it is actually simpler.

And the fizz-buzz problem has two structurally different solutions.