OWL Web Language


  • Discourse touched me in a no-no place

    @masonwheeler said in OWL Web Language:

    There is no lock involved in atomic operations on an integer.

    0_1501599443398_16472814-da3c-4f8d-99f9-0a7d9cfbae6d-image.png

    The lock operation is at the machine level. There's a bunch of instructions that are dedicated to doing these operations, and these are definitely the cheapest way of making a decision between threads provided the threads don't need to make decisions about the same things at a time.


  • Impossible Mission - B

    @dkf Yeah, read the next paragraph after the one you quoted. Even when there :pendant:ically is technically a bus lock, there still is no bus lock the vast majority of times. Only in the vanishingly rare case where two cores are operating on the same data at the same time does that even become an issue at all.



  • @dkf Yeah, the refcount uses atomic increments/decrements, which are called "lock-free" because they don't use kernel locks (but in the x86 instruction set, this involves prefixing the instruction with lock).


  • Impossible Mission - B

    @wharrgarbl said in OWL Web Language:

    @dkf it isn't really mutable if the change made in another insance doesn't reflect on mine, is it?

    This question seems to be based on a fundamental misunderstanding of what COW means.

    The idea of mutable data is that any given value can change. Normally, in order to make this safe, every reference has to have its own unique copy of the value. This is just fine with things like ints, but strings can be arbitrarily large, and copying them around in memory all the time can be a massive performance bottleneck.

    COW is an optimization that essentially says, "strings are supposed to have value type semantics, but let's implement them as reference types to save on copying overhead." Every time you assign or pass a string to another place, it passes a reference and increments the ref count. The new version is allowed to pretend it has a mutable copy, even though what it really has is a reference. When you're done with each "copy", it decrements the ref count, and frees the string if the count falls to 0.

    The way value type semantics are implemented is that when you go to actually make changes within the string, the compiler inserts code that guarantees uniqueness of the string before any changes are made. After this code runs, you're guaranteed to have a unique copy of the string, with a reference count of 1, that you can safely modify without changing any of the other "copies" (references). And when this is done with atomic operations on an integer representing the ref count, there is no locking involved and no thread safety problems.

    If, on the other hand, two threads are trying to work on the same reference (a global string, for example,) then that's not a COW issue; that's a bog-standard race condition, and it needs to be handled with locks just like like any other race condition, the same as if it were an array of ints. Such a scenario is outside the scope of COW semantics. Saying "not thread safe because it's mutable" is therefore a meaningless tautology, whether or not there is COW semantics in play.



  • @masonwheeler said in OWL Web Language:

    If, on the other hand, two threads are trying to work on the same reference (a global string, for example,) then that's not a COW issue; that's a bog-standard race condition

    That doesn't happen with immutable strings, hence immutable is thread-safer.


  • Impossible Mission - B

    @wharrgarbl :rolleyes: See "meaningless tautology," above. Immutable strings also don't have the advantages that mutability gives.



  • @masonwheeler let's see:

    advantages:
    You can change them without creating a new one

    disadvantages:
    Cost of refcounting
    Not thread-safe

    I prefer the immutable ones. But like Dr. Dobbs said in the previously linked article, why not have both options?


  • Impossible Mission - B

    @wharrgarbl The linked article is full of crap, talking about how you need a mutex to implement COW in a thread-safe manner. It's true that doing it that way makes it perform badly, but absolutely wrong that that's necessary. (And a resource as well-regarded as Dr. Dobbs should have known better. By the time that article came out, Delphi had been doing it with atomic increments and decrements for several years already!)



  • @masonwheeler A memory barrier is still a cost you're paying that you didn't need too. The dealbreaker is the thread unsafety anyway.


  • Impossible Mission - B

    @wharrgarbl said in OWL Web Language:

    The dealbreaker is the thread unsafety anyway.

    Do you not see that this is exactly equivalent to saying "the problem with mutability is mutability"? It's a meaningless tautology.



  • @wharrgarbl pay attention

    advantages:
    if it's only yours, you can change it "for free", like mutable strings
    if it's not only yours, you can change it at the same cost as immutable strings
    thread-safe

    disadvantages:
    cost of refcounting



  • @twelvebaud A string that can change while you're iterating through it's characters isn't threadsafe



  • @wharrgarbl If you're the one iterating, that means you have a reference. Which means either you're the one changing, or the other thread notices "something else has a reference" and makes its own copy to change. Either way, you're still okay.



  • @twelvebaud You can't trust nobody will iterate it using directly a reference in a class attribute, so it's not safe. When we say something is whateversafe, it means it is still safe no matter whatever stupidity the programmer does.

    Also, welcome back.


  • Impossible Mission - B

    @wharrgarbl In that case, by your logic immutable strings aren't threadsafe either. What's to stop someone from replacing the entire string out from under you while you're iterating it?



  • @wharrgarbl Yes, someone can cast a spell to get a char * to it without maintaining an official reference for the lifetime of the pointer they get, which can screw you over. It can also screw them over too; if you release the managed string, it gets reclaimed, and their pointer is left dangling. Everyone has to obey the refcounting system or else bad things can happen, the same as nothing's holding a gun to the programmer's head to EnterCriticalSection() in lock-based code but they almost certainly should anyway. That's why most libraries for COW string have some kind of smart pointer magic to maintain that for you, so you actually need to do extra work to break them.

    And thanks. It looks like NodeBB has working ignore semantics now, so I can keep unwanted categories out of Unread and maintain Inbox Zero again.



  • ok, you're right


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    In this scenario, if some other thread tried to modify it, COW semantics means guaranteeing that it makes a copy first, and then modifies that copy. This one that we're iterating here remains unchanged.

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?



  • @masonwheeler said in OWL Web Language:

    There is no lock involved in atomic operations on an integer.

    In the .NET VM there is. (Well you can read the value atomically. You can't change it atomically. Incrementing an integer in .NET is not atomic.) Maybe it's not using some magical AMD64 instruction it should be, but them's the facts.

    It seems like the core of the discussion here might be:

    Mason "Spends Too Much Time On Stack Overflow" Wheeler:

    "The CPU can do this thing well"

    RacePro "Almost Always Factually Wrong When Posting Here" UK:

    "The .NET VM can't do this thing well"

    But it's apples and oranges. The CPU != the .NET VM. They have different instruction sets. And when .NET was designed, it was designed without any kind of atomic "increment integer" opcode. (Probably because it had to run on CPUs that also didn't have one.)



  • @wharrgarbl said in OWL Web Language:

    But like Dr. Dobbs said in the previously linked article, why not have both options?

    Simplicity is always better.



  • @boomzilla said in OWL Web Language:

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    Bad things, because reference counting isn't a thread-safety mechanism. More than one thread can access the same reference.

    So I guess @masonwheeler and @TwelveBaud are wrong.


  • Discourse touched me in a no-no place

    @masonwheeler said in OWL Web Language:

    And when this is done with atomic operations on an integer representing the ref count, there is no locking involved and no thread safety problems.

    I'm still not sure that this is correct. The issue is that you've got at some point to decide whether you are making a copy, but if you decide that you are OK to modify in-place then you need to be sure that nobody will acquire another reference while you are modifying the value. That's where everything becomes nasty in the multi-threaded case and why people decide “fuck it, I'm putting proper locks in”. It might be possible to do this without a full mutex, but it sure can't be done with a simple reference count as I'm fairly sure you can't use atomic operations to update two machine words in a single step in a complex fashion. (Thinking about the details of it makes my brain hurt.)

    BTW, CoW works wonderfully in the single-threaded case, and if you can corral your strings into per-thread arenas that they never leave, that works pretty well too.

    [EDIT]: Hmm, thinking a bit more about the multi-threaded reference management scenarios, I'm less certain that it is possible for a thread to get a reference unsafely. But I'm not sure that it is impossible, either. Erk.


  • Java Dev

    @boomzilla said in OWL Web Language:

    @masonwheeler said in OWL Web Language:

    In this scenario, if some other thread tried to modify it, COW semantics means guaranteeing that it makes a copy first, and then modifies that copy. This one that we're iterating here remains unchanged.

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    Then one of your threads is modifying a shared reference which seems like a bad idea to me.



  • @blakeyrat C# has the Interlocked class that does that


  • ♿ (Parody)

    @pleegwat said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    @masonwheeler said in OWL Web Language:

    In this scenario, if some other thread tried to modify it, COW semantics means guaranteeing that it makes a copy first, and then modifies that copy. This one that we're iterating here remains unchanged.

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    Then one of your threads is modifying a shared reference which seems like a bad idea to me.

    Right. And so the thing you meant to prevent via COW happens anyways.



  • @boomzilla said in OWL Web Language:

    @pleegwat said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    Then one of your threads is modifying a shared reference which seems like a bad idea to me.

    Right. And so the thing you meant to prevent via COW happens anyways.

    And then you go murder the guy who took a reference to the smart pointer across threads instead of copying it and "sending" the copy.


  • Impossible Mission - B

    @boomzilla said in OWL Web Language:

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    That would be a classic race condition, and the answer is "it depends on which code runs first." It's exactly equivalent to asking, "what happens if I copy an int while another thread is modifying it?" As I said, COW semantics doesn't protect against that kind of race condition; it protects against modifications in a string with a refcount >= 2 showing up in other references.



  • @medinoc said in OWL Web Language:

    And then you go murder the guy who took a reference to the smart pointer across threads instead of copying it and "sending" the copy.

    A class can have it's methods called by multiple threads, and they'll access it's own attributes trough the same references. There is no "sending".


  • Discourse touched me in a no-no place

    @masonwheeler said in OWL Web Language:

    As I said, COW semantics doesn't protect against that kind of race condition; it protects against modifications in a string with a refcount >= 2 showing up in other references.

    (My builds are going slow today…)

    Having thought about this a bit more, a reference count is OK to give the safety guarantees provided you're using interlocked accesses to a volatile counter (or whatever equivalents are necessary to make that work in the language of your choice). Constants simply have one or more references held by the runtime. However, there's no safe way to optimise any of the reference count management either; the test, increments and decrements all have to be done exactly as written, and operations with a bus interlock are quite a lot more expensive than ordinary integer manipulation in the first place, so this will hurt.

    In the single-threaded case, by contrast, reference count management can be subject to substantial compiler-driven optimisation.


  • Impossible Mission - B

    @dkf said in OWL Web Language:

    operations with a bus interlock are quite a lot more expensive than ordinary integer manipulation in the first place, so this will hurt.

    [citation needed]

    As I've pointed out several times, the "bus interlock" hasn't been expensive for several generations of CPUs now, except in the rare case where multiple cores actually hold copies of the data that's being operated on and need to be synchronized.


  • Impossible Mission - B

    @blakeyrat "Doesn't Spend Enough Time On StackOverflow Or Learning Basic Theory" said in OWL Web Language:

    @masonwheeler said in OWL Web Language:

    There is no lock involved in atomic operations on an integer.

    In the .NET VM there is. (Well you can read the value atomically. You can't change it atomically. Incrementing an integer in .NET is not atomic.) Maybe it's not using some magical AMD64 instruction it should be, but them's the facts.

    It seems like the core of the discussion here might be:

    Mason "Spends Too Much Time On Stack Overflow" Wheeler:

    "The CPU can do this thing well"

    RacePro "Almost Always Factually Wrong When Posting Here" UK:

    "The .NET VM can't do this thing well"

    But it's apples and oranges. The CPU != the .NET VM. They have different instruction sets. And when .NET was designed, it was designed without any kind of atomic "increment integer" opcode. (Probably because it had to run on CPUs that also didn't have one.)

    Perhaps, but these days, all CPUs powerful enough to run a .NET VM are multicore machines, which by necessity have atomic instructions in their instruction sets. As @wharrgarbl pointed out, atomic instructions may not be part of the CIL instruction set, but they're supported by the Interlocked class, which the JIT special-cases to translate to the appropriate CPU instruction for the architecture in question.


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    What happens if a thread increments the refcount to 2 while some other thread is in the process of modifying it?

    That would be a classic race condition, and the answer is "it depends on which code runs first." It's exactly equivalent to asking, "what happens if I copy an int while another thread is modifying it?" As I said, COW semantics doesn't protect against that kind of race condition; it protects against modifications in a string with a refcount >= 2 showing up in other references.

    Right...so...a bad idea for multithreaded systems.


  • Impossible Mission - B

    @boomzilla As I pointed out to @wharrgarbl a while ago, there's nothing specifically bad about this that you don't also get with immutable strings. What happens if one thread is trying to read an immutable string while another is replacing the reference with a different immutable string? You get the exact same race condition.



  • @masonwheeler You still have the problem that multiple threads can access the same reference.



  • @masonwheeler said in OWL Web Language:

    Perhaps, but these days, all CPUs powerful enough to run a .NET VM are multicore machines, which by necessity have atomic instructions in their instruction sets. As @wharrgarbl pointed out, atomic instructions may not be part of the CIL instruction set, but they're supported by the Interlocked class, which the JIT special-cases to translate to the appropriate CPU instruction for the architecture in question.

    I'm not defending anything, I'm just saying that's probably the cause of your guys' weird pointless stupid debate and arcane bullshittery. You were comparing applies to oranges with C# vs. Delphi.


  • Impossible Mission - B

    @wharrgarbl Yes, obviously. That's exactly what I said to @boomzilla. COW doesn't magically do away with classic race conditions. It was never intended to. Its purpose is to implement value semantics on top of a reference type implementation, and it does that in a guaranteed thread-safe manner.



  • @masonwheeler said in OWL Web Language:

    and it does that in a guaranteed thread-safe manner.

    No, it doesn't. You can have 1000 threads acessing the same reference, the refcount can still be 1 and you have 1000 threads modifying that string.


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @boomzilla As I pointed out to @wharrgarbl a while ago, there's nothing specifically bad about this that you don't also get with immutable strings. What happens if one thread is trying to read an immutable string while another is replacing the reference with a different immutable string? You get the exact same race condition.

    Um...no, that's not how immutable strings work. You end up with a new reference that's assigned to the variable or whatever at the place where it's being "changed."


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @wharrgarbl Yes, obviously. That's exactly what I said to @boomzilla. COW doesn't magically do away with classic race conditions. It was never intended to. Its purpose is to implement value semantics on top of a reference type implementation, and it does that in a guaranteed thread-safe manner.

    Except that I showed that it doesn't do that in a thread-safe manner. Unless it's only one thread.


  • Impossible Mission - B

    @boomzilla said in OWL Web Language:

    Um...no, that's not how immutable strings work. You end up with a new reference that's assigned to the variable or whatever at the place where it's being "changed."

    Thread 1 and thread 2 both have access to the same object. This object has a public string field (:pendant: or a trivial property, which the JIT turns into field access) that thread 1 is reading while thread 2 is assigning to it. That's exactly how it works. You have the exact same race condition as you would have with mutable strings.


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    Um...no, that's not how immutable strings work. You end up with a new reference that's assigned to the variable or whatever at the place where it's being "changed."

    Thread 1 and thread 2 both have access to the same object. This object has a public string field (:pendant: or a trivial property, which the JIT turns into field access) that thread 1 is reading while thread 2 is assigning to it. That's exactly how it works. You have the exact same race condition as you would have with mutable strings.

    Yes, but now you're talking about the mutability of the other object, not the string. If two objects held a reference to the same string, you can do whatever you want to one of the objects but it will never affect the string that the other object is referencing.


  • Impossible Mission - B

    @boomzilla ...which is completely different from the race condition that @wharrgarbl is talking about. Immutable strings are reference types with reference semantics. COW strings are reference types with value semantics, so the only way to set up this scenario is to have multiple threads looking at the same reference, either on a shared object or a global string. I was pointing out that if you set up the equivalent scenario with immutable strings, you get the exact same race condition.


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    ...which is completely different from the race condition that @wharrgarbl is talking about.

    I don't realy care what he's talking about.

    @masonwheeler said in OWL Web Language:

    I was pointing out that if you set up the equivalent scenario with immutable strings, you get the exact same race condition.

    I was just pointing out that COW can fail in a multi-threaded system in a way that immutable strings cannot.



  • @masonwheeler

    class foobar {
        cowstring member1;
        
        void dosomething(int k) {
            // here, member1 will be a single reference, with a refcount 1, being modified by a thousand threads
            member1.replace("foo", "bar" + k.toString());
        }  
    };
    
    foobar f = new foobar();
    f.member1 = "foo foo foo foo foo foo";
    
    for (int i = 0; i < 1000; i++) {
        startnewthread({
            f.dosomething(i);
        });
    }
    
    Write(f.member1);

  • Impossible Mission - B

    @boomzilla said in OWL Web Language:

    I was just pointing out that COW can fail in a multi-threaded system in a way that immutable strings cannot.

    And I was pointing out that they can't. If you set up the same scenario (two threads viewing the same reference) the failure mode is exactly the same for both of them: a classic race condition.


  • Impossible Mission - B

    @wharrgarbl Do you expect the result to be any different from the following scenario?

    class foobar {
        string member1;
        
        void dosomething(int k) {
            member1 = string.Replace(member1, "foo", "bar" + k.toString());
        }  
    };
    
    foobar f = new foobar();
    f.member1 = "foo foo foo foo foo foo";
    
    for (int i = 0; i < 1000; i++) {
        startnewthread({
            f.dosomething(i);
        });
    }
    

    Why or why not?


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    I was just pointing out that COW can fail in a multi-threaded system in a way that immutable strings cannot.

    And I was pointing out that they can't. If you set up the same scenario (two threads viewing the same reference) the failure mode is exactly the same for both of them: a classic race condition.

    No. I'll try to expand a bit to explain why you're wrong. The race condition is that one thread (A) holds a reference and starts to modify it. At that time, the reference is 1. At that point another thread (B) takes the reference and the count is incremented to 2. Now thread A has modified a string with a reference count of 2.

    That's a race condition that is impossible for immutable strings because a new string would have been created by A instead of modifying it in place.



  • @masonwheeler yes, because with immutable strings, member1 would be replaced in a single atomic operation

    With your suggested cow string implementation, the refcount will be 1, and all the threads will proceed to modify the string at the same time, without duplicating it, and you'll get some undefined behavior.


  • Impossible Mission - B

    @boomzilla said in OWL Web Language:

    That's a race condition that is impossible for immutable strings because a new string would have been created by A instead of modifying it in place.

    Wait. Which code are you talking about here?

    sharedString = modify(sharedString) or myString = modify(sharedString)? The difference is very significant.


  • ♿ (Parody)

    @masonwheeler said in OWL Web Language:

    @boomzilla said in OWL Web Language:

    That's a race condition that is impossible for immutable strings because a new string would have been created by A instead of modifying it in place.

    Wait. Which code are you talking about here?

    sharedString = modify(sharedString) or myString = modify(sharedString)? The difference is very significant.

    I wasn't talking about any particular code.


Log in to reply