Efficient base 30



  • Long time lurker. First time post.

    I need to convert decimals to base 30 serial numbers and back again.

    Whilst easy enough to do, I wondered what the most efficient way to do it would be.

    WTFs welcome.

    End language will be tcl but any code will be understandable I'm sure.

    Filed under: I'm ready for tcl bashing.


  • Notification Spam Recipient

    I don't know about efficient, but I would loop the number, dividing by base and spitting out the modulus each round as the next character until you reached zero.

    Edit: And obviously the reverse going the other way.


  • Java Dev

    Untested C, lacks basic safety checks:

    char glyphs[] = "0123456789ABCDEFGHIJKLMNOPQRST";
    
    void to_base(unsigned n, char * dst)
    {
        while( n > 0 )
        {
            *dst++ = glyphs[n % 30];
            n /= 30;
        }
        *dst = 0;
    }
    

    This will return least significant digit first, and generates empty string for zero, so you'll want to add checks for that. And of course some error checking, like destination buffer size, but since you'll use a different language anyway...


  • FoxDev

    @PleegWat I'd use "0123456789ABCDEFGHJKLMNPQRSTVX" to minimise the chances of ambiguity


  • Java Dev

    @RaceProUK I figured if @Ben_Warre is looking for base30 specifically he's already got an alphabet, so I couldn't be bothered figuring out a good one and used a placeholder.



  • @PleegWat said:

    @RaceProUK I figured if @Ben_Warre is looking for base30 specifically he's already got an alphabet, so I couldn't be bothered figuring out a good one and used a placeholder.

    Yep.


  • Banned

    If it were Base-32, you could probably (unlikely) use some weird bit fiddling to make it (marginally) faster. But with non-power-of-2 base, I think this is the only method.


  • Java Dev

    @Gąska For number formatting, it's likely marginal. The value of bit fiddling is in encoding data, as large blocks of binary data can be encoded in small chunks, rather than having to do modulo operations over the entire data block.
    For base-32, each block of 5 input bytes converts to 8 encoded characters.
    For base-64, each block of 3 input bytes converts to 4 encoded characters.

    Here base-64 is more compact, but base-32 is easier for humans to copy over, and does not require using both upper and lower case letters so may be compatible with more character sets.


  • Discourse touched me in a no-no place

    @Ben_Warre said:

    End language will be tcl but any code will be understandable I'm sure.

    Well, since it's base-30 there's no real efficient technique. You're stuck with div-and-mod, though since you're working with bignums there's no real limit there.

    # The alphabet that we're using. Had better be exactly 30 chars long!
    set alphabet "0123456789ABCDEFGHJKLMNPQRSTVX"
    
    proc toBase30 {number} {
        global alphabet
        if {$number == 0} {
            return [string index $alphabet 0]
        } elseif {$number < 0} {
            error "negative numbers not handled by this code"
        }
        set str ""
        while {$number > 0} {
            set val [expr {$number % 30}]
            set number [expr {$number / 30}]
            append str [string index $alphabet $val]
        }
        return [string reverse $str]
    }
    
    proc fromBase30 {str} {
        global alphabet
        set num 0
        foreach ch [split $str ""] {
            set idx [string first $ch $alphabet]
            if {$idx < 0} {
                error "illegal character \"$ch\""
            }
            set num [expr {$num * 30 + $idx}]
        }
        return $num
    }
    

    Seems to work when I test it:

    % toBase30 1234567890
    1LQ4N30
    % fromBase30 1LQ4N30
    1234567890
    % toBase30 9876543210987654321
    JHG0BBJ81GDLM
    % fromBase30 JHG0BBJ81GDLM
    9876543210987654321
    

Log in to reply