Math formula wizardry

cartman82

And the value is 850000 (assuming that is the number of events), the result should be 9.850000? That would mean my first guess about treating "x" like a version number was correct, even though I thought that was the least likely to be correct.

I'm sure at this point Boomzilla is going to burst in here and tell me what an idiot I am for not using my telepathic powers to instantly understand this.

Bad idea: Ask an ambiguous question, challenge math nerds to participate, then disappear without providing additional information, and watch as the entire thread devolves into anarchy.

@Captain said:

More on Mobius transformations, if you want something fancy. Do an interpolation on a few of the points. The algorithm will yield a formula that uses two additions and a division to transform the input. Use that formula in your program.

https://ics.uci.edu/~eppstein/pubs/BerEpp-SODA-03.pdf

@asdf said:

Try cubic spline interpolation, it might be what you want. (I'm pretty sure Wolfram Alpha can do that for you.)

In case you've never heard of splines: You basically find cubic functions (instead of linear functions) for each interval, so you get a "smooth" piecewise-defined function.

Ok, I'm reading up on those.

But I really need something super-simple, to match my math skills. Like with step-by-step instructions and lots of pictures of cartoons encouraging me to push on.

^-- like that, only for cubic spline interpolation.

asdf

@cartman82 said:

But I really need something super-simple, to much my math skills.

I just checked: Unfortunately, there's no Khan Academy video explaining spline interpolation. The concept is pretty easy if you know what derivatives are. I'll try to find some video or text explaining it without too many formulas.

BTW: If linear interpolation is sufficient (i.e. if you don't care that the resulting function has bends and isn't "smooth"), then there's no need to use cubic splines. Just use the solution @Yamikuronue provided in this case.

Rhywden

@asdf said:

BTW: If linear interpolation is sufficient (i.e. if you don't care that the resulting function has bends and isn't "smooth"), then there's no need to use cubic splines. Just use the solution @Yamikuronue provided in this case.

It's also much easier to maintain and understand, later on. Fancy stuff might look nice but unless you're using it on a regular basis (or there's a real technical requirement like heightened calculation performance), you're better off with the bog-standard not-so-nice-to-look-at variant.

blakeyrat

@cartman82 said:

Bad idea: Ask an ambiguous question, challenge math nerds to participate,

I'm the exact opposite of a math nerd, I hate math, I don't have a college degree because I failed calculus multiple times. And I still couldn't figure out what the question was asking. FYI.

And you still haven't confirmed what "x" is, which is annoying the fuck out of me because I'm certain you didn't actually mean what you said. (Or, I misinterpreted which number was the "event" because the concept of events just kind of appeared out of nowhere.)

asdf

I just found this gem:

Monotone cubic interpolation - Wikipedia

Ready-to-use JavaScript code for generating a monotonous cubic spline function. Does that help?

swayde

http://spikedmath.com/comics/111-x.png

I couldn't find exactly the comic I was looking for - so here are the wrong ones

Rhywden

@asdf said:

I just found this gem:

Monotone cubic interpolation - Wikipedia

Ready-to-use JavaScript code for generating a monotonous cubic spline function. Does that help?

Nice. Not linear between the data points, though. :)

anotherusername

It said cubic. Anyway, if he wanted linear, I posted one.

asdf

@Rhywden said:

Nice. Not linear between the data points, though.

Well, if you want a linear piecewise interpolation, just use a linear piecewise interpolation, right? That's what @Yamikuronue and @anotherusername already suggested. I thought I'd present a more sophisticated alternative that results in a "smooth" function in case he wants/needs that.

cartman82

@blakeyrat said:

And you still haven't confirmed what "x" is, which is annoying the fuck out of me because I'm certain you didn't actually mean what you said. (Or, I misinterpreted which number was the "event" because the concept of events just kind of appeared out of nowhere.)

x as in the input into the equation? It's the large integer. The number of generated events. The output is float on a scale 0-10.

I don't understand what's unclear here.

@asdf said:

Ready-to-use JavaScript code for generating a monotonous cubic spline function. Does that help?

Oh cool! I'll give it a try.

@asdf said:

BTW: If linear interpolation is sufficient (i.e. if you don't care that the resulting function has bends and isn't "smooth"), then there's no need to use cubic splines. Just use the solution @Yamikuronue provided in this case.

It's not like the client is gonna check if the curve is smooth or anything. I can definitely get away with linear jagged graph. And that's what I'll probably do in the end.

But I'll still look over your and @Captain's fancy math stuff. This thing function reversal stuff seems like a good thing to know in general.

PJH

@swayde said:

I couldn't find exactly the comic I was looking for - so here are the wrong ones

Callum Lyon on Twitter

Captain

If you're interested in this stuff, consider learning about regressions next, as opposed to learning about lots of interpolation algorithms.

Regressions are the statistical "version" of an interpolation, where you approximate a function by finding functions in a class that minimizes some quantitiy (usually the square error, which is called a least-squares approximation). Regressions are good for when interpolations get too "wild" or are over-determined (i.e., there's no solution because of conflicting data points).

cartman82

@Captain said:

If you're interested in this stuff, consider learning about regressions next, as opposed to learning about lots of interpolation algorithms.

Frankly, I'm not. I'll never be a math guru of any description.

I'm mostly interested in ready made tools - formulas, equations - I can put in my toolbox and apply to practical problems. I know learning this with understanding would be preferable, but I just don't have mental capacity and/or interest for doing that.

Captain

Not trying to argue with you.

I'm just suggesting that there are two kinds of data: perfectly "clean" data that you can match perfectly (with an interpolation), and noisy data that you can only fit in a best (but not necessarily perfect) way. And that the keyword to look for in the latter case is "regression".

asdf

Just a short, basic explanation of what cubic spline interpolation is, since I couldn't find a good video about it:

You know N data points (x_i, y_i) that the resulting function must match.
Between each pair of successive data points, you define the resulting function as a different cubic function (a function of the form ax³ + bx² + cx + d).
Since you want the resulting function to be "smooth", you also define that the first and second deriative of two successive cubic functions should be the same at your data points.
Now you only need two additional requirements for what the function should do at both ends (x₀ and x_N) to have enough equations to calculate unambiguous cubic functions that fulfill these requirements. For example, that the second derivative at both ends is zero.

Then the magic starts (unless you want to manually calculate the coefficients with pen and paper). Basically, you play around with the formulas a bit to get a system of linear equations, which can easily be solved with an efficient algorithm.

That's what the JavaScript code does: Calculate the coefficients (a, b, c, d) for each of the cubic functions efficiently and return a JavaScript function which represents the resulting interpolation function (combination of all cubic functions).

blakeyrat

@cartman82 said:

x as in the input into the equation? It's the large integer. The number of generated events. The output is float on a scale 0-10.

I mean I think I know the answer, and it's the answer that other people are guiding you towards, but you never explicitly gave the requirement. (And in fact, you explicitly said at one point that the x was the number of events, which is clearly incorrect.)

The output float is designated as: "1.x"

I'm JUST ASKING WHAT THE X IS.

My first guess was that it was like a version number, so the x would be the number of events. Then you basically confirmed that, which blew my mind because I was (and still am) sure that was wrong.

Then Yami made a guess that the X was the number of events divided into the maximum of the particular part of the scale it was in. But I'm pretty sure that was also wrong, because it means you skip from 1.000 to 1.200.

Anyway, I'm not going to review the whole thread, the POINT IS YOU'VE NEVER FULLY SPECIFIED THE PROBLEM.

I guess we just assume my second (and apparently everybody else's) ass-pull guess is correct, since you seem to be responding to those posts positively. Whatever.

@cartman82 said:

I don't understand what's unclear here.

I don't understand what you don't understand. You typed "1.x". We know the x is a placeholder for something. We don't know what it's a placeholder for. Until we do, the problem can't be solved. (Or, rather, everybody's just working off of ass-pull assumptions.) WHAT IS THE X?

Captain

I don't understand what you don't understand. You typed "1.x". We know the x is a placeholder for something. We don't know what it's a placeholder for. Until we do, the problem can't be solved. (Or, rather, everybody's just working off of ass-pull assumptions.) WHAT IS THE X?

It's a placeholder for the function he doesn't know how to calculate.

He just used some funny notation. He is asking for an f like:

f(z) = 1.x(z)

(again, using slightly funny notation with regards to the decimal).

See, x is a function of y, the input to f. A solution to what x is determines a solution of f and vice-versa.

He asked his f (equivalently, x) to have certain properties. We've given him options that satisfy those properties. He can pick anyone he wants, or keep looking now that he knows a keyword for the topic.

PleegWat

I'd try to guide the client toward either a logarithmic approximation, or a pure scoring function with integer output. Those two are the least likely to confuse the users.

blakeyrat

I believe I understand what he wants now, what I'm complaining about is:

Not ONCE in this thread has he specified what he wants, and
At one point, he explicitly said the x was something he obviously does not want

I mean, solve the problem, sure, but just recognize that everybody in this thread is solving it based on ass-pull assumptions about what cartman82 wants. (And, if he wants what he said he wanted, which he obviously does not, then the solution was found ages ago.)

There's a huge communication problem here, is what I'm saying.

Captain

Yeah, logs are good. The first three terms of its Taylor expansion should be pretty alright.

Captain

There's a huge communication problem here, is what I'm saying.

Yeah, you don't get it.

He doesn't know exactly what he wants. And he asked for general methods for making solutions for things that are scales.

blakeyrat

@Captain said:

He doesn't know exactly what he wants.

All I'm asking is what the output should be. Since we know it shouldn't be literally "1.x".

Surely he knows that much.

And for the record, yes, I know I don't get it, that's why I'm asking all these clarifying questions. That's what people do when they don't get something.

Captain

No he doesn't! He has some points to work from. And he was told "go make a scale out of that." So, there are some sensible properties a scale should have, like bigger inputs are bigger outputs (monotonicity), and smoothness is nice, and closely matching the points he was given; but also ease of calculating the what the solution should be given the data, and ease of implementing the solution. Which is why we suggested things like linear interpolations and Mobius transforms and cubic splines.

cartman82

@Captain said:

I'm just suggesting that there are two kinds of data: perfectly "clean" data that you can match perfectly (with an interpolation), and noisy data that you can only fit in a best (but not necessarily perfect) way. And that the keyword to look for in the latter case is "regression".

I understand what you mean.

In this use case, I would probably have to start removing datapoints until the cubic interpolation algorithm figures out a unique function generating exactly my points.

And the regression approach would be able to swallow some imperfections and find the best case solution.

@blakeyrat said:

The output float is designated as: "1.x"

I'm JUST ASKING WHAT THE X IS.

@blakeyrat said:

I don't understand what you don't understand. You typed "1.x". We know the x is a placeholder for something. We don't know what it's a placeholder for. Until we do, the problem can't be solved. (Or, rather, everybody's just working off of ass-pull assumptions.) WHAT IS THE X?

Oooooooooohhhhh.

That's what you were asking.

It's just a placeholder for decimals.
Like if I wrote 1.00-1.99

Was that really so obscure? It seems perfectly natural to me.

blakeyrat

Yes, yes, yes, I get all that.

But he also explicitly said the x in "1.x" referred to the number of events, which blatantly contradicts what you posted. That wouldn't meet any of your requirements for a "sensible scale".

Right?

So we're dealing with an OP who posts a really vague question a lot of people don't get, doesn't bother specifying what the output is, then when he does specify what the output is, what he specifies is clearly not what he actually wants.

Am I crazy here? Surely I'm not the only one who sees the problem here. Right?

blakeyrat

@cartman82 said:

Oooooooooohhhhh.

That's what you were asking.

Yeah, I was asking the thing I asked. What did you think I was asking?

@cartman82 said:

It's just a placeholder for decimals.Like if I wrote 1.00-1.99

Right; and we knew that because you specified the output was a float.

But before you clarified what you were asking, we had no way of knowing how to generate those decimals. And then you explicitly told me the decimal portion was the number of events, when really threw a wrench into my brain, but apparently that was because you thought when I asked "what is x?" I meant something entirely different than "what is x?"

@cartman82 said:

Was that really so obscure?

Obviously it was.

@cartman82 said:

It seems perfectly natural to me.

It wasn't to Yami, and it wasn't to me.

asdf

@blakeyrat said:

Not ONCE in this thread has he specified what he wants, and

That's why we provided different options that fulfill different requirements. @cartman82's problem was exactly that he didn't know what kind of interpolation function he needed, so we showed him some reasonable options.

@cartman82 said:

In this use case, I would probably have to start removing datapoints until the cubic interpolation algorithm figures out a unique function generating exactly my points.

No, cubic splines match the given data points exactly. That's the whole point of using splines (piecewise-defined functions) vs. using one interpolation function for the whole range.

blakeyrat

@asdf said:

That's why we provided different options that fulfill different requirements.

No; I mean he never specified what the output should be.

Look, whatever.

Captain

In this use case, I would probably have to start removing datapoints until the cubic interpolation algorithm figures out a unique function generating exactly my points.

And the regression approach would be able to swallow some imperfections and find the best case solution.

The cubic spline interpolation will automatically fit your data exactly, if you use all the data points. That's what makes interpolation good: it's useful when you can intuitively think that a small number of data points "represents" all of the data. But if the data is noisy (or there are lots of points), you can over-fit, and include noise in whatever model you're building.

Compare this phenomenon to https://en.wikipedia.org/wiki/Gibbs_phenomenon, which happens when you try to fit sine waves to square waves. (The cause of that phenomenon is slightly different, though, the picture is good.)

But yes, you have the right idea.

Edit: https://en.wikipedia.org/wiki/Runge's_phenomenon is the one for polynomial fitting.

asdf

OT: Goddammit, it's hard to talk about maths in English.

cartman82

@asdf said:

OT: Goddammit, it's hard to talk about maths in English.

Fuck yeah. My head hurts trying to figure out what different people think I asked, versus how they explain to others what I asked, versus what I actually asked. Also reading up on advanced math and trying to remember what are derivatives, also trying to match English terms everyone is using with the Serbian terms they taught me in school. Also 5 different associates decided this is the perfect time to ask me shit on Skype.

Summary: I should have just coded my ifs and kept my mouth shut.

asdf

@cartman82 said:

trying to remember what are derivatives

If all you want is a mental image:

First derivate: Slope of the function (how fast does it rise/fall)
Second derivative: Curvature of the function (how much does it bend to the left/right)

Formulas for both can be easily derived (pun intended) from the original function.

Circuitsoft

Since there doesn't seem to be too much consistency to the ranges, I think you're stuck with something like this:

Python:

def transf(x):
    if x < 0:
        return float('-inf');
    elif x < 100:
        return x/100.0
    elif x < 500:
        return 1 + (x-100)/400.0
    ...
    elif x < 1000000:
        return 9 + (x-800000)/200000

Circuitsoft

@cartman82 said:

The ranges are provided by the client. As far as I can tell, he pulled the numbers out of his ass.

Based on what I see, take the log of x, multiply it by some value (1,3,10,etc) to get numbers in a sort of useful range. Go back to the client and say "This table is much easier to generate. Is that close enough?"

blakeyrat

@cartman82 said:

Summary: I should have just coded my ifs and kept my mouth shut.

I believe I recommended that in post 5?

Maciejasjmj

@r10pez10 said:

Also, I know it's Coding Help and all, but I've always wanted to ask you something.

You do realize you can actually sometimes type in the box you drop pictures in, right?

FrostCat

@blakeyrat said:

I'm JUST ASKING WHAT THE X IS.

I can't quite decided if you're trolling inclusive-or a math dummy. To me it was immediately obvious: x is a decimal scaled to the input range[1]. It's been mentioned several times now but you keep arguing about it, which makes me lean towards "trolling".

[1] once again, that is, if the input number is between (say) 1000 and 2000, and maps to 3.x, then 1000 -> 3.0, 2000 -> 4.0 or 3.9999999999999, 1500 -> 3.5, and so on.

FrostCat

@blakeyrat said:

Obviously it was.

Hardly, math dummy/troll.

cartman82

This is what I went with in the end.

class CalculatorUtility {
    /**
     * $mappings should be in the following format:
     * [
     *  [1000, 1],
     *  [3000, 2],
     *  ...
     *  [1000000, 10] <- anything above also gets 10
     * ]
     * First number is the START of input range. Second number is the START of corresponding output range. Both inclusive.
     * The initial [0, 0] mapping is implied.
     *
     * @param int $input
     * @param array $mappings
     * @return float
     */
    public static function convertValueUsingMap($input, array $mappings) {
        $fromInput = 0;
        $fromOutput = 0;

        foreach ($mappings as $mapping) {
            $toInput = $mapping[0];
            if ($input >= $fromInput && $input < $toInput) {
                return $fromOutput + ($input - $fromInput) / ($toInput - $fromInput);
            }
            $fromInput = $toInput;
            $fromOutput = $mapping[1];
        }

        return end($mappings)[1];
    }
}

I'll still look into the math algorithms, but this goes into production. People who advised me to KISS were right.

I accepted @anotherusername, as he provided the closest solution to what I used in the end.

Thanks everyone.

Dragnslcr

As much as I hate defending the rat, it was a legitimate question. Yes, the most likely meaning was that the range should map linearly between the integers, like most people said. Given the bizarre ranges in the input, though, I wouldn't have been surprised if the output values were supposed to be on some equally bizarre scale. Would you have been all that surprised to find out that each range should end up on a logarithmic scale?

flabdablet

@cartman82 said:

I intentionally didn't include the real scale.

As I said, I'm looking for methodology, not a spoon-fed solution.

It would be good if you did. There's every chance that what looks like ass-pull to you might look like "oh yeah, it's that relationship" to somebody with more of a numerics background.

Edit: also, such numbers as you did give look like you'd get smoother results from an function that takes the log of the input number and then does a piecewise linear interpolation on the result.

CoyneTheDup

@Maciejasjmj said:

You do realize you can actually sometimes type in the box you drop pictures in, right?

Sometimes a picture is worth a thousand words.

Especially words by some of ...*... Never mind.