What (Who?) needs research

Mikael_Svahnberg

Background: I've been swamped in education in the past years, and have only recently found myself with the head above the water. I thus have time to start doing research again. What I did research on a few years back is likely no longer relevant (if it ever was), and so I need a fresh start. I could (I will) talk to our industry partners and see what research they need. I can also invent my own topic and then try to pawn that to industry/academia. Or, I can ask here. So:

Question: What software engineering challenges are you currently facing (or predicting) that you would like someone to have a further look into and try to solve?

loopback0

INB4 Discourse.

Helix

an efficient type safe language

Piko

Ada?

Helix

an modern efficient type safe language

Piko

Ada 2012?

Adynathos

(1) Efficient virtualization / process isolation

Cloud hosting is very popular and probably will continue to grow fast. However, the OSes we use have not really been designed for reliably running alot of untrusted code for multiple users.
The main solution is using VMs, but that harms performance.
On the other hand, Docker allows sharing of the OS kernel between containers, but may not be safe against untrusted code.
Maybe this problem requires a redesign of the OS kernel, with isolation and virtualization as the main focus.

Or maybe something similar to the idea at the end of this lecture.

dkf

@Adynathos said:

alot

dkf

@Mikael_Svahnberg said:

What software engineering challenges are you currently facing (or predicting) that you would like someone to have a further look into and try to solve?

Managing data remains challenging, especially in ways that make getting good metadata out of the real world so that one can go back later to figure out how all the bits and pieces relate to each other. We can solve this by forcing users to fill out a form that provides everything each time they save a file to the data repository, but the users detest this and either put in bad values or stop saving to the repository in the first place, both of which make the situation worse. So… how can we do better than that? How can we teach systems to infer more and ask less?

Yamikuronue

Static analysis tooling for Node.js. You can do wonderful analysis on C code, I'd love to be able to do the same in Node. The best attempt at complexity measures is unsupported anymore, so all we've got is linting.

accalia

@Yamikuronue said:

Static analysis tooling for ~~Node.js~~Javascript.

FTFY, we'd like it for browser Javascript as well, and the problem domains overlap to a very high degree.

but yeah. create a awesome new static analysis tool, or start maintaining and improving the existing one and you will be praised by many Node.js and web developers! (well, at least the subset of them that actually care about delivering quality products)

ijij

@dkf said:

@Mikael_Svahnberg said:
What software engineering challenges are you currently facing (or predicting) that you would like someone to have a further look into and try to solve?

Managing data remains challenging, especially in ways that make getting good metadata out of the real world so that one can go back later to figure out how all the bits and pieces relate to each other.

This . (a simple example being the meat behind the NULL thread)

"Risk-aware" optimization and simulation.

Behavior of distributed intelligent agents.

a sensor in my percolator to tell me the coffee I just brewed is bilge.

dkf

@accalia said:

(well, at least the subset of them that actually care about delivering quality products)

So… about 0.5% of them then?

accalia

sounds about right. that would put the relative percentage about on par with that of the developers in other languages.

Polygeekery

Is C# really just Java without the portability? Also, what part of that question annoyed Blakey the most? <INB4 Blakeyrant about Mono and Linux, etc.

dkf

@Polygeekery said:

Is C# really just Java without the portability?

Not quite, but they have a heck of a lot of similarities.

accalia

@dkf said:

Not quite, but they have a heck of a lot of similarities.

enough that my first job outside academia was at a C# shop. i'd never touched C# at that point. i was able to do the "lend me a book and i'll know it by monday" routine and was cranking out code on day one.

it wasn't good code because i was fresh from academia, but it was valid c# code.

i've gotten much better these last 5 years, but i still wouldn't call myself good. I'm just a hell of a lot better than past me, and not as good as future me.

FrostCat

@Polygeekery said:

Is C# really just Java without the portability?

No, it's also got a far better-designed standard library.

Magus

And it's also portable to all major operating systems.

TimeBandit

@Magus said:

And it's also portable to all ~~major~~Windows operating systems.

FTFY
[spoiler]Mono sucks big time[/spoiler]

Magus

I DON'T LIKE IT BECAUSE IT SUCKS BUT YOU DON'T NEED TO KNOW HOW OR WHY OR WHAT'S WRONG IT JUST SUCKS

No, I can do without your fix. Besides, there's also the CoreCLR or whatever, which works on everything and isn't mono.

TimeBandit

@Magus said:

Besides, there's also the CoreCLR or whatever, which works on everything and isn't mono.

That will probably improve things a lot.
Still, Mono is so much behind .NET it's not even funny.
And just go and try running ASP.NET on Mono and see how slow it is and how much resource it takes to render a simple page.

Magus

I'll admit it, my main experience with Mono was playing around with OpenGL, which it works really well for. That's where most of the development effort goes, since Xamarin is focused on mobile.

But in general, it works quite well on all operating systems. Unless you're implying that the only extent to which other operating systems matter is web servers?

TimeBandit

Since we are now in a world where Web app is a great deal, I believe that part to be of significance.

Never played with Mono for mobile, so I believe you that it works very well.

Will have to re-investigate when CoreCLR is more mature.

But... probably will not be that great since (quoting @blakeyrat) "Open Source Is Crap"

Magus

@TimeBandit said:

Since we are now in a world where Web app is a great deal, I believe that part to be of significance.

No one is saying it isn't. I'm saying that it isn't the only thing. Until now, they had to reverse-engineer it all, so some pain points were expected, but they still did a remarkably good job in general.

dkf

@Magus said:

But in general, it works quite well on all operating systems.

Did they get around to doing a replacement for WCF yet, or are developers who write server systems still restricted to Windows (or encouraged to use a different ecosystem)?

NTAuthority

@dkf said:

a replacement for WCF

but WCF is open source?

(oops, apparently just some weird magical modern client variant. silly microsoft)

Mikael_Svahnberg

Thanks for the suggestions. I let it simmer overnight without interefering in order to not influence you. Now I need to think about your suggestions and see how they hook into my areas of interest and previous research.

@Yamikuronue, @accalia: Revealing my ignorance on JavaScript, but is it the linting tools itself that are lacking, or is it research on the "best practices" for Javascript/node programming? I had a quick look, and it seems that a lot of the patterns that are flagged are "inherited" from C/C++. That can't be right, can it?

@ijij: Any particular aspect of the behaviour of distributed intelligent agents?

@ijij: Would you also mind elaborating on the "risk-aware" optimisation and simulation?

@dkf @ijij: The "managing data" is indeed a challenge. I will have to mull this one over a bit first before I come back with more questions.

dkf

@NTAuthority said:

but WCF is open source?

I thought that was only some parts of it. As I understand it (though my understanding is definitely dated) other critical bits are not.

dkf

@Mikael_Svahnberg said:

I will have to mull this one over a bit first before I come back with more questions.

Take your time. Take all the time you need. It's the area where I'm working, and I know we're nowhere near a great solution yet.

Yamikuronue

@Mikael_Svahnberg said:

is it the linting tools itself that are lacking, or is it research on the "best practices" for Javascript/node programming?

Yes.

The Node community likes to pretend that they've invented programming for the first time and no previous lessons apply. It would be great if someone who understands the lessons learned from C would apply some of that knowledge to Javascript to provide static analysis tools that don't suck.

dkf

@Yamikuronue said:

It would be great if someone who understands the lessons learned from C would apply some of that knowledge to Javascript to provide static analysis tools that don't suck.

It would be worth looking at other dynamic programming languages to see if they have static analysis tools that make sense. After all, even if Node's flavour of JS is fairly static, it's still a dynamic language in a way that C, C++, C# and Java really aren't.

blakeyrat

Part of the philosophy of writing unit tests is that you're supposed to write the test before the code.

The problem is you can't write the test until you have:

The data objects required
At least some vague awareness of what operations will need to be performed on those data objects

And you don't have those two things (in 99.9% of cases) until you actually start writing the actual code. And by the time you've fully established your data model and which operations need to be performed on it, uh... you've written the code, so now test-first is out the window.

It's a nasty chicken-and-egg problem that results either in tests having to be constantly rewritten (which is bad, because each rewrite could create a gap that didn't exist in the previous test version), or tests being skipped altogether except for trivial things that really don't need testing.

For the types of problems they put in software textbooks, unit testing is fine. For problems like, "implement a shared library that processes Federal healthcare spending account laws" even without considering the integration with the rest of the product, the test-first practice is complete garbage.

So solve that or whatever.

blakeyrat

@Yamikuronue said:

The Node community likes to pretend that they've invented programming for the first time and no previous lessons apply.

Ugh, that reminds me of the Ruby-on-Rails crap that was going on in 2003-2004 timeframe. I have no idea if that community eventually grew up or not, I did a couple projects in it and left, I couldn't take it anymore.

ijij

@Mikael_Svahnberg said:

@ijij: Any particular aspect of the behaviour of distributed intelligent agents?

The tractable example that came to mind was that of traffic signals.

Actually ;) back to step 0... inspiration: some of the best progress in walking robotics came not from cranking-up processing and calculating what to do but to use good algorithms for each of several legs. [citation needed]

Back to signals... often signals can work better as four-way stops, but need to transition to signals as traffic increases, and transition to different behaviors as traffic grows more, ... but as traffic really increases to a point the signals need to transition to work intelligently/cooperatively as a system of signals.

Is that behavior manageable and/or predictable?

@Mikael_Svahnberg said:

@ijij: Would you also mind elaborating on the "risk-aware" optimisation and simulation?

Two sides of this coin -

it should be easier to visualize and include stochastic data into optimization models (LP as start). See "Iowa is basically just a corn factory now" in the "Books" and "Meat is Delicious Murder" thread for an example... common-sense makes me suspicious of single-value solutions to optimizations
(much) More difficult - what are the unlikely values needed that lead to the worst or boundary case outcomes.

Simulation is obviously already running of off RNGs and so is already "risk"-ified... but can the simulation software inherently devote the most work towards the most information-rich parts of the solution?
That's broad enough to keep us employed forever ;) A concrete idea in that vein - think about Monte-Carlo simulation with better(more specific) distribution of the random variables

There is a current advertising campaign for Ford Trucks that ends with

"...is that all you want? EVERYTHING?!...
...what color?"

Dreikin

@ijij said:

Ford

@ijij said:

...what color?"

Any, as long as it's black.

@Mikael_Svahnberg said:

Question: What software engineering challenges are you currently facing (or predicting) that you would like someone to have a further look into and try to solve?

Well, I've been trying to factor these large integers efficiently...

More seriously: what @dkf said, since the only other major one I know about now (for me) is probably solved research-wise, and just needs better/more-widespread implementation:

Relational db+language that easily handles things like graphs and intervals, and that
Does not requiring so much verbosity for common queries (e.g., the ability to have the db store and do the table connections automatically. E.g., declare a.b_id -> b.id when setting up, and not having to specify that in later queries), and that
Does not require so much guess-and-checking for query plan optimization.

ben_lubar

Cool 2015?

dkf

@ijij said:

common-sense makes me suspicious of single-value solutions to optimizations

It should particularly make you suspicious of the weighting of the factors.

For anything complicated, you need to have a formula that combines the scoring for the different things and that formula will have a bunch of constants in there to scale the aspects of goodness to a single measure. (It might also have a bunch of other things, depending on what you're doing.) Those constants are often completely arbitrary, and that can make a colossal difference to “optimum” you find: someone who wants to slant the answer they get out can usually do it just fine by tweaking them.

You can't get an answer out in the first place without constructing such a formula. But you should beware.

dse

I know 3 instructions is slower (on average) than 1 (branch predictions and such make this more complex), but I do not know which one is more power efficient. I would like to have a -Op compiler flag to optimize for power.
With the advent of large non-volatile memory, data gets closer to the code. To truly take advantage of this may(?) need a new architecture and data structure; something that is not fully rooted in Von Neuman architecture and could resemble the neural pathways of the brain.

dkf

@dse said:

I know 3 instructions is slower (on average) than 1 (branch predictions and such make this more complex), but I do not know which one is more power efficient.

Yes and no. The most efficient solution is the one that causes the least cache churn and which involves the smallest fraction of the transistors on the CPU chip. Naïve calculations of numbers of instructions miss the importance of caches and ignore the fact that different instructions have very different costs.

Piko

Does that mean something? I'm not following

ScholRLEA

I assume Ben is referring to the Cool Programming Language, except that AFAICT there is no actual 2015 version. I'm guessing that he was unaware that there actually is an Ada 2012, and thought that you were making a joke.

dse

True, but I do not claim to know what is going to be the key, only that I would like to have a simple -Op than what currently we have. Power optimization now is a combination of black arts, some OS-level knobs and something similar to hand-optimization depending on the board. Perhaps compilers can do better knowing about target being Big.LITTLE and that I do not need OS scheduler for my Bluetooth Low energy and ...

ben_lubar

Cool 2015:

https://github.com/BenLubar/bit/blob/master/cmd/coolc/cool-manual.pdf

ijij

@dkf said:

You can't get an answer out in the first place without constructing such a formula.

There are... ways*... but since you usually need to explain yourself to someone... using weights is the "best" way. My preferred way to use the weights involves varying them to generate tradeoffs.

*Many of those ways only hide the underlying weighting...

dkf

@ijij said:

There are... ways...

Of course there are! But you still need to make a value judgement that aspect A is 3.7 times more important than aspect B but 4.2 times less important than aspect C. Or whatever. You can say that the ratios are 1:1:1 if you like, but that's making a very particular type of statement.

Weighting isn't wrong. It's absolutely necessary. But it's a prime point to apply shenanigans.

ijij

I've forgotten the details, it must have something to do with Duals, but memory says there are subtle ways.

In my line*, though, we need to explain our work to people who have no clue what we do, so weights are the only practical approach that can be explained/demonstrated...

*ie at the decision-maker level²

² ... somebody sells a solver for the π

dkf

@ijij said:

I've forgotten the details, it must have something to do with Duals, but memory says there are subtle ways.

You can do all sorts of tricks (and have to: sometimes you need to start with log scales or work in the reciprocal domain) but you still need a way to convert things to a common “goodness” scale, to say that gaining 3 points of A is worth giving up 4 points of B (or not). If you don't, you end up making bizarre decisions, such as giving up one bar of gold to get 3 wooden nickels.

Eldelshell

Distributed HTTP. The guys behind BitTorrent are trying to do this and there are other efforts but it would be great if you cracked that one.

Image recognition is a big one too, but it seems everyone is doing this plus AI.

loopback0

Clearly the most important thing to research at the moment is new forum software.