async/await with IEnumerable in C#
-
Let's say I have a function like this:
public async Task<IEnumerable<Post>> GetPageFromLikesThreadAsync(int pageNumber) { // Implementation omitted }
And I also have a function like this:
public async Task<IEnumerable<Post>> GetAllPostsFromLikesThreadAsync() { var allPosts = new List<Post>(); for (int page = 0; ; page++) { var posts = await GetPageFromLikesThreadAsync(page); if (!posts.Any()) { return allPosts; } allPosts.AddRange(posts); } }
The second function has a problem: If the Likes thread has 8 billion pages of posts, all 8 billion of them will be loaded before it returns a value.
I can instead write the function without tasks:
public IEnumerable<Post> GetAllPostsFromLikesThread() { for (int page = 0; ; page++) { var posts = GetPageFromLikesThreadAsync(page).GetAwaiter().GetResult(); if (!posts.Any()) { yield break; } foreach (var post in posts) { yield return post; } } }
But this means that if I call the third function from an async method, the thread pool will hold the method iterating through the returned IEnumerable until it finishes since it's not using await anywhere. That means if there are enough concurrent reads of the likes thread, the thread pool might never get around to running the HTTP requests and just deadlock the program.
Here's a Go implementation of what I'm talking about:
In the Go implementation, none of the goroutines are taking up thread pool space while they're waiting.
I've heard suggestions like "use
IEnumerable<Task<T>>
", but that won't work because by the time the program knows if there's anotherTask<T>
in the IEnumerable, theTask<T>
has already resolved. It's basically just more complicated implementation of the third function.
-
@ben_lubar When you create the Task, mark it as LongRunning and it will get scheduled on a background thread instead of the thread pool.
-
@masonwheeler said in async/await with IEnumerable in C#:
@ben_lubar When you create the Task, mark it as LongRunning and it will get scheduled on a background thread instead of the thread pool.
It's not doing a lot of work, though. It's just doing a tiny bit of work and then waiting for another
Task
to resolve and then doing a tiny bit more work. Isn't this what the entire async/await system is intended to handle?
-
@ben_lubar If it's working with 8 billion elements, that's doing a lot of work. :P
-
@masonwheeler said in async/await with IEnumerable in C#:
@ben_lubar If it's working with 8 billion elements, that's doing a lot of work. :P
The point is that not all 8 billion elements are known at the same time. Also, which Task would be marked as LongRunning? The caller of the third function? That seems like it would be hard to keep track of.
-
@ben_lubar said in async/await with IEnumerable in C#:
Also, which Task would be marked as LongRunning?
The one that's holding up the threadpool.
-
You've stated what you don't like, but not why or what your goal actually is.
-
@Magus His goal is to come up with some way of making C# look bad in comparison to Go.
I've talked to him a lot off-forum and I'm 90% certain that is his goal.
-
You need to structure you code differently so you yield a task then process from there.
https://ctigeek.net/using-asyncawait-with-ienumerable-and-yield-return/
public void ProcessAccount(string accountNumber) { foreach (var accountTask in GetTasksForAccountWithSubAccounts(accountNumber)) { var account = accountTask.Result; // ... process the accounts.... } } //notice, no async keyword... public IEnumerable<Task<Account>> GetTasksForAccountWithSubAccounts(string accountNumber) { //get the parent account from the repo.... var parentAccountTask = accountRepository.GetAccount(accountNumber); yield return parentAccountTask; var parentAccount = parentAccountTask.Result; foreach (var childAccountNumber in parentAccount.ChildAccountNumbers) { //notice there is no await. we want to return the task, not the account. var childAccountTask = accountRepository.GetAccount(childAccountNumber); yield return childAccountTask; } }
So basically he yields each task instead of each result.
-
@blakeyrat said in async/await with IEnumerable in C#:
@Magus His goal is to come up with some way of making C# look bad in comparison to Go.
I've talked to him a lot off-forum and I'm 190% certain that is his goal.
ETTFTFM
-
I'm also not sure why it's suddenly a huge problem that
await
only returns upon completion of the task.Isn't that the whole point of async/await in the first place?
-
@Rhywden I am not sure why he wants to do this either with the code he has shown us.
-
I write C# code for a living. I've run into this exact problem several times. In fact, Microsoft has run into this problem and was also unable to solve it with the language's syntax for loops: https://github.com/Azure/azure-sdk-for-net/blob/3f736b5af3851ab99bbaa98483ae537de0d48cfb/src/SDKs/Batch/DataPlane/Azure.Batch/PagedEnumerableExtensions.cs
-
@blakeyrat said in async/await with IEnumerable in C#:
@Magus His goal is to come up with some way of making C# look bad in comparison to Go.
I've talked to him a lot off-forum and I'm 90% certain that is his goal.
...said Blakey in Coding Help section.
-
@Gąska said in async/await with IEnumerable in C#:
...said Blakey in Coding Help section.
I already told him exactly how to do this off-forum. The design I gave him is functionally identical to the Go code sample he provided me.
The only reason he's posting it here is because he's trying to make some point about how great Go is. He's not genuinely asking for help. We know this because he hasn't yet stated, either here or in the channel I was using to talk to him, what the holy fuck he's actually trying to accomplish.
So at BEST this is (what StackOverflow) calls a XY problem with a very stubborn asker. At worst it's just an attempt to score internetpointzzz for Go.
Since Ben knows what an XY problem is, I can only conclude it's the latter.
-
@blakeyrat said in async/await with IEnumerable in C#:
I already told him exactly how to do this off-forum. The design I gave him is functionally identical to the Go code sample he provided me.
You told me to use
BlockingCollection<T>
which does not use async/await and is exactly the same as my third function.
-
-
For those wanting to play with this with a more complete basis, try:
public class Post { int id { get; } string content { get; } public Post(int id, string content) { this.id = id; this.content = content; } } public IEnumerable<Post> GetPageFromLikesThread(int pageNumber) { var posts = new List<Post>(); if (pageNumber <= 1000) { for (int i = 0; i < 10; i++) { var id = pageNumber * 10 + i; posts.Add(new Post(id, id.ToString())); } } Thread.Sleep(1000); return posts; } public async Task<IEnumerable<Post>> GetPageFromLikesThreadAsync(int pageNumber) { var page = await Task.Run<IEnumerable<Post>>(() => GetPageFromLikesThread(pageNumber)); return page; }
-
@Dreikin yeh they have been making loads of improvements in this area.
-
@Dreikin so I guess the answer is "there's not currently a way to do it, but the people in charge of the language also want the feature I'm asking for, so it's quite probable that there will be a way to do it in the future".
Good enough for me.
-
@ben_lubar No the answer is there is a way of doing it but you have to re-structure your code.
-
You can make a stream of posts instead of loading everything in memory with the AsyncEnumerable library:
using System.Collections.Async; public IAsyncEnumerable<Post> GetAllPostsFromLikesThread() => new AsyncEnumerable<Post>( async yield => { for (var page = 0; ; page++) { var posts = await GetPageFromLikesThreadAsync(page); foreach (var post in posts) await yield.ReturnAsync(post); if (!posts.Any()) break; } });
The GitHub page has examples on how to consume an
IAsyncEnumerable<T>
. You can install the library with the NuGet package.
-
@sergiis that looks promising, but because it's not integrated into the language,
yield.Break();
doesn't actually stop the execution of the current function.In Go, it could implement that using
runtime.Goexit()
, but if you're using Go anyway, you could just return a<-chan Post
anyway.
-
@ben_lubar Yes, but then you'd have Go on you!
-
@masonwheeler I wonder how hard it would be to make an MSIL compiler backend for Go.
-
@ben_lubar Probably mostly simple, but with plenty of awkwardness in making the runtime right (and you'd have to avoid directly calling other languages' code due to the impedance mismatches with exception semantics).
-
@ben_lubar Massively difficult, because it would need to be able to adapt to the CLR type system at the very least, which is very different from Go's type system, just to do the bare minimum. To make it useful, it would also need to be able to import the CLR type system, (ie. to interoperate with non-Go CLR assemblies in order to use third party libraries,) which would require accepting concepts such as OOP and generics that the design of Go has actively rejected.
-
@dkf said in async/await with IEnumerable in C#:
@ben_lubar Probably mostly simple, but with plenty of awkwardness in making the runtime right (and you'd have to avoid directly calling other languages' code due to the impedance mismatches with exception semantics).
Not really - exceptions from foreign calls could be translated to either multi-value returns or panics. Panics that escape Go code and go to another language would probably be treated as exceptions anyway.
-
@ben_lubar said in async/await with IEnumerable in C#:
exceptions from foreign calls could be translated to either multi-value returns or panics
A multi-value return where the second value is the exception object (or
nil
) would work. You'd have to do it for everything (where the other side isn't directly participating in the Go shared hallucination) so it would be really annoying, but it would work. Also you'd need to bridge the type systems, but that's actually easier with appropriate value boxing.
-
(disclaimer: I have only browsed this discussion, not studied it)
It looks like you really want an async enumerator not an async enumeration. The former releases as each element is available, the latter only when it is known that there will be no more elements. Be very careful of examples on the web, as many of them are based on very old information.
The alternative (which I often prefer) is to implement an Observable Queue. This complete decouples the producer from the consumer...