Skip to content

.multicombine required for rbind & foreach reduce stage is generally quite slow #46

@locklin

Description

@locklin

In the documentation it indicates you don't need .multicombine=T when using foreach with .combine=rbind.

This is incorrect; trying to return an array without .multicombine=T produces an absurdly slow result.


registerDoMC(cores=8)

testFun  <- function(multicomb,n=64000) {
    out = foreach(com=1:n, .combine=rbind,.multicombine=multicomb) %dopar% {
        Sys.sleep(8/n)
        if(com==n) {
            print(paste("preparing to return last value at",strftime(Sys.time(),format="%H:%M:%S")))
        }
        return(rnorm(10))
    }
    print(paste("finished gathering my ",n,"arrays at",strftime(Sys.time(),format="%H:%M:%S"))) 
    nrow(out)
}

testFun(F)
[1] "preparing to return last value at 14:49:18"
[1] "finished gathering my  64000 arrays at 14:50:27"
[1] 64000

 testFun(T)
[1] "preparing to return last value at 14:47:10"
[1] "finished gathering my  64000 arrays at 14:47:14"
[1] 64000


Personally I think the result is bad regardless of .multicombine state; 4 seconds to stick 64000 rows together is absurd, even on a raspberry pi. But it gets horrendously bad without .multicombine -in fact for a similar problem (prop trading stuff instead of Sys.sleep) I clock 7 minutes to cons the 64000 rows into a report in the .multicombine=F situation. The actual task only takes 3 minutes. For .multicombine=T this task still takes 19 seconds to cons together the 64000 rows; acceptable for my uses but still nuts. It's a threadripper not a 6809.

FWIIW same thing happens when you ignore .combine and .multicombine and return it as a list. Are you guys doing some giant memory garbage collection before you return? if so that would make sense on fork based multicore doodads.

version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14)
nickname Race for Your Life

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions