This set of OSS projects is fantastic, congrats to the CPP-AMP team and their bosses for making this happen!
Let me be the first to file a request - nonlinear optimization routines would be a huge addition.
In many scientific dataset analyses there are an embarassingly parallel set of calls to unconstrained or constrained optimization functions (Levenberg-Marquardt, genetic algorithms, etc), where an objective function is evaluated iteratively until some criterion
(goodness of fit) is reached. This is hard to get right on a data-parallel device like a GPU, but is inherently well-suited to the architecture. Conditional evaluation (branching paths) and thread termination/batching are the biggest challenges when moving
from CPU-oriented codes, but I'm sure these can be dealt with, and I believe there are already some examples out there.
Sorry if this request is in the wrong place; if so, please suggest a new home and I'll move it over.