On 4/11/17 2:49 PM, Bruce Elliott wrote:
> This works well for memory usage - I'm now using about 10% of the memory
> previously used with the non-categorical data - but it's taking far
> longer process the data in the table. When I run the profiler I'm seeing
> a large amount of time being spent in the function cell.ismember and its
> child function cell.ismember>cellismemberR2012a.
>
> I'm not sure exactly where these functions are being called (I don't
> call them directly), but it appears to be related to comparison
> functions for some of the categorical data.
Bruce, it's hard to say what's going on without more specifics. Often
times, performance questions about tables come up because the code is
written with scalar loops that access one value or one row of the table
at a time. Sometimes that's unavoidable, but often it's possible (and
even more readbale) to vectorize operations -- that's what tables are
best at.
Even if you can't do that, it's often possible to "hoist" the variables
out of the table for the time critical code. If you put that code in a
separate function, the code becomes very natural-looking, nothing more
than calling a function with things like T.A, T.B, etc.
You've used the profiler to identify categorical comparisons as a
bottleneck. That's seems surprising, but maybe those are going on inside
a scalar loop.
Can't say for sure how to address this without knowing more. As dpb
suggests, if you can put together an example and post it to Answers (the
formatting is much better there) or contact support, we can try to get
to the bottom of it. Looking forward to it.
> This works well for memory usage - I'm now using about 10% of the memory
> previously used with the non-categorical data - but it's taking far
> longer process the data in the table. When I run the profiler I'm seeing
> a large amount of time being spent in the function cell.ismember and its
> child function cell.ismember>cellismemberR2012a.
>
> I'm not sure exactly where these functions are being called (I don't
> call them directly), but it appears to be related to comparison
> functions for some of the categorical data.
Bruce, it's hard to say what's going on without more specifics. Often
times, performance questions about tables come up because the code is
written with scalar loops that access one value or one row of the table
at a time. Sometimes that's unavoidable, but often it's possible (and
even more readbale) to vectorize operations -- that's what tables are
best at.
Even if you can't do that, it's often possible to "hoist" the variables
out of the table for the time critical code. If you put that code in a
separate function, the code becomes very natural-looking, nothing more
than calling a function with things like T.A, T.B, etc.
You've used the profiler to identify categorical comparisons as a
bottleneck. That's seems surprising, but maybe those are going on inside
a scalar loop.
Can't say for sure how to address this without knowing more. As dpb
suggests, if you can put together an example and post it to Answers (the
formatting is much better there) or contact support, we can try to get
to the bottom of it. Looking forward to it.