Quantcast
Viewing latest article 3
Browse Latest Browse All 48

efficiency using categorical data

I've been experimenting with converting some older code to use categorical data to reduce memory use, but it seems to be causing unexpected slow-down in processing time. I'm wondering if this is expected, or if I'm just doing something wrong.

The older code reads a number of data records from several files - typically tens of files, each with thousands of records. The files are all of the same format, so the data from each file goes into a separate structure array, all of the structures having the same 20-30 fields. The structure arrays are stored in the first column of a cell array, with the corresponding filenames in the second column.

This all works well enough, but it's a little cumbersome having to index first into the cell array, and then into the selected structure array. It is much simpler to have a single large structure array with an additional field to identify the source file for each data record. Once I've done that, I've also converted the structure array to a table for convenience.

Since we're talking about thousand to tens of thousands of records, many of which have values from a relatively short list of possibilities (e.g. the filenames), I decided to convert select fields to categorical data types, to reduce the memory footprint.

This works well for memory usage - I'm now using about 10% of the memory previously used with the non-categorical data - but it's taking far longer process the data in the table. When I run the profiler I'm seeing a large amount of time being spent in the function cell.ismember and its child function cell.ismember>cellismemberR2012a.

I'm not sure exactly where these functions are being called (I don't call them directly), but it appears to be related to comparison functions for some of the categorical data.

Does all this sound correct? Should I expect a slow-down in processing categorical data in exchange for less memory use?

Viewing latest article 3
Browse Latest Browse All 48

Trending Articles