On 04/12/2017 1:21 PM, Bruce Elliott wrote:
> "Yair Altman" wrote in message <ocli7j$k5c$1@newscl01ah.mathworks.com>...
>> Categorical data and tables in general use more memory and are less
>> performant than the corresponding implementation using simple arrays.
...
> Why would categorical data use more memory? In my example, I could
> repeat the char array containing the full path and filename for every
> row of data that came from that file, but I don't see how that could
> take less memory than simply storing an index to a list of filenames.
> In my case, converting five columns to categoricals reduced the size of
> the table from 95.4 MB to 8.8 MB.
What Yair is alluding to is that a table or cell array or structure or
other higher-level abstraction does use memory in addition to the data
in order to provide the amenities of using that abstract data type.
Here's a simple demonstration:
>> ans=int8(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 5 int8
>> ans=int32(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 20 int32
>> ans=1:5;
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 40 double
>> ans=categorical(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 378 categorical
>>
So, while turning a longer string representation or other
memory-intensive data into a categorical can save memory, it's not the
same internally as turning it into just an integer array. We see it
took an extra 300+ bytes over a double to build the categorical array
that internally only needs a short integer.
Experimenting with 1 to 5 elements one finds that
mCat=64*N+58
for integer values with no additional properties stored. So, for the
convenience there is a price. If your categorical values are less
memory intensive than that then memory usage actually will increase, not
decrease.
--
> "Yair Altman" wrote in message <ocli7j$k5c$1@newscl01ah.mathworks.com>...
>> Categorical data and tables in general use more memory and are less
>> performant than the corresponding implementation using simple arrays.
...
> Why would categorical data use more memory? In my example, I could
> repeat the char array containing the full path and filename for every
> row of data that came from that file, but I don't see how that could
> take less memory than simply storing an index to a list of filenames.
> In my case, converting five columns to categoricals reduced the size of
> the table from 95.4 MB to 8.8 MB.
What Yair is alluding to is that a table or cell array or structure or
other higher-level abstraction does use memory in addition to the data
in order to provide the amenities of using that abstract data type.
Here's a simple demonstration:
>> ans=int8(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 5 int8
>> ans=int32(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 20 int32
>> ans=1:5;
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 40 double
>> ans=categorical(1:5);
>> whos ans
Name Size Bytes Class Attributes
ans 1x5 378 categorical
>>
So, while turning a longer string representation or other
memory-intensive data into a categorical can save memory, it's not the
same internally as turning it into just an integer array. We see it
took an extra 300+ bytes over a double to build the categorical array
that internally only needs a short integer.
Experimenting with 1 to 5 elements one finds that
mCat=64*N+58
for integer values with no additional properties stored. So, for the
convenience there is a price. If your categorical values are less
memory intensive than that then memory usage actually will increase, not
decrease.
--