"Yair Altman" wrote in message <ocm19f$n6v$1@newscl01ah.mathworks.com>...
>
> Duplicating a long string multiple time is indeed wasteful in memory. Storing a cell array of unique strings and then storing just the numeric index into that cell array in another array next to the data would be much more efficient in memory, and you'd probably find that it would also be more efficient than the corresponding categorical array. But of course, this would mean investing more coding time and would result in less readable / maintainable / debuggable code. This goes back to my point that the real tradeoff here is maintainability and dev time/cost vs. performance and memory. There are no free lunches in this world...
>
> Yair Altman
> http://UndocumentedMatlab.com
>
Thanks, Yair Altman and dpb - your responses make perfect sense.
It sounds like the benefits for memory efficiency increase with larger array sizes, but then there might be a performance hit, since the methods being called to process all the array elements might be more costly than operations on fundamental data types.
I'm starting to think that Yair's suggestion of sticking with fundamental types and writing the code to do the bookkeeping is probably the most efficient way to work, albeit at the cost of more complex code. This is the point when I tend to start defining classes to keep the low-level processing encapsulated where other users won't have to see it.
My biggest concern is that I'm getting tempted to start defining unique indices - or better, keys - for each record in a table. That keeps sounding like a database, which I don't really want to take on.
>
> Duplicating a long string multiple time is indeed wasteful in memory. Storing a cell array of unique strings and then storing just the numeric index into that cell array in another array next to the data would be much more efficient in memory, and you'd probably find that it would also be more efficient than the corresponding categorical array. But of course, this would mean investing more coding time and would result in less readable / maintainable / debuggable code. This goes back to my point that the real tradeoff here is maintainability and dev time/cost vs. performance and memory. There are no free lunches in this world...
>
> Yair Altman
> http://UndocumentedMatlab.com
>
Thanks, Yair Altman and dpb - your responses make perfect sense.
It sounds like the benefits for memory efficiency increase with larger array sizes, but then there might be a performance hit, since the methods being called to process all the array elements might be more costly than operations on fundamental data types.
I'm starting to think that Yair's suggestion of sticking with fundamental types and writing the code to do the bookkeeping is probably the most efficient way to work, albeit at the cost of more complex code. This is the point when I tend to start defining classes to keep the low-level processing encapsulated where other users won't have to see it.
My biggest concern is that I'm getting tempted to start defining unique indices - or better, keys - for each record in a table. That keeps sounding like a database, which I don't really want to take on.