Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Each timeseries is associated with a MeasurementSchema, which contains the `measurementId`, datatype, encoding, compression type, and properties. However, with limited numbers of data types, encodings, compression types, and mostly null properties, MeasurementSchemas are highly redundant.
To make it more specific, we currently have 7 data types, 9 encodings, and 8 compressions, so there are at most 7*9*8=504 distinguish MeasurementSchemas. However, each timeseries will create its own MeasurementSchema, when there are 1M timeseries, only 0.05% of the MeasurementSchemas are different.
If we squeeze `measurementId` out of MeasurementSchema, then we can share MeasurementSchema in different timeseries and reduce the number of MeasurementSchema greatly. In the example above, about 1M MeasurementSchema instances will be eliminated, assuming 20 bytes per instance, 20MB memory footprint will be reduced, and the number grows almost linearly with the number of timeseries.