SolarNetwork

News » Datum database schema redesign

A big part of what SolarNetwork does is collect and store data from all sorts of devices and services. The type of data collected varies quite a bit, so SolarNetwork has a flexible design for storing it, which we call datum. A datum is a collection of properties tied to a specific timestamp and object and source. An object is either a SolarNode device (a node) or a physical location (such as a building address or GPS coordinates). A source is a user-assigned name.

For example, a node datum collected from a power meter might have properties like this:

Property Value
ts 2021-01-01 12:00:00+13
node_id 123
source_id /meter/1
current 9.733
voltage 239.5
watts 2331
wattHours 340983098
status OK

Old database design

Until now, SolarNetwork stored this data as a set of JSON objects, based on a simple classification of instantaneous vs accumulating vs status property types. Along with columns for the timestamp, object ID, and source ID of the datum, the database table had JSON columns for each set of properties, sort of like this:

Column Datatype Value
ts timestamp 2021-01-01 12:00:00+13
node_id bigint 123
source_id text /meter/1
jdata_i JSON {"current":9.733,"voltage":239.5,"watts":2331}
jdata_a JSON {"wattHours":340983098}
jdata_s JSON {"status":OK}

Additionally, users are allowed to store arbitrary metadata for object + source ID pairs, in the form of a JSON object. In the old database design a separate table holds this information, using the object ID and source ID pair as its primary key. A row in the database table for node source metadata might look like this:

Column Datatype Value
node_id bigint 123
source_id text /meter/1
jdata JSON {"description":"Power meter for house supply."}

Design problems

Although quite flexible (and having served SolarNetwork well for many years), the database design had some notable drawbacks:

New database design

To overcome these design problems, we’ve developed a new database design that preserves the general flexibility of how datum are stored but solves the design issues. We introduced the concept of a datum stream, with these changes:

A diagram makes this easier to take in:

Table redesign diagram

The new da_datm table has a primary key defined by a stream ID and timestamp. The datum property values are stored as numeric[] and text[] array columns.

The new da_datm_meta and da_loc_datm_meta tables associate stream IDs to node/location ID and source ID pairs. These tables hold the datum property names for each property classification in text[] array columns (the names_* columns). They also hold user-defined JSON metadata for each stream in the jdata column, formerly stored in the da_meta and da_loc_meta tables.

Using the same example datum as shown earlier, in the new database design there would be one row in the da_datm_meta table like this:

Column Datatype Value
stream_id UUID a4840264-2b5b-4673-83f7-66a404d6faf0
node_id bigint 123
source_id text /meter/1
names_i text[] [current,voltage,watts]
names_a text[] [wattHours]
names_s text[] [status]

Then there would be one row in the da_datm table like this:

Column Datatype Value
stream_id UUID a4840264-2b5b-4673-83f7-66a404d6faf0
ts timestamp 2021-01-01 12:00:00+13
data_i numeric[] [9.733,239.5,2331]
data_a numeric[] [340983098]
data_s text[] [OK]

Problems solved

The new design solves the first problem of renaming source IDs quite nicely. Now a source ID is just a bit of metadata for a datum stream: updating a single row in the da_datm_meta table is all it takes to update the source ID for an entire stream. In fact, object IDs are now essentially another bit of metadata for a datum stream, and could be easily updated if needed.

The new design solves the second problem of duplicate property names by turning the property names into metadata on a datum stream as well. As more and more datum are collected for a given stream, there will still be only one row in the da_datm_meta table. SolarNetwork is also flexible enough that property names can be added over time to a given stream, and NULL (empty) values are allowed (and stored efficiently) if a datum doesn’t always contain values for all properties.