|
SimonHa
|
 |
« on: January 12, 2011, 12:44:52 PM » |
|
Here's a little lunchtime post for people to ponder over sometime... There have been several discussions in the past that have strayed into logging to a database, but the recent OWFS discussion ( http://www.navitron.org.uk/forum/index.php?topic=12604) made me think perhaps we should come up with a "best approach" based on our experiences for people new to all this. My feeling is that we should be taking a layered approach to building our HA systems so that any given layer is easily defined, relatively decoupled and testable. Assumptions/general requirements: * we're using a low power (energy & compute) 24/7 server * we're using Linux (or a *nix) * it has to be robust and relatively simple, preferably with little or no maintenance required * it should use open source, or at least free for non-commercial use, products * we've already chosen OWFS to provide 1-wire bus access On the latter point I have also used digitemp, and looked at logtemp, but I think owfs provides the best hardware support (i.e. not just temperature), has a good architecture with multiple APIs, and has pretty good documentation. Now, last week I think we all agreed that logging temperatures to a database was a good thing. The idea of this discussion is to agree how best to do that. I suggest we leave output/graphing for another day (though it may need consideration in passing as some approaches may make the data easier to extract)! So, my starter for 10 is: 1) Use SQLite This seems the natural choice for me as for the majority of the time this database is used by a single processes so doesn't warrant the overhead of a client/server database (like MySQL) 2) Have a cron job that logs data regularly Nice and simple. I'm not sure if the interval should be different for different sensors - you could do that with some thermometers at 1/5/15 minute intervals depending on location/use. Note: we tend to focus on temperature measurement here for obvious reasons, but whilst we're thinking we should also consider pressure, insolation and counters (for electricity/current) etc. Ideally I reckon we want one mechanism for data logging from all 1-wire devices. So, what do people think we need, and <CanOfWorms.open()> how do we do it? Simon
|
|
|
|
|
Logged
|
|
|
|
|
wyleu
Guest
|
 |
« Reply #1 on: January 12, 2011, 01:48:35 PM » |
|
Funnily enough I'm writing this as we speak/type. The core logging process should really do little more than that. It should concentrate as much as it can on getting those data values. The only other thing it should report are errors of some sort. It's much more useful to have a report of an error than a duff reading ( 85 degree readings from DS1820's) but in the one-wire world they do seem to just happen. So I would prefer to seperate errors like that which require some knowledge of the context of the sensor from actual bus errors and other such stuff. The next issue is which sensor do we read from? We can read the whole bus or read individual sensors and rely on the owfs cache to keep things consistent. But once again errors are a problem. Because it's all well and good expecting there to be a reading from a specific sensor, but what so you do if it's blatted out by a noise spike or similar? Do you consider that sensor dead or do you define conditions it has to meet to bereported? And then there is the issue of do you report all the sensors you can see or only a requested list? For what it's worth I poll the bus for all sensors every once in a while and then report against that list of sensors from there on. If one of those sensors doesn't read ( it's unplugged or similar) that is flagged as an error but simply specifies unknown till the next sensor check. Given we have a sensor list poll then we have the actual polling itself which requires times to be defined. Do we poll every sensor at the same rate ( possibly ) or do we allow specific sensors to have their individual polling times set ( more desirable). So we have a set of messages. Coming out | 1 | Error | timestamp | Error code | | 2 | Sensorval | timestamp | sensor identification | sensor value | | 3 | UnknownSensorVal | timestamp | sensor identification | Unknown | | 4 | Reader Starting | timestamp | | 5 | Reader Stopping | timestamp | | 6 | Sensor List | timestamp | Sensor List Values[/] |
So messages going in | 101 | Read SensorVal | timestamp | sensor identification | | 102 | Read All SensorVal | timestamp | | 103 | Reader Start | timestamp | | 104 | Reader Stop | timestamp | | 105 | Sensor List | timestamp |
Now once we 've got this message loop running we need something that can add messages to the system. If we get the structure right then we have a message system that can take messages from anything, not just one-wire, A TDC-4e for instance. A serial port connected to an inverter. That's nice so we'd probably add a client ID to the above messages to say where it's from. Coming out | 1 | Client_id | Error | timestamp | Error code | | 2 | Client_id | Sensorval | timestamp | sensor identification | sensor value | | 3 | Client_id | UnknownSensorVal | timestamp | sensor identification | Unknown | | 4 | Client_id | Reader Starting | timestamp | | 5 | Client_id | Reader Stopping | timestamp | | 6 | Client_id | Sensor List | timestamp | Sensor List Values | | 7 | Client_id | Heartbeat | timestamp |
So messages going in | 101 | Client_id | Read SensorVal | timestamp | sensor identification | | 102 | Client_id | Read All SensorVal | timestamp | | 103 | Client_id | Reader Start | timestamp | | 104 | Client_id | Reader Stop | timestamp | | 105 | Client_id | Sensor List | timestamp | | 106 | Client_id | Set Heartbeat | timestamp | Heartbeat time period | | 107 | Client_id | Set sensor Poll Heartbeat | timestamp | Sensor Poll Heartbeat time period |
If we have multiple clients then we need a way of adding them to the loop and removing them if that client closes down. More messages in fact... | 201 | Client_id | Request to Register New Client | timestamp | New_Client_id | | 202 | Client_id | Request to De Register Client | timestamp | Client_id |
with the answers: | 301 | Client_id | Registered New Client | timestamp | New_Client_id | | 302 | Client_id | DeRegistered Client | timestamp | Client_id |
So the database hook is just a client, that hooks into the system, as is the RRD system as is a LCD display. All nice and modular! The issues of Client ID are interesting ( because they need to be unique and be renderable as, in the case of the one-wire client, as one wire addresses. Likewise for other devices but, for the moment if we can add any other messages then we will start to see some structure before it's set in stone. This is obviously a bit more complicated than a while loop but stands a much better chance of actually doing something useful rather than stopping for no apparentreason when you try to read from a sensor that ain't there any more.
|
|
|
|
|
Logged
|
|
|
|
|
KLD
|
 |
« Reply #2 on: January 12, 2011, 05:49:59 PM » |
|
Wyleu
Is your model essentially a "owserver plus"? A central program (for want of another word) that collects the raw data, and makes them available to be stored in a database, displayed somewhere, taken as input for control functions, etc.
Klaus
|
|
|
|
|
Logged
|
|
|
|
|
SimonHa
|
 |
« Reply #3 on: January 12, 2011, 05:59:11 PM » |
|
Thanks wyleu - that's very interesting you mention proper error handling.
It's interesting that you think in terms of a stream of messages, whereas I would be trying to mold the data into a relational schema...
e.g. something like (biased towards 1-wire I'll admit):
SENSOR_TYPES type reading_units read_wait_time
SENSORS (foreign key to SENSOR_TYPES) id/serial number location (should be a key to another table I suppose) client/bus type polling interval start_date/time end_date/time calibration/offset? last(successful?)_reading_date/time (de-normalised bit for convenience)
READINGS (foreign key to SENSORS) reading_date/time value status (normal/various error codes)
Plus there's a bunch of extra data you'd need if you wanted to do things like conversions from m3 to kWh for gas, but that's probably the next level up.
Then I thought you'd have a daemon (or a job run every minute from cron) which would look through the SENSORS table, work out which ones needed reading, read them and insert the result into the READINGS table. By the way, having the effective date range on the SENSORS entity means that sensor locations could be changed without affecting the historical data (e.g. if one fails or is a bit dodgy, or you just decide to move it).
|
|
|
|
|
Logged
|
|
|
|
|
SimonHa
|
 |
« Reply #4 on: January 12, 2011, 06:06:56 PM » |
|
Wyleu
Is your model essentially a "owserver plus"? A central program (for want of another word) that collects the raw data, and makes them available to be stored in a database, displayed somewhere, taken as input for control functions, etc.
Klaus
Interesting thought - what we'd want is pretty generic really. I think digitemp has a built-in option to record all readings to a db, but then I just stumbled into a post (here again needless to say! https://www.navitron.org.uk/forum/index.php?topic=8673.0 and comment by Wookey about OWFS temploggerd). I do think we should cast our net a little more widely than just temp though.
|
|
|
|
|
Logged
|
|
|
|
|
wyleu
Guest
|
 |
« Reply #5 on: January 12, 2011, 07:05:20 PM » |
|
Wyleu
Is your model essentially a "owserver plus"? A central program (for want of another word) that collects the raw data, and makes them available to be stored in a database, displayed somewhere, taken as input for control functions, etc.
Klaus
yes. Owserver does a lot of this stuff but I think messages ( An API if you will) is really the way to go. Adding and removing clients for different functions makes for a much more robust system. Simple CSV files for graphing in Excel, complex database structures for analysis, RRD type storage,One Wire LCD displays or even a client to track the international scrap copper price to tell you how much your system is worth in scrap value? The database schema gets complicated with the units and sensor types. Devices like the Half Bee's are starting to aggregate various readings into one device, so the layers of sensor type become involved. Personally I'd like to see organisations distribute XML or JSON based descriptions of devices so that a particular solar controller can just be loaded into the system and just appearswithout lots of tedious configaration. I've done a bit of this for some of my bit's and pices and you don't half spend a lot of time describing sensors context. Such a mechansim could link to technical specs and pictures, and for the first time we might have somewhere to file all those pictures people take of the installs as they do them. There is another aspect of this. Sensors get 'bent' what is actually a half bee pretending to be a DS1820 is in fact a sensor for on/off. At the higher level this should be a boolean, if we realistically want point and click interfaces that are easy to use and don't need massive configuration, which no one would actually do. Units are very important because they allow you to construct meaningful relationships that actually mean something, rather than an exceptionally brittle system that really only are relevant in one situation under one configuration at one time. The difficulty here is that the Units will have differing structures, which means it's a little more complicated than just a foreign key; it's content types. The sme is true of sensor types. These will need categories and the definitions become involved. It's not too difficult to deal with a system that doesn't work. You stick it in a cardboard box and loose it at the back of a cupboard, the difficulty is the successful project because then you find yourself desperatel retro fitting new features to add to what you have learnt from the information it's already gathered. Anyway I'll see what I can get running and get it packaged up so rather than lots of hand coding for people it's just loading up a synaptic package, which will keep the non terminal people happy. I'd look at doing RRD, a couple of databases, and owfs for starters but if anyone has any other suggestions then now is the time to speak up. I'm without an employer at the moment so I can really get my teeth into it.
|
|
|
|
|
Logged
|
|
|
|
Talisker
Newbie
Online
Posts: 16
|
 |
« Reply #6 on: January 12, 2011, 08:44:13 PM » |
|
Sounds a good way forwards (my next step now I have Owserver up and running). It would be most appreciated if 'the solution' could be packaged and documented for those of us who are relatively new to Linux etc.....
Jason
|
|
|
|
|
Logged
|
|
|
|
|
SimonHa
|
 |
« Reply #7 on: January 12, 2011, 08:49:16 PM » |
|
Owserver does a lot of this stuff but I think messages ( An API if you will) is really the way to go. Adding and removing clients for different functions makes for a much more robust system. Simple CSV files for graphing in Excel, complex database structures for analysis, RRD type storage,One Wire LCD displays or even a client to track the international scrap copper price to tell you how much your system is worth in scrap value? ... Anyway I'll see what I can get running and get it packaged up so rather than lots of hand coding for people it's just loading up a synaptic package, which will keep the non terminal people happy. I'd look at doing RRD, a couple of databases, and owfs for starters but if anyone has any other suggestions then now is the time to speak up. I'm without an employer at the moment so I can really get my teeth into it.
Interesting. So your program would be a module that sits between owserver and other modules? Does it proxy the owserver network protocol ( http://owfs.org/index.php?page=owserver-protocol), and sniff the results? Or does one of the client modules initiate the reading? If so I'm thinking your program as a central intersection, it sounds like it has 3 types of client (arbitrary names): 1) data storage 2) data consumers 3) event triggers I assume at least one form of storage would be mandatory. It's interesting about those hybrid readings - I assume you mean Eric's PICs etc - as I hadn't thought about that. 1-wire must have a standard way of representing them though (otherwise you couldn't put them in the OWFS hierarchy for example)? If they need some kind of further processing shouldn't that be done in a higher layer? (admittedly that's another one between the data and the graphing etc). One think that's confusing me - why does a Half Bee binary I/O sensor have to look like a DS1820? Why can't it look like a DS2405 (e.g. the .PIO entries in the owfs filesystem)? Is this because it does multi-functions and there's no equivalent Maxim chip? One point I realised after my last post is that for my heating it would be very handy to have a history of the zone on/off statuses (in the .PIO fields) as I could plot them alongside the room temperature for gratuitous frivolity  (I saw a gorgeous RRD graph where they had the zone heat status as a transparent overlay to the temp curves and with a red to blue tint background... I've completely forgotten where now of course  ) A final point - I whole heartedly agree in some kind of standardised device description supplied by the device "manufacturer". I think most software just caters for DS18x20s (except for owfs of course) and any new device currently presumably needs extra code. What worries me slightly is that this could get a bit out of hand - if the database is a requirement for all use cases why don't we put it at the centre? Just my (novice) view...
|
|
|
|
|
Logged
|
|
|
|
|
KLD
|
 |
« Reply #8 on: January 12, 2011, 09:14:08 PM » |
|
Simon (A) "One think that's confusing me - why does a Half Bee binary I/O sensor have to look like a DS1820?"(B) "I think most software just caters for DS18x20s "Could (B) be the answer to (A)? I believe Eric does most of the HalfBee development under Windows and LogTemp, so that would fit in as well. "if the database is a requirement for all use cases why don't we put it at the centre?"I could envisage a system without a database, just measure and put the current data on a display. Or control your heating system with it. Wyleu, sorry to hear you are out of employment. Good for us, though  The missing part in your list is probably the visualisation of the data. Once it's all in a database, we want to play with it /show it around / make pretty pictures. Klaus
|
|
|
|
|
Logged
|
|
|
|
|
SimonHa
|
 |
« Reply #9 on: January 12, 2011, 09:54:46 PM » |
|
Could (B) be the answer to (A)?
Ha ha - good point - must have been a rambling post! I think I was assuming that OWFS was the way to go though - it seems they've done a lot of the hard work already. I could envisage a system without a database, just measure and put the current data on a display. Or control your heating system with it.
Errm, but the idea (well, title) was to log 1-wire readings to a database for short and long term analysis purposes. Otherwise wouldn't we just use Munin/RRD as your example in http://www.navitron.org.uk/forum/index.php/topic,12604.msg140346.html#msg140346? Simple control of heating is fairly trivial (i.e. if time is x and temp < y then switch on relay z), particularly if you're not embarrassed by shell script. I'm not sure the logic for more sophisticated heating control (i.e. setpoint = y at time x so switch on relay t hours earlier depending on outside temp and historic heat up time for zone) is really in scope for this layer (hence my discussion of misterhouse elsewhere). Plus there might only be one person on this forum actually interested in it 
|
|
|
|
|
Logged
|
|
|
|
|
KLD
|
 |
« Reply #10 on: January 12, 2011, 10:03:09 PM » |
|
Errm, but the idea (well, title) was Oups, did I go off topic? Greenbeast I think has developed a system (Arduino based?) that just collects and displays, just a few numbers, no graphs / historical data. If this new endeavour here could offer a click-and-install solution for something that "simple", it might be interesting for those who just want to know whether there's enough hot water in the tank for a bath. Klaus
|
|
|
|
|
Logged
|
|
|
|
|
wookey
|
 |
« Reply #11 on: January 12, 2011, 10:45:40 PM » |
|
So, my starter for 10 is: 1) Use SQLite This seems the natural choice for me as for the majority of the time this database is used by a single processes
This assumption is wrong. There will be at least two processes: One reading data and putting it in the database. Another reading the database in order to do system control. And probably another reading it for graphing. SQlite may manage this fine - I'm not sure of the details. You _could_ use owfs as the 'neutral point' between logging, control and graphing (one of its purpose's is to manage contention to a 1-wire bus from multiple clients), but I think that it makes more sense to read the data just once, then record it. Other processes can act on changes or graph it, or whatever else is desired. I am currently running one process (munin) that graphs temps and another ('solar') which does the solar control, but they read temps entirely independently which seems inefficient to me (as temp reading is really slow in comparison to the other aspects). There are issues with delays in response time, but I think that can be managed. 2) Have a cron job that logs data regularly Nice and simple. I'm not sure if the interval should be different for different sensors - you could do that with some thermometers at 1/5/15 minute intervals depending on location/use.
You can't assume all sensors are sampled and recorded at same rate. You want a PHE-monitoring sensor(or any sensor used as a control input) to be monitored at a high rate. A room sensor or outdoor sensor can be monitored at a much lower rate. There is no point logging more freqently than new data is acquired. You could do everything at the rate of the most-freqent sensor but that's a lot of redundant data. I'd prefer a smarter design (I don't think it's too hard by any means). Note: we tend to focus on temperature measurement here for obvious reasons, but whilst we're thinking we should also consider pressure, insolation and counters (for electricity/current) etc. Ideally I reckon we want one mechanism for data logging from all 1-wire devices.
We have that. I don;t believe anything special is needed to cope with this. Stored data does need to be 'typed' the way munin does, so we know if it it a reading, or a cumulative counter, or an on/off state - check the munin docs for other types - those are the only three I've used.
|
|
|
|
|
Logged
|
Wookey
|
|
|
|
MN
|
 |
« Reply #12 on: January 13, 2011, 10:12:54 AM » |
|
Just my pennies worth on the database
I started using access MDB as a DB but have since moved to SQL Server (I already had a server running)
BUT
I split into multiple databases by month so as I write the data I check that it is the right month, if not I create a new database. When I read data it reads from multiple databases (if it has to) and returns as one dataset. Obviously the reading is in a layer so I can call GetData(Device,Jan01,Dec02) and it would read through all required databases. Most of the queries I run are show graphs for yesterday or last week – so a lot of the time only one DB is hit
This keeps the ‘live’ database relatively small and fast. It also means when the data is corrupted and you have forgotten to back up your data you only loose the current month!
Just an idea
MN
|
|
|
|
|
Logged
|
|
|
|
|
EccentricAnomaly
Guest
|
 |
« Reply #13 on: January 13, 2011, 11:10:33 AM » |
|
Personally, I'd avoid relational databases of any sort. I think they tend to be used, rather thoughtlessly, when often a different setup would be far more suitable. There is, thankfully, a bit of a movement away from relational DBs around the edges of the computer industry these days ¹ - going under the banner of NoSQL. Something which might be worth a look is CouchDB ( Wikipedia). It's a non-relational database designed to store a collection of "documents" which are typically small JSON fragments. It's set up so appending is a very efficient operation and uses a map/reduce architecture for query evaluation (queries a written in JavaScript) which allows partial query results (e.g., running totals on already entered data) to be cached. A neat feature is that the JSON fragments don't have to all have the same structure in order to run useful queries across them. I've only had a brief play at the "hello world" level but was sufficiently impressed that I plan to have another look when next have a suitable application. ¹ If you think Google is "round the edges".
|
|
|
|
|
Logged
|
|
|
|
|
wyleu
Guest
|
 |
« Reply #14 on: January 13, 2011, 11:20:31 AM » |
|
Did You transfer the old data from Access to the new system? I don't think I would... There's a million sketches. Personally I like to keep everything, for ever, but don't manage it anything like properly. It's really no different to the office job bookings system or on-line revenue forms for that matter. At the end of the day you want to ask questions of something. How do you phrase the question and off what do you ask it? owserver is great but it only takes you so far. Something has to abstract over the top of it. There is a small database at the core, You want to maintain a little state after all. It records a list of very basic sensor objects it has seen. Importantly it records Location stuff, sensor stuff, client stuff, first and last reads and thats about it ( apart from the no doubt obvious one that some-else will add... ) . It shouldn't contain any other measurements. You can't get much lighter than that On start up it sends messages to the owclient, or any clients it knows about. asking for all the sensors they know about . Once it knows what it's got it tells everybody about it. Once at the start and then when asked by the appropriate message. LOCATION lat long altitude precision Postgres has a geo database built in. I've never used it. Probably a bit bulky and unless we have a GPC client running we can't really be sure quite where we are. So dumb at the moment. CLIENT type SOURCE, SINK, BI, datatype The types described by Wookey end up here. first connect last connect SENSOR_TYPES type reading_units read_wait_time SENSORS (foreign key to CLIENT) (foreign key to SENSOR_TYPES) id/serial number (foreign key to LOCATION) client/bus type polling interval start_date/time end_date/time calibration/offset? last(successful?)_reading_date/time (de-normalised bit for convenience) active READINGS (foreign key to SENSORS) reading_date/time value status (normal/various error codes) Should only really contain the first & last reading but if we do it this way rather than place them in SENSOR then the basic data shape can be used by any database type thing. EccentricAnomoly makes a great point about the non relational stuff. Big Table et al, and the cloud... That could well be a great way to go. Not having to bother about a relational structure is something to think about.
|
|
|
|
« Last Edit: January 13, 2011, 11:31:40 AM by wyleu »
|
Logged
|
|
|
|
|