Temporary (failsafe) storage suggestions


  • :belt_onion:

    It came up the other day by accident in another thread, and since I'll have to implement this fairly soon...

    So, I have a service that captures some live data, does some parsing on it and stores it into a database. Note that, typically, this database will be on the same machine as the service and as such should be fairly safe to assume that it will be available (unless the whole thing keels over). There is also a possibility of client-specific features that store data into their own existing database which will most likely NOT be on the same machine, maybe not even on-site.

    So, as always, it's possible that Shit Happens™ and I can't store the parsed data right away. This data is going to be given to me once, and once only, there is no way to request a re-transmission. For that reason I want to add some kind of temporary storage where I can dump the data to be inserted into the target DB if for any reason it's not immediately available.

    Note that:

    • this data does not have to be accessible from any other system, so the speed of fetching it does not matter
    • the data does not have to be structured in the same way as long as it's complete and can be parsed again if needed
    • the data I'm getting is just plain text, with blocks formatted like Attribute1: Value1\r\n ... AttributeN: ValueN\r\n\r\n. No fancy XML and/or JSON
    • all the relevant data is also mapped to attributes of different classes at runtime. I currently don't save the raw text in any way once I'm done with it. In fact, large amounts of it are discarded because it's irrelevant. All objects holding this data are deleted as soon as the data is saved and they are no longer needed (they can and do change as the new data comes in until I can detect there will be no more changes, otherwise I wouldn't even bother with them)

    I'm debating on how to handle this backup storage at the moment. Things that come to mind:

    • SQLite
    • Dumping objects to JSON or XML (data types of attributes are well known and do not change, so JSON is viable)
    • Dumping raw text of all data to a file and cleaning it up after storage is successful (worst idea IMHO, since I would have to store them regardless of DB connectivity then since I have to wait for the point at which I know no new data will come in)

    Leaning towards SQLite myself. Then again, maybe there's something easier that I'm just missing.



  • This post is deleted!


  • @Onyx said:

    I currently don't save the raw text in any way once I'm done with it. In fact, large amounts of it are discarded because it's irrelevant.

    Do you know immediately what to discard? If so, I'd probably save the relevant text into some kind of memory stream, and then when it's time to store the data and it fails, dump the stream into a file to check later.

    I'm assuming re-parsing the data won't be expensive?


  • :belt_onion:

    @Maciejasjmj said:

    Do you know immediately what to discard?

    Yes.

    @Maciejasjmj said:

    I'm assuming re-parsing the data won't be expensive?

    Not really, but it's a bit of a pain. The data is a disjointed mess that, you'd assume, should be something you can parse by a simple shell script. Yeah, not really. I have something like 5 or 6 classes whose only job is to hold a few pieces of relevant data, so they can actually be looked up in a somewhat sane way when needed.

    Also, there are some triggers for when specific bits of data come in that send realtime notifications over websocket connections. If I wanted to re-parse the data I'd have to add switches that neuter that, so obsolete notifications don't get pushed out again.


  • Discourse touched me in a no-no place

    @Onyx said:

    Leaning towards SQLite myself.

    That has the advantage of really having written it if it says it has written it, and SQLite's pretty careful about that aspect of things, far more so than anything you're likely to write. Since you're getting one-shotted the data, making sure it is durable is important. (Don't write to a network share, OK? The semantics of networked filesystems are… broken at best.) Once it's saved, you can parse at your leisure.

    I assume that the message rate is low enough that the time to sync to disk won't be a problem? (Using an SSD will help a lot if you've got problems there; they accelerate databases far more than most other apps.)


  • :belt_onion:

    @dkf said:

    Once it's saved, you can parse at your leisure.

    Again, some of the parsing is, unfortunately, tied into the live notification thing. Going to elaborate a bit now to avoid further confusion, I didn't want to add it to OP to make it shorter, but I realize now it's a bit too confusing if I don't.

    I'm getting events from Asterisk PBX. Example of an event definition:

    Event: Dial
    SubEvent: <value>
    Channel: <value>
    Destination: <value>
    CallerIDNum: <value>
    CallerIDName: <value>
    ConnectedLineNum: <value>
    ConnectedLineName: <value>
    UniqueID: <value>
    DestUniqueID: <value>
    Dialstring: <value>
    

    See those UniqueID and Channel bits? I have to use those to connect that event with a specific call. I can't rely on any kind of sequence really, since multiple calls can be made in parallel. There actually is a "sequence number" that's not documented here, but that fails under some circumstances (transfers, redirects some fax wizardry I'm doing, things like that). Dial is actually very verbose, some events don't have near as much data in them. Since I have to trigger notifications on some events, and they don't contain enough data to reliably gather all the required info I have to send, I have to parse everything as it comes, so I can gather that data from objects that contain it.

    Also, my cutoff point is an event as well. When I receive a Hangup I can gather all the data I need, store it to the DB, and delete all objects that particular call spawned from memory.

    During the parsing all the data is in-memory. Since the data is generally small enough, and I'm trying to be as memory-efficient as possible (the service that handles this is written in C++), current benchmarks show it will not cause any problems on that front.

    The idea was to save the parsed data somewhere if DB operations fail. I would like to avoid saving individual chunks of data, so I don't have to worry about the notification thing if recovery is required.


  • Discourse touched me in a no-no place

    @Onyx said:

    Again, some of the parsing is, unfortunately, tied into the live notification thing.

    OK, but still do the minimum necessary.



  • It seems you don't really need any of the capabilities of relational database. Just a place to permanently store data and then later delete it once you're done with it.

    If you think you might need SQL for other things in the system (now or later), then go with SQLite. You can use one table for dumping this data in whatever form is the easiest to parse in C++ (probably normal rows, since you say data is fixed).

    If you don't think you'll need SQL for anything else, you can consider dumping each data record into an individual file. Using whatever format is the easiest to parse. Once you can transmit the data for single file, you just delete it. Ghetto, but should be the least resource demanding and still easy enough.



  • What happens if something else goes wrong, like your server is unreachable? The information is just lost?


  • :belt_onion:

    @cartman82 said:

    If you think you might need SQL for other things in the system (now or later), then go with SQLite. You can use one table for dumping this data in whatever form is the easiest to parse in C++ (probably normal rows, since you say data is fixed).

    SQL is used already. The data should end up in MySQL in the end. What I want to do here is have a backup plan if something fails during that insert, for whatever reason. There is also a possibility I'll have to interface with other DBs which I can't guarantee the availability of (client-specific setups), and that's the case I really want to have covered.

    @Keith said:

    What happens if something else goes wrong, like your server is unreachable? The information is just lost?

    The system is intended to host both Asterisk and monitoring service on the same machine, if at all possible. There should be no reason why they wouldn't, really, but it is possible to run them on separate machines. It's not a cloud service in a sense that you connect your Asterisk install to it if that's what you mean, it comes packaged together.

    There is some native Asterisk logging that's turned on. Most of the data should be recoverable from that and I will put some recovery procedures in place using that data. Was a bit lacking last I checked though, that's the main reason I had to build this freaking reverse-engineered beast in the first place.



  • @Onyx said:

    SQL is used already. The data should end up in MySQL in the end. What I want to do here is have a backup plan if something fails during that insert, for whatever reason. There is also a possibility I'll have to interface with other DBs which I can't guarantee the availability of (client-specific setups), and that's the case I really want to have covered.

    Ok, I see. Go with files then. This eliminates at least a few potential problems that could crop up with SQL (poorly formatted data, for example). If you can't write files, you might as well give up completely and keel over.



  • What about dumping stuff into a simple key-value store? Redis seems to be fairly light-weight (but requires running an external process) and it's relatively easy to interface with.



  • @cvi said:

    What about dumping stuff into a simple key-value store? Redis seems to be fairly light-weight (but requires running an external process) and it's relatively easy to interface with.

    I'm using it.

    :white_check_mark: Easy to interface with from node.js

    :red_circle: ... but maybe not from c++

    :red_circle: ... also a separate process

    :red_circle: ... and not 100% persistent, which is IMO the deal-breaker.



  • I'm using it from C++ via hiredis - it seems to be the officially blessed C interface. It's a bit primitive compared to e.g. the Lua bindings that I also played around with, but it works and is simple enough to use.

    As for data being 100% data persistent... The redis docs mention something about an append-only file mechanism that can help here, but that's about as much as I know about that.

    Originally, I was also going to mention Berkeley DB, but decided against it, since I had never used it for anything - I believe that it doesn't require a separate process, but I might be wrong. Also, a quick Google showed that it's now an Oracle-product (which I don't remember it being previously).

    Anyway, perhaps somebody here knows of a semi-competent, simple-to-use persistent in-process key-value store with a C/C++ API... I might be interested in that, too. :-)



  • @cvi said:

    As for data being 100% data persistent... The redis docs mention something about an append-only file mechanism that can help here, but that's about as much as I know about that.

    It has two storage mechanisms:

    • snapshot style, where it dumps current content in certain intervals / under certain conditions. On average 10-20 secs of data loss using default settings.

    • append only, that writes all changes into a log, then occasionally chews them up into a more permanent snapshot. There's a configurable cached delay before things are written to HD. 1 second data loss using defaults.

    • or you can use both, or neither of these.

    I use redis in several places. In one system, as a simple memory cache. In another, using both append-only and snapshots.

    In the first system, I couldn't be happier. Great cached db.

    The second system, however, occasionally has to deal with hardware shutdown. In that case, I get AOF corruption that prevents DB from starting up, so I have to SSH and deal with it manually. Fixing it is a single command and it's not a big deal in my use case, but I don't know.

    Persistence in redis just doesn't feel serious enough if I really had critical data that must be saved at any cost.


  • Discourse touched me in a no-no place

    @Onyx said:

    SQL is used already. The data should end up in MySQL in the end. What I want to do here is have a backup plan if something fails during that insert, for whatever reason. There is also a possibility I'll have to interface with other DBs which I can't guarantee the availability of (client-specific setups), and that's the case I really want to have covered.

    SQLite definitely fits this bill. It's intended to be a grown-up version of the cheap-ass dump-stuff-to-files solution, and once you've got your SQL written it should be reliable. (How do I know this? Because I know the author of SQLite personally and have had him tell me to my face that this was the original design brief. Nice guy. Very competent programmer. Slight NIH mentality, but good enough that this isn't a hinderance to his productivity.)

    Fun fact: early versions of SQLite used Berkeley DB as the storage engine. This was changed to something custom for speed and reliability.


  • :belt_onion:

    I was leaning towards SQLite myself all the time, but I wanted to hear some more suggestions. I'm happy I did in any case, never can learn enough stuff, right?

    I also forgot to mention I already interface with SQLite in this project anyway - Asterisk has it's local DB, which is held in RAM, but every change is saved to SQLite so that it can restore it's state when restarted. Since it far easier to get that data from SQLite than futzing around with commands through telnet (ugh), I actually read some initial settings from it's SQLite "backup" on startup.

    Now if only I could write data to it that way, but no, there is no way to refresh it's state from the database other than a service restart. So telnet it is! Yay! Fake enthusiasm!

    I'm most likely going SQLite in the end, even though I don't hate files as much for this use case.



  • Are you trying to write data to things like peer definitions (sip.conf), or dialplan (extensions.conf), or voicemail, or what? The Asterisk Realtime Architecture does a pretty fair job of allowing on-the-fly external modification of voicemail out of the box, and I've kind of hacked together a simplistic way of doing "realtime" dialplan (the original realtime modules aren't directly supported any more, but with judicious use of ODBC functions you can roll your own way to do it). Sadly, your choices with peer definitions (at least as of Asterisk 11 -- I haven't mucked around with 12 and PJSIP yet) are more limited, at least if your peers are behind NAT and you're relying on Asterisk's qualify process to keepalive the NAT connection, but Realtime might be worth investigation depending on the data you're trying to write & refresh "live"...


  • :belt_onion:

    No, I'm capturing my own CDRs, because internal ones are an unreliable mess. If AMI is a correct reflection of how they do it in code, I can see why. Additionally, I'm pushing notifications when certain events trigger.

    Config side, there's a universal dialplan written that pulls settings from Asterisk DB and allow us to route calls or inject custom code on the fly. Other conf files are just files atm, but we might move it to realtime. Currently the focus is on giving users the ability to set routes / DIDs, toggle recording, edit star codes and such from a web UI. Messing with adding extensions to sip.conf and similar is something we'll work on for v2, still didn't decide on how we'll handle that exactly.



  • Ah, guess I'm not much versed in CDRs (in my particular application I have a non-Asterisk CDR source that is authoritative so I don't even record them in Asterisk).

    If you don't need SIP Qualify, Realtime would give you the ability to hot-add/change SIP extensions, and every time you try to place or receive a call to/from that peer, Asterisk will use the then-current SIP information. Qualify requires caching the Realtime peers, however, which then requires an asterisk -rx "sip prune realtime peer XYZ" any time you make a change to peer XYZ (yay telnet).


  • :belt_onion:

    @izzion said:

    Ah, guess I'm not much versed in CDRs (in my particular application I have a non-Asterisk CDR source that is authoritative so I don't even record them in Asterisk).

    Count yourself lucky - I was originally brought on to make an application that would just read the provided CDRs from MySQL and show them to the user. What's that, a week of work on that? And then I could move on to fun stuff. Yeah... don't dare to try doing any fancy subs if you want that to work. Hell, don't even use h or i extensions, because the damned thing will find a way to record that into the DB in the dst field.

    @izzion said:

    If you don't need SIP Qualify, Realtime would give you the ability to hot-add/change SIP extensions, and every time you try to place or receive a call to/from that peer, Asterisk will use the then-current SIP information.

    Didn't play with it yet, but sounds good. Not sure about Qualify but hey, my service should running on the same server anyway, don't even need to mess with telnet, I can just spawn a process, shoot off a command and be done with it.



  • I work with IVRs/call center data all the time, and can give you disgusting amounts of details about the stuff if you need it.

    Depending on the load, sqlite is good for client side, a secondary mysql database is better for a server hosting the ivr itself since it's better with concurrency/locking issues. Either will let you bulk grab data for retransmission, but sqlite will lock the database down while you're accessing it so be careful with how you re-sync the data.



  • I'm using "Qualify" to refer to the process of periodically probing your peers to see if they're still there (and, incidentally, keep their NAT connection open, which I trust Asterisk to do a lot more than I trust the NAT Keep-Alive settings on the hodgepodge of endpoints I get to support). If you're on a LAN type setup where you don't have to worry about that, then you don't have to allow Realtime to cache peers, and you're in great shape for Realtime to be truly dynamic. If you do need the qualify process to periodically probe peers, then you're stuck with caching -- qualify isn't "Realtime aware". Not that it probably should be.

    And yay for the mess that is Asterisk CDRs. I've had to plumb through them a few time to try to help differentiate between two types of calls that hit our internal queues, and that's more than enough to make me sing hosannas that I don't have to try to deal with them for customer billing.


  • :belt_onion:

    @Matches said:

    I work with IVRs/call center data all the time, and can give you disgusting amounts of details about the stuff if you need it.

    Been there, mostly cleaned the guts up. Still a few edge cases that I'm fixing up, but I'm happy with the data now.

    @izzion said:

    try to deal with them for customer billing

    If anyone asked me to do it using only native CDRs... There would be blood. Weather it would be a murder or a suicide is unknown at this time.

    And yeah, LAN setups, for now at least. Avoiding NAT transversal like a plague, if at all possible. Will have a play with realtime in a week or so hopefully, if I sort out the last bits of mess left in the system by then.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.