Was Googling about OLAP in Azure, and found this new ColumnStore feature



  • Has anybody used this before? The column seems to be saying that SQL Server will automatically use the ColumnStore indexes for "analytics queries"-- but how does it identify which queries are analytics queries?

    It also doesn't seem to have any information on how to translate an OLAP design to ColumnStore indexes. Anybody have information on this?

    I assume these are like a special case of a materialized view, is that assumption correct? I can't imagine what else this would be.

    Any help appreciated.



  • @blakeyrat

    Column Store indices are for OLTP not OLAP. (Yes, you can analyze "regular" SQL information!)

    Have you seen this page: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-2017

    As far as "this new feature", I am not sure if June 1st, 2016 really counts as "new" ;)



  • Also, here is a decent walkthrough that might help.



  • Oopppss...



  • @thecpuwizard said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    Column Store indices are for OLTP not OLAP.

    So does OLTP do what OLAP does? In other words, can you replace an OLAP cube with a bunch of these columnstores?



  • @blakeyrat said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    @thecpuwizard said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    Column Store indices are for OLTP not OLAP.

    So does OLTP do what OLAP does? In other words, can you replace an OLAP cube with a bunch of these columnstores?

    Depends on what you mean by the first question :) There is much more to it than I can put in a reasonable post here. I would suggest doing some of the available labs after study of the material(not just specific pages).

    As an analogy, consider someone who has good OLTP knowledge, but never hear of OLAP....then try to give them working knowledge of OLAP and related in a single post... :eek:

    Information coming from Microsoft indicates that OLAP [aka Analysis Services in the classic mode] is going away. For many use-cases it has already officially been deprecated.

    To answer your second question... Yes and OLAP cube can be replaced with a different means of doing Analytics which will (almost certainly) include ColumnStores. But also many other aspects.

    My personal favorite reason for OLAP Cubes to go away is that it means MDX queries go away [they are partially responsible for my hair loss]



  • I really wish I could find a non-Microsoft tutorial on this, the Microsoft ones are written in this language I always have so much trouble parsing and understanding, and always seem to focus on getting into every little complex detail before answering simple questions like, "if I already know OLAP, how do I use that knowledge for columnstore?" which is really what I want.

    Like this:

    rowgroup

    A row group is a group of rows that are compressed into columnstore format at the same time. A rowgroup usually contains the maximum number of rows per rowgroup which is 1,048,576 rows.

    Is this REALLY something I need in an overview article? Even if I do (for some reason) need to know what a "rowgroup" (or perhaps "row group", its inconsistent) is, do I really need to know exactly how many rows the maximum number is? Why not start with HOW TO USE IT.

    To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index, which is called a deltastore, and a btree list of IDs for deleted rows. The deltastore operations are handled behind the scenes.

    If it's handled behind-the-scenes, WHY IS AN OVERVIEW ARTICLE TELLING ME ABOUT IT? I WANT AN OVERVIEW! Grump.



  • @blakeyrat From the "Getting Started" article here https://docs.microsoft.com/en-us/sql/relational-databases/indexes/get-started-with-columnstore-for-real-time-operational-analytics?view=sql-server-2017

    1. Identify the tables in your operational schema that contain data required for analytics.

    2. For each table, drop all btree indexes that are primarily designed to speed up existing analytics on your OLTP workload. Replace them with a single columnstore index. This can improve the overall performance of your OLTP workload since there will be fewer indexes to maintain.

    Seriously? That's step 2? What if I don't have any btree indexes? What if I'm not sure what a btree index is (which is actually my case)? Why isn't the term "btree indexes" linked to useful documentation on that?

    0_1524515497289_6a00d8341c5f3053ef0120a8ed794e970b.jpg

    Seriously, this technology might be the bee's knees, but this documentation is worse than useless. I need an actual practical example just once here, Microsoft.


    Oh and the punchline, here's step 3:

    1. This is all you need to do!


  • It turns out that I guess B-tree indexes are the same thing as clustered indexes.

    A bigger problem is that some of my columns are JSON and I'm guessing that this columnstore index isn't going to be able to help with JSON columns; so I'm still going to need some kind of ETL to yank that data out.



  • @blakeyrat said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    What if I'm not sure what a btree index is (which is actually my case)? Why isn't the term "btree indexes" linked to useful documentation on that?

    It was directly in one of the articles I posted a link to....

    normal SQL "CREATE INDEX" creates a btree index. Pretty hard to be effective at creating/designing performant databases without understanding how indices work.....

    Other points: If you already know OLAP, the first thing to do is forget (for this purpose) ALL of it. This type of analytics is a completely different paradigm.



  • @thecpuwizard said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    normal SQL "CREATE INDEX" creates a btree index. Pretty hard to be effective at creating/designing performant databases without understanding how indices work.....

    I know how indexes work, I didn't know those indexes had two different names that apparently Microsoft's documentation uses interchangeably.

    @thecpuwizard said in Was Googling about OLAP in Azure, and found this new ColumnStore feature:

    Other points: If you already know OLAP, the first thing to do is forget (for this purpose) ALL of it. This type of analytics is a completely different paradigm.

    Yeah I'm quickly coming to that conclusion.

    Other conclusions I'm coming two:

    • Data in JSON columns is fine for OLAP, but completely useless for columnstore indexes, so I need to ETL that out somehow
    • I think I should just write my analytics queries "normally" (in SQL; without using OLAP) then later add columnstore indexes to the database

Log in to reply