Can SQL (relational databases) do this?



  • Here's the markov chain generator I scrawled out last night. I'm curious, because people seem to like row/table DBs more than document-based DBs: how would this be done in something like MySQL or SQL Server?



  • Microsoft T-SQL:

    -- *** Create tables
    
    -- Drop old tables
    drop table chain
    drop table word
    
    -- Word
    CREATE TABLE [dbo].[word](
    	[wordId] [bigint] IDENTITY(1,1) NOT NULL,
    	[word] [nvarchar](50) NOT NULL,
     CONSTRAINT [PK_wordList] PRIMARY KEY CLUSTERED 
    (
    	[wordId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    -- Chain
    CREATE TABLE [dbo].[chain](
    	[chainId] [bigint] IDENTITY(1,1) NOT NULL,
    	[word1] [bigint] NULL,
    	[word2] [bigint] NULL,
    	[word3] [bigint] NULL,
     CONSTRAINT [PK_chainList] PRIMARY KEY CLUSTERED 
    (
    	[chainId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    
    ALTER TABLE [dbo].[chain]  WITH CHECK ADD  CONSTRAINT [FK_chain_word] FOREIGN KEY([word1])
    REFERENCES [dbo].[word] ([wordId])
    GO
    
    ALTER TABLE [dbo].[chain] CHECK CONSTRAINT [FK_chain_word]
    GO
    
    ALTER TABLE [dbo].[chain]  WITH CHECK ADD  CONSTRAINT [FK_chain_word1] FOREIGN KEY([word2])
    REFERENCES [dbo].[word] ([wordId])
    GO
    
    ALTER TABLE [dbo].[chain] CHECK CONSTRAINT [FK_chain_word1]
    GO
    
    ALTER TABLE [dbo].[chain]  WITH CHECK ADD  CONSTRAINT [FK_chain_word2] FOREIGN KEY([word3])
    REFERENCES [dbo].[word] ([wordId])
    GO
    
    ALTER TABLE [dbo].[chain] CHECK CONSTRAINT [FK_chain_word2]
    GO
    
    -- *** Populate the word table
    SET IDENTITY_INSERT [dbo].[word] ON
    INSERT [dbo].[word] ([wordId], [word]) VALUES (1, N'this')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (2, N'is')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (3, N'a')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (4, N'sample')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (5, N'markov')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (6, N'chain')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (7, N'generator')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (8, N'written')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (9, N'in')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (10, N'SQL')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (11, N'three')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (12, N'dogs')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (13, N'are')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (14, N'hungry')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (15, N'four')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (16, N'cats')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (17, N'like')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (18, N'playing')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (19, N'with')
    INSERT [dbo].[word] ([wordId], [word]) VALUES (20, N'balls')
    SET IDENTITY_INSERT [dbo].[word] OFF
    
    -- *** Populate the chain table
    declare @numberLoops bigint
    declare @firstWord bigint
    declare @secondWord bigint
    declare @thirdWord bigint
    
    set @numberLoops = 10000
    
    truncate table chain
    
    while @numberLoops > 0
    begin
    	
    	set @numberLoops = @numberLoops - 1
    	
    	set @firstWord = ( select top 1 wordId from word order by NewID() )
    	set @secondWord = ( select top 1 wordId from word order by NewID() )
    	set @thirdWord = ( select top 1 wordId from word order by NewID() )
    	
    	insert into chain (word1, word2, word3 )
    	values( @firstWord, @secondWord, @thirdWord )
    	
    end
    
    select top 1000 * from chain
    
    declare @initialChainId bigint
    declare @nextChainId bigint
    declare @outputWord nvarchar(50)
    
    -- *** Initialize
    -- Select a random entry to be the first chain
    set @initialChainId = ( select top 1 chainId from chain order by NewID() )
    
    -- Set initial first and second word
    set @firstWord = ( select word1 from chain where chainId = @initialChainId )
    set @secondWord = ( select word1 from chain where chainId = @initialChainId )
    
    set @numberLoops = 100
    
    while @numberLoops > 0
    begin
    
    	set @numberLoops = @numberLoops - 1
    	
    	-- Find third word matching the first and second word
    	set @nextChainId = ( select top 1 chainId from chain where word1 = @firstWord and word2 = @secondWord order by NewID() )
    	set @thirdWord = ( select word3 from chain where chainId = @nextChainId )
    	
    	set @outputWord = ( select word from word where wordId = @thirdWord )
    	
    	print @outputWord
    	
    	-- Move the words "back" for the next link in the chain
    	set @firstWord = @secondWord
    	set @secondWord = @thirdWord
    
    end
    

    Output is gibberish because the chain table is filled with random combinations, but you get the idea.

    That all said, the map/reduce database can probably do the same task *faster*, but I'm not 100% sure, you'd have to implement both with the same data and benchmark.



  • I got home to this:


    Turns out trying to reduce 5551887 values with 3850233 unique keys takes a lot of memory. And a lot of memory use means a lot of swapping.

    I wrote some Go that does the same thing, but uses the hard drive as a stream instead of loading everything into memory and then processing and then dumping everything to the hard drive, and I'm currently testing it.

    MORE AT 8



  • Uh. Ok.



  • @blakeyrat said:

    Uh. Ok.

    Uh. Ok.



  • You asked how to do it in SQL Server; I replied with a method that works. I don't understand why you brought up... whatever was represented by that graph and whatever is represented by those two links to GitHub. If you're not interested in your own goddamned topic, I don't know what to tell you.

    So.

    Uh. Ok.



  • SQL Server or Oracle (though a lot of ppl on forum dislike it) can do most thing that your average programming language can.

    There is reason to dislike SQL that is mostly F-U-D (fear / uncertainty and doubt). Another thing I mention is that SQL is not most efficiency way to do things. Some things can be done very quickly and some thing are slow.
    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.



  • @Nagesh said:

    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.

    Holy shit.



  • @morbiuswilters said:

    @Nagesh said:
    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.

    Holy shit.

     

     

    This person "Phil Factor" has already done it in 2010, so my friend will  need variance of this very same procedure.


  • Considered Harmful

    @morbiuswilters said:

    @Nagesh said:
    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.

    Holy shit.

    You could do this in 15 minutes. Write a thin .NET class library that wraps the framework JSON code, import that into SQL Server as an assembly, then import the methods as functions/stored procedures.


  • ♿ (Parody)

    @joe.edwards said:

    @morbiuswilters said:
    @Nagesh said:
    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.

    Holy shit.

    You could do this in 15 minutes. Write a thin .NET class library that wraps the framework JSON code, import that into SQL Server as an assembly, then import the methods as functions/stored procedures.

    That feels like cheating. And wholly beyond the imagination of Nagesh's colleague.



  • @joe.edwards said:

    You could do this in 15 minutes. Write a thin .NET class library that wraps the framework JSON code, import that into SQL Server as an assembly, then import the methods as functions/stored procedures.

    I still have to wonder why you'd want to convert XML to JSON in the database..



  •  Especially considering there is a pretty decent open source library for converting XML to JSON on codeplex (and much faster than the one Microsoft provides).



  • @lucas said:

     Especially considering there is a pretty decent open source library for converting XML to JSON on codeplex (and much faster than the one Microsoft provides).


    Go standard library



  •  That link isn't on codeplex is it? I was talking about NewtonSoft's JSON library for .NET



  • @Ben L. said:

    @lucas said:

     Especially considering there is a pretty decent open source library for converting XML to JSON on codeplex (and much faster than the one Microsoft provides).


    Go standard library

    It silently discards duplicate key names.. (try giving it another Butt)



  • @boomzilla said:

    @joe.edwards said:
    @morbiuswilters said:
    @Nagesh said:
    My college is currently writing code to make conversion of "XML" to "JSON" using sql server stored procedure. I will tell you how it goes.

    Holy shit.

    You could do this in 15 minutes. Write a thin .NET class library that wraps the framework JSON code, import that into SQL Server as an assembly, then import the methods as functions/stored procedures.

    That feels like cheating. And wholly beyond the imagination of Nagesh's colleague.

     

     

    Agreed. I explain that use of CLR is easy, but our team struggle a lot due to overzeal of infrastructure team. So now Phil Factor is our man.

    My  colleague already writing him mail and he will reply soon, we hope.

     



  • @lucas said:

     That link isn't on codeplex is it? I was talking about NewtonSoft's JSON library for .NET

     

     

    Yes, that is beast and used a lot for developers working with FQL

     



  • Does any post in this thread have anything to do with it's predecessor?



    Ooh, I get it! You're guys doing a markov chain with posts instead of words! That's pretty genius!



  • @PSWorx said:

    Does any post in this thread have anything to do with it's predecessor?

    Ooh, I get it! You're guys doing a markov chain with posts instead of words! That's pretty genius!
     

     

    Every forum thread get derailed after a while. This is universal law. Staying on topic is never happening for any of threads in any of forums.


  • Discourse touched me in a no-no place

    @morbiuswilters said:

    It silently discards duplicate key names.
    Isn't that the correct thing to do with an associative map?


Log in to reply