andrewlocatelliwoodcock

Thoughts on Software

Archive for the ‘Databases’ Category

Unable to replicate from local CouchDB to Cloudant – a quick resolution for the {“error”:”shutdown”} message

leave a comment »

Just in case anyone else hits an issue whereby you are unable to replicate a local CouchDB database to Cloudant … I was using Curl on Windows (which never helps of course ūüôā ) and found that I was unable to replicate from a local instance to Cloudant even though my passwords, ca cert bundle, etc. were all correct. Naturally, it took a couple of goes to get the curl escaping on Windows right but even once I’d figured that out it was still failing with the uninformative

{"error":"shutdown"}

I eventually found some posts that pointed to my (very) out-of-date CouchDB instance being the issue: you need to be running CouchDB 1.2.0 preferably on Erlang R15 for this to work correctly.

I upgraded everything and key presto! It works!

Written by andrewlocatelliwoodcock

June 15, 2012 at 18:13

Posted in CouchDb, Databases, Erlang

Tagged with , ,

Connecting to Cloudant from Erlang: a quick example of using HTTPS from httpc:request

with 2 comments

Wiser heads than me will no doubt already know this but I for one struggled with working out how to do this so I thought I’d put the example up in the hope it will help others. So, what I was trying to do was get and put to a Cloudant database from Erlang. Cloudant requires both HTTPS and a username and password and I struggled to find an Erlang example online.

First, make sure that you can connect to Cloudant over curl and fetch at least the _all_dbs resource. If you can’t, Erlang isn’t going to work either. I found that to get curl working correctly, I needed to update the ca cert bundle that came with it.

Here’s how to retrive _all_dbs for your Cloudant account using curl:

curl https://username:password@username.cloudant.com/_all_dbs

where¬†username is your Cloudant username and¬†password is the password associated with your username. For example, if your username were ‘foo’ and your password ‘bar’, here’s the command you would use:

curl https://foo:bar@foo.cloudant.com/_all_dbs

If this doesn’t work, the most likely explanation is that your ca cert bundle is out of date: follow the instructions on the curl website for how to update them.
Assuming this worked, your now in a position to try the same with Erlang. I’m going to use the httpc library here:

inets:start(),

ssl:start(),

httpc:request (get, {“https://foo:bar@foo.cloudant.com/_all_dbs”, []}, [{ssl,[{verify,0}]}], []).

It is important to remember to start inets and ssl before attempting to use httpc:request. A put to Cloudant is similar, here’s an example that assumes the existance of a database barfoo¬†and a resource widget on your Cloudant instance and also that you have a suitably initialized variable Update which contains the data which you are writing to the widget resource:

inets:start(),

ssl:start(),

httpc:request (put, {“https://foo:bar@foo.cloudant.com/barfoo/widget”, [], [], Update}, [], []).

And there you have it

Written by andrewlocatelliwoodcock

June 12, 2012 at 11:40

Using SQL and LEFT JOINs to find missing data

leave a comment »

It’s not unusual when working with data-driven applications to be asked to produce ad-hoc exception reoprst of the type: there should a at least one row in TableB for each row in TableA – can you tell me any rows in TableA¬†that do not have at least one corresponding row in TableB?

How do you find something that’s not there?!

Our old friend the LEFT JOIN to the rescue. We already know from this post, that we can use LEF T JOINs to return every row from one table and any matching row from another and this is an extension of the same problem: in this case, we are looking specifically for every row in table A that does not have a matching row in table B.

Here’s how to do it:

SELECT
      a.Id
FROM
      TableA a
LEFT JOIN TableB b ON a.Id = b.Id
WHERE
      b.Id IS NULL

So, we’re returning everything in TableA, anything matching from TableB but then limiting the resultset to only those rows in TableA that DO NOT have a match in TableB. We do that with the statement:

WHERE b.Id IS NULL

This works because the database is returning all rows from TableA and matching rows from TableB but it still has to return something in the case where there are no matching rows in TableB: the special value NULL meaning “unknowable”. What our WHERE clause is saying is “only return those rows in TableA where the matching row in TableB is unknowable”, i.e. where we don’t have a matching row!

Simples.

Written by andrewlocatelliwoodcock

June 4, 2012 at 21:29

Posted in Databases, mySQL, SQL, SQL Server, T-SQL

Tagged with , ,

How and why to use LEFT JOINs in SQL statements

with one comment

LEF T JOINs are something I’ve been using in SQL statements for literally years without thinking much about it but a few conversations recently have made me realize that with the rise of ORMs, a lot of people are a lot less SQL-savvy than they were even a few years ago to the point that JOINs are a bit of a mystery. Most people seem to be able to use INNER JOINs correctly but LEFT JOINs cause a lot of confusion and hence this post …

A LEFT JOIN is used to return data from two tables where there are definitely rows in one table and there may be corresponding rows in the other. If there are, we want to see them and if there aren’t we don’t care: that is, we don’t want to see data only where there is data present in both tables. An example of this could be a report showing all students enrolled in college course and their grades, where some students may not yet have taken any exams but we still want to see all students and then any results for exams they have taken.

So, how do we do this?

Continuing with our example, we’ll need three tables,¬†Students, Courses and Grades with the following schema:

Students: Id, FirstName, SecondName
Courses: Id, Description
Grades: StudentId, CourseId, Grade

What we want to see is all students and every grade they have received over the year. We’ll also want to see the course description so we’ll need to JOIN all three tables. Here’s how we do it:

SELECT 
      s.Id, s.FirstName, s.SecondName, g.Grade, c.Description 
FROM
      Students s
LEFT JOIN Grades g ON s.Id = g.StudentId
INNER JOIN Courses c ON c.Id = g.CourseId

(And just to explain s is declared as an alias of Students so s.Id is the same as writing Students.Id, etc. …)

The LEFT JOIN means: “give me everything on the left of the equals sign and any matching rows on the right of the equals sign”. So

FROM
      Students s
LEFT JOIN Grades g ON s.Id = g.StudentId

means: “give me everything from Students and any matching rows from the Grades table”

So, why the INNER JOIN on Courses? This is what allows us to get the course description from the Courses table. We are assuming here that there should never be a grade for a course that doesn’t exist (pretty reasonable assumption!), so we are restricting the grades we are returning to only those with a matching course.

So there you have it: how to return all students and any matching grades complete with course description in one simple SQL statement.

Later in the week, I’ll be publishing a post on how to use LEFT JOINs to find missing data …

Written by andrewlocatelliwoodcock

May 30, 2012 at 22:11

Posted in Databases, mySQL, SQL, SQL Server

Tagged with , ,

Creating a table directly from a SELECT statement in mySQL

with one comment

It is often useful to be able to select some data we are working with into a new table for further analysis. This is often achieved by first creating the new table and then populating it via a separate SQL statement but it can be quite time-consuming, especially when we want to work with larger numbers of columns, to write the full CREATE TABLE statement for the new table.

What would be ideal would be to be able to infer the structure of the new table directly from the SELECT statement and mySQL does actually give us a way to do that: the CREATE TABLE … SELECT statement.

Here’s an example of how it works:

CREATE TABLE my_new_working_table SELECT column_a, column_b, column_d, column_f FROM my_original_table;

And that’s it: mySQL will create the new table for you, inferring the correct structure from the SELECT statement and also insert the data that matches the SELECT. It won’t add indexes or the like but these can be added afterwards if required and when using this approach, often we only need the table for a short-term analysis task anyway.

Written by andrewlocatelliwoodcock

May 21, 2012 at 21:39

Posted in Databases, mySQL, SQL

Tagged with , ,

In mySQL, how do you find all tables that have foreign key constraints against another table

with one comment

The situation is that I need to know which tables in a database hold a foreign key constraint against a particular table, let’s call it TableX. The reason I need to know this is because I am planning to rename and retire TableX and in mySQL at least, the foreign key contraints follow the table rename. So really, I’m looking for metadata about my database and searching it for references to TableX.

It turns out that this is actually pretty simple:

USE information_schema;
SELECT * FROM KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_NAME = 'TableX' AND TABLE_SCHEMA='[my database name]';

This query will give you all the foreign keys in the named database that reference TableX. To find all foreign keys in the database is even simpler (although I can’t think of a use case for it at the moment):

USE information_schema;
SELECT * FROM KEY_COLUMN_USAGE WHERE TABLE_SCHEMA='bigfishgames';

Simples. Once you know how …

Written by andrewlocatelliwoodcock

May 8, 2012 at 21:53

Posted in Databases, mySQL, SQL

Tagged with , ,

Restoring CouchDB databases from file

with one comment

I had a situation recently where I had the database files for several CouchDB databases backed up in source control but no longer had the actual databases available to clone.

What I needed to do was to restore my databases onto a different machine using the database files that were in the source repository.

As it turns out, this is actually pretty simple although not well documented.

CouchDB stores its data files in %COUCHDB%\var\lib\couchdb\ which on my Windows VM equated to C:\Program Files (x86)\Apache Software Foundation\CouchDB\var\lib\couchdb

In order to restore my databases from source control (or any other source where you have stored the physical files for that matter), it was simply a matter of copying them to C:\Program Files (x86)\Apache Software Foundation\CouchDB\var\lib\couchdb

Open Futon afterwards and there are your recovered CouchDB files. Simples!

Written by andrewlocatelliwoodcock

September 22, 2011 at 18:55

Posted in CouchDb

Tagged with ,

Introduction to SQL Injection

leave a comment »

I wrote a post recently about creating flexible WHERE clauses in stored procedures to support, amongst other things, requirements for flexible search functionality without resorting to building and executing dynamic SQL against the database. It realised afterwards that the concept of SQL injection is still not universally understood and that it would be a good idea to write a post explaining what it is, what threat it poses and how to guard against it. I will also reveal the worst, most SQL injection friendly piece of code I have ever come across …

SQL injection is a technique whereby an attacker manages to get your database to run SQL of their choosing thereby gaining access to your complete user list, causing the database server to drop a table or even a database and generally causing mayhem. Typically an attacker will append extra instructions or conditions onto a valid SQL statement and when that statement is executed so is the attacker’s extra code.¬†An attacker gaining access to your user account table is bad enough but imagine if you’ve also been sloppy enough to have stored passwords, credit card details, etc. …

Read the rest of this entry »

Written by andrewlocatelliwoodcock

June 1, 2011 at 19:55

Posted in SQL Server, T-SQL

Tagged with , , ,

Flexible Search using Stored Procedures: varying the WHERE clause at run-time

with 4 comments

Stored Procedures offer a number of advantages when developing databases, perhaps the chief among them being performance and security. I am loathe to lose these advantages but because they are pre-compiled blocks of SQL code, it can be challenging to develop Stored Procedures that can meet requirements for very flexible functionality such as search.

Over the years, I have seen a number of attempts to provide flexible search functionality. The worst of these involved building SQL Script on the client machine and then passing that script to the server for execution … SQL injection, anyone? Just as bad were stored procedures where the entire where clause was built on the client before being executed on the server: simply passing the where clause as a parameter to the stored procedure does not prevent SQL Injection attacks. Another method is to create a series of stored procedures, one for each combination of search terms. This is used in conjunction with business logic which decides which search stored procedure to run; this approach prevents SQL injection attacks (unless the stored procedure is spectacularly badly written) and also means that each search term has its own execution plan computed. The main disadvantage to this approach is that the number of stored procedures required increases exponentially with each additional search term. In these situations, the technique I prefer sacrifices some performance in order to retain security and improve flexibility.

Read the rest of this entry »

Written by andrewlocatelliwoodcock

May 30, 2011 at 20:37

Posted in SQL Server, T-SQL

Tagged with , , ,

Curl returning ‚ÄúInvalid UTF-8 JSON‚ÄĚ error from CouchDb on Windows although JSON is correct

leave a comment »

I have been playing around with CouchDB for a while now but early on hit a problem with some of the examples in O’Reilly’s “CouchDB: The Definitive Guide”: Curl, when used on Windows, kept giving an “Invalid UTF-8 JSON” error even though the JSON itself was correct.

The example I was trying to use was taken from the O’Reilly book:

curl -X PUT http://username:password@127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters"}' 

The database albums existed and the username and password were correct … so what was going on? Turns out it was an issue with escaping quotes in the cmd shell in Windows … Basically, use double instead of single quotes for the JSON string being posted and escape all quotes within that string with \”:

curl -X PUT http://username:password@127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af -d "{\"title\":\"There is Nothing Left to Lose\",\"artist\":\"Foo Fighters\"}" {"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"1-4b39c2971c9ad54cb37e08fa02fec636"}

Obviously, replace username and password values with your actual user name and password but this example will work.

Written by andrewlocatelliwoodcock

May 11, 2011 at 21:57

Posted in CouchDb

Tagged with ,