andrewlocatelliwoodcock

Thoughts on Software

Posts Tagged ‘SQL

Using SQL and LEFT JOINs to find missing data

leave a comment »

It’s not unusual when working with data-driven applications to be asked to produce ad-hoc exception reoprst of the type: there should a at least one row in TableB for each row in TableA – can you tell me any rows in TableA that do not have at least one corresponding row in TableB?

How do you find something that’s not there?!

Our old friend the LEFT JOIN to the rescue. We already know from this post, that we can use LEF T JOINs to return every row from one table and any matching row from another and this is an extension of the same problem: in this case, we are looking specifically for every row in table A that does not have a matching row in table B.

Here’s how to do it:

SELECT
      a.Id
FROM
      TableA a
LEFT JOIN TableB b ON a.Id = b.Id
WHERE
      b.Id IS NULL

So, we’re returning everything in TableA, anything matching from TableB but then limiting the resultset to only those rows in TableA that DO NOT have a match in TableB. We do that with the statement:

WHERE b.Id IS NULL

This works because the database is returning all rows from TableA and matching rows from TableB but it still has to return something in the case where there are no matching rows in TableB: the special value NULL meaning “unknowable”. What our WHERE clause is saying is “only return those rows in TableA where the matching row in TableB is unknowable”, i.e. where we don’t have a matching row!

Simples.

Written by andrewlocatelliwoodcock

June 4, 2012 at 21:29

Posted in Databases, mySQL, SQL, SQL Server, T-SQL

Tagged with , ,

How and why to use LEFT JOINs in SQL statements

with one comment

LEF T JOINs are something I’ve been using in SQL statements for literally years without thinking much about it but a few conversations recently have made me realize that with the rise of ORMs, a lot of people are a lot less SQL-savvy than they were even a few years ago to the point that JOINs are a bit of a mystery. Most people seem to be able to use INNER JOINs correctly but LEFT JOINs cause a lot of confusion and hence this post …

A LEFT JOIN is used to return data from two tables where there are definitely rows in one table and there may be corresponding rows in the other. If there are, we want to see them and if there aren’t we don’t care: that is, we don’t want to see data only where there is data present in both tables. An example of this could be a report showing all students enrolled in college course and their grades, where some students may not yet have taken any exams but we still want to see all students and then any results for exams they have taken.

So, how do we do this?

Continuing with our example, we’ll need three tables, Students, Courses and Grades with the following schema:

Students: Id, FirstName, SecondName
Courses: Id, Description
Grades: StudentId, CourseId, Grade

What we want to see is all students and every grade they have received over the year. We’ll also want to see the course description so we’ll need to JOIN all three tables. Here’s how we do it:

SELECT 
      s.Id, s.FirstName, s.SecondName, g.Grade, c.Description 
FROM
      Students s
LEFT JOIN Grades g ON s.Id = g.StudentId
INNER JOIN Courses c ON c.Id = g.CourseId

(And just to explain s is declared as an alias of Students so s.Id is the same as writing Students.Id, etc. …)

The LEFT JOIN means: “give me everything on the left of the equals sign and any matching rows on the right of the equals sign”. So

FROM
      Students s
LEFT JOIN Grades g ON s.Id = g.StudentId

means: “give me everything from Students and any matching rows from the Grades table”

So, why the INNER JOIN on Courses? This is what allows us to get the course description from the Courses table. We are assuming here that there should never be a grade for a course that doesn’t exist (pretty reasonable assumption!), so we are restricting the grades we are returning to only those with a matching course.

So there you have it: how to return all students and any matching grades complete with course description in one simple SQL statement.

Later in the week, I’ll be publishing a post on how to use LEFT JOINs to find missing data …

Written by andrewlocatelliwoodcock

May 30, 2012 at 22:11

Posted in Databases, mySQL, SQL, SQL Server

Tagged with , ,

Creating a table directly from a SELECT statement in mySQL

with one comment

It is often useful to be able to select some data we are working with into a new table for further analysis. This is often achieved by first creating the new table and then populating it via a separate SQL statement but it can be quite time-consuming, especially when we want to work with larger numbers of columns, to write the full CREATE TABLE statement for the new table.

What would be ideal would be to be able to infer the structure of the new table directly from the SELECT statement and mySQL does actually give us a way to do that: the CREATE TABLE … SELECT statement.

Here’s an example of how it works:

CREATE TABLE my_new_working_table SELECT column_a, column_b, column_d, column_f FROM my_original_table;

And that’s it: mySQL will create the new table for you, inferring the correct structure from the SELECT statement and also insert the data that matches the SELECT. It won’t add indexes or the like but these can be added afterwards if required and when using this approach, often we only need the table for a short-term analysis task anyway.

Written by andrewlocatelliwoodcock

May 21, 2012 at 21:39

Posted in Databases, mySQL, SQL

Tagged with , ,

In mySQL, how do you find all tables that have foreign key constraints against another table

with one comment

The situation is that I need to know which tables in a database hold a foreign key constraint against a particular table, let’s call it TableX. The reason I need to know this is because I am planning to rename and retire TableX and in mySQL at least, the foreign key contraints follow the table rename. So really, I’m looking for metadata about my database and searching it for references to TableX.

It turns out that this is actually pretty simple:

USE information_schema;
SELECT * FROM KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_NAME = 'TableX' AND TABLE_SCHEMA='[my database name]';

This query will give you all the foreign keys in the named database that reference TableX. To find all foreign keys in the database is even simpler (although I can’t think of a use case for it at the moment):

USE information_schema;
SELECT * FROM KEY_COLUMN_USAGE WHERE TABLE_SCHEMA='bigfishgames';

Simples. Once you know how …

Written by andrewlocatelliwoodcock

May 8, 2012 at 21:53

Posted in Databases, mySQL, SQL

Tagged with , ,

Introduction to SQL Injection

leave a comment »

I wrote a post recently about creating flexible WHERE clauses in stored procedures to support, amongst other things, requirements for flexible search functionality without resorting to building and executing dynamic SQL against the database. It realised afterwards that the concept of SQL injection is still not universally understood and that it would be a good idea to write a post explaining what it is, what threat it poses and how to guard against it. I will also reveal the worst, most SQL injection friendly piece of code I have ever come across …

SQL injection is a technique whereby an attacker manages to get your database to run SQL of their choosing thereby gaining access to your complete user list, causing the database server to drop a table or even a database and generally causing mayhem. Typically an attacker will append extra instructions or conditions onto a valid SQL statement and when that statement is executed so is the attacker’s extra code. An attacker gaining access to your user account table is bad enough but imagine if you’ve also been sloppy enough to have stored passwords, credit card details, etc. …

Read the rest of this entry »

Written by andrewlocatelliwoodcock

June 1, 2011 at 19:55

Posted in SQL Server, T-SQL

Tagged with , , ,

Flexible Search using Stored Procedures: varying the WHERE clause at run-time

with 4 comments

Stored Procedures offer a number of advantages when developing databases, perhaps the chief among them being performance and security. I am loathe to lose these advantages but because they are pre-compiled blocks of SQL code, it can be challenging to develop Stored Procedures that can meet requirements for very flexible functionality such as search.

Over the years, I have seen a number of attempts to provide flexible search functionality. The worst of these involved building SQL Script on the client machine and then passing that script to the server for execution … SQL injection, anyone? Just as bad were stored procedures where the entire where clause was built on the client before being executed on the server: simply passing the where clause as a parameter to the stored procedure does not prevent SQL Injection attacks. Another method is to create a series of stored procedures, one for each combination of search terms. This is used in conjunction with business logic which decides which search stored procedure to run; this approach prevents SQL injection attacks (unless the stored procedure is spectacularly badly written) and also means that each search term has its own execution plan computed. The main disadvantage to this approach is that the number of stored procedures required increases exponentially with each additional search term. In these situations, the technique I prefer sacrifices some performance in order to retain security and improve flexibility.

Read the rest of this entry »

Written by andrewlocatelliwoodcock

May 30, 2011 at 20:37

Posted in SQL Server, T-SQL

Tagged with , , ,

Inserting the results of a stored procedure into a table

leave a comment »

There’s an extremely useful feature in T-SQL that let’s you insert a rowset from a stored procedure or function directly into a table or (from SQL Server 2005 onwards) a table variable.

Where this comes in really handy is that it allows you to manipulate the output of a stored procedure outside the context of that stored procedure, basically allowing better code reuse.

This feature is not undocumented and is hardly a state secret but whilst I can always remember that it exists, I can’t always remember how to use it, hence this post …

Examples:

INSERT #myTemporaryTable EXEC myStoredProcedure

INSERT myTable EXEC myStoredProcedure

INSERT @myTableVariable EXEC myStoredProcedure

There is one caveat however: the structure of the table must match the output of the stored procedure, so if the stored procedure is updated, this can break unrelated code …

Written by andrewlocatelliwoodcock

May 6, 2011 at 14:44

Posted in SQL Server, T-SQL

Tagged with , , ,