Posts Tagged: data science


8
Oct 15

SQL Beginners Cheat Sheet

So its been a while again, but push comes to shove and I’m determined to get back on to making a return to this blog. I’m on a bit of a skills development fix at the moment and decided here would be a great spot to report back on what I’ve been doing and to create some documentation on the key learnings.

I recently went through a beginners course on SQL through Code Academy (link to the course here) which I found really interesting and potentially useful for future endeavors.

SUMMARY 

What is SQL? Known as a ‘Standard Query Language’ its a method of organizing and summarizing data. My general thinking when it comes to this is that taking a structured dataset it allows the user to summarize and manipulate the data. In the case of multiple datasets, it connects them through a common tie so the user can get a better understanding of how they interact.

Why is it potentially useful? I’ve experienced SQL in the context of a data tool where the language organizes raw data in such a way that its  a readable output for the end-user. When thinking about it in terms of even a basic excel spreadsheet, the language allows for the data to be summarized and reformatted saving the user from having to do a lot of manual work.

Below is a select set of terms from the course and example code:

TERMS/COMMANDS:

; is used as an ‘End statement’ and is placed at the end of a command. It tells the system to move to the next query statement.

There are three basic functions of the SQL language: To BUILD, ANALYZE or JOIN data tables.

 BUILD:

CREATE TABLE: is a command telling the system to build a new data table.

table_name(***): labels a table for use in queries and joining with other tables.

column_1 data_type: defines what data goes into a particular column (can be text, date, number etc.) These can be layered up to create a series of columns

INSERT INTO: is a formatting command used for define the set columns. Following this, column titles can be added ex: (id, name, age).

VALUES: This is a command used to enter in individual entries. Its formatted as VALUES (row value 1, row value 2…);

UPDATE: used to change entries in a table. Using ‘SET’ and ‘WHERE’ entries, data within the table can be altered

ALTER TABLE: Where as UPDATE alters an entry, ALTER TABLE allows the user to change or add data columns.

DELETE FROM: Deletes rows from a table.

 

Example Code:

CREAT TABLE table_name (
 column_1 data_type,
 column_2 data_type,
 column_3 data_type
 );
INSERT INTO celebs (id, name, age)
VALUES (1, 'Justin Bieber', 21);
 SELECT *FROM celebs;
UPDATE celebs
 SET age = 22
 WHERE id= 1;
 SELECT *FROM celebs;
ALTER TABLE celebs ADD COLUMN
 twitter_handle TEXT;
 SELECT *FROM celebs;
DELETE FROM celebs WHERE twitter_handle IS NULL;

ANALYZE:

SELECT *FROM ‘table name': displays all data from the labeled table the SQL interface. The * is used to notate ‘select the whole table’. Other commands such as ‘SELECT DISTINCT’ can be used to summarize a table further based on a column or row entry.

BETWEEN: Is a filter option for the use of numerical data.

ORDER BY: organizes a data set. A related command DESC/ASC(aka: Descending/Ascending) can be used to organize numbers by a set parameter.

LIMIT: Sets maximum number of rows pulled from a dataset.

GROUP BY: organizes data by a specific column entry (example: type of apple).

COUNT: summarizes numerical data by a category (example: how many entries contain X)

SUM: Counts up the numerical data of a particular column in a table ( Ex: How many impressions in total).

MAX(): finds the largest value of a criteria.

MIN(): finds the smallest value of a criteria.

AVG(): finds the average of a criteria.

ROUND(): rounds decimal numbers to a specific integer.

EXAMPLE CODE: 

SELECT* FROM movies WHERE imdb_rating >8;
SELECT * FROM movies
 WHERE name LIKE 'Se_en';
SELECT *FROM movies
 ORDER BY imdb_rating DESC;

SELECT price, ROUND(AVG(downloads),2) FROM fake_apps GROUP BY price;

JOIN:

JOIN: creates a common column between two tables which joins the data set together (ex: id_actormovie JOIN ID_Actor)

ON: states the parameter of the join.

Inner Join: joins entries when a join criteria is met.

Left Join: Combines two tables together side by side and provides a NULL value when the join parameters are not met.

AS: When joining a table this allows columns of the joined table to be renamed without affecting the original unjoined tables.

EXAMPLE CODE: 

SELECT
albums.name AS 'Album', albums.year, artists.name AS 'Artist'
 FROM albums JOIN artists ON albums.artist_id = artists.id WHERE albums.year >1990