Data scientists handle time series data on a daily basis, and being able to manipulate and analyze this data is a necessary part of the job. The SQL window functions allow you to do just this and it is a common question in data science interviews. So, let’s talk about what time series data is, when to use it, and how to implement functions to help manage time series data.
What is time series data?
Time series data are variables within your data that have a time component. This means that each value of this attribute has a date or time value, sometimes both. Here are some examples of time series data:

• The daily price of company shares because each share price is associated with a specific day.
• The value of the daily average stock index during the last years because each value is assigned to a specific day
• Unique visits to a website for a month
• Platform logs every day
• Sales and monthly income
• Daily logins for an application
LAG and LEAD window functions
When dealing with time series data, a common calculation is to calculate growth or averages over time. This means that you will have to take the future date or the previous date and its associated values.

Two WINDOW functions that allow you to achieve this are LAG and LEAD, which are extremely useful for dealing with time-related data. The main difference between LAG and LEAD is that LAG gets data from previous rows, while LEAD is the opposite, gets data from subsequent rows.

We can use either of the two functions to compare growth from month to month, for example. As a data analytics professional, you will most likely be dealing with time-related data, and if you can use LAG or LEAD efficiently, you will be a very productive data scientist.

A data science interview question that requires a window function
Let’s review an advanced data science SQL interview question related to this window function. You will see that window functions are often part of interview questions, but you will also see them a lot in your daily work, so knowing how to use them is important.

Let’s go over an Airbnb question called Airbnb growth. If you want to follow it interactively, you can do it here.

The question is to estimate Airbnb’s growth each year using the number of registered hosts as a growth metric. The growth rate is calculated by taking ((number of registered hosts in the current year – number of registered hosts in the previous year) / number of registered hosts in the previous year) * 100.

Shows the year, the number of hosts in the current year, the number of hosts in the previous year, and the growth rate. Round the growth rate to the nearest percentage and sort the result in ascending order by year.
Approach Step 1: Count the host for the current year
The first step is to count the hosts by year, so we will need to extract the year from the date values.

SELECT extract (year
FROM host_since :: date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_since :: date)
ORDER BY YEAR
Approach Step 2: Count the host from the previous year.
This is where you will use the LAG window function. Here you will create a view where we have the year, the number of hosts in that current year and then the number of hosts from the previous year. Use a lag function for the previous year’s count and take last year’s value and put it in the same row as this year’s count. This way you will have 3 columns in your view: year, current year’s host count, and last year’s host count. The LAG feature allows you to easily extract the count of last year’s hosts in your queue. This makes it easy for you to implement any metric, such as a growth rate, because it has all the values ​​you need in a row for SQL to easily calculate a metric. Here is the code for it:

SELECT year,
host_current_year,
LAG (current_year_host, 1) OVER (ORDER BY year) AS prev_year_host
FROM
(SELECT extract (year
FROM host_since :: date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_since :: date)
ORDER PER year) t1) t2
Approach 3: implement the growth metric
As mentioned above, it is much easier to implement a metric like the one below when all the values ​​are in one row. That is why it performs the LAG function. Implement round of growth rate calculation (((current_year_host – prev_year_host) / (cast (prev_year_host AS numeric))) * 100) Estimated_growth

SELECT year,
host_current_year,
prev_year_host,
round (((host_current_year – host_previous_year) / (cast (host_previous_year AS numeric))) * 100) estimated_growth
FROM
(SELECT year,
host_current_year,
LAG (current_year_host, 1) OVER (ORDER BY year) AS prev_year_host
FROM
(SELECT extract (year
FROM host_since :: date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_since :: date)
ORDER PER year) t1) t2