In recent years, the travel industry in developing countries has experienced a high surge in the number of travelers opting for airlines as a preferred means of conveyance, due to the growing global middle class. It has led to the development of many online portals giving users ability to search for a flight, choose and compare options, and book a flight. It also enables common travelers, travel agents and tour operators to keep watch on flight prices so they can suggest the best time of travel for themselves and clients thus benefiting everyone monetarily. Flight prices often change based on demand and availability and it is of great interest to users if these prices can be tracked and further analyzed for more insights. Thus, I have created a web scraper to fetch useful flight data. Let’s get started.
In this project, I used Python, Selenium and Chrome webdriver to scrape Flight Data from the MakeMyTrip portal. The technologies are chosen solely based on requirements and ease of execution. Python is used to code the web scraper, while Selenium along with Chrome webdriver are used to simulate user interaction in the browsers. This scrapers’ logic can be divided into two parts. In part one, all the required data will be fetched using Selenium and Chrome webdriver. In part two, the fetched data will be parsed using beautifulsoup and written to a csv file.
Please note that this project assumes proper installation of Python 3.7, Selenium and Chrome webdriver (corresponding to your installed version of Chrome).
First, we will go to the Make My Trip website and search for a flight. After searching for a few flights, we notice that the URL shows a pattern that we can use to search for flights. Thus, we see that for a flight going from Mumbai to Delhi on 30 May 2019 we will see the following URL in the search bar:
This can be seen as a URL of the pattern:
https://www.makemytrip.com/flight/search?itinerary=”+ origin +”-“+ destination +”-“+ travelDate +”&tripType=O&paxType=A-1_C-0_I-0&intl=false&=&cabinClass=E”
We can now use this URL pattern to extract data for any kind of flight with origin and destination city airport codes and target date. Let’s get coding.
First, we will do all the necessary module imports in our python script. We will need the Selenium, CSV and BeautifulSoup modules to achieve this task. Let us import them.
Then, we define the variables to hold, the origin city code, destination city code, travel date which will be taken as a cmd/terminal input, and base URL for fetching data using the URL pattern above.
Now, we will request the base URL. This will open Chrome and start simulating user behavior. You can see that the Chrome browser navigates to the base URL and starts loading data. Here, we will need to wait for the data to fully load into the DOM by scrolling until the bottom of the page. This is where Selenium is needed to simulate the user waiting and scrolling to the bottom of the page.
Once the data is fully loaded after scrolling, we fetch the inner HTML element and assign it to a variable for parsing and data extraction. After this, the Chrome browser window will be closed.
Part two – Data Parsing and Saving to File:
In this part, we will parse the data fetched in part one using the BeautifulSoup module in python. Next, relevant HTML tags containing the required data will be selected from the parsed data.
Next, we will create a list of lists to hold the extracted data from the collected HTML tags holding flight data. Data text will be extracted and appended to this list of lists.
Finally, all of this data will be written to a CSV file and stored in the same directory.
All of this logic is put under a try-except block to catch unhandled exceptions and throw errors for resolution. On successful code execution, you will see a CSV file with the name “FlightsData_” + origin +”-“+ destin +”-“+ trDate.split(“/”) + “-” + trDate.split(“/”) + “-” + trDate.split(“/”) + “.csv” in the working directory. It should look as follows:
Thus, the full script code can be seen on Github here:
We can run this script executing the following command in the cmd/terminal:
Upon successful code execution, the command line will have the following output on screen.
Changes to this script can be easily made to take command line user inputs for destination and arrival cities. You may wish to run this script for multiple dates and multiple locations. You may also be interested in scheduling this script as a cron job to automatically gather flight data in your system, thus this script can be easily modified to be used for custom tasks. Please comment on this post for any questions or related issues.