Portfolio Project powered by Infare

Web Data Extraction Robot Development

Code Projected Over Woman

This task is designed to evaluate your ability to develop web data extraction robot, a crucial skill for data collection and analysis in various industries, especially in fields like travel, e-commerce, and market research. By completing this task, you’ll demonstrate your proficiency in navigating web-based APIs, parsing JSON data, and organizing extracted information into a structured format.

In this context, the ability to extract flight information from a JSON-based API is particularly relevant to Infare for extracting real-time data about flight routes, prices, and availability. Understanding and mastering web data extraction techniques empower data professionals to automate data collection processes, enabling faster decision- making and deeper insights.

Through this task, we aim to assess several key skills:

  • Technical Proficiency
  • Data Analysis capability.
  • Problem-Solving
  • Attention to Detail

Using the given URL http://homeworktask.infare.lt/ create a web data extraction robot that collects roundtrip flight information from this API and saves it to a local CSV file.

Suggested steps:

    1. Page Load and JSON Retrieval: Implement a pageload to retrieve the JSON data using your program.

    2. Extract Flight Data: Extract outbound and inbound flight data for flights from MAD to AUH, regardless of dates.

    3. Flight Combinations: Generate roundtrip flight combinations of outbound and inbound flights for each price category.

    4. Price and Tax Calculation: Extract all available prices and calculate taxes for each roundtrip flight combination.

    5. Parameter Flexibility: Ensure your program can work with any search parameter set, such as origin, destination, and dates.

    6. CSV Data Export: Save the extracted data in CSV format.

    Data to be extracted (CSV example visible on the start page):

    • Departure and arrival airport three-letter IATA codes for each flight (including connections).
    • Departure and arrival dates with times for each flight (including connections).
    • Flight numbers of each flight (two-character airline company designator with flight number in digits ex. BA4040)
    • All roundtrip flight combinations with price.
    • Roundtrip flight combination total taxes. 

      1. The website acts as a front-end for a JSON based API. The result page doesn‘t have a front-end, so in order to better understand the JSON data, use the front-end example provided on the start page. The example visualizes data you will get from the API with given search parameters.

      2. Feel free to use any coding language.

      3. Prefer HTTP requests over headless browsers.

      4. Data needs to be saved in CSV format. The CSV data structure should be the same as in the CSV example file provided on the start page.

      5. Suggested completion time is 10-15 hours.

      3. If you want Infare team to review and provide feedback, please send your competed task to jzu@infare.com (Subject Line – Women go tech task for review_Your Name Surname)

      Recommended Resources

      • Use HttpClient library to make HTTP request if using C# or something similar for other languages.
      • Newtonsoft library for working with JSON format if using C# or something similar for other languages.
      • JSON Viewer as visual JSON viewer.
      • Telektrik Fiddler for network traffic tracking.