Project Update Presentation

2 minute read

This is a presentation of everything that has been completed in this project so far.

What is the project?

The goal of this project is to compare the reviews of three different products, produced from the same manufacturer and sold by different merchandisers. The manufacturer is Ulker (a leading food company in Turkey famous for its sweet snacks). The products of Ulker can be found on the greatest e-commerce platform in Turkey: Trendyol. Similar to Amazon, you can get the same product from different sellers on Trendyol. We observed that the same product sold by different merchandisers have different concerns on the comments section. Therefore, we wanted to compare the comment section of three different products.

The products:

Progress of the Project

Identify the products to be used in analysis
Implement a web scraper to obtain the comments from the website
Clean and format the data, save and store in Excel files
Implement a Django Rest Framework API and post the data
Analyze the data

Codes Implemented for Web Scraping:

Utilized Selenium and BeautifulSoup

  # open the browser 
  browser = webdriver.Chrome(ChromeDriverManager().install())
  browser.maximize_window()
  browser.get(url)

 # load the page and manually sort comments, 
 # then scroll down automatically
 time.sleep(10)
 while i < loop:
     browser.execute_script("window.scrollBy(0, 700);")
     time.sleep(0.8)
     i += 1
     django_logger.debug(f' Loop number: {i}')

 html = browser.page_source
 soup = BeautifulSoup(html, 'lxml')

 # Get the reviews 
 brand = soup.find('div', class_='seller-name-text')
 brand_score = soup.find('div', class_='sl-pn')

 time.sleep(0.8)

 browser.execute_script("window.scrollBy(0, 400);")
 time.sleep(1)
 competitors = soup.findAll('div', class_='merchant-name-container')

 time.sleep(1)
 reviews = soup.find_all('div', class_='rnr-com-w')

 # store data in arrays 
 for r in reviews:
     django_logger.debug(f' Review number: {i}')
     i += 1
     review_div = r.find_all('div', class_='rnr-com-tx')
     for r_div in review_div:
         django_logger.debug(f' Review text: {r_div.text}')
         revs_arr.append(r_div.text)

     review_date = r.find_all('span', class_='rnr-com-usr')
     for r_date in review_date:
         date_review = r_date.text.split('|')[1]
         django_logger.debug(f' Review text: {date_review}')
         dates_arr.append(convert(date_review))

     review_shop = r.find_all('span', class_='seller-name-info')
     for r_shop in review_shop:
         django_logger.debug(f'store: {r_shop.text}')
         shops_arr.append(r_shop.text)

Here is an example screenshot from the comments that were scraped:

Unsplash image 9

Here is a screenshot showing how the data is stored in Excel Files:

Unsplash image 0

Complications and Problems:

Sorting the comments After automatically opening up the browser, we manually have to sort the comments from the drop-down menu. This is necessary to automate the process fully later on in the project.
Obtaining the rating of the product in each review The html parsing of the stars are complicated to parse through. Here is a screenshot for how they are represented: Each star has an empty and full component, and they are present in both cases. I am now looking for a way to count the stars that are actually full.

Next Steps:

Build an API to post the data tables
- Any recommendations?
Automate the process of scraping the data and posting to the API
Analyze the differences of the data (waiting on further information from my supervisor)

Share on

Twitter Facebook LinkedIn

Serra Goker

Project Update Presentation

This is a presentation of everything that has been completed in this project so far.

What is the project?

Progress of the Project

Complications and Problems:

Next Steps:

Share on

You may also enjoy

Update 7

Update 6

Update 5

Update 4