1 minute read

Update Week 2

Tools to be used for the project

As mentioned in the proposal, this project will utilize web scraping tools to obtain data from websites. Web scraping refers to the automated process of extracting data from website using software tools. For the scope of this project I will utilize these tools:

As the webpage that I am scraping has a dynamic comments section, the ChromeDriver package from Selenium opens up a chrome browser run by the script. This script scrolls down to the comments section of the product. With the sleep functionality of chromedriver, the browser freezes for 3 seconds before proceeding to scraping the comments. During these three seconds, I click the sort comments by date dropdown to ease the process of refining the latest comments. This is currently a loophole I am intending to fix, so that the scraping can be fully automated.

After sorting the comments, utilizing beautifulsoup4, I parsed the date, the text of the comment, and the seller name. Another challenge that arose during this process was parsing the score of the review. The score of the review is represented in the html as star icons filled in. I will be working furter on this challenge to come up with a way to parse the scores as well.

To-do for next time:

  • automate date sorting
  • parse review scores

Updated: