Books

28.12.2018 - Jay M. Patel - Reading time ~2 Minutes

Getting Structured Data from Internet: Web Scraping and Rest APIs

ISBN-10: [TBA]
ISBN-13: [TBA]
Paperback: est. 2019

Leanpub

About the Book

This book will teach you web scraping to quickly get unlimited amounts of free data available on the web in structured format. You’ll learn Python scripts to not only to access free APIs to get structured data from websites such as Twitter, but you’ll also learn to scrape data from any HTML and Javascript page and convert that into Excel, CSV or SQL database of your choice. We will go beyond the basics of web scraping, and cover advanced topics such as natural language processing and text analytics to extract out top keywords, text summary, names of people, places, email addresses and contact details etc. from a page. All the code used in the book will be available to help you understand the concepts in practice and write your own web scraper.

Table of contents

  1. Introduction to web scraping: Why is web scraping essential and who uses web scraping?

  2. Intro to web services to get structured data

    • 2.1 Getting data from Twitter APIs
    • 2.2 Getting stock market data from Alphavantage

  3. Web scraping in python using Beautiful Soup library

    • 3.1 Tags and structure of HTML documents
    • 3.2 Cascading style sheets (CSS)
    • 3.3 Building first scraper with Beautiful Soup
    • 3.4 Scraping a HTML table into pandas dataframe
    • 3.5 Scraping XML files from clinicaltrials.gov

  4. Using selenium to scrape from Javascript

  5. Advanced Topics

    • 5.1 Boilerplate text removal
    • 5.2 Solving captchas
    • 5.3 Extracting top keywords, and text summarization from scraped documents
    • 5.4 Extracting names, entities from scraped documents

Green Chemistry Education: Recent Developments

ISBN 978-3-11-056649-9 (PDF)
ISBN 978-3-11-056588-1 (EPUB)
ISBN 978-3-11-056578-2 (Hardcover)
Hardcover: 219 pages
de Gruyter (December 17, 2018)

Buy on Amazon.com | Table of contents

About the Book

This book aims to cover recent advances in green chemistry, including application of cheminformatics, quantitative structure-activity relationships (QSARs) and statistical approaches to model chemical reactivity. With my co-authors, I contributed a chapter on using machine learning and knowledge based systems to predict environmental degradation of organic chemicals which are currently being used by US Environmental Protection Agency:

  • Mills T., Patel J.M., Stevens C.T. (2018). The environmental fate of synthetic organic chemicals.