Articles
Projects walkthroughs, tool teardowns, interviews, and more.
Articles tagged: scraping
-
Running scrapers on GitHub to simplify your workflow
By Iris Lee
Posted onHow the LAT Data and Graphics team uses GitHub Actions to keep code and data in one place, and track scraper history for free.
-
How to Save DNAInfo/Gothamist Bylines
By Erin Kissane
Posted onThe owner of the DNAInfo and Gothamist family of local news websites shut the sites down today, which means that not only are all their 115 journalists out of work, but all their bylines—and all the vital information in their years of reporting—is gone.
-
How We Tracked Cable News Chyrons
By Kevin Schaul
Posted onReporting on media bias and the bubbles it creates is nothing new. But last week’s Senate Intelligence Committee hearing provided a rare opportunity to explore a new angle. CNN, MSNBC, and Fox News all aired former FBI director James Comey’s testimony live and uninterrupted. The graphics team at The Washington Post tracked what each network displayed in its lower third caption panel—also called a chyron—and showed it to readers as the hearing unfolded. (You can see the finished piece here.)
-
The Twitterverse of Donald Trump, In 26,234 Tweets
By Lam Thuy Vo
Posted onWe wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from. But first, we had to get the tweets.
-
Tracking Amtrak 188
By Michael Keller
Posted onHow curiosity and tinkering let Al Jazeera America publish historical data for a derailed train’s route without Amtrak’s cooperation.
-
Scraping Nevada
By Derek Willis
Posted onDerek Willis breaks down the three stages of scraping (denial, annoyance, and acceptance) while confronting the election-results form from hell.
-
To Scrape, Perchance to Tweet
By Abe Epton
Posted onAt the Chicago Tribune, we had a simple goal: to automatically tweet contributions to Illinois politicians of $1,000 or more, which campaigns are required to report within five business days. To see, in something approximating real time, which campaigns are bringing in the big bucks and who those big-buck-bearers are. The Illinois State Board of Elections (ISBE) has helpfully published exactly this data for years online, in a format that appears to have changed very little since at least the mid-2000s. There’s no API for this data, but the stability of the format is encouraging. A scraper is hardly an ideal tool for anything intended to last for a while and produce public-facing data, but if we can count on the format of the page not to change much over at least the next several months, it’s probably worth it.