Trendlist: [New post] My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

Wednesday, October 6, 2021

[New post] My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

by Dwayne

Filed under: super-specific use case with hints of generality for those wanting to write long-running PHP scripts.

For a little while now, I have been building a site with WordPress that consumes news content from the Associated Press news API and then stores the news content in WordPress.

My first iteration of the ingest engine with PHP worked quite well, but I encountered dreaded NGINX server timeouts and other issues.

You see, with the Associated Press API, it works for the feed request a little like this.

Make request to feed endpoint
Some items might be returned in the first request
Parse items (if any) which requires making a separate request for an NITF file
If there is a next_page link property, make a request to it

How the AP API is designed is that requests are long-running. A request remains open using a long polling feature for about 15 seconds before requiring you to follow the following page link (if one exists). The idea is that you perpetually connect to the API (especially if there is breaking news).

The nature of PHP and web servers is you won't be able just to run a script and expect it always to run reliably. PHP isn't a language that lends itself to long-running processes, nor are servers like Apache or Nginx, but it can work with some patience. A lot of the solutions you will find for this problem go back as far as 2010.

My solution involves creating four shell scripts for ultimate control and using native cron functionality.

A script to run our PHP application and create a process for it
A script to check on our PHP application to ensure it is running (start it if it's not)
A script to reset our PHP application
A script to stop our PHP application

It is not a requirement you use shell scripts. You could very much call commands directly via your crons, but shell scripts will allow you to cleanly maintain your functionality and do status checks and so on.

The only package we are going to require to achieve our constant long-running PHP scripts is nohup. I am using Debian, and by default, nohup comes installed with the operating system. If you don't have nohup, use your respective package manager to install it.

The first shell script we create is the one that starts everything.

#!/bin/bash nohup /opt/bitnami/php/bin/php -q /opt/bitnami/wordpress/fetch-news.php >/dev/null 2>&1 &

We are running our PHP script using nohup and forcing it to be in the background. We also suppress PHP errors and output using a combination of -q as well as /dev/null 2>&1

Now, let's create a shell script that allows us to stop the process.

#!/bin/bash PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'` if [ "" !=  "$PID" ] then     echo "killing $PID"     kill -9 $PID else     echo "not running" fi

We are checking if our PHP script can be found in the running processes or not. It then tries to determine if it is, then kills the process using kill and the process ID.

Another script for checking if our script is running or not.

#!/bin/bash PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'` if [ "" !=  "$PID" ]  then     echo "Parser running on $PID" else     echo "Not running, going to start it"     cd /opt/bitnami/wordpress/     ./run-news-parser.sh fi

When I run my cron, this is the script that I call. It will check if my parser is running or not. If no process ID can be found, it's not running and, therefore, needs to be started using our run-news-parser.sh script.

Some improvements here could be having a maximum number of attempts to start before altogether bailing and maybe sending you a notification something went wrong (remote API went down, credentials expired or revoked, etc.).

And finally, a shell script that can restart our script.

#!/bin/bash PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'` if [ "" !=  "$PID" ] then     echo "killing $PID"     kill -9 $PID else     echo "not running" fi  echo "Starting again" /bitnami/wordpress/run-news-parser.sh

This script looks similar to that of the ones that came before it. It's a combination of the start and stop scripts. If it finds a process, it will kill it and then restart.

This approach might not be pure and some might laugh at how simple it is, but it works and will continue to work well into the future. I am probably going to rewrite these scripts to use Node.js, but for now, it's something I will keep using because it works (even if a little hacky).

Dwayne | October 7, 2021 at 3:30 pm | Categories: PHP | URL: https://wp.me/p56NQW-2it

Comment

Trendlist

Wednesday, October 6, 2021

[New post] My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own

Report Abuse

Labels

Wednesday, October 6, 2021

[New post] My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

New post on I Like Kill Nerds

My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own