PHP: Email scraping from a list of urls listed in a txt file

I am trying to extract emails from urls listed in a txt file.

But only the last url of the txt file displays his related emails.

The output appears like this :

url1

url2

url3

url4

email: email address1

email: email address2

I do not understand what I am doing wrong.

Is there something obvious that I am missing ? Thank you for your help.

The code :

<?php
$handle = fopen("url-list.txt", "r");
if ($handle) {
    while (($url = fgets($handle)) !== false) {
        // process the line read.
        echo "<br>";
        echo $url ;
        echo "<br>";
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, FALSE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    $result = curl_exec($ch);
    curl_close($ch);

    $emails = extract_emails_from($result);
    
    foreach(array_unique($emails) as $email) {
        echo "email: ", trim($email);
        echo "<br>";
    }
}
fclose($handle);
} else {
    // error opening the file.
} 
function extract_emails_from($string) {
    preg_match_all("/[._a-zA-Z0-9-][email protected][._a-zA-Z0-9-]+/i", $string, $matches);
    return $matches[0];
}
?>

Answer

It’s a scope issue.

$emails = extract_emails_from($result);

The $emails variable gets rewritten in every iteration of the while loop.

Solution:

Create a new array outside of the loop and merge the results into it.

$allEmails = [];

while (($url = fgets($handle)) !== false) {
   // ... rest of your code
   $emails = extract_emails_from($result);
   $allEmails = array_merge($allEmails, $emails);
}

Leave a Reply

Your email address will not be published. Required fields are marked *