When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider:
DOM loading: loads all the document, making it easy to navigate and parse, and as such provides maximum flexibility for developers.
Streaming: implies iterating through the document, acts like a cursor and stops at each element in its way, thus avoiding memory overkill.
Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as memory is concerned.
composer require rodenastyle/stream-parser
Delegate as possible the callback execution so it doesn't blocks the document reading:
(Laravel Queue based example)
use Illuminate\Support\Collection;
StreamParser::xml("https://example.com/users.xml")->each(function(Collection $user){
dispatch(new App\Jobs\SendEmail($user));
});<bookstore>
<book ISBN="10-000000-001">
<title>The Iliad and The Odyssey</title>
<price>12.95</price>
<comments>
<userComment rating="4">
Best translation I've read.
</userComment>
<userComment rating="2">
I like other versions better.
</userComment>
</comments>
</book>
[...]
</bookstore>use Illuminate\Support\Collection;
StreamParser::xml("https://example.com/books.xml")->each(function(Collection $book){
var_dump($book);
var_dump($book->get('comments')->toArray());
});class Tightenco\Collect\Support\Collection#19 (1) {
protected $items =>
array(4) {
'ISBN' =>
string(13) "10-000000-001"
'title' =>
string(25) "The Iliad and The Odyssey"
'price' =>
string(5) "12.95"
'comments' =>
class Tightenco\Collect\Support\Collection#17 (1) {
protected $items =>
array(2) {
...
}
}
}
}
array(2) {
[0] =>
array(2) {
'rating' =>
string(1) "4"
'userComment' =>
string(27) "Best translation I've read."
}
[1] =>
array(2) {
'rating' =>
string(1) "2"
'userComment' =>
string(29) "I like other versions better."
}
}
Additionally, you could make use of ->withSeparatedParametersList() to get the params of each element separated on the __params property.
Also, ->withoutSkippingFirstElement() could be of help to parse the very first item (usually the element that contains the elements).
[
{
"title": "The Iliad and The Odyssey",
"price": 12.95,
"comments": [
{"comment": "Best translation I've read."},
{"comment": "I like other versions better."}
]
},
{
"title": "Anthology of World Literature",
"price": 24.95,
"comments": [
{"comment": "Needs more modern literature."},
{"comment": "Excellent overview of world literature."}
]
}
]use Illuminate\Support\Collection;
StreamParser::json("https://example.com/books.json")->each(function(Collection $book){
var_dump($book->get('comments')->count());
});int(2)
int(2)
title,price,comments
The Iliad and The Odyssey,12.95,"Best translation I've read.,I like other versions better."
Anthology of World Literature,24.95,"Needs more modern literature.,Excellent overview of world literature."
use Illuminate\Support\Collection;
StreamParser::csv("https://example.com/books.csv")->each(function(Collection $book){
var_dump($book->get('comments')->last());
});string(29) "I like other versions better."
string(39) "Excellent overview of world literature."
This library is released under MIT license.

