This guide shows how to extract any available metadata from a single archived collection page, including:
title
description
original collection URL
image URL
image alt text
collection handle
archived page URL
You do not need any coding experience. You only need to open the archived collection page, paste a script into your browser console, and copy the result.
Before you start
Make sure you already have the archived collection page open in your browser.
This works best on pages opened from the Wayback Machine or another archived version of a collection page. See this article if you have not generated the collection list file yet.
Step 1: Open the archived collection page
Open the archived collection page in your browser. You should be on the exact page you want to extract data from.
Step 2: Open your browser console
Choose the instructions for your browser.
Chrome
Windows: press
Ctrl + Shift + JMac: press
Cmd + Option + J
Safari
First enable the Develop menu in Safari settings if needed
Then press
Cmd + Option + C
Firefox
Windows: press
Ctrl + Shift + KMac: press
Cmd + Option + K
A panel will open, usually at the bottom or side of the browser window.
Step 3: Paste the script into the console
Copy the full script below.
Note: A description of what the script checks can be found further down in this guide.
(() => {
const headers = [
'title',
'description',
'url',
'image src url',
'image alt text',
'handle',
'url to the archived version of the collection page'
];
const getMeta = (selector) =>
document.querySelector(selector)?.getAttribute('content')?.trim() || '';
const getAbsUrl = (value) => {
if (!value) return '';
try {
return new URL(value, location.href).href;
} catch {
return value;
}
};
const getJsonLdNodes = () => {
return [...document.querySelectorAll('script[type="application/ld+json"]')]
.flatMap((el) => {
try {
const json = JSON.parse(el.textContent.trim());
const flatten = (obj) => {
if (!obj) return [];
if (Array.isArray(obj)) return obj.flatMap(flatten);
if (obj['@graph']) return flatten(obj['@graph']);
return [obj];
};
return flatten(json);
} catch {
return [];
}
});
};
const nodes = getJsonLdNodes();
const bestNode =
nodes.find(x => String(x.url || x['@id'] || '').includes('/collections/')) ||
nodes.find(x => String(x['@type'] || '').match(/CollectionPage|WebPage/i)) ||
{};
const ldUrl = getAbsUrl(bestNode.url || bestNode['@id'] || '');
const ogUrl = getAbsUrl(getMeta('meta[property="og:url"]'));
const canonicalUrl = document.querySelector('link[rel="canonical"]')?.href || '';
const imageFromLd = (() => {
const img = bestNode.image;
if (typeof img === 'string') return img;
if (Array.isArray(img) && typeof img[0] === 'string') return img[0];
if (Array.isArray(img) && img[0]?.url) return img[0].url;
if (img?.url) return img.url;
return '';
})();
const imageAltFromLd = (() => {
const img = bestNode.image;
if (Array.isArray(img) && typeof img[0] === 'object') {
return img[0].caption || img[0].name || img[0].description || '';
}
if (img && typeof img === 'object') {
return img.caption || img.name || img.description || '';
}
return '';
})();
const title =
bestNode.name ||
bestNode.headline ||
getMeta('meta[property="og:title"]') ||
getMeta('meta[name="twitter:title"]') ||
getMeta('meta[name="title"]') ||
document.title.trim() ||
'';
const description =
bestNode.description ||
getMeta('meta[property="og:description"]') ||
getMeta('meta[name="twitter:description"]') ||
getMeta('meta[name="description"]') ||
'';
const url =
ldUrl ||
ogUrl ||
canonicalUrl ||
location.href;
const imageSrc =
getAbsUrl(
imageFromLd ||
getMeta('meta[property="og:image:secure_url"]') ||
getMeta('meta[property="og:image"]') ||
getMeta('meta[name="twitter:image"]')
);
const imageAlt =
imageAltFromLd ||
getMeta('meta[property="og:image:alt"]') ||
getMeta('meta[name="twitter:image:alt"]') ||
'';
const handle = (() => {
try {
return new URL(url).pathname.split('/').filter(Boolean).pop() || '';
} catch {
return '';
}
})();
const row = {
'title': title,
'description': description,
'url': url,
'image src url': imageSrc,
'image alt text': imageAlt,
'handle': handle,
'url to the archived version of the collection page': location.href
};
const csv = [
headers.join(','),
headers.map(h => JSON.stringify(row[h] ?? '')).join(',')
].join('\n');
console.log('Row object:', row);
console.log(csv);
return csv;
})();
Paste it into the console and press Enter.
Step 4: Copy the result
After running the script, you will see two outputs in the console:
1. A row object
This is a readable preview of the extracted data.
2. A CSV result
This is the line you need to copy. It will look something like this:
title,description,url,image src url,image alt text,handle,url to the archived version of the collection page
"Abstract","Abstract collection","https://www.example.com/collections/abstract","https://www.example.com/image.jpg","","abstract","https://web.archive.org/..."
Copy the CSV output.
Step 5: Paste into a spreadsheet
Open Excel or Google Sheets and paste the CSV output.
If you are collecting multiple pages, repeat the same process for each archived collection page and paste each new row underneath the previous one.
What the script checks
The script looks for metadata in several places on the page, then uses the best available value.
It checks:
structured data in
application/ld+jsonOpen Graph tags like
og:title,og:description,og:imageTwitter meta tags
standard meta description
canonical URL
the current archived page URL
This helps it return useful values even if some metadata is missing.
Troubleshooting
Nothing happens
Make sure you pasted the full script and pressed Enter.
The result is blank
Some archived pages do not include all metadata. The script will still return whatever it can find.
The URL looks like the archived page instead of the original page
That usually means the original collection URL was not available in the page metadata, so the script used the best fallback.
Tips
Run the script on the exact collection page, not on the collection list or homepage
Use the CSV output, not the preview object, when pasting into a spreadsheet
Keep the header row only once if you are combining results from many pages
Need to collect many pages?
If you are working through a large list of archived collection pages, here is the condensed steps from this guide:
open a page
run the script
copy the CSV row
paste it into your spreadsheet
repeat for the next page
