Skip to main content

Extract data from a single collection page

How to extract additional data from a single archived collection page, including seo meta title, description, image alt text etc.

Written by Bjorn Forsberg
Updated today

This guide shows how to extract any available metadata from a single archived collection page, including:

  • title

  • description

  • original collection URL

  • image URL

  • image alt text

  • collection handle

  • archived page URL

You do not need any coding experience. You only need to open the archived collection page, paste a script into your browser console, and copy the result.


Before you start

Make sure you already have the archived collection page open in your browser.

This works best on pages opened from the Wayback Machine or another archived version of a collection page.

To get a list of your sites historical collections pages, please see the Recovering a sites collections pages article first. You can open an archived version of a collection page from the last column in the CSV file.


Step 1: Open the archived collection page

Open the archived collection page in your browser. You should be on the exact page you want to extract data from.

Step 2: Open your browser console

Choose the instructions for your browser.

Chrome

  • Windows: press Ctrl + Shift + J

  • Mac: press Cmd + Option + J

Safari

  • First enable the Develop menu in Safari settings if needed

  • Then press Cmd + Option + C

Firefox

  • Windows: press Ctrl + Shift + K

  • Mac: press Cmd + Option + K

A panel will open, usually at the bottom or side of the browser window.

Step 3: Paste the script into the console

Copy the full script below.

Note: A description of what the script checks can be found further down in this guide.

(() => {
const getMeta = (selector) =>
document.querySelector(selector)?.getAttribute('content')?.trim() || '';

const getAbsUrl = (value) => {
if (!value) return '';
try {
return new URL(value, location.href).href;
} catch {
return value;
}
};

const getJsonLdNodes = () => {
return [...document.querySelectorAll('script[type="application/ld+json"]')]
.flatMap((el) => {
try {
const json = JSON.parse(el.textContent.trim());

const flatten = (obj) => {
if (!obj) return [];
if (Array.isArray(obj)) return obj.flatMap(flatten);
if (obj['@graph']) return flatten(obj['@graph']);
return [obj];
};

return flatten(json);
} catch {
return [];
}
});
};

const pretty = (value) => {
const v = value ?? '';
if (v === '') return '""';
if (/[\n\r"]/.test(v)) return JSON.stringify(v);
if (v !== v.trim()) return JSON.stringify(v);
return v;
};

const nodes = getJsonLdNodes();

const bestNode =
nodes.find(x => String(x.url || x['@id'] || '').includes('/collections/')) ||
nodes.find(x => String(x['@type'] || '').match(/CollectionPage|WebPage/i)) ||
{};

const ldUrl = getAbsUrl(bestNode.url || bestNode['@id'] || '');
const ogUrl = getAbsUrl(getMeta('meta[property="og:url"]'));
const canonicalUrl = document.querySelector('link[rel="canonical"]')?.href || '';

const imageFromLd = (() => {
const img = bestNode.image;
if (typeof img === 'string') return img;
if (Array.isArray(img) && typeof img[0] === 'string') return img[0];
if (Array.isArray(img) && img[0]?.url) return img[0].url;
if (img?.url) return img.url;
return '';
})();

const imageAltFromLd = (() => {
const img = bestNode.image;
if (Array.isArray(img) && typeof img[0] === 'object') {
return img[0].caption || img[0].name || img[0].description || '';
}
if (img && typeof img === 'object') {
return img.caption || img.name || img.description || '';
}
return '';
})();

const title =
bestNode.name ||
bestNode.headline ||
getMeta('meta[property="og:title"]') ||
getMeta('meta[name="twitter:title"]') ||
getMeta('meta[name="title"]') ||
document.title.trim() ||
'';

const description =
bestNode.description ||
getMeta('meta[property="og:description"]') ||
getMeta('meta[name="twitter:description"]') ||
getMeta('meta[name="description"]') ||
'';

const url =
ldUrl ||
ogUrl ||
canonicalUrl ||
location.href;

const imageSrc = getAbsUrl(
imageFromLd ||
getMeta('meta[property="og:image:secure_url"]') ||
getMeta('meta[property="og:image"]') ||
getMeta('meta[name="twitter:image"]')
);

const imageAlt =
imageAltFromLd ||
getMeta('meta[property="og:image:alt"]') ||
getMeta('meta[name="twitter:image:alt"]') ||
'';

const handle = (() => {
try {
return new URL(url).pathname.split('/').filter(Boolean).pop() || '';
} catch {
return '';
}
})();

const archivedUrl = location.href;

const csvLine = [
title,
description,
url,
imageSrc,
imageAlt,
handle,
archivedUrl
].map(v => JSON.stringify(v ?? '')).join(',');

const output = [
csvLine,
'',
`title: ${pretty(title)}`,
`description: ${pretty(description)}`,
`url: ${pretty(url)}`,
`image src url: ${pretty(imageSrc)}`,
`image alt text: ${pretty(imageAlt)}`,
`handle: ${pretty(handle)}`,
`url to the archived version of the collection page: ${pretty(archivedUrl)}`
].join('\n');

console.log(output);
})();


Paste it into the console and press Enter.

Step 4: Copy the result

After running the script, you will see two outputs in the console:

1. A CSV result

This is the line you need to copy. It will look something like this:

"Abstract","Abstract collection","https://www.example.com/collections/abstract","https://www.example.com/image.jpg","","abstract","https://web.archive.org/..."

Copy the CSV output.

2. A line for each field

This is a readable preview of the extracted data, useful for copy pasting individual fields:

title: Abstract
description: Abstract collection
url: https://www.example.com/collections/abstract
image src url: https://www.example.com/image.jpg
image alt text: Abstract collection
handle: abstract
url to the archived version of the collection page: https://web.archive.org/...


Step 5: Paste into a spreadsheet

Open Excel or Google Sheets and paste the CSV output.

If you are collecting multiple pages, repeat the same process for each archived collection page and paste each new row underneath the previous one.



What the script checks

The script looks for metadata in several places on the page, then uses the best available value.

It checks:

  • structured data in application/ld+json

  • Open Graph tags like og:title, og:description, og:image

  • Twitter meta tags

  • standard meta description

  • canonical URL

  • the current archived page URL

This helps it return useful values even if some metadata is missing.

Troubleshooting

Nothing happens

Make sure you pasted the full script and pressed Enter.

The result is blank

Some archived pages do not include all metadata. The script will still return whatever it can find.

The URL looks like the archived page instead of the original page

That usually means the original collection URL was not available in the page metadata, so the script used the best fallback.

Tips

  • Run the script on the exact collection page, not on the collection list or homepage

  • Use the CSV output, not the preview object, when pasting into a spreadsheet

  • Keep the header row only once if you are combining results from many pages

Need to collect many pages?

If you are working through a large list of archived collection pages, here is the condensed steps from this guide:

  1. open a page

  2. run the script

  3. copy the CSV row

  4. paste it into your spreadsheet

  5. repeat for the next page



Creating Collections in Shopify Admin

Once you have the collection page data, you can create the collections in the Shopify Admin.

It is important to create collections with the same handle that they had previously. This will restore all previous links, seo or other references to the collection page.

See the Shopify SmartCollections help article for detailed instructions when creating these collections in Shopify.

Did this answer your question?