mercury-parser/scripts/templates/custom-extractor-test.js

92 lines
2.5 KiB
JavaScript
Raw Normal View History

feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
import template from './index';
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
const IGNORE = [
'url',
'domain',
'content',
'word_count',
'next_page_url',
'excerpt',
'direction',
'total_pages',
'rendered_pages',
]
function testFor(key, value, dir, file, url) {
if (IGNORE.find(k => k === key)) return ''
return template`
feat: some small tweaks to toy's excellent parsers ☺️ Squashed commit of the following: commit 9638220124a325322d6cda7d16c645185d5fe827 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 11:02:29 2016 -0700 fix: removed eslint plugin that was adding unneded async parens commit ce2268c0f7c1b093c06f156730a0f1bc2aaba39c Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:47:36 2016 -0700 style: fix async in parens commit 9591856915eddaf93170da1ce9225b8a378bdf55 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:37:11 2016 -0700 fix: remove parens around async commit 6c56054717acc1f7e5499691780f8273f6d07bac Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:35:50 2016 -0700 fix msn fixture; adjusted yahoo test commit 4fc117ad5fdc5528f29b0873d60a6a1709642f15 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:14:38 2016 -0700 removed dek and date_publised tests; neither exist in littlethings commit 401094b4abc52901255fd2461f5839624f11d8a3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:08:44 2016 -0700 feat: updated buzzfeed for content extraction commit 19548a5485f70ff9b65e3e725d2364d07734ac9c Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 09:54:30 2016 -0700 fix: generator should make transforms an object, not array commit b92113f9f7c97aca9e6d3ce9243abac967d26b63 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:54:38 2016 -0700 feat: updated politico commit c026591040f7671cb2a6dd5177a995e21d015482 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:48:52 2016 -0700 fix: typos commit 14aa8fa4ce38ff1c2a212cd0225437ae3042c2c3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:36:12 2016 -0700 fix: incorrect command in readme commit fe260e6122877e2cb0130a1ecde0e503017057a3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:31:11 2016 -0700 fix: removed dek test because there is no dek on wikia
2016-10-10 18:03:10 +00:00
it('returns the ${key}', async () => {
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
// To pass this test, fill out the ${key} selector
// in ${dir}/index.js.
const html =
fs.readFileSync('${file}');
const articleUrl =
'${url}';
const { ${key} } =
await Mercury.parse(articleUrl, html, { fallback: false });
// Update these values with the expected values from
// the article.
assert.equal(${key}, ${value ? "'" + value + "'" : "''"})
});
`;
}
export default function (file, url, dir, result) {
feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
return template`
import assert from 'assert';
import fs from 'fs';
import URL from 'url';
import cheerio from 'cheerio';
import Mercury from 'mercury';
import getExtractor from 'extractors/get-extractor';
// Rename CustomExtractor
describe('CustomExtractor', () => {
it('is selected properly', () => {
// To pass this test, rename your extractor in
// ${dir}/index.js
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
// (e.g., CustomExtractor => NYTimesExtractor)
feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
// then add your new extractor to
// src/extractors/all.js
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
const url =
'${url}';
feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
const extractor = getExtractor(url);
assert.equal(extractor.domain, URL.parse(url).hostname)
})
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
${Reflect.ownKeys(result).map(k => testFor(k, result[k], dir, file, url)).join('\n\n')}
feat: some small tweaks to toy's excellent parsers ☺️ Squashed commit of the following: commit 9638220124a325322d6cda7d16c645185d5fe827 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 11:02:29 2016 -0700 fix: removed eslint plugin that was adding unneded async parens commit ce2268c0f7c1b093c06f156730a0f1bc2aaba39c Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:47:36 2016 -0700 style: fix async in parens commit 9591856915eddaf93170da1ce9225b8a378bdf55 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:37:11 2016 -0700 fix: remove parens around async commit 6c56054717acc1f7e5499691780f8273f6d07bac Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:35:50 2016 -0700 fix msn fixture; adjusted yahoo test commit 4fc117ad5fdc5528f29b0873d60a6a1709642f15 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:14:38 2016 -0700 removed dek and date_publised tests; neither exist in littlethings commit 401094b4abc52901255fd2461f5839624f11d8a3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 10:08:44 2016 -0700 feat: updated buzzfeed for content extraction commit 19548a5485f70ff9b65e3e725d2364d07734ac9c Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 09:54:30 2016 -0700 fix: generator should make transforms an object, not array commit b92113f9f7c97aca9e6d3ce9243abac967d26b63 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:54:38 2016 -0700 feat: updated politico commit c026591040f7671cb2a6dd5177a995e21d015482 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:48:52 2016 -0700 fix: typos commit 14aa8fa4ce38ff1c2a212cd0225437ae3042c2c3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:36:12 2016 -0700 fix: incorrect command in readme commit fe260e6122877e2cb0130a1ecde0e503017057a3 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Oct 10 08:31:11 2016 -0700 fix: removed dek test because there is no dek on wikia
2016-10-10 18:03:10 +00:00
it('returns the content', async () => {
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
// To pass this test, fill out the content selector
// in ${dir}/index.js.
// You may also want to make use of the clean and transform
// options.
const html =
fs.readFileSync('${file}');
const url =
'${url}';
const { content } =
await Mercury.parse(url, html, { fallback: false });
const $ = cheerio.load(content || '');
const first13 = $('*').first()
.text()
.trim()
.split(/\\s+/)
.slice(0, 13)
.join(' ')
feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
// Update these values with the expected values from
// the article.
feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
2016-09-30 16:26:25 +00:00
assert.equal(first13, null);
feat: generator for custom parsers and some documentation Squashed commit of the following: commit deaf9e60d031d9ee06e74b8c0895495b187032a5 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Sep 20 10:31:09 2016 -0400 chore: README for custom parsers commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 23:36:09 2016 -0400 draft of readme commit 4f0f463f821465c282ce006378e5d55f8f41df5f Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:56:34 2016 -0400 custom extractor used to build basic parser for theatlantic commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:20:13 2016 -0400 pre-commit to test custom parser generator commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:19:55 2016 -0400 feat: added nytimes parser commit 58b8d83a56927177984ddfdf70830bc4f328f200 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 17:17:28 2016 -0400 feat: can do fuzzy search or go straight to file commit c99add753723a8e2ac64d51d7379ac8e23125526 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Sep 19 10:52:26 2016 -0400 refactored export for custom extractors for easier renames commit 22563413669651bb497f1bb2a92085b71f2ae324 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 17:36:13 2016 -0400 feat: custom extractor generation in place commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 16 16:42:20 2016 -0400 good progress
2016-09-20 14:35:23 +00:00
});
});
`;
}