These scores are the implementation of the following paper:
Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828
Their approach to calculate the quality of ebook records comming from different data sources.
Each record get a score based on a number of criteria. Each criteria result in a positive score. The final score is the summary of these criteria scores.
| Record Element | MARC field/position/subfield | How counted | |
|---|---|---|---|
| 1. | ISBN | 020 | 1 point for each occurrence of field |
| 2. | Authors | 100, 110, 111 | 1 point for each occurrence of field(s) |
| 3. | Alternative Titles | 246 | 1 point for each occurrence of field |
| 4. | Edition | 250 | 1 point for each occurrence of field |
| 5. | Contributors | 700, 710, 711, 720 | 1 point for each occurrence of field(s) |
| 6. | Series | 440, 490, 800, 810, 830 | 1 point for each occurrence of field(s) |
| 7. | Table of Contents and Abstract | 505, 520 | 2 points if both fields exist; 1 point if either field exists |
| 8. | Date (MARC 008) | 008/07 | 1 point if valid coded date exists |
| 9. | Date (MARC 26X) | 260$c, 264$c | 1 point if 4-digit date exists; 1 point if matches 008 date. |
| 10. | LC/NLM Classification | 600, 610, 611, 630, 650, 651, 653 | 1 point if any field exists |
| 11. | Subject Headings: Library of Congress | 1 point for each field up to 10 total points | |
| 12. | Subject Headings: MeSH | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
| 13. | Subject Headings: FAST | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
| 14. | Subject Headings: GND (This was not part of the original algorithm) |
600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
| 15. | Subject Headings: Other | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 5 total points |
| 16. | Description | 008/23, 300$a | 2 points if both elements exist; 1 point if either exists |
| 17. | Language of Resource | 008/35 | 1 point if likely language code exists |
| 18. | Country of Publication Code | 008/15 | 1 point if likely country code exists |
| 19. | Language of Cataloging | 1 point if either no language is specified, or if English is specified | |
| 20. | Descriptive cataloging standard | 1 point if value is “rda” |
The histograms of the individual components:
|
1. ISBN |
2. Authors |
3. Alternative Titles |
|
4. Edition |
5. Contributors |
6. Series |
|
7. Table of Contents and Abstract |
8. Date 008 |
9. Date 26X |
|
10. LC/NLM Classification |
11. Subject Headings: Library of Congress |
12. Subject Headings: Mesh |
|
13. Subject Headings: Fast |
14. Subject Headings: GND |
15. Subject Headings: Other |
|
16. Online |
17. Language of Resource |
18. Country of Publication |
|
19. Language of Cataloging |
20. Descriptive cataloging standard is RDA |
| files |
kbr-0.xml.gz
kbr-1.xml.gz kbr-2.xml.gz |
| marcVersion | KBR |
| marcFormat | XML |
| dataSource | FILE |
| limit | -1 |
| offset | -1 |
| id | — |
| defaultRecordType | BOOKS |
| alephseq | false |
| marcxml | true |
| lineSeparated | false |
| trimId | true |
| recordFilter | {conditions: —, empty: true } json: {"conditions":null,"empty":true} |
| ignorableFields | {fields: [590, 591, 592, 593, 594, 595, 596, 659, 900, 912, 916, 940, 941, 942, 944, 945, 946, 948, 949, 950, 951, 952, 953, 954, 970, 971, 972, 973, 975, 977, 988, 989 ], empty: false } |
| stream | — |
| defaultEncoding | — |
| alephseqLineType | — |
| picaIdField | 003@$0 |
| picaSubfieldSeparator | $ |
| picaSchemaFile | — |
| picaRecordTypeField | 002@$0 |
| schemaType | MARC21 |
| groupBy | — |
| groupListFile | — |
| solrForScoresUrl | http://localhost:8983/solr/kbr_scores |
| processRecordsWithoutId | false |
| fileName | tt-completeness.csv |
| replacementInControlFields | # |
| marc21 | true |
| unimarc | false |
| pica | false |
| mqaf.version | 0.9.8 |
| qa-catalogue.version | 0.8.0-SNAPSHOT |
| duration | 00:02:39 |
| numberOfprocessedRecords | 2953005 |
| analysisTimestamp | 2026-06-25 12:12:24 |