[dns-operations] Follow up to the talk - Beta availability of Two Data Sets
Edward Lewis
edward.lewis at icann.org
Tue Feb 22 17:48:33 UTC 2022
(This isn't operational, but it relates to the DNS-OARC workshop held last week. Which raised a side question: Ought there be a dns-research at lists.dns-oarc.net?)
As a follow up comments that JSON is necessary .... I've added a JSON version for each CSV file on the DNS Core Census website. Even a JSON version of the catalog. The catalog.csv lists only CSV files. The catalog.json lists only JSON files. I figure that is appropriate.
For those who didn't attend, the slides for the talk are here: https://indico.dns-oarc.net/event/42/contributions/903/attachments/872/1594/Beta%20Availability%20of%20two%20TLD%20Data%20Products.pdf
On slide 8, where it mentions CSV, there is now JSON (including csv.gz -> json.gz)
Uncompressed, JSON is 2-3 times the size of CSV, compressed JSON is about 1/2 the size of CSV. I'd never expected that.
In addition, in the code directory there are now these two scripts demonstrating how to download the census:
get_dns_core_census_from_web_via_csv.py
get_dns_core_census_from_web_via_json.py
The diff between the two are, showing how "easy" pandas makes this in python (;)) and why I was wondering why JSON was preferred.
47c47 --> change the catalog
< catalog_url = 'https://observatory.research.icann.org/dns-core-census/v010/table/catalog.csv'
---
> catalog_url = 'https://observatory.research.icann.org/dns-core-census/v010/table/catalog.json'
51c51 --> read the catalog in the right format
< catalog = pd.read_csv (catalog_url,dtype=str,na_filter=False)
---
> catalog = pd.read_json (catalog_url,dtype=str)#,na_filter=False)
83c83 --> read each table in the right format
< dataframes[row['TABLE_TOPIC']] = pd.read_csv (read_file,dtype=str,na_filter=False)
---
> dataframes[row['TABLE_TOPIC']] = pd.read_json (read_file,dtype=str)#,na_filter=False)
More information about the dns-operations
mailing list