[dns-operations] Follow up to the talk - Beta availability of Two Data Sets

Tue Feb 22 17:48:33 UTC 2022

(This isn't operational, but it relates to the DNS-OARC workshop held last week.  Which raised a side question: Ought there be a dns-research at lists.dns-oarc.net?) 

As a follow up comments that JSON is necessary .... I've added a JSON version for each CSV file on the DNS Core Census website.  Even a JSON version of the catalog.  The catalog.csv lists only CSV files.  The catalog.json lists only JSON files.  I figure that is appropriate.

For those who didn't attend, the slides for the talk are here: https://indico.dns-oarc.net/event/42/contributions/903/attachments/872/1594/Beta%20Availability%20of%20two%20TLD%20Data%20Products.pdf

On slide 8, where it mentions CSV, there is now JSON (including csv.gz -> json.gz)

Uncompressed, JSON is 2-3 times the size of CSV, compressed JSON is about 1/2 the size of CSV.  I'd never expected that.

In addition, in the code directory there are now these two scripts demonstrating how to download the census:

get_dns_core_census_from_web_via_csv.py
get_dns_core_census_from_web_via_json.py

The diff between the two are, showing how "easy" pandas makes this in python (;)) and why I was wondering why JSON was preferred.

47c47 --> change the catalog
< 	catalog_url = 'https://observatory.research.icann.org/dns-core-census/v010/table/catalog.csv'
---
> 	catalog_url = 'https://observatory.research.icann.org/dns-core-census/v010/table/catalog.json'
51c51 --> read the catalog in the right format
< 	catalog = pd.read_csv (catalog_url,dtype=str,na_filter=False)
---
> 	catalog = pd.read_json (catalog_url,dtype=str)#,na_filter=False)
83c83 --> read each table in the right format
< 			dataframes[row['TABLE_TOPIC']] = pd.read_csv (read_file,dtype=str,na_filter=False)
---
> 			dataframes[row['TABLE_TOPIC']] = pd.read_json (read_file,dtype=str)#,na_filter=False)