Lets say we want to assess the data quality of Company A's big data. Due
to both security, privacy and work-load concerns, it's impossible to view/access the whole data repository(data-lake or data-ocean) of A.
We can only request a sample of Company A's big data and then hopefully we can apply some quality-assess-toolkit to do some analysis.
My question is: how to draw such a data sample? what requirements should we set up for such a data sample?
Moreover, Company A may "optimize" or "decorate" the sample data that he gives out, what might be a good scheme or mechanism design
such that we can avoid his "optimization" or "decoration"?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 149:15:00 |
Calls: | 10,383 |
Calls today: | 8 |
Files: | 14,054 |
D/L today: |
2 files (1,861K bytes) |
Messages: | 6,417,765 |