Apologies, I am not familiar with clustering algorithms, so am not really sure how this all hangs together.
I would recommend storing the least amount of data on-chain, the same goes for processing.
This might be that you only store the hash of your dataset, the name of the algorithm (or a hash of the algorithm) and the results (or if the results are large, the hash of the results) on-chain.
As you suggested, the data could be stored in decentralized storage such as IPFS.
This would allow someone to access the dataset, run the algorithm and compare them with what is stored in the contract.
You can use Remix to experiment with simple contracts.
It might help if you can expand a bit more on what you are trying to do.