| Big Data | Data sets that range from many terabytes to petabytes in size, and that usually consist of less-structured information such as Web log files. |
| Hadoop cluster | A type of scalable computer cluster inspired by the Google Cluster Architecture and intended for cost-effectively processing less-structured information. |
| Apache Hadoop | The core of an open-source ecosystem that makes Big Data analysis more feasible through the efficient use of commodity computer clusters. |
| Cascading | A bridge from Hadoop to common Java-based programming techniques not previously usable in cluster-computing environments. |
| NoSQL | A class of non-relational data stores and data analysis techniques that are intended for various kinds of less-structured data. Many of these techniques are part of the Hadoop ecosystem. |
| Gray Data | Data from multiple sources that isn’t formatted or vetted for specific needs, but worth exploring with the help of Hadoop cluster analysis techniques. |