Products
Clients
Extensions
APIs
Portable Data Exchange (PDX) is VMware GemFire’s data serialization protocol for cross-language objects and JSON data. When a PDX-serializable object is serialized for the first time, a PdxType is generated for it. The PdxType represents the data structure of that object and is used to serialize and deserialize it.
PdxTypes can proliferate in the TypeRegistry especially with unstructured JSON data. The structure of a class is the same for every instance of that class. The same is not necessarily true of JSON data. If uniquely-structured JSON data is added to a Region, it generates a PdxType specific to it. If that data is then deleted, its PdxType remains in the TypeRegistry and becomes an unused orphan.
This article describes a way to remove unused PdxTypes from the Distributed System.
For additional information regarding PDX serialization, see the VMware GemFire documentation here.
Several examples of JSON data and the PdxType generated for each is shown below. Each field in the JSON data becomes a PdxField in the PdxType. A PdxField describes the field name, type and the location in the data containing the field’s value.
This is a very simple example of multiple PdxTypes being created for similar unstructured JSON data. In this case, the best thing to do is to standardize the JSON address on one format and convert all JSON to that format before storing it in a Region. In a more complex case, PdxTypes can proliferate pretty easily, many of which become unused if the JSON data is removed or becomes more standardized. The RemoveUnusedPdxTypesFunction below shows a way to remove any unused PdxTypes.
All source code described in this article as well as an example usage is available here. The RemoveUnusedPdxTypesFunction:
The PdxTypes are stored in a replicated Region called PdxTypes. EnumInfo instances (which represent Enums in PDX) are also stored in that Region. This method filters those out and returns just the PdxTypes.
private Map getAllPdxTypes() { return this.cache.getRegion("PdxTypes").entrySet() .stream() .filter(entry -> entry.getValue() instanceof PdxType) .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); }
This method iteratively removes all in-use PdxTypes from the input object by getting the PdxType from the object as a PdxInstanceImpl and removing it from the collection. It then iterates each field of the PdxInstanceImpl and recursively calls the method on the field’s value. Collections and Maps are iterated separately, but every object ends up in the first conditional.
private void removeInUsePdxTypes(Map<Integer,PdxType> allPdxTypesCopy, Object parent, String objFieldName, Object obj) { if (obj instanceof PdxInstanceImpl) { PdxInstanceImpl pdxInstance = (PdxInstanceImpl) obj; PdxType pdxType = pdxInstance.getPdxType(); allPdxTypesCopy.remove(pdxType.getTypeId()); for (PdxField field : pdxType.getFields()) { String fieldName = field.getFieldName(); Object fieldValue = pdxInstance.readField(fieldName); removeInUsePdxTypes(allPdxTypesCopy, obj, fieldName, fieldValue); } } else if (obj instanceof Collection) { ((List) obj).forEach(value -> removeInUsePdxTypes(allPdxTypesCopy, obj, objFieldName, value)); } else if (obj instanceof Map) { ((Map) obj).forEach((key, value) -> { removeInUsePdxTypes(allPdxTypesCopy, obj, objFieldName, key); removeInUsePdxTypes(allPdxTypesCopy, obj, objFieldName, value); }); } }
Any PdxTypes remaining in the collection are removed from the Region using removeAll like:
this.cache.getRegion("PdxTypes").removeAll(allPdxTypesCopy.keySet());
Here are a few caveats and comments:
The concepts in this Function can become the implementation of a gfsh command that removes unused PdxTypes from an offline Distributed System. In order to provide this behavior for an online Distributed System, the Function would have to be modified to use the TypeRegistry so that proper locking is done around access to the PdxTypes Region. Also, the TypeRegistry would have to be enhanced to be able to delete PdxTypes from itself as well as the PdxTypes Region.