parse_json
langroid/parsing/parse_json.py
is_valid_json(json_str)
¶
Check if the input string is a valid JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_str |
str
|
The input string to check. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the input string is a valid JSON, False otherwise. |
Source code in langroid/parsing/parse_json.py
flatten(nested_list)
¶
Flatten a nested list into a single list of strings
Source code in langroid/parsing/parse_json.py
get_json_candidates(s)
¶
Get top-level JSON candidates, i.e. strings between curly braces.
Source code in langroid/parsing/parse_json.py
add_quotes(s)
¶
Replace accidentally un-quoted string-like keys and values in a potential json str. Intended to handle cases where a weak LLM may produce a JSON-like string containing, e.g. "rent": DO-NOT-KNOW, where it "forgot" to put quotes on the value, or city: "New York" where it "forgot" to put quotes on the key. It will even handle cases like 'address: do not know'.
Got this fiendishly clever solution from https://stackoverflow.com/a/66053900/10940584 Far better/safer than trying to do it with regexes.
Args: - s (str): The potential JSON string to parse.
- str: The (potential) JSON string with un-quoted string-like values replaced by quoted values.
Source code in langroid/parsing/parse_json.py
repair_newlines(s)
¶
Attempt to load as json, and if it fails, try with newlines replaced by space. Intended to handle cases where weak LLMs produce JSON-like strings where some string-values contain explicit newlines, e.g.: {"text": "This is a text with a newline"} These would not be valid JSON, so we try to clean them up here.
Source code in langroid/parsing/parse_json.py
extract_top_level_json(s)
¶
Extract all top-level JSON-formatted substrings from a given string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
str
|
The input string to search for JSON substrings. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of top-level JSON-formatted substrings. |
Source code in langroid/parsing/parse_json.py
top_level_json_field(s, f)
¶
Extract the value of a field f from a top-level JSON object. If there are multiple, just return the first.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
str
|
The input string to search for JSON substrings. |
required |
f |
str
|
The field to extract from the JSON object. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Any
|
The value of the field f in the top-level JSON object, if any. Otherwise, return an empty string. |